Debian Heartbeat: deal with STONITH errors when configuring CRM cluster

root's picture

STONITH is short for "Shoot The Other Node In The Head". It's a technique for fencing in computer clusters.

Fencing is the isolation of a failed node so that it does not cause disruption to a computer cluster. As its name suggests, STONITH fences failed nodes by resetting or powering down the failed node.

Multi-node error-prone contention in a cluster can have catastrophic results, such as if both nodes try writing to a shared storage resource. STONITH provides effective, if rather drastic, protection against these problems.

If you try to commit the crm configuration or directly configure the cluster, you will get error regarding STONITH if it's not configured.
The errors can be verified also with the following command, prior to commit:
crm_verify -L -V

Work log:

root@candy:~# crm_verify -L -V
crm_verify[2963]: 2014/11/27_17:55:42 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
crm_verify[2963]: 2014/11/27_17:55:42 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
crm_verify[2963]: 2014/11/27_17:55:42 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid

The error is explaining: In order to guarantee the safety of your data ⁠[8] , the default for STONITH ⁠[9] in Pacemaker is enabled. However it also knows when no STONITH configuration has been supplied and reports this as a problem (since the cluster would not be able to make progress if a situation requiring node fencing arose).

You have two options here and I recommend to configure STONITH instead of disabling it but it is up to you. I will present you below both options.

To disable STONITH, we set the stonith-enabled cluster option to false:

crm configure property stonith-enabled=false
crm_verify -L -V

work log:

root@candy:~# crm configure property stonith-enabled=false
root@candy:~# crm_verify -L -V
root@candy:~# 								<- NO OUTPUT IS OK

To configure STONITH (recommended), we do the following:

crm configure property stonith-enabled=true
crm configure property stonith-action=poweroff
crm configure rsc_defaults resource-stickiness=100
crm configure property no-quorum-policy=ignore
crm configure primitive stonith_rg stonith:external/ssh params hostlist="eave candy"
crm configure clone fencing_rg stonith_rg
crm_mon -1

work log:

root@candy:~# crm configure property stonith-enabled=true
root@candy:~# crm configure property stonith-action=poweroff
root@candy:~# crm configure rsc_defaults resource-stickiness=100
root@candy:~# crm configure property no-quorum-policy=ignore
root@candy:~# crm configure primitive stonith_rg stonith:external/ssh params hostlist="eave candy"
ERROR: stonith_rg: parameter candy does not exist
Do you still want to commit? yes
root@candy:~# crm configure clone fencing_rg stonith_rg
root@candy:~# crm_mon -1
============
Last updated: Thu Nov 27 18:20:54 2014
Last change: Thu Nov 27 18:20:51 2014 via cibadmin on candy
Stack: Heartbeat
Current DC: eave (ddfe44df-4fc3-4cbf-afc1-d4846465a920) - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, unknown expected votes
2 Resources configured.
============

Node eave (ddfe44df-4fc3-4cbf-afc1-d4846465a920): pending
Node candy (ea078cb1-708c-4ae4-b7b7-5d4322422b3e): pending
root@eave:~# crm configure property stonith-enabled=true
root@eave:~# crm configure property stonith-action=poweroff
root@eave:~# crm configure rsc_defaults resource-stickiness=100
root@eave:~# crm configure property no-quorum-policy=ignore
root@eave:~# crm configure primitive stonith_rg stonith:external/ssh params hostlist="eave candy"
ERROR: stonith_rg: id is already in use
root@eave:~# crm configure clone fencing_rg stonith_rg
ERROR: fencing_rg: id is already in use
root@eave:~# crm_mon -1
============
Last updated: Thu Nov 27 18:23:24 2014
Last change: Thu Nov 27 18:20:51 2014 via cibadmin on candy
Stack: Heartbeat
Current DC: eave (ddfe44df-4fc3-4cbf-afc1-d4846465a920) - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, unknown expected votes
2 Resources configured.
============

Online: [ eave candy ]

 Clone Set: fencing_rg [stonith_rg]
     Started: [ candy eave ]

Then restart the cluster on both nodes and check the status:

service heartbeat restart
crm_mon --one-shot

work log:

root@candy:~# service heartbeat restart
Stopping High-Availability services: Done.

Starting High-Availability services: Done.

root@candy:~# crm_mon --one-shot

============
Last updated: Thu Nov 27 16:28:29 2014
Last change: Thu Nov 27 12:40:45 2014 via crmd on eave
Stack: Heartbeat
Current DC: eave (ddfe44df-4fc3-4cbf-afc1-d4846465a920) - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, unknown expected votes
0 Resources configured.
============

Online: [ eave candy ]
root@eave:~# pico /etc/heartbeat/haresources
root@eave:~# service heartbeat restart
Stopping High-Availability services: Done.

Starting High-Availability services: Done.

root@eave:~#

DONE!

Thou shalt not steal!

If you want to use this information on your own website, please remember: by doing copy/paste entirely it is always stealing and you should be ashamed of yourself! Have at least the decency to create your own text and comments and run the commands on your own servers and provide your output, not what I did!

Or at least link back to this website.

Recent content