How to move the services manually, to the secondary node (candy):
This will also prevent the resource to automatically failover if the primary node is back online so keep that in mind when you plan several reboots on primary node and you don't want to be bothered.
crm resource migrate nfs-group candy
How to stop/start a resource group:
crm resource stop nfs-group
crm resource start nfs-group
One way to modify the cluster crm configuration:
crm configure edit
At this point, you will be introduced in "vi mode" and you can edit anything you want. Do not forget to save exit afterwards. Once done, continue with:
commit exit
How to add another DRBD volume in the existing cluster configuration:
1. Add new resource:
primitive drbd_squid ocf:heartbeat:Filesystem \ params fstype="reiserfs" directory="/nfs/squid" device="/dev/drbd1" \ meta target-role="Started"
2. Teach the cluster to mount the filesystems in the right order (drbd_fs is mounting the /nfs partition so it must be started first):
order drbd_fs-before-drbd_squid mandatory: drbd_fs:promote drbd_squid:start colocation drbd_squid-on-drbd_fs inf: drbd_squid drbd_fs:Master
3. Edit drbd_main and add nfsquid to drbd_resource:
primitive drbd_main ocf:heartbeat:drbd \ params drbd_resource="nfs [color=red]nfsquid[/color]" \ op monitor interval="59s" role="Master" timeout="30s" \ op monitor interval="60s" role="Slave" timeout="30s"
4. Add the new ressource to the existing nfs group:
group nfs-group drbd_fs [color=red]drbd_squid[/color] nfs_server nfs_common nfs_ip
How to restart service within the cluster:
Once you add something in cluster configuration, it is advisable to stop/start it only within the cluster.
So, if you want to (let's say) restart nginx server, you will have to do it with heartbeat commands:
# crm resource restart <service>
If you want to just reload, you can do it the classic way but make sure the configuration is OK because otherwise the cluster will detect the failure and will fail-over the service to the other node (on which will fail too since you have the same "wrong" configuration).
Example:
root@eave:~# crm resource restart nginx_srv INFO: ordering nginx_srv to stop INFO: ordering nginx_srv to start root@eave:~# crm_mon -1 | grep nginx_srv nginx_srv (lsb:nginx): Started eave
root@eave:~# /etc/init.d/nginx configtest Testing nginx configuration: nginx. root@eave:~# /etc/init.d/nginx reload Reloading nginx configuration: nginx.
How to solve error heartbeat: [5686]: ERROR: should_drop_message: attempted replay attack [eave]? [gen = 1417088478, curgen = 1417088491]:
In my situation, I have migrated the VMs from VirtualBox to ESXi by converting the disks associated with the VMs. When I started back the 2nd node, I got this error continuously on the primary node:
20:43:43 root@candy:~# grep should_drop_message /var/log/heartbeat.log | tail Jan 23 20:39:08 candy heartbeat: [5686]: ERROR: should_drop_message: attempted replay attack [eave]? [gen = 1417088478, curgen = 1417088491] Jan 23 20:39:08 candy heartbeat: [5686]: ERROR: should_drop_message: attempted replay attack [eave]? [gen = 1417088478, curgen = 1417088491] Jan 23 20:39:09 candy heartbeat: [5686]: ERROR: should_drop_message: attempted replay attack [eave]? [gen = 1417088478, curgen = 1417088491] Jan 23 20:39:09 candy heartbeat: [5686]: ERROR: should_drop_message: attempted replay attack [eave]? [gen = 1417088478, curgen = 1417088491] Jan 23 20:39:09 candy heartbeat: [5686]: ERROR: should_drop_message: attempted replay attack [eave]? [gen = 1417088478, curgen = 1417088491] Jan 23 20:39:09 candy heartbeat: [5686]: ERROR: should_drop_message: attempted replay attack [eave]? [gen = 1417088478, curgen = 1417088491] Jan 23 20:39:10 candy heartbeat: [5686]: ERROR: should_drop_message: attempted replay attack [eave]? [gen = 1417088478, curgen = 1417088491] Jan 23 20:39:10 candy heartbeat: [5686]: ERROR: should_drop_message: attempted replay attack [eave]? [gen = 1417088478, curgen = 1417088491] Jan 23 20:39:11 candy heartbeat: [5686]: ERROR: should_drop_message: attempted replay attack [eave]? [gen = 1417088478, curgen = 1417088491] Jan 23 20:39:11 candy heartbeat: [5686]: ERROR: should_drop_message: attempted replay attack [eave]? [gen = 1417088478, curgen = 1417088491]
You can solve this by deleting the file /var/lib/heartbeat/hb_generation from the 2nd node:
1. Stop the cluster services on 2nd node. If the stopping script hangs, just CTRL+C and kill the heartbeat's main process
2. Delete the mentioned file:
20:38:24 root@eave:heartbeat# cd /var/lib/heartbeat/ 20:38:32 root@eave:heartbeat# ls -la hb_generation -rw-r--r-- 1 root root 16 Jan 23 20:27 hb_generation
3. Start back up the services on 2nd node.
You should be OK now.