Reading Pacemaker Logs
Pacemaker is a suite of software and not just a single component. Each component within the Pacemaker suite records to the cluster logs. This article has brief descriptions of each component and how each relates to Pacemaker logging.
Pacemaker by default logs messages of notice severity and higher to the system log, and messages of info severity and higher to the detail log, which by default is
/var/log/pacemaker/pacemaker.log
.source: ClusterLabs documentation
Corosync is responsible for communication between the nodes and will log messages when connections fail between hosts.
Jan 16 21:19:22 iscsi-0 corosync[6655]: [QUORUM] Sync members[2]: 1 3
Jan 16 21:19:22 iscsi-0 corosync[6655]: [QUORUM] Sync left[1]: 2
Jan 16 21:19:22 iscsi-0 corosync[6655]: [TOTEM ] A new membership (1.1e) was formed. Members left: 2
Jan 16 21:19:22 iscsi-0 corosync[6655]: [TOTEM ] Failed to receive the leave message. failed: 2
Jan 16 21:19:23 iscsi-0 corosync[6655]: [QUORUM] Members[2]: 1 3
Corosync will also log messages when nodes join or leave the cluster.
Jan 21 06:20:11 iscsi-0 corosync[709]: [QUORUM] Sync members[3]: 1 2 3
Jan 21 06:20:11 iscsi-0 corosync[709]: [QUORUM] Sync joined[1]: 3
Jan 21 06:20:11 iscsi-0 corosync[709]: [TOTEM ] A new membership (1.5f) was formed. Members joined: 3
Pacemaker configuration is stored in the Cluster Information Base (CIB) and managed by the pacemaker-based
daemon.
The CIB is effectively the XML file that you edit when you use the crm configure edit
command.
However, the CIB is more than just a configuration.
It also stores the present state of the cluster.
The CIB is XML in its raw format, but the crmsh and its crm
commands help make this more human readable.
You can also use pcs
commands to manage the CIB configuration for the cluster.
When the pacemaker-based
daemon logs messages, it is usually done to show the changes to the CIB in diff
format.
The following log excerpt is an example of the cluster updating a timeout value in the CIB and saving it to the configuration.
Jan 30 23:51:18.236 iscsi-0 pacemaker-based [753] (cib_process_request) info: Forwarding cib_replace operation for section configuration to all (origin=local/cibadmin/2)
Jan 30 23:51:18.247 iscsi-0 pacemaker-based [753] (cib_perform_op) info: Diff: --- 0.24.78 2
Jan 30 23:51:18.247 iscsi-0 pacemaker-based [753] (cib_perform_op) info: Diff: +++ 0.25.0 82421fd6ad12d6ea8abb9468b7ef671e
Jan 30 23:51:18.247 iscsi-0 pacemaker-based [753] (cib_perform_op) info: + /cib: @epoch=25, @num_updates=0
Jan 30 23:51:18.247 iscsi-0 pacemaker-based [753] (cib_perform_op) info: ++ /cib/configuration/resources/master[@id='ms_drbd_r0']/primitive[@id='p_drbd_r0']/instance_attributes[@id='p_drbd_r0-instan
ce_attributes']: <nvpair id="p_drbd_r0-instance_attributes-wfc_timeout" name="wfc_timeout" value="240"/>
Jan 30 23:51:18.250 iscsi-0 pacemaker-based [753] (cib_replace_notify) info: Replaced: 0.24.78 -> 0.25.0 from iscsi-0
These types of log messages are not the most useful when it comes to troubleshooting a cluster. However, you can still use them to determine exactly when a cluster was put into standby or maintenance mode.
The pacemaker-attrd
daemon maintains a database of Pacemaker attributes, and handles requests to modify them across the cluster.
This can be useful because the daemon will log messages when nodes join or leave the server.
The following log messages is an example of a node leaving the cluster.
Jan 30 20:02:33.983 iscsi-0 pacemaker-attrd [930] (node_left) info: Group attrd event 4: iscsi-1 (node 2 pid 890) left for unknown reason
And here is an example of two nodes joining the cluster:
Jan 30 20:02:38.892 iscsi-0 pacemaker-attrd [930] (pcmk_cpg_membership) info: Group attrd event 3: iscsi-0 (node 1 pid 930) is member
Jan 30 20:02:38.892 iscsi-0 pacemaker-attrd [930] (pcmk_cpg_membership) info: Group attrd event 3: iscsi-1 (node 2 pid 890) is member
These logs can be useful to determine when the cluster recognized that a node or resource has gone down.
The pacemaker-schedulerd
daemon is the updated version of the Policy Engine (PEngine or PE) daemon.
This component parses the CIB and attempts to adjust the cluster to the configured state.
The pacemaker-schedulerd
daemon is event driven and only runs when there is a cluster event, or every 15 minutes (by default).
Log messages that the pacemaker-schedulerd
daemon generates will show the current cluster state and the configured state.
In the following log message example, the current cluster state is that all resources are promoted on the node iscsi-0
, and the configured state is simply to leave things running on iscsi-0
, shown as “Leave”, under the LogActions
column.
Jan 30 16:57:17.671 iscsi-0 pacemaker-schedulerd[931] (unpack_config) notice: On loss of quorum: Ignore
Jan 30 16:57:17.672 iscsi-0 pacemaker-schedulerd[931] (determine_online_status) info: Node iscsi-0 is online
Jan 30 16:57:17.672 iscsi-0 pacemaker-schedulerd[931] (determine_online_status) info: Node iscsi-1 is online
Jan 30 16:57:17.672 iscsi-0 pacemaker-schedulerd[931] (determine_online_status) info: Node iscsi-2 is online
Jan 30 16:57:17.678 iscsi-0 pacemaker-schedulerd[931] (pcmk__set_instance_roles) info: Promoting p_drbd_r0:0 (Promoted iscsi-0)
Jan 30 16:57:17.678 iscsi-0 pacemaker-schedulerd[931] (pcmk__set_instance_roles) info: ms_drbd_r0: Promoted 1 instances of a possible 1
Jan 30 16:57:17.679 iscsi-0 pacemaker-schedulerd[931] (rsc_action_default) info: Leave p_iscsi_portblock_on0 (Started iscsi-0)
Jan 30 16:57:17.679 iscsi-0 pacemaker-schedulerd[931] (rsc_action_default) info: Leave p_vip_ip0 (Started iscsi-0)
Jan 30 16:57:17.679 iscsi-0 pacemaker-schedulerd[931] (rsc_action_default) info: Leave p_iscsi_target_0 (Started iscsi-0)
Jan 30 16:57:17.679 iscsi-0 pacemaker-schedulerd[931] (rsc_action_default) info: Leave p_iscsi_lun_0 (Started iscsi-0)
Jan 30 16:57:17.679 iscsi-0 pacemaker-schedulerd[931] (rsc_action_default) info: Leave p_iscsi_portblock_off0 (Started iscsi-0)
Jan 30 16:57:17.679 iscsi-0 pacemaker-schedulerd[931] (rsc_action_default) info: Leave p_drbd_r0:0 (Promoted iscsi-0)
Jan 30 16:57:17.679 iscsi-0 pacemaker-schedulerd[931] (rsc_action_default) info: Leave p_drbd_r0:1 (Unpromoted iscsi-1)
Jan 30 16:57:17.679 iscsi-0 pacemaker-schedulerd[931] (rsc_action_default) info: Leave p_drbd_r0:2 (Unpromoted iscsi-2)
Jan 30 16:57:17.680 iscsi-0 pacemaker-schedulerd[931] (pcmk__log_transition_summary) notice: Calculated transition 7, saving inputs in /var/lib/pacemaker/pengine/pe-input-24.bz2
Jan 30 16:57:17.680 iscsi-0 pacemaker-schedulerd[931] (handle_pecalc_op) info: Input has not changed since last time, not saving to disk
Pacemaker-controld is the cluster resource management daemon. Pacemaker-controld is responsible for coordinating actions across the cluster. The more interesting pacemaker-controld logs will almost always exist on the DC (designated controller node). Pacemaker-controld is also responsible for the DC elections.
Jun 20 07:22:56.773 nfs-1 pacemaker-execd [1382] (log_execute) info: executing - rsc:p_drbd_r0 action:notify call_id:45
Jun 20 07:22:56.866 nfs-1 pacemaker-execd [1382] (log_finished) info: p_drbd_r0 notify (call 45, PID 223600) exited with status 0 (execution time 92ms)
When troubleshooting a cluster, controld
log messages are often useful to show when a particular cluster node has left or joined the cluster.
Jan 21 06:18:22 iscsi-1 pacemaker-controld[776]: notice: Node iscsi-2 state is now lost
[...]
Jan 21 06:20:12 iscsi-1 pacemaker-controld[776]: notice: Node iscsi-2 state is now member
Jan 21 06:20:12 iscsi-1 pacemaker-controld[776]: notice: State transition S_NOT_DC -> S_PENDING
The pacemaker-fenced
daemon is the fencing tool within the Pacemaker suite.
Whenever fencing is initiated, the pacemaker-fenced
daemon is called and requests the affected node to be fenced.
You can use this knowledge to check log messages for pacemaker-fenced
and node_left
events to determine when a node was fenced, or left the cluster.
Jan 30 20:02:33.905 iscsi-0 pacemaker-fenced [928] (node_left) info: Group stonith-ng event 3: iscsi-2 (node 3 pid 848) left for unknown reason
Jan 30 20:02:33.905 iscsi-0 pacemaker-fenced [928] (crm_update_peer_proc) info: node_left: Node iscsi-2[3] - corosync-cpg is now offline
Jan 30 20:02:33.905 iscsi-0 pacemaker-fenced [928] (update_peer_state_iter) notice: Node iscsi-2 state is now lost | nodeid=3 previous=member source=crm_update_peer_proc
Jan 30 20:02:33.905 iscsi-0 pacemaker-fenced [928] (crm_reap_dead_member) info: Removing node with name iscsi-2 and id 3 from membership cache
Jan 30 20:02:33.905 iscsi-0 pacemaker-fenced [928] (reap_crm_member) notice: Purged 1 peer with id=3 and/or uname=iscsi-2 from the membership cache
Jan 30 20:02:33.906 iscsi-0 pacemaker-fenced [928] (pcmk_cpg_membership) info: Group stonith-ng event 3: iscsi-0 (node 1 pid 928) is member
Jan 30 20:02:33.906 iscsi-0 pacemaker-fenced [928] (pcmk_cpg_membership) info: Group stonith-ng event 3: iscsi-1 (node 2 pid 888) is member
The pacemaker-execd
daemon is responsible for the actual starting, stopping, and monitoring of Pacemaker managed resources that are local to a node.
The log messages that this daemon generates are often the most useful messages when troubleshooting issues with cluster resources.
The pacemaker-execd
daemon log messages will often include standard output of the commands called from the resource agent.
This information can be invaluable for troubleshooting.
The following log messages show an example of a DRBD® resource that has been started on a new node (iscsi-0
).
The pacemaker-execd
daemon starts the crm
resource p_drbd_r0
and then sends out a notification to the cluster that it has done so.
Jan 24 19:17:16.722 iscsi-0 pacemaker-execd [749] (log_execute) info: executing - rsc:p_drbd_r0 action:start call_id:27
Jan 24 19:17:17.588 iscsi-0 pacemaker-execd [749] (log_finished) info: p_drbd_r0 start (call 27, PID 2406) exited with status 0 (execution time 865ms)
Jan 24 19:17:17.614 iscsi-0 pacemaker-execd [749] (log_execute) info: executing - rsc:p_drbd_r0 action:notify call_id:28
Jan 24 19:17:17.701 iscsi-0 pacemaker-execd [749] (log_finished) info: p_drbd_r0 notify (call 28, PID 2681) exited with status 0 (execution time 86ms)
Jan 24 21:04:08.472 iscsi-0 pacemaker-execd [749] (cancel_recurring_action) info: Cancelling ocf operation p_drbd_r0_monitor_31000
Reviewed on 2021/01/27 - DJV
Edited on 2025/06/24 - AB