Corosync main process was not scheduled for x ms

This article will help explain what the Corosync message, "Corosync main process was not scheduled for x ms", means in your logs, and where you should look to address it.

The, “Corosync main process was not scheduled for x ms”, messages in the logs are generated when Corosync is not scheduled for CPU time for over 2 seconds (2000ms). Corosync runs as a real-time process. Real time processes receive the highest priority for CPU time on a Linux system, and therefore should receive “real time scheduling”. If Corosync is not getting scheduled in a timely manner, then either the system is severely overloaded, or in the case of a virtualized cluster node, the VM is not getting the CPU time from the hypervisor that it requires.

There is no amount of Corosync tuning that will mitigate these messages. If this is a virtualized cluster, you should investigate the hypervisor load to ensure it has the appropriate amount of resources to host the cluster, and also check that the hypervisor is not “freezing” the cluster VMs for any reason (like backups or automated live migrations), as both these operations do .

 

Reviewed 2020/12/01 - DGT