This article explains "High CPU load detected" messages from Pacemaker's crmd, and how you can adjust their threshold.
The following question gets asked fairly often:
What sets the threshold for this load average?
"crmd: notice: throttle_handle_load: High CPU load detected "
This message is in no case an indication that there is a problem with the cluster. It is an indication that the cluster is reporting a high OS Load.
This was an authoritative answer by the Pacemaker project lead a few years back:
Those messages indicate there is a real issue with the CPU load. When
the cluster notices high load, it reduces the number of actions it will
execute at the same time. This is generally a good idea, to avoid making
the load worse.
The messages don't hurt anything, they just let you know that there is something worth investigating.
If you've investigated the load and it's not something to be concerned about, you can change load-threshold to adjust what the cluster considers "high". The load-threshold (cluster properties) works like this:
- It defaults to 0.8 (which means Pacemaker should try to avoid consuming more than 80% of the system's resources).
- On a single-core machine, load-threshold is multiplied by 0.6 (because with only one core you *really* don't want to consume too many resources); on a multi-core machine, load-threshold is multiplied by the number of cores (to normalize the system load per core).
- That number is then multiplied by 1.2 to get the "Noticeable CPU load detected" message (debug level), by 1.6 to get the "Moderate CPU load" message, and 2.0 to get the "High CPU load" message. These are measured against the 1-minute system load average (the same number you would get with top, uptime, etc.).
So, if you raise load-threshold above 0.8, you won't see the log messages until the load gets even higher. But, that doesn't do anything about the actual load problem if there is one.
Reviewed 2020/12/02 - DGT