Tune DRBD's resync controller to optimize resync speed and avoid over-saturating the replication network, leading to a more performant and healthy DRBD device.
The dynamic sync-rate controller for DRBD was introduced way back in version 8.3.9. It was introduced as a way to *slow down* DRBD resynchronization speeds. The idea here is that if you have a write intensive application running atop the DRBD device, it may already be close to filling up your I/O bandwidth. We introduced the dynamic rate limiter to then make sure that recovery resync does not compete for bandwidth with the ongoing write replication. To ensure that the resync does not compete with application I/O, the defaults lean towards the conservative side.
If the defaults seem slow to you or your use case, you can speed things up with a little bit of tuning in the DRBD configuration.
It is nearly impossible for DRBD to know just how much activity your storage and network backend can handle. It is fairly easy for DRBD to know how much activity it generates itself, which is why we tune how much network activity we allow DRBD to generate.
The dynamic sync-rate controller is configured using the following DRBD settings:
The following sections will help you tune each of the settings mentioned above.
Set the resync-rate to ⅓ of the c-max-rate.
With the dynamic resync-rate controller, this value is only used as a starting point. Changing this will only have a slight effect, but will help things speed up faster.
Size is specified in bytes and has a default value of zero and a maximum value of 1048576.
From the drbd.conf man page:
"[This] feature only gets active if the backing block device reads back zeroes after a discard command."
"When rs-discard-granularity is set to a non zero, positive value then DRBD tries to do a resync operation in requests of this size. In case such a block contains only zero bytes on the sync source node, the sync target node will issue a discard/trim/unmap command for the area."
This setting should be of particular interest in cases where LINSTOR is managing ZFS volumes. LINSTOR sets the rs-discard-granularity to 8K (8x1024=8192) bytes on ZFS volumes (zvols). For a LINSTOR volume backed by an empty, for example, newly created, ZFS volume, LINSTOR's default value results in a slower resync than could be possible with a higher value. Increasing the rs-discard-granularity value, for example, to 1M (1024K) bytes, will result in a significant speed increase. During one test, resync speeds increased from 200KiByte/sec to 200MiByte/sec!
You can view the current value of the setting using the command:
linstor volume-definition list-properties <resource> <volume_number>
It is possible to override LINSTOR's value for the rs-discard-granularity setting using the following command:
linstor volume-definition set-property <resource> <volume_number> DrbdOptions/Disk/rs-discard-granularity 1048576 # or a different value
Set c-max-rate to 100% (or slightly more) than what your hardware can handle.
For example: if you know your network is capable of 10Gb/s, but your disk throughput is only 800MiB/s, then set this value to 800M.
Increase the c-min-rate to ⅓ of the c-max-rate.
It is usually advised to leave this value alone as the idea behind the dynamic sync rate controller is to “step aside” and allow application I/O to take priority. If you really want to ensure things always move along at a minimum speed, then feel free to tune this a bit. As I mentioned earlier, you may want to start with a lower value and work up if doing this on a production system.
Set c-fill-target to 1M.
Just trust us on this, and simply set it to ‘1M’.
This should be enough to get the resync rate going well beyond the defaults.
Increase max-buffers to 40k.
40k is usually a good starting point, but we’ve seen good results with anywhere between 20k to 80k.
sndbuf-size / rcvbuf-size
Set sndbuf-size and rcvbuf-size to 10M.
TCP buffers are usually auto-tuned by the kernel, but setting this to a static value may help to move along the resync speeds. Again, on a production system, start with a more conservative value, like 4M, and increase it slowly while observing the systems.
Reviewed 2022/2/01 - MDK