Speeding up Unmount When It's Slow or Taking Longer than Expected in a Cluster Using DRBD

There are a few aspects of DRBD® nodes and your network that can slow down the `umount` process:

  • Round-trip times between nodes where data is being copied and replicated to
  • Large amounts of unwritten data, for example dirty cache, that is, data that is not yet written to the disk 
  • Data in RAM that requires a write-out to persistent storage
  • Network throughput (when working with remote nodes)

When a DRBD stack spans a WAN and is configured to use the synchronous replication protocol (Protocol C), the network throughput can increase the time it takes the `umount` process to complete. This can be true, even with a stack that has premium hardware on all nodes in the cluster. If the network connection between the nodes is for example 300 Mbps, then that will slow down the cluster synchronization between the nodes.

Consider an example of a DRBD stack with the following network throughput speed and dirty cache size:

  • 100 Mbps throughput on the connections between nodes
  • 10 GiB of dirty cache

With a maximum throughput possible of 100 Mbps, it would take a minimum of 800 seconds (10 x 8 x 1000 / 100) to process the `umount`.

However, with adjustments to the cluster, the bottlenecks can be reduced. For example,  increasing the network throughput to 1 Gbps with low enough (RTT) could improve the `umount` speed in the example above from 800 seconds to 80 seconds.