The Tradeoff Between Striping Disks and Increasing the Failure Domain When Using LINSTOR
Getting the maximum performance from the available hardware in your deployments is a natural concern. With a striped disk set, such as with RAID 0, I/O load is evenly distributed across all members of the set. This improves I/O throughput, but there is a catch.
In many LINSTOR® deployments, the workload manager, for example, Kubernetes, Apache CloudStack, or another, creates many small (relative to the physical storage devices) volumes. When you forego disk striping and build one LVM volume group from each physical storage device in a cluster, LINSTOR distributes the resulting LVM logical volumes (LVs) evenly across the physical storage devices.
If a physical storage device fails and you replace it, DRBD® will resynchronize the data to all LVs that were on that device. The failure domain is a single physical storage device. However, when you use striped disk sets, if one physical device in a striped set fails and you replace it, DRBD needs to resynchronize the data of all the LVs in the striped set.
When considering the number of physical storage devices per striped disk set, remember that larger striped sets also create larger failure domains. While larger striped sets deliver better performance, the recovery time, that is, the DRBD resynchronization time, is longer.
From practical experience, striped sets are essential for systems with many hard disk drives. In this case, the performance of individual drives is poor. Creating striped disk sets is necessary so that the drives can deliver acceptable throughput. For systems with a modest number of NVMe drives, the performance of an individual drive is often already so high that the increase in the failure domain does not justify the performance gain by creating a striped set.
Written by PR, 2025-11-10.
Reviewed by MAT, 2025-11-10.