If you must use parity based RAID under DRBD for local resiliency when using LINSTOR, it is best to create many separate striped logical volumes underneath many DRBD devices, rather than a single large DRBD device over a single large RAID volume.
Symptoms
You are experiencing poor write performance when using a large parity RAID device underneath DRBD®. This manifests as high latency and suboptimal throughput in write performance, particularly with highly random write workloads.
Problem
The issue stems from using DRBD over a single large parity RAID device (such as RAID 5 or RAID 6) while utilizing DRBD's default internal metadata storage. DRBD writes to its metadata in small 4k blocks, which can cause frequent read-modify-write cycles on RAID devices with parity stripes that are larger than 4k, leading to significant write penalties.
DRBD, or more specifically the drbdmeta
userland utility that drbdadm
calls to create DRBD's metadata, can create DRBD's metadata by using the --al-stripes
and --al-stripe-size
options to stripe DRBD metadata, and therefore avoid a single device in a large RAID array from becoming a bottleneck for then entire array while also limiting read-modify-write cycles. These drbdmeta
options, however, are not currently configurable within LINSTOR®.
Solution
For improved performance when the additional reliability of local RAID is necessary and using LINSTOR to create and manage DRBD devices, the following approaches mitigate the aforementioned performance bottlenecks:- Use LVM's RAID and LINSTOR's
StorDriver/LvcreateOptions
settings to create individual LVM RAID volumes for use with DRBD devices. This allows you to create many smaller DRBD devices each backed by their own logical volume, rather than using a single large parity RAID. Striping drives this way, either through RAID or simple striping without RAID, distributes the I/O load more evenly across all physical devices. - Keep DRBD's metadata external to the RAID array with parity. By storing DRBD metadata off any RAID arrays with parity, you can avoid the performance degradation caused by frequent 4k writes triggering read-modify-write cycles. DRBD's external metadata configuration improves write efficiency by separating metadata from parity disk operations.
Example Commands
For reference, here are some relevant LINSTOR commands that can be used to configure RAID striped logical volumes backed by DRBD, along with DRBD external metadata.
Create the storage pool LINSTOR will create the striped logical volumes from, repeat for each node:
# linstor physical-storage create-device-pool \
--pool-name storage LVM <node> /dev/sdb /dev/sdc /dev/sdd \
--storage-pool storage
Set the stripe configuration on each of the storage pools, again, repeat for each node:
# linstor storage-pool set-property <node> storage \
StorDriver/LvcreateOptions "--type raid5 -i2 -I64"
💡 TIP: Omit the
--type raid5
setting for striping without RAID. The-i
lvcreate
option configures the number of stripes, and-I
configures the stripe size.
Create a separate storage pool for LINSTOR to use for DRBD's metadata, repeat for each node:
# linstor physical-storage create-device-pool \
--pool-name meta LVM <node> /dev/sde --storage-pool meta
💡 TIP: Always use a physical device for metadata that is as fast or faster than the devices used when creating the storage pool for data volumes.
Finally, create a resource group that tells LINSTOR what options and storage pools to use when creating storage resources from it:
# linstor resource-group create striped_rg \
--storage-pool storage --place-count 2
# linstor resource-group set-property striped_rg \
StorPoolNameDrbdMeta meta
When you create resources using the guidelines above, you will see that DRBD's underlying logical volume is configured with the RAID and stripe settings specified in the LvCreateOptions
property, and DRBD is using the external metadata device pool:
# linstor resource-group spawn-resources striped_rg res0 7G
# lvs -a -o name,copy_percent,devices storage
LV Cpy%Sync Devices
res0_00000 100.00 res0_00000_rimage_0(0),res0_00000_rimage_1(0),res0_00000_rimage_2(0)
[res0_00000_rimage_0] /dev/sdb(1)
[res0_00000_rimage_1] /dev/sdc(1)
[res0_00000_rimage_2] /dev/sdd(1)
[res0_00000_rmeta_0] /dev/sdb(0)
[res0_00000_rmeta_1] /dev/sdc(0)
[res0_00000_rmeta_2] /dev/sdd(0)
# lsblk /dev/sdb /dev/sdc /dev/sdd /dev/sde
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sdb 252:16 0 4G 0 disk
├─storage-res0_00000_rmeta_0 253:1 0 4M 0 lvm
│ └─storage-res0_00000 253:7 0 7G 0 lvm
│ └─drbd1000 147:1000 0 7G 0 disk
└─storage-res0_00000_rimage_0 253:2 0 3.5G 0 lvm
└─storage-res0_00000 253:7 0 7G 0 lvm
└─drbd1000 147:1000 0 7G 0 disk
sdc 252:32 0 4G 0 disk
├─storage-res0_00000_rmeta_1 253:3 0 4M 0 lvm
│ └─storage-res0_00000 253:7 0 7G 0 lvm
│ └─drbd1000 147:1000 0 7G 0 disk
└─storage-res0_00000_rimage_1 253:4 0 3.5G 0 lvm
└─storage-res0_00000 253:7 0 7G 0 lvm
└─drbd1000 147:1000 0 7G 0 disk
sdd 252:48 0 4G 0 disk
├─storage-res0_00000_rmeta_2 253:5 0 4M 0 lvm
│ └─storage-res0_00000 253:7 0 7G 0 lvm
│ └─drbd1000 147:1000 0 7G 0 disk
└─storage-res0_00000_rimage_2 253:6 0 3.5G 0 lvm
└─storage-res0_00000 253:7 0 7G 0 lvm
└─drbd1000 147:1000 0 7G 0 disk
sde 252:64 0 4G 0 disk
└─meta-res0.meta_00000 253:8 0 4M 0 lvm
└─drbd1000 147:1000 0 7G 0 disk
Written 2024/09/25 – MDK
Reviewed 2024/09/25 – RJR