Optimizing Write Performance With DRBD and Parity RAID

If you must use parity based RAID under DRBD for local resiliency when using LINSTOR, it is best to create many separate striped logical volumes underneath many DRBD devices, rather than a single large DRBD device over a single large RAID volume.

Symptoms

You are experiencing poor write performance when using a large parity RAID device underneath DRBD®. This manifests as high latency and suboptimal throughput in write performance, particularly with highly random write workloads.

Problem

The issue stems from using DRBD over a single large parity RAID device (such as RAID 5 or RAID 6) while utilizing DRBD's default internal metadata storage. DRBD writes to its metadata in small 4k blocks, which can cause frequent read-modify-write cycles on RAID devices with parity stripes that are larger than 4k, leading to significant write penalties.

DRBD, or more specifically the drbdmeta userland utility that drbdadm calls to create DRBD's metadata, can create DRBD's metadata by using the --al-stripes and --al-stripe-size options to stripe DRBD metadata, and therefore avoid a single device in a large RAID array from becoming a bottleneck for then entire array while also limiting read-modify-write cycles. These drbdmeta options, however, are not currently configurable within LINSTOR®.

Solution

For improved performance when the additional reliability of local RAID is necessary and using LINSTOR to create and manage DRBD devices, the following approaches mitigate the aforementioned performance bottlenecks:
  • Use LVM's RAID and LINSTOR's StorDriver/LvcreateOptions settings to create individual LVM RAID volumes for use with DRBD devices. This allows you to create many smaller DRBD devices each backed by their own logical volume, rather than using a single large parity RAID. Striping drives this way, either through RAID or simple striping without RAID, distributes the I/O load more evenly across all physical devices.
  •  Keep DRBD's metadata external to the RAID array with parity. By storing DRBD metadata off any RAID arrays with parity, you can avoid the performance degradation caused by frequent 4k writes triggering read-modify-write cycles. DRBD's external metadata configuration improves write efficiency by separating metadata from parity disk operations.

Example Commands

For reference, here are some relevant LINSTOR commands that can be used to configure RAID striped logical volumes backed by DRBD, along with DRBD external metadata.

Create the storage pool LINSTOR will create the striped logical volumes from, repeat for each node:

# linstor physical-storage create-device-pool \
    --pool-name storage LVM <node> /dev/sdb /dev/sdc /dev/sdd \
    --storage-pool storage

Set the stripe configuration on each of the storage pools, again, repeat for each node:

# linstor storage-pool set-property <node> storage \
    StorDriver/LvcreateOptions "--type raid5 -i2 -I64"

💡 TIP: Omit the --type raid5 setting for striping without RAID. The -i lvcreate option configures the number of stripes, and -I configures the stripe size.

Create a separate storage pool for LINSTOR to use for DRBD's metadata, repeat for each node:

# linstor physical-storage create-device-pool \
    --pool-name meta LVM <node> /dev/sde --storage-pool meta

💡 TIP: Always use a physical device for metadata that is as fast or faster than the devices used when creating the storage pool for data volumes.

Finally, create a resource group that tells LINSTOR what options and storage pools to use when creating storage resources from it:

# linstor resource-group create striped_rg \
    --storage-pool storage --place-count 2

# linstor resource-group set-property striped_rg \
    StorPoolNameDrbdMeta meta

When you create resources using the guidelines above, you will see that DRBD's underlying logical volume is configured with the RAID and stripe settings specified in the LvCreateOptions property, and DRBD is using the external metadata device pool:

# linstor resource-group spawn-resources striped_rg res0 7G

# lvs -a -o name,copy_percent,devices storage
  LV                    Cpy%Sync Devices
  res0_00000            100.00   res0_00000_rimage_0(0),res0_00000_rimage_1(0),res0_00000_rimage_2(0)
  [res0_00000_rimage_0]          /dev/sdb(1)
  [res0_00000_rimage_1]          /dev/sdc(1)
  [res0_00000_rimage_2]          /dev/sdd(1)
  [res0_00000_rmeta_0]           /dev/sdb(0)
  [res0_00000_rmeta_1]           /dev/sdc(0)
  [res0_00000_rmeta_2]           /dev/sdd(0)

# lsblk /dev/sdb /dev/sdc /dev/sdd /dev/sde
NAME                          MAJ:MIN  RM  SIZE RO TYPE MOUNTPOINTS
sdb                           252:16    0    4G  0 disk
├─storage-res0_00000_rmeta_0  253:1     0    4M  0 lvm
│ └─storage-res0_00000        253:7     0    7G  0 lvm
│   └─drbd1000                147:1000  0    7G  0 disk
└─storage-res0_00000_rimage_0 253:2     0  3.5G  0 lvm
  └─storage-res0_00000        253:7     0    7G  0 lvm
    └─drbd1000                147:1000  0    7G  0 disk
sdc                           252:32    0    4G  0 disk
├─storage-res0_00000_rmeta_1  253:3     0    4M  0 lvm
│ └─storage-res0_00000        253:7     0    7G  0 lvm
│   └─drbd1000                147:1000  0    7G  0 disk
└─storage-res0_00000_rimage_1 253:4     0  3.5G  0 lvm
  └─storage-res0_00000        253:7     0    7G  0 lvm
    └─drbd1000                147:1000  0    7G  0 disk
sdd                           252:48    0    4G  0 disk
├─storage-res0_00000_rmeta_2  253:5     0    4M  0 lvm
│ └─storage-res0_00000        253:7     0    7G  0 lvm
│   └─drbd1000                147:1000  0    7G  0 disk
└─storage-res0_00000_rimage_2 253:6     0  3.5G  0 lvm
  └─storage-res0_00000        253:7     0    7G  0 lvm
    └─drbd1000                147:1000  0    7G  0 disk
sde                           252:64    0    4G  0 disk
└─meta-res0.meta_00000        253:8     0    4M  0 lvm
  └─drbd1000                  147:1000  0    7G  0 disk

 

Written 2024/09/25 – MDK

Reviewed 2024/09/25 – RJR