By using Kubernetes node labels and LINSTOR® auxiliary properties, you can better control the placement of your replicas within your cluster. This is useful when you need to avoid placing two replicas within a single failure domain (such as a rack or DC).
Assume that you have a six node Kubernetes cluster with LINSTOR configured using the LINSTOR Operator for persistent storage, and you have a LINSTOR storage pool named lvm-thin
configured across all nodes.
# kubectl get nodes NAME STATUS ROLES AGE VERSION kube-0 Ready control-plane 6h57m v1.26.3 kube-1 Ready <none> 6h57m v1.26.3 kube-2 Ready <none> 6h57m v1.26.3 kube-3 Ready <none> 6h57m v1.26.3 kube-4 Ready <none> 6h57m v1.26.3 kube-5 Ready <none> 6h57m v1.26.3 LINSTOR ==> node list ╭───────────────────────────────────────────────────────────────────────────────────────────────╮ ┊ Node ┊ NodeType ┊ Addresses ┊ State ┊ ╞═══════════════════════════════════════════════════════════════════════════════════════════════╡ ┊ kube-0 ┊ SATELLITE ┊ 192.168.222.40:3366 (PLAIN) ┊ Online ┊ ┊ kube-1 ┊ SATELLITE ┊ 192.168.222.41:3366 (PLAIN) ┊ Online ┊ ┊ kube-2 ┊ SATELLITE ┊ 192.168.222.42:3366 (PLAIN) ┊ Online ┊ ┊ kube-3 ┊ SATELLITE ┊ 192.168.222.43:3366 (PLAIN) ┊ Online ┊ ┊ kube-4 ┊ SATELLITE ┊ 192.168.222.44:3366 (PLAIN) ┊ Online ┊ ┊ kube-5 ┊ SATELLITE ┊ 192.168.222.45:3366 (PLAIN) ┊ Online ┊ ┊ linstor-op-cs-controller-7c7d59d98d-d82lr ┊ CONTROLLER ┊ 172.16.186.2:3366 (PLAIN) ┊ Online ┊ ╰───────────────────────────────────────────────────────────────────────────────────────────────╯ LINSTOR ==> storage-pool list ╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ ┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName ┊ ╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ 8<------------------------------------------------------------snip---------------------------------------------------------------8< ┊ lvm-thin ┊ kube-0 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 8.00 GiB ┊ 8.00 GiB ┊ True ┊ Ok ┊ ┊ ┊ lvm-thin ┊ kube-1 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 8.00 GiB ┊ 8.00 GiB ┊ True ┊ Ok ┊ ┊ ┊ lvm-thin ┊ kube-2 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 8.00 GiB ┊ 8.00 GiB ┊ True ┊ Ok ┊ ┊ ┊ lvm-thin ┊ kube-3 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 8.00 GiB ┊ 8.00 GiB ┊ True ┊ Ok ┊ ┊ ┊ lvm-thin ┊ kube-4 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 8.00 GiB ┊ 8.00 GiB ┊ True ┊ Ok ┊ ┊ ┊ lvm-thin ┊ kube-5 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 8.00 GiB ┊ 8.00 GiB ┊ True ┊ Ok ┊ ┊ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Also assume you have your six nodes evenly distributed across three separate racks within your data center, or across three separate availability zones (AZ) within a cloud region. In our examples, we'll assume kube-0
and kube-1
are in one rack or AZ, kube-2
and kube-3
are in another, and kube-4
and kube-5
are in yet another.
LINSTOR, by default, is not aware of this distribution and therefore might place both replicas of a two replica LINSTOR volume within the same rack or AZ. This would leave your data inaccessible during a rack or AZ outage. Alternatively, you might want to keep replicas within a single rack or AZ to isolate LINSTOR's replication, or to keep replication latency to an absolute minimum.
In either situation, you will first need to add Kubernetes labels to each node. The LINSTOR Operator will automatically import Kubernetes node labels into LINSTOR and apply them as auxiliary properties on the LINSTOR node objects. Using the assumptions above, you will add the following node labels to your Kubernetes nodes, using the key zone with values a
, b
, and c
to differentiate your racks or AZs.
# kubectl label nodes kube-{0,1} zone=a node/kube-0 labeled node/kube-1 labeled # kubectl label nodes kube-{2,3} zone=b node/kube-2 labeled node/kube-3 labeled # kubectl label nodes kube-{4,5} zone=c node/kube-4 labeled node/kube-5 labeled
You'll see the Kubernetes node labels on each of the respective LINSTOR node objects.
LINSTOR ==> node list-properties kube-0 ╭────────────────────────────────────────────────────────────────────────────────╮ ┊ Key ┊ Value ┊ ╞════════════════════════════════════════════════════════════════════════════════╡ ┊ Aux/beta.kubernetes.io/arch ┊ amd64 ┊ ┊ Aux/beta.kubernetes.io/os ┊ linux ┊ ┊ Aux/kubernetes.io/arch ┊ amd64 ┊ ┊ Aux/kubernetes.io/hostname ┊ kube-0 ┊ ┊ Aux/kubernetes.io/os ┊ linux ┊ ┊ Aux/linbit.com/hostname ┊ kube-0 ┊ ┊ Aux/linbit.com/sp-DfltDisklessStorPool ┊ true ┊ ┊ Aux/linbit.com/sp-lvm-thick ┊ true ┊ ┊ Aux/linbit.com/sp-lvm-thin ┊ true ┊ ┊ Aux/node-role.kubernetes.io/control-plane ┊ ┊ ┊ Aux/node.kubernetes.io/exclude-from-external-load-balancers ┊ ┊ ┊ Aux/registered-by ┊ linstor-operator ┊ ┊ Aux/zone ┊ a ┊ ┊ CurStltConnName ┊ default ┊ ┊ NodeUname ┊ kube-0 ┊ ╰────────────────────────────────────────────────────────────────────────────────╯
Placing Replicas in Different Zones
LINSTOR's storageClasses
can then be configured to avoid placing replicas within a single failure domain using the LINSTOR storageClass
parameter replicasOnDifferent
, naming the zone key.
cat << EOF > linstor-sc-on-diff.yaml --- apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: linstor-csi-lvm-thin-r2-on-diff provisioner: linstor.csi.linbit.com parameters: autoPlace: 2 storagePool: lvm-thin replicasOnDifferent: zone reclaimPolicy: Delete allowVolumeExpansion: true --- apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: linstor-csi-lvm-thin-r3-on-diff provisioner: linstor.csi.linbit.com parameters: autoPlace: 3 storagePool: lvm-thin replicasOnDifferent: zone reclaimPolicy: Delete allowVolumeExpansion: true EOF kubectl apply -f linstor-sc-on-diff.yaml
Creating persistent volume claims (PVC) using the storageClass
created above will result in replicas being distributed where the key zone has different values.
cat << EOF > pvcs-on-diff.yaml --- kind: PersistentVolumeClaim apiVersion: v1 metadata: name: demo-vol-claim-diff-zone-0 spec: storageClassName: linstor-csi-lvm-thin-r3-on-diff accessModes: - ReadWriteOnce resources: requests: storage: 1G --- kind: PersistentVolumeClaim apiVersion: v1 metadata: name: demo-vol-claim-diff-zone-1 spec: storageClassName: linstor-csi-lvm-thin-r3-on-diff accessModes: - ReadWriteOnce resources: requests: storage: 1G EOF kubectl apply -f pvcs-on-diff.yaml
Within LINSTOR, you will see that each replica of the LINSTOR resources is in a different zone.
LINSTOR ==> resource list ╭────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ ┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊ CreatedOn ┊ ╞════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ ┊ pvc-c38af6c1-f02a-46db-b8ac-74b4eef20ca6 ┊ kube-0 ┊ 7000 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:39:18 ┊ ┊ pvc-c38af6c1-f02a-46db-b8ac-74b4eef20ca6 ┊ kube-2 ┊ 7000 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:39:17 ┊ ┊ pvc-c38af6c1-f02a-46db-b8ac-74b4eef20ca6 ┊ kube-4 ┊ 7000 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:39:18 ┊ ┊ pvc-e8a5d0c8-9e61-46c3-afb5-f0ca975c4249 ┊ kube-1 ┊ 7001 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:39:17 ┊ ┊ pvc-e8a5d0c8-9e61-46c3-afb5-f0ca975c4249 ┊ kube-3 ┊ 7001 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:39:19 ┊ ┊ pvc-e8a5d0c8-9e61-46c3-afb5-f0ca975c4249 ┊ kube-4 ┊ 7001 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:39:19 ┊ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Placing Replicas in the Same Zone
LINSTOR's storageClasses
can also be configured to place replicas within the same zone by using the LINSTOR storageClass
parameter replicasOnSame
, naming the respective key and value pair.
cat << EOF > linstor-sc-on-same.yaml --- apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: linstor-csi-lvm-thin-r2-on-same-a provisioner: linstor.csi.linbit.com parameters: autoPlace: 2 storagePool: lvm-thin replicasOnSame: zone=a reclaimPolicy: Delete allowVolumeExpansion: true --- apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: linstor-csi-lvm-thin-r2-on-same-b provisioner: linstor.csi.linbit.com parameters: autoPlace: 2 storagePool: lvm-thin replicasOnSame: zone=b reclaimPolicy: Delete allowVolumeExpansion: true --- apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: linstor-csi-lvm-thin-r2-on-same-c provisioner: linstor.csi.linbit.com parameters: autoPlace: 2 storagePool: lvm-thin replicasOnSame: zone=c reclaimPolicy: Delete allowVolumeExpansion: true EOF kubectl apply -f linstor-sc-on-same.yaml
Creating PVCs using the storageClasses
created above will result in replicas being distributed where the key zone has the specified value.
cat << EOF > pvcs-on-same.yaml --- kind: PersistentVolumeClaim apiVersion: v1 metadata: name: demo-vol-claim-zone-a spec: storageClassName: linstor-csi-lvm-thin-r2-on-same-a accessModes: - ReadWriteOnce resources: requests: storage: 1G --- kind: PersistentVolumeClaim apiVersion: v1 metadata: name: demo-vol-claim-zone-b spec: storageClassName: linstor-csi-lvm-thin-r2-on-same-b accessModes: - ReadWriteOnce resources: requests: storage: 1G --- kind: PersistentVolumeClaim apiVersion: v1 metadata: name: demo-vol-claim-zone-c spec: storageClassName: linstor-csi-lvm-thin-r2-on-same-c accessModes: - ReadWriteOnce resources: requests: storage: 1G EOF kubectl apply -f pvcs-on-same.yaml
Within LINSTOR, you will see that each replica of the LINSTOR resources are in the same zone.
LINSTOR ==> resource list ╭────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ ┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊ CreatedOn ┊ ╞════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ ┊ pvc-0ef85bf7-2a9a-4e6f-9d7b-a473518c6cee ┊ kube-2 ┊ 7001 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:17:55 ┊ ┊ pvc-0ef85bf7-2a9a-4e6f-9d7b-a473518c6cee ┊ kube-3 ┊ 7001 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:17:52 ┊ ┊ pvc-0fc56b3d-b249-4e6f-a225-41224cb367f9 ┊ kube-0 ┊ 7000 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:17:52 ┊ ┊ pvc-0fc56b3d-b249-4e6f-a225-41224cb367f9 ┊ kube-1 ┊ 7000 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:17:53 ┊ ┊ pvc-35144a76-d15f-4709-9911-b6c951e87cc1 ┊ kube-4 ┊ 7002 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:17:54 ┊ ┊ pvc-35144a76-d15f-4709-9911-b6c951e87cc1 ┊ kube-5 ┊ 7002 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:17:56 ┊ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Written by: MDK - 3/24/23