Using Kubernetes node labels and LINSTOR auxiliary properties you can better control the placement of your replicas within your cluster. This is useful when you need to avoid placing two replicas within a single failure domain (such as a rack or DC).
Assume you have a six node Kubernetes cluster with LINSTOR configured using the LINSTOR Operator for persistent storage, and you have a LINSTOR storage-pool named "lvm-thin" configured across all nodes.
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube-0 Ready control-plane 6h57m v1.26.3
kube-1 Ready <none> 6h57m v1.26.3
kube-2 Ready <none> 6h57m v1.26.3
kube-3 Ready <none> 6h57m v1.26.3
kube-4 Ready <none> 6h57m v1.26.3
kube-5 Ready <none> 6h57m v1.26.3
LINSTOR ==> node list
╭───────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node ┊ NodeType ┊ Addresses ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════╡
┊ kube-0 ┊ SATELLITE ┊ 192.168.222.40:3366 (PLAIN) ┊ Online ┊
┊ kube-1 ┊ SATELLITE ┊ 192.168.222.41:3366 (PLAIN) ┊ Online ┊
┊ kube-2 ┊ SATELLITE ┊ 192.168.222.42:3366 (PLAIN) ┊ Online ┊
┊ kube-3 ┊ SATELLITE ┊ 192.168.222.43:3366 (PLAIN) ┊ Online ┊
┊ kube-4 ┊ SATELLITE ┊ 192.168.222.44:3366 (PLAIN) ┊ Online ┊
┊ kube-5 ┊ SATELLITE ┊ 192.168.222.45:3366 (PLAIN) ┊ Online ┊
┊ linstor-op-cs-controller-7c7d59d98d-d82lr ┊ CONTROLLER ┊ 172.16.186.2:3366 (PLAIN) ┊ Online ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────╯
LINSTOR ==> storage-pool list
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName ┊
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
8<------------------------------------------------------------snip---------------------------------------------------------------8<
┊ lvm-thin ┊ kube-0 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 8.00 GiB ┊ 8.00 GiB ┊ True ┊ Ok ┊ ┊
┊ lvm-thin ┊ kube-1 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 8.00 GiB ┊ 8.00 GiB ┊ True ┊ Ok ┊ ┊
┊ lvm-thin ┊ kube-2 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 8.00 GiB ┊ 8.00 GiB ┊ True ┊ Ok ┊ ┊
┊ lvm-thin ┊ kube-3 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 8.00 GiB ┊ 8.00 GiB ┊ True ┊ Ok ┊ ┊
┊ lvm-thin ┊ kube-4 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 8.00 GiB ┊ 8.00 GiB ┊ True ┊ Ok ┊ ┊
┊ lvm-thin ┊ kube-5 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 8.00 GiB ┊ 8.00 GiB ┊ True ┊ Ok ┊ ┊
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Also assume you have your six nodes evenly distributed across three separate racks within your datacenter, or across three separate availability zones (AZ) within a cloud region. In our examples, we'll assume kube-0 and kube-1 are in one rack or AZ, kube-2 and kube-3 are in another, and kube-4 and kube-5 are in yet another.
LINSTOR, by default, is not aware of this distribution and therefore may place both replicas of a two replica LINSTOR volume within the same rack or AZ. This would leave your data inaccessible during a rack or AZ outage. Alternatively, you may want to keep replicas within a single rack or AZ in order to isolate LINSTOR's replication, or to keep replication latency to an absolute minimum.
In either situation, we'll first need to add Kubernetes labels to each node. The LINSTOR Operator will automatically import Kubernetes node labels into LINSTOR and apply them as auxiliary properties on the LINSTOR node objects. Using the assumptions above, we'll add the following node labels to our Kubernetes nodes, using the key "zone" with values "a", "b", and "c" to differentiate our racks or AZs.
# kubectl label nodes kube-{0,1} zone=a
node/kube-0 labeled
node/kube-1 labeled
# kubectl label nodes kube-{2,3} zone=b
node/kube-2 labeled
node/kube-3 labeled
# kubectl label nodes kube-{4,5} zone=c
node/kube-4 labeled
node/kube-5 labeled
You'll see the Kubernetes node labels on each of the respective LINSTOR node objects.
LINSTOR ==> node list-properties kube-0
╭────────────────────────────────────────────────────────────────────────────────╮
┊ Key ┊ Value ┊
╞════════════════════════════════════════════════════════════════════════════════╡
┊ Aux/beta.kubernetes.io/arch ┊ amd64 ┊
┊ Aux/beta.kubernetes.io/os ┊ linux ┊
┊ Aux/kubernetes.io/arch ┊ amd64 ┊
┊ Aux/kubernetes.io/hostname ┊ kube-0 ┊
┊ Aux/kubernetes.io/os ┊ linux ┊
┊ Aux/linbit.com/hostname ┊ kube-0 ┊
┊ Aux/linbit.com/sp-DfltDisklessStorPool ┊ true ┊
┊ Aux/linbit.com/sp-lvm-thick ┊ true ┊
┊ Aux/linbit.com/sp-lvm-thin ┊ true ┊
┊ Aux/node-role.kubernetes.io/control-plane ┊ ┊
┊ Aux/node.kubernetes.io/exclude-from-external-load-balancers ┊ ┊
┊ Aux/registered-by ┊ linstor-operator ┊
┊ Aux/zone ┊ a ┊
┊ CurStltConnName ┊ default ┊
┊ NodeUname ┊ kube-0 ┊
╰────────────────────────────────────────────────────────────────────────────────╯
Placing Replicas in Different Zones
LINSTOR's storageClasses can then be configured to avoid placing replicas within a single failure domain using the LINSTOR storageClass parameter "replicasOnDifferent", naming the "zone" key.
cat << EOF > linstor-sc-on-diff.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: "linstor-csi-lvm-thin-r2-on-diff"
provisioner: linstor.csi.linbit.com
parameters:
autoPlace: "2"
storagePool: "lvm-thin"
replicasOnDifferent: "zone"
reclaimPolicy: Delete
allowVolumeExpansion: true
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: "linstor-csi-lvm-thin-r3-on-diff"
provisioner: linstor.csi.linbit.com
parameters:
autoPlace: "3"
storagePool: "lvm-thin"
replicasOnDifferent: "zone"
reclaimPolicy: Delete
allowVolumeExpansion: true
EOF
kubectl apply -f linstor-sc-on-diff.yaml
Creating persistent volume claims (PVC) using the storageClass created above will result in replicas being distributed where the key "zone" has different values.
cat << EOF > pvcs-on-diff.yaml
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: demo-vol-claim-diff-zone-0
spec:
storageClassName: linstor-csi-lvm-thin-r3-on-diff
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1G
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: demo-vol-claim-diff-zone-1
spec:
storageClassName: linstor-csi-lvm-thin-r3-on-diff
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1G
EOF
kubectl apply -f pvcs-on-diff.yaml
Within LINSTOR, you will see each replica of the LINSTOR resources is in a different "zone".
LINSTOR ==> resource list
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊ CreatedOn ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-c38af6c1-f02a-46db-b8ac-74b4eef20ca6 ┊ kube-0 ┊ 7000 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:39:18 ┊
┊ pvc-c38af6c1-f02a-46db-b8ac-74b4eef20ca6 ┊ kube-2 ┊ 7000 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:39:17 ┊
┊ pvc-c38af6c1-f02a-46db-b8ac-74b4eef20ca6 ┊ kube-4 ┊ 7000 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:39:18 ┊
┊ pvc-e8a5d0c8-9e61-46c3-afb5-f0ca975c4249 ┊ kube-1 ┊ 7001 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:39:17 ┊
┊ pvc-e8a5d0c8-9e61-46c3-afb5-f0ca975c4249 ┊ kube-3 ┊ 7001 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:39:19 ┊
┊ pvc-e8a5d0c8-9e61-46c3-afb5-f0ca975c4249 ┊ kube-4 ┊ 7001 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:39:19 ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Placing Replicas in the Same Zone
LINSTOR's storageClasses can also be configured to place replicas within the same zone using the LINSTOR storageClass parameter "replicasOnSame", naming the respective key and value pair.
cat << EOF > linstor-sc-on-same.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: "linstor-csi-lvm-thin-r2-on-same-a"
provisioner: linstor.csi.linbit.com
parameters:
autoPlace: "2"
storagePool: "lvm-thin"
replicasOnSame: "zone=a"
reclaimPolicy: Delete
allowVolumeExpansion: true
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: "linstor-csi-lvm-thin-r2-on-same-b"
provisioner: linstor.csi.linbit.com
parameters:
autoPlace: "2"
storagePool: "lvm-thin"
replicasOnSame: "zone=b"
reclaimPolicy: Delete
allowVolumeExpansion: true
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: "linstor-csi-lvm-thin-r2-on-same-c"
provisioner: linstor.csi.linbit.com
parameters:
autoPlace: "2"
storagePool: "lvm-thin"
replicasOnSame: "zone=c"
reclaimPolicy: Delete
allowVolumeExpansion: true
EOF
kubectl apply -f linstor-sc-on-same.yaml
Creating PVCs using the storageClasses created above will result in replicas being distributed where the key "zone" has the specified value.
cat << EOF > pvcs-on-same.yaml
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: demo-vol-claim-zone-a
spec:
storageClassName: linstor-csi-lvm-thin-r2-on-same-a
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1G
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: demo-vol-claim-zone-b
spec:
storageClassName: linstor-csi-lvm-thin-r2-on-same-b
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1G
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: demo-vol-claim-zone-c
spec:
storageClassName: linstor-csi-lvm-thin-r2-on-same-c
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1G
EOF
kubectl apply -f pvcs-on-same.yaml
Within LINSTOR, you will see each replica of the LINSTOR resources are in the same "zone".
LINSTOR ==> resource list
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊ CreatedOn ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-0ef85bf7-2a9a-4e6f-9d7b-a473518c6cee ┊ kube-2 ┊ 7001 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:17:55 ┊
┊ pvc-0ef85bf7-2a9a-4e6f-9d7b-a473518c6cee ┊ kube-3 ┊ 7001 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:17:52 ┊
┊ pvc-0fc56b3d-b249-4e6f-a225-41224cb367f9 ┊ kube-0 ┊ 7000 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:17:52 ┊
┊ pvc-0fc56b3d-b249-4e6f-a225-41224cb367f9 ┊ kube-1 ┊ 7000 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:17:53 ┊
┊ pvc-35144a76-d15f-4709-9911-b6c951e87cc1 ┊ kube-4 ┊ 7002 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:17:54 ┊
┊ pvc-35144a76-d15f-4709-9911-b6c951e87cc1 ┊ kube-5 ┊ 7002 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:17:56 ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Written by: MDK - 3/24/23