Backing Up and Restoring the LINSTOR etcd Key-value Store

This article assumes you are running LINSTOR® (or Piraeus Datastore) in Kubernetes and are using etcd for your LINSTOR database.

Taking a backup of the etcd key-value store for LINSTOR running in Kubernetes can be accomplished with the following commands:

kubectl exec linstor-op-etcd-0 -- etcdctl snapshot save /tmp/save.db
kubectl cp linstor-op-etcd-0:/tmp/save.db save.db

The save.db file is your backup.

Restoring the etcd key space in Kubernetes is a bit more involved, the steps below assume you’re using etcd as deployed by the LINSTOR or Piraeus operator for Kubernetes.

First, you must touch a file in each etcd pod instructing it NOT to startup normally, to prepare for disaster recovery maintenance.

# touch the nostart file
for p in $(kubectl get pods -n linstor -o=jsonpath={.items[*].metadata.name} -l app=etcd); do
kubectl exec -n linstor $p -- touch /var/run/etcd/nostart
done

# restart the pods
for p in $(kubectl get pods -n linstor -o=jsonpath={.items[*].metadata.name} -l app=etcd); do
kubectl delete -n linstor pod $p
done

Once the pods have been rescheduled and are reported as running, delete the keyspace for the cluster you’re about to restore (if you do not, you will encounter permissions errors while recovering). Moving the directory, rather than deleting it, might be wise if you’re not confident recovery will succeed:

# delete the existing etcd cluster
for p in $(kubectl get pods -n linstor -o=jsonpath={.items[*].metadata.name} -l app=etcd); do
kubectl exec -n linstor $p -- rm -rf /var/run/etcd/default.etcd
done

Copy the backed up the save.db file to each of the etcd pods and restore the keyspace in each etcd pod:

# copy save.db etcd to pods
for p in $(kubectl get pods -n linstor -o=jsonpath={.items[*].metadata.name} -l app=etcd); do \
kubectl cp -n linstor save.db $p:/tmp/restore.db; \
done

# restore
for p in $(kubectl get pods -n linstor -o=jsonpath={.items[*].metadata.name} -l app=etcd); do \
kubectl exec -n linstor $p -- etcdctl snapshot --wal-dir /var/run/etcd/default.etcd/member/wal --data-dir /var/run/etcd/default.etcd restore /tmp/restore.db; \
done

If the restore was successful, delete the nostart file so etcd can startup normally, and restart (delete) the etcd pods:

# delete nostart and restart pods
for p in $(kubectl get pods -n linstor -o=jsonpath={.items[*].metadata.name} -l app=etcd); do \
kubectl exec -n linstor $p -- rm /var/run/etcd/nostart; \
done
for p in $(kubectl get pods -n linstor -o=jsonpath={.items[*].metadata.name} -l app=etcd); do \
kubectl delete -n linstor pod $p; \
done

Restart the LINSTOR controller pod, <helm-release-name>-cs-controller-<deployment-id>, to attach to the restored etcd keyspace and LINSTOR’s control plane should be recovered to the point in time where the backup was taken.

MDK – 08/30/21