This article assumes you're running LINSTOR (or Piraeus) in Kubernetes and are using etcd for your LINSTOR database.
Taking a backup of the etcd key value store for LINSTOR® running in Kubernetes can be accomplished with the following commands:
kubectl exec linstor-op-etcd-0 -- etcdctl snapshot save /tmp/save.db
kubectl cp linstor-op-etcd-0:/tmp/save.db save.db
The save.db is your backup.
Restoring the etcd key space in Kubernetes is a bit more involved, the steps below assume you're using etcd as deployed by the LINSTOR or Piraeus operator for Kubernetes.
First, you must touch a file in each etcd pod instructing it NOT to startup normally, in order to prepare for disaster recovery maintenance.
# touch the nostart file
for p in $(kubectl get pods -n linstor -o=jsonpath="{.items[*].metadata.name}" -l app=etcd); do
kubectl exec -n linstor $p -- touch /var/run/etcd/nostart
done
# restart the pods
for p in $(kubectl get pods -n linstor -o=jsonpath="{.items[*].metadata.name}" -l app=etcd); do
kubectl delete -n linstor pod $p
done
Once the pods have been rescheduled and are reported as running, delete the keyspace for the cluster you're about to restore (if you do not, you will encounter permissions errors while recovering). Moving the directory, rather than deleting it, might be wise if you're not confident recovery will succeed:
# delete the existing etcd cluster
for p in $(kubectl get pods -n linstor -o=jsonpath="{.items[*].metadata.name}" -l app=etcd); do
kubectl exec -n linstor $p -- rm -rf /var/run/etcd/default.etcd
done
Copy the backed up save.db to each of the etcd pods and restore the keyspace in each etcd pod:
# copy save.db etcd to pods
for p in $(kubectl get pods -n linstor -o=jsonpath="{.items[*].metadata.name}" -l app=etcd); do
kubectl cp -n linstor save.db $p:/tmp/restore.db
done
# restore
for p in $(kubectl get pods -n linstor -o=jsonpath="{.items[*].metadata.name}" -l app=etcd); do
kubectl exec -n linstor $p -- etcdctl snapshot --wal-dir /var/run/etcd/default.etcd/member/wal --data-dir /var/run/etcd/default.etcd restore /tmp/restore.db
done
If the restore was successful, delete the nostart file so etcd can startup normally, and restart (delete) the etcd pods:
# delete nostart and restart pods
for p in $(kubectl get pods -n linstor -o=jsonpath="{.items[*].metadata.name}" -l app=etcd); do
kubectl exec -n linstor $p -- rm /var/run/etcd/nostart
done
for p in $(kubectl get pods -n linstor -o=jsonpath="{.items[*].metadata.name}" -l app=etcd); do
kubectl delete -n linstor pod $p
done
Restart the LINSTOR controller pod, <helm-release-name>-cs-controller-<deployment-id>, in order to attach to the restored etcd keyspace and LINSTOR's control plane should be recovered to the point in time where the backup was taken.
MDK – 08/30/21