Resolving Issues Related to Upgrading a LINSTOR Deployment Within Kubernetes

This article will describe how you can overcome a potential CrashLoop state that the LINSTOR controller pod might get into after upgrading a LINSTOR deployment within Kubernetes. When there are a large number of LINSTOR resources in your deployment and your LINSTOR deployment uses the k8s back-end database, this can sometimes happen.

Symptoms

After upgrading LINSTOR, you might find the linstor-controller pod in a CrashLoop state because the run-migration container failed. If you then examine the container logs, they will show that the container tried to create a backup of the database but could not store the backup in the k8s back end. The error will be something such as “Request entity too large”.

Resolution

To resolve this situation, take the following steps:

  1. Manually create an external backup of the database. You can do that by entering the following commands which will create many YAML files:
    kubectl api-resources --api-group internal.linstor.linbit.com -oname | xargs kubectl get crds -oyaml > crds.yaml
    kubectl api-resources --api-group internal.linstor.linbit.com -oname | xargs -I {} sh -c "kubectl get {} -oyaml > {}.yaml"
  2. Create a secret to indicate to the run-migration container that it is safe to continue: kubectl create secret generic linstor-backup-for-<linstor-controller-pod-name>
  3. Wait until the run-migration container restarts.

Verifying successful resolution

To verify that these steps have resolved the LINSTOR controller pod CrashLoop state, you can watch the rollout status of the pod and verify that it is up and running by entering the following command:

kubectl rollout status deploy/linstor-controller -w

Written by MW, 2024/10/23.

Edited by MAT, 2024/10/30.