Restoring a Kubernetes Cluster

So you are running containerized applications in a Kubernetes (K8s) cluster and suddenly the K8s control plane is acting weird. This might be due to the control plane’s database being inaccessible and/or corrupted. Although losing the control plane does not affect the currently running workload, you will not be able to monitor this workload nor manage it until the database is back up and running.

This article describes the process of restoring a K8s cluster to a healthy state by restoring the etcd database from a backup. Etcd is by far the most widely used database technology in K8s and one of the most commonly used tools to install/configure K8s clusters (the “kubeadm” tool) automatically integrates the etcd functionality into the control plane nodes, as illustrated by the below picture coming from the official Kubernetes documentation :

Using “kubeadm” to create a K8s cluster by default will install the control plane components as static pods on each control plane node. This includes the etcd database, which is used by the K8s API server to maintain information about all K8s objects in the cluster.

By the way … you can also configure your K8s cluster to use an external etcd database

Backup of the etcd database

In order to be able to do any type of restore you obviously need to have a backup available first. This backup is achieved by using the “etcdctl snapshot save” command, which means you will need to have access to the etcdctl command (etcd client) which is also needed to do the restore. You can either use the client which is pre-installed in the etcd pod or install it on a separate system and point that client to the etcd server (for example use “sudo apt install etcd-client” to install the client .. depending of your client OS).

The snapshot can be done at any time (no database “quiescing” of any kind needed) by running the command :

ETCDCTL_API=3 etcdctl --endpoints <etcd server IP address>:2379 --cacert <ca cert> --cert <server cert> --key <server key> snapshot save etcdbackup.db

NOTE : You have to specify ETCDCTL_API=3 to signal to etcdctl that you need API version 3 to be able to use the snapshot command. Alternatively you can perform a one time “export ETCDCTL_API=3” command and use the plain “etcdctl” command.

root@myk8s-control-plane:~/etcd-backup# ETCDCTL_API=3 etcdctl snapshot save etcd-backup.db --endpoints --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key
2024-01-10 14:26:05.034497 I | clientv3: opened snapshot stream; downloading
2024-01-10 14:26:05.208164 I | clientv3: completed snapshot read; closing
Snapshot saved at etcd-backup.db
root@myk8s-control-plane:~/etcd-backup# ls -al
total 2896
drwxr-xr-x 2 root root    4096 Jan 10 14:26 .
drwx------ 1 root root    4096 Jan 10 14:23 ..
-rw-r--r-- 1 root root 2953248 Jan 10 14:26 etcd-backup.db

Test/Dev Cluster

Let’s first look at restoring a simple K8s cluster with a single control plane and a single worker node. This would be a test/dev cluster, as a single control plane offers no High Availability.

In order to restore the snapshot we created previously we first need to stop the etcd database as well as its main consumer – the kube-apiserver. In a K8s cluster created with kubeadm we can achieve this by temporarily moving the .yaml files from the static pod folder (by default this is /etc/kubernetes/manifests) to a different location. We can check that these services no longer run by checking the container runtime (most commonly these days containerd) for example using “crictl ps”.

root@myk8s-control-plane:~/etcd-backup# mv /etc/kubernetes/manifests/*.yaml .
root@myk8s-control-plane:~/etcd-backup# ls -al
total 2912
drwxr-xr-x 2 root root    4096 Jan 10 14:39 .
drwx------ 1 root root    4096 Jan 10 14:23 ..
-rw-r--r-- 1 root root 2953248 Jan 10 14:26 etcd-backup.db
-rw------- 1 root root    2407 Jan 10 11:42 etcd.yaml
-rw------- 1 root root    3896 Jan 10 11:42 kube-apiserver.yaml
-rw------- 1 root root    3429 Jan 10 11:42 kube-controller-manager.yaml
-rw------- 1 root root    1463 Jan 10 11:42 kube-scheduler.yaml
root@myk8s-control-plane:~/etcd-backup# crictl ps
CONTAINER           IMAGE               CREATED             STATE               NAME                     ATTEMPT             POD ID              POD
3ef0d65ea9ab4       9d5429f6d7697       48 minutes ago      Running             kube-proxy               0                   3b60c02b94d78       kube-proxy-rdrdb
2b626c164cc50       697605b359357       About an hour ago   Running             speaker                  0                   1a3183dc5af73       speaker-vl678
4dc4989025874       ce18e076e9d4b       3 hours ago         Running             local-path-provisioner   0                   81fbc4589ee9d       local-path-provisioner-6bc4bddd6b-9b5g4
f7a42f910c10d       ead0a4a53df89       3 hours ago         Running             coredns                  0                   c4d800f65c5c6       coredns-5d78c9869d-m8b57
d3e268a712ab0       ead0a4a53df89       3 hours ago         Running             coredns                  0                   58bdc61343208       coredns-5d78c9869d-vcj8j

Now we can move the existing database to a safe location (just to be sure) and replace it with a restore of the etcd snapshot :

root@myk8s-control-plane:~/etcd-backup# mv /var/lib/etcd .
root@myk8s-control-plane:~/etcd-backup# ETCDCTL_API=3 etcdctl snapshot restore etcd-backup.db --data-dir /var/lib/etcd
2024-01-10 15:22:22.596939 I | mvcc: restore compact to 13014
2024-01-10 15:22:22.603321 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32

And finally we can restart the controller manager functions again by moving the static pod definitions back :

root@myk8s-control-plane:~/etcd-backup# mv *.yaml /etc/kubernetes/manifests/
root@myk8s-control-plane:~/etcd-backup# crictl ps
CONTAINER           IMAGE               CREATED             STATE               NAME                      ATTEMPT             POD ID              POD
391ac13e3700d       86b6af7dd652c       3 seconds ago       Running             etcd                      0                   b20f741c9bf0b       etcd-myk8s-control-plane
0e2115845491f       9f8f3a9f3e8a9       3 seconds ago       Running             kube-controller-manager   0                   8ac9daf2a8bbf       kube-controller-manager-myk8s-control-plane
33557b92ad114       c604ff157f0cf       3 seconds ago       Running             kube-apiserver            0                   431fd31b57b47       kube-apiserver-myk8s-control-plane
e270a0049ee0c       205a4d549b94d       4 seconds ago       Running             kube-scheduler            0                   4cdbbdf6f9338       kube-scheduler-myk8s-control-plane
3ef0d65ea9ab4       9d5429f6d7697       2 hours ago         Running             kube-proxy                0                   3b60c02b94d78       kube-proxy-rdrdb
<rest op output hidden ...>

Your K8s cluster is now back to the state of the time when the snapshot was taken. You might need to restart the kubelet on each node, so maybe a good idea to do that anyway to reset connections to the API server.

Note that this procedure only saves and restores the etcd database. This is fine as long as you are running stateless applications. However when you also have stateful applications storing data on the worker nodes or (preferably) on external systems using components like persistent volumes it makes more sense to use actual backup solutions like “Velero” or “Kasten” to name a few. More on that in a future post.

HA Production Cluster

When you run a production K8s cluster you probably have at least 3 nodes, each running a copy of the etcd database. When running in this mode etcd is distributed (the “d” in etcd) across all the control plane nodes. In order to restore the entire etcd cluster we need to restore the same etcd snapshot/backup on each node. The procedure to restore as described before is basically repeated on each control plane node (where all nodes are down during the restore process). However there is one major difference which has to do with the actual etcdctl restore command. This command will be slightly different on each node to configure the proper local IP address and local hostname as well as configure all the IP addresses of the members of the etcd database cluster.
Assuming we have a cluster with 3 control plane nodes with hostname cp1, cp2 and cp3 and IP address, and respectively the command to restore the database on cp1 for example would be :

root@cp1:~/etcd-backup# ETCDCTL_API=3 etcdctl snapshot restore backup.db --data-dir /var/lib/ --name cp1 --initial-cluster cp1=,cp2=,cp3= --initial-cluster-token new-etcd-cluster --initial-advertise-peer-urls

In a HA etcd cluster one of the nodes is the leader which takes care of updating the database initially and using a logging system makes sure the other member nodes will be able to synchronize their local copy of the database. It is always a good idea when shutting down an etcd cluster (or for example replacing all nodes with new ones) to stop the leader last. To find out which node is the leader you can use the etcdctl endpoint status command :

root@ha-k8s-control-plane:/# ETCDCTL_API=3 etcdctl --endpoints,, --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key  endpoint status -w table
| | eadff6a055b2649b |   3.5.9 |  7.0 MB |      true |         2 |     487938 |
| | 701bb008d1e08bea |   3.5.9 |  7.1 MB |     false |         2 |     487938 |
| | 926c2d51a0207869 |   3.5.9 |  7.0 MB |     false |         2 |     487938 |

… and checkout the column “IS LEADER”.