Debug School

Cover image for Backup etcd in OpenShift
Suyash Sambhare
Suyash Sambhare

Posted on

Backup etcd in OpenShift

How to take the “etcd” backup before making any changes in the cluster.

Steps to take the etcd backup:

  1. Login into the OCP cluster using kubeadmin
  2. Login into the OCP cluster of one of the master nodes by using the below command: oc debug --as-root node/node1
  3. Then run the below command: chroot /host
  4. Then create the one directory as backup under /home/core/ if not present: mkdir /home/core/backup
  5. If any previous backup exists, then rename or delete it.
  6. Run the below command to take the fresh backup. /usr/local/bin/cluster-backup.sh /home/core/backup
  7. It will create two files in the backup directory. Change the mode and make it 777, otherwise, it will give a permission error while copying into the bastion server.
  8. Exit from the master host by typing exit twice.
  9. Copy the backup into the bastion server: scp core@bastion.suyi.local:/home/core/backup/* .
  10. From the bastion server copy the data into one of the shared drives whenever you take the fresh backup. It might be possible that the bastion node may crash, so request you to copy the data in a shared drive.

Note: This is not required for any deployment or any operator installation. It's only required when you are making changes in the cluster.

etcd

Backing up etcd

etcd is the key-value store for the OpenShift Container Platform, which persists in the state of all resource objects.

Back up your cluster’s etcd data regularly and store it in a secure location ideally outside the OpenShift Container Platform environment. Do not take an etcd backup before the first certificate rotation completes, which occurs 24 hours after installation, otherwise, the backup will contain expired certificates. It is also recommended to take etcd backups during non-peak usage hours because the etcd snapshot has a high I/O cost.

Be sure to take an etcd backup after you upgrade your cluster. This is important because when you restore your cluster, you must use an etcd backup that was taken from the same z-stream release. For example, an OpenShift Container Platform 4.y.z cluster must use an etcd backup that was taken from 4.y.z.

Back up your cluster’s etcd data by performing a single invocation of the backup script on a control plane host (also known as the master host). Do not take a backup for each control plane host.

After you have an etcd backup, you can restore to a previous cluster state.

Follow these steps to back up etcd data by creating an etcd snapshot and backing up the resources for the static pods. This backup can be saved and used at a later time if you need to restore etcd.

Only save a backup from a single control plane host (also known as the master host). Do not take a backup from each control plane host in the cluster.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.
  • You have checked whether the cluster-wide proxy is enabled.

    You can check whether the proxy is enabled by reviewing the output of oc get proxy cluster -o yaml. The proxy is enabled if the httpProxy, httpsProxy, and noProxy fields have values set.

Steps to follow

  1. Start a debug session for a control plane node: $ oc debug node/<node_name>
  2. Change your root directory to /host: sh-4.2# chroot /host
  3. If the cluster-wide proxy is enabled, be sure that you have exported the NO_PROXY, HTTP_PROXY, and HTTPS_PROXY environment variables.
  4. Run the cluster-backup.sh script and pass in the location to save the backup. The cluster-backup.sh script is maintained as a component of the etcd Cluster Operator and is a wrapper around the etcdctl snapshot save command.
        sh-4.4# /usr/local/bin/cluster-backup.sh /home/core/assets/backup


        found latest kube-apiserver: /etc/kubernetes/static-pod-resources/kube-apiserver-pod-6
        found latest kube-controller-manager: /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-7
        found latest kube-scheduler: /etc/kubernetes/static-pod-resources/kube-scheduler-pod-6
        found latest etcd: /etc/kubernetes/static-pod-resources/etcd-pod-3
        ede95fe6b88b87ba86a03c15e669fb4aa5bf0991c180d3c6895ce72eaade54a1
        etcdctl version: 3.4.14
        API version: 3.4
        {"level":"info","ts":1624647639.0188997,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/home/core/assets/backup/snapshot_2021-06-25_190035.db.part"}
        {"level":"info","ts":"2021-06-25T19:00:39.030Z","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
        {"level":"info","ts":1624647639.0301006,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://10.0.0.5:2379"}
        {"level":"info","ts":"2021-06-25T19:00:40.215Z","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
        {"level":"info","ts":1624647640.6032252,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://10.0.0.5:2379","size":"114 MB","took":1.584090459}
        {"level":"info","ts":1624647640.6047094,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"/home/core/assets/backup/snapshot_2021-06-25_190035.db"}
        Snapshot saved at /home/core/assets/backup/snapshot_2021-06-25_190035.db
        {"hash":3866667823,"revision":31407,"totalKey":12828,"totalSize":114446336}
        snapshot db and kube resources are successfully saved to /home/core/assets/backup
Enter fullscreen mode Exit fullscreen mode

In this example, two files are created in the /home/core/assets/backup/ directory on the control plane host:

  • snapshot_<datetimestamp>.db: This file is the etcd snapshot. The cluster-backup.sh script confirms its validity.
  • static_kuberesources_<datetimestamp>.tar.gz: This file contains the resources for the static pods. If etcd encryption is enabled, it also contains the encryption keys for the etcd snapshot.
  • If etcd encryption is enabled, it is recommended to store this second file separately from the etcd snapshot for security reasons. However, this file is required to restore from the etcd snapshot. Keep in mind that etcd encryption only encrypts values, not keys. This means that resource types, namespaces, and object names are unencrypted.

Congratulations! 🙋🏽‍♂️🧧🥽🏞️
You have successfully backed up the etcd data!

Ref: https://docs.openshift.com/container-platform/4.8/backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.html

Top comments (0)