Ceph Block Storage Backup And Recovery¶
One of the most crucial parts while installing a rook ceph cluster comes with having a reliable backup solution, with the ease of restoring those backups in an efficient manner!
For digging into what would be the best players out there, we experimented with the existing backup solutions for Rook Ceph.
This series will take you through some of the best viable backup solutions in hope of avoiding losses and making your lives easier in case of an unfortunate disaster.
There are several kinds of solutions, but we are going to focus on:
- Ceph features and Ceph native tools
- Kubernetes native tools and projects
Rook Ceph based Disaster Recovery Solutions¶
Looking into the first category, based on different workload Disaster Recovery solutions can narrowed down to:
- Block Storage: RBD Mirroring, which is supported and can be enabled/managed for a Block based persistent volume.
- FileSystem Storage: Filesystem Mirroring uses snapshots with one way peering for creating a FS based volume backup
- Object Storage: RGW Multisite support
External Tools based Disaster Recovery Solutions¶
These can be used to create a snapshot and for the entire application's, Kubernetes resources. The snapshot and export process can be configured for a schedule.
Let us dive into each of these solutions and discuss which would suit your use case best!
Block Storage Backup (RBD Mirroring)¶
The Block Storage in Rook Ceph cluster exists in form of Kubernetes's PersistentVolumes. To backup these resources we'd be using one of Ceph RBDs feature called RBD Mirroring. RBD Mirroring will asynchronously mirror the RBD images(present in for of PersistentVolumes) from Primary cluster to a Secondary(Backup) cluster.
- On your Rook Ceph Cluster enable RBD mirroring using Rook’s Block based mirroring official doc
Check daemon health status for rbd-mirror
If any issues are found please check the logs and configure the deployment correctly.
To make sure mirroring is configured correctly, identify the rbd-image mapped to the csi volume using:
Go to the toolbox pod and check the mirroring status
For a health state the rbd mirroring status for the image should look like:
When mirroring is enabled and working correctly, you should be able to see the mirror persistent volume/ rbd image getting synced from primary cluster to secondary:
On the secondary cluster:
Once we have made sure mirroring is working properly, we can create a snap schedule by running:
Read more about configuring rbd snap schedule here
Now we have all in place for periodic sync of our block volume in form of rbd snapshots to a secondary cluster
What if something happens to Block Persistent volume on the primary cluster?
Restoring Backed up RBD Persistent Volume¶
If for any reason the primary goes down/something happens to the PersistentVolume on primary; we can use this failback process to restore the back.
For failback we do the following step by step process using the toolbox pod of respective clusters:
Demote the primary cluster (cluster 1)you might observe split brain status momentarily
Promote the secondary cluster to be the new primary, this step will make the cluster 1 (old primary) image to sync cluster 2 (new primary)
And you should be able see rbd image getting synced from cluster2 (new primary) to cluster1 (old primary)
Once the sync is complete cluster 2 can be demoted and cluster 1 can be promoted back to primary cluster
This completes the restore of failed/ corrupt Block Persistent VolumeNote: The size of the snapshot is expected to be size of the data written in the image.
For a 3 node Rook Ceph cluster, having 10 GB test data on the Block based PersistentVolume.
|BACKUP STRATEGY||TIME TAKEN TO EXPORT 10GB FILE|
|RBD MIRRORING||~31 sec|
Block Based Backup uses RBD snapshot diff natively for image mirroring, this will provide higher availability, as secondary cluster can be used when primary goes down, until restore is finished.
Although, this solution will certainly help with backing up Block Volumes in a secondary backup cluster in an efficient manner, there might be some Ceph RBD mirroring concepts you might want to learn with it.
Rook has good documentation around such scenarios as well as planned migration documented here.