What is HCX Disaster Recovery ?
The official stuff :”VMware HCX Disaster Recovery is a service intended to protect virtual workloads managed by VMware vSphere that are either deployed in a private or a public cloud. It is simple to set up, manage, and costs less than the traditional disaster recovery solutions. VMware HCX Disaster Recovery can accommodate the most demanding business critical applications and allows you to scale your protection capacity to meet variable demands.”
DR in HCX is not as future rich as SRM – but it does provide a quick-n-dirty way of protecting your workloads. Which is great – and to be honest, especially in a lab scenario.
So what’s required to enable DR in HCX ? Simple – just enable DR in your Compute Profiles and Service Mesh.
Once enabled you can access it via the Hybridity UI.
But this isn’t so much of a post in how to deploy / configure it, but what do the settings mean.
I have setup DR in my lab and couldn’t quite figure out how each of those settings affect the outcome. HCX DR is based on vSphere Replication, but still, I couldn’t figure it out.
I reached out to my colleagues and eventually got a good explanation from the engineering team I’d like to share.
Let’s use an example – Parameters
- RPO: 6hrs
- This is the value which determines max data loss that you can tolerate
- Snapshot Interval: 3hrs
- Replication instances are saved in slots logically. Snapshot interval defines the size of that slot in terms of hours. Single slots can have multiple instances based on RPO. For example: If the snapshot interval is 3 hours, each slot will be defined at specific 3 hour intervals; 00:00,03:00,06:00,09:00 etc.
- Max Snapshots: 2
- Number of instances to save based on available slot intervals. For example: If there are 5 instances in a particular slot and max snapshots configured is 2, HCX keeps only 2 snapshots and rest of them will be deleted. This is done per each slots as well as across slots. Snapshots corresponding to older slots will be deleted over time too so that available snapshots are always latest based on RPO configured.
So how does it affect the number of restore points ?
HCX DR will remove any instance of the snapshot every X-Hours that are defined by the Snapshot Interval – in my example every 3hrs – and keeps only the number of instances defined by the Max Number Snapshots – here 2 – that were created every X hours defined by the RPO – here 6hrs.
So looking at the details here (6hrs/3hrs/2 snapshots), once the initial sync is completed, the first instance is saved. Let’s say 4:30am
Since the Snapshot Interval is 3hrs and look at slots, we have essentially
- Slot 1 : Snapshot at 4:30am
- Slot 2 : Snapshot at 10:30am
- Slot 3 : No Snapshot as RPO is 6hrs
- Slot 4 : Snapshot at 4:30pm
When Snapshot in slot 4 is created, HCX will check the number of max snapshots (here 2) so it goes back through the slots to ensure that each slot has the latest instance – in this case only 1.
If the number of slots are higher than the configured Max Snapshots (2 here), the oldest snapshot will be deleted (4:30am).
The cycle will continue as soon as instances are created based on RPO.
Hope this is clear enough. As mentioned previously, HCX DR is based on vSphere Replication so you could always check the documentation of that for a more depth explanation.