Introduction:
In part 1 of this series we discussed a seamless active-active VMware Horizon View deployment in which HO & DR sites where connected over a WAN link and users are able to connect to either sites internally/externally using the same profile and data.
VMware Horizon 7 VDI Active-Active/Passive Multi-Site Disaster Recovery Part 1
VMware Horizon 7 Multi-Site with Microsoft Azure GSLB Traffic Manager
The reason I say WAN link is that a stretched site or sites connected using dark fiber are not considered as a multi site deployment so whatever has been discussed does not apply. Well it applies to a certain extent but configuration would surely differ, one that comes to mind is that in a stretched site deployment we will have to use the DFS namespace in UEM or GP rather than the independent replicated share cluster name path in each site.
Requirement:
In this post we are going to take a different scenario and approach. We have a Primary Site ( Head Office ) with users local to that site and a Secondary Site (Disaster Recovery) with users local to that site. Users in HO should be able to failover to DR in case of disaster and vice-versa users in DR should be able to failover to HO incase of disaster. Both sets of users should have the same Profile/Data needless to say.
Honestly speaking putting this setup into words is no easy task although when understood the logic behind it is fairly simple so I urge you if anything is not clear ( my bad ) just try to read it more than once while contemplating the diagram below until everything makes sense or drop me a comment/email and we can take it from there.
I understand there are many different requirements in terms of DR and solutions to meet them, I am not trying to cover all scenarios but rather establish a baseline for us to build upon. Every environment is different especially with VDI because of the tight integration of all IT components ( servers, storage, networking, security, virtualization, … ) required to establish a working VDI environment let alone an active or passive DR.
Scenario:
We have two active sites one called HO acting as primary site and one called DR acting as Secondary Site. Each site has local or remote users that always connect to it either internally or externally. Both sites are connected through a WAN link ( IPSEC or MPLS or … ) with each site having its own infrastructure.
From an infrastructure perspective Both sites are part of the same domain name structure with each site having couple of GC domain controllers. An Microsoft DFS-R “ Distributed File System-Replication “ Full Mesh namespace ( \\domain\DFS ) is configured between both sites using a file server Failover cluster in Head Office (Primary Site) utilizing two file servers [Anti-Affinity] \\FS-HO and a file server Failover cluster in Disaster Recovery (Secondary Site) utilizing two file servers [Anti-Affinity] \\FS-DR . Each site has a DHCP server and an independent vCenter/vSphere deployment.
We have two independent sites connected with an IPSEC tunnel with 50MB bandwidth on the WAN link. Each site has an application delivery controller (F5) acting as a load balancer and Global Server Load Balancer. Two GSLB virtual servers exist one for external users and one for internal users both configured with site proximity. Two VMware Access Points are deployed at each site load balanced using (F5) acting as access gateways to each VDI environment. Again all of these components are deployed and configured independently at each site.
Each site hosts two load balanced connection servers, one composer server, two load balanced App Volumes/UEM mgmt. console. Each site has a pool of virtual desktops created from master images that were prepared in HO and replicated to DR to avoid preparing images twice, only base images are replicated never the less desktop pools are independently provisioned in each site to Different OUs (Organizational Units), HO Primary Site OU that hosts HO virtual desktops is called HO-VDI-OU and DR Secondary Site OU that hosts DR virtual desktops is called DR-VDI-OU .
Users local to HO Primary Site are grouped in a security group called HO-Users and Users local to DR Secondary Site are grouped in a security group called DR-Users. The grouping of site users in specific groups and placement of VDI virtual desktop computer accounts for each site in a different OUs are essential for this setup and is a core requirement for this specific scenario.
UEM Profiles and Data (Folder Redirection) folders in HO Primary Site hosted on //FS-HO/HO-Profiles are being replicated through DFS-R to DR Secondary Site hosted on //FS-DR/HO-Profiles .
UEM Profiles and Data (Folder Redirection) folders in DR Secondary Site hosted on //FS-DR/DR-Profiles are being replicated through DFS-R to HO Primary Site hosted on //FS-HO/DR-Profiles .
HO-VDI-OU hosting VDI machines in HO Primary Site has two group policies (Loopback Processing) linked to it:
-
HO-PS-VDI-GP group policy has UEM/Profile Redirection policies configured pointing to cluster share //FS-HO/HO-Profiles which is locally hosted in HO. HO-PS-VDI-GP group policy also has security filtering configured in delegation tab to apply this group policy ONLY when users part of HO-Users group login.
-
DR-PS-VDI-GP group policy has UEM/Profile Redirection policies configured pointing to cluster share //FS-HO/DR-Profiles which is locally hosted in HO (replicated from DR). DR-PS-VDI-GP group policy also has security filtering configured in delegation tab to apply this group policy ONLY when users part of DR-Users group login.
DR-VDI-OU hosting VDI machines in DR Secondary Site has two group policies (Loopback Processing) linked to it:
-
DR-SS-VDI-GP group policy has UEM/Profile Redirection policies configured pointing to cluster share //FS-DR/DR-Profiles which is locally hosted in DR. DR-SS-VDI-GP group policy also has security filtering configured in delegation tab to apply this group policy ONLY when users part of DR-Users group login.
-
HO-SS-VDI-GP group policy has UEM/Profile Redirection policies configured pointing to cluster share //FS-DR/HO-Profiles which is locally hosted in DR. HO-SS-VDI-GP group policy also has security filtering configured in delegation tab to apply this group policy ONLY when users part of HO-Users group login.
Horizon Pod Architecture is configured with Head Office Site and Disaster Recovery site added in the federation. Assign users to a global entitlement so that they can see only a single icon and have access to pools in both sites. Home sites are assigned to users in HO and DR. Even without Pod architecture just make sure users are assigned access on both sites so HO-Users and DR-Users are entitled to virtual desktop pools in each site independently.
Challenge:
So why do we have different user groups, different OUs, different group policies, and pointing to different files shares !? Why is the cluster file share path used and not the DFS namespace name path being used !? Why do we have group policy security filtering configured !? Why aren’t we just using UEM conditions !?
Our aim is to have no single point of failure for VDI users in either sites seamlessly without downtime and/or manual intervention while maintaining optimal experience. To guarantee this we need to discuss what makes up a VDI session and move on from there. From a user perspective as long as the OS is delivered with the same profile ( customization settings ), data ( Desktop, Documents, Home Folder ), and backend applications, nothing can be noticed or complained about thus the VDI environment is considered functional.
In order to provide this optimal seamless experience we must make sure that the user profile/data is being accessed locally as long as the site is up and not go through WAN since the WAN link bandwidth will only hold so much until it eventually breaks or if not would provide a very bad user experience one of which the user will not be able to work with.
To guarantee that the user will always be assigned a virtual desktop, profile, and data from the same site to ensure a usable experience we need to first identify each site user group for entitlements, identify VDI computer account machines in each site, identify clustered file share holding the user profiles/data in each site, and logically group all of these components when a VDI machine is being delivered to an end user.
The aim is for group policy to point the user to the profile/data local to the site they are logging in to thus the usage of filtering on different group policies pointing to different file shares not the unified DFS namespace.
Logic:
HO site has a dedicated OU that hold all VDI machines local to HO site infrastructure. HO site also has a user group that includes all users local to HO site. The VDI OU in HO has two group policies that apply when a user log-in and each of these group policies points to a different file share for profiles/data depending on the AD security group membership of the user logging in.
HO User logging-in to HO Site (Local Site) :
-
“Mo” User member of HO-Users security group logs-in to a VDI machine in HO.
-
VDI machines in HO are part of HO-VDI-OU organizational unit.
-
HO-VDI-OU has two group policies linked to it (HO-PS-VDI-GP & DR-PS-VDI-GP).
-
HO-PS-VDI-GP has a security filtering to apply only on HO-Users group and DR-PS-VDI-GP has a security filtering to apply only on DR-Users.
-
Since “Mo” is a member of HO-Users , the group policy named HO-PS-VDI-GP will apply.
-
HO-PS-VDI-GP group policy points UEM/Data to clustered share //FS-HO/HO-Profiles , this file share cluster is local to HO site so experience is optimal for user.
-
Any change done by “Mo” is replicated to DR folder //FS-DR/HO-Profiles .
DR User logging-in to HO Site (Failover) :
-
“Omar” User member of DR-Users security
group logs-in to a VDI machine in HO. -
VDI machines in HO are part of HO-VDI-OU
organizational unit. -
HO-VDI-OU has two group policies linked to it (HO-PS-VDI-GP
& DR-PS-VDI-GP). -
HO-PS-VDI-GP has a security filtering to apply only on
HO-Users group and DR-PS-VDI-GP has a security filtering to apply only on
DR-Users. -
Since “Omar” is a member of DR-Users , the group policy named DR
-PS-VDI-GP will apply. -
DR-PS-VDI-GP group policy points UEM/Data to clustered
share //FS-HO/DR-Profiles which is being replicated from DR site ( the actual main site of DR user Omar ), this file share cluster is local to HO site so
experience is optimal for user. -
Any change done by “Omar” is replicated back to main DR folder
//FS-DR/DR-Profiles .
DR site has a dedicated OU that hold all VDI machines local to DR site infrastructure. DR site also has a user group that includes all users local to DR site. The VDI OU in DR has two group policies that apply when a user log-in and each of these group policies points to a different file share for profiles/data depending on the AD security group membership of the user logging in.
DR User logging-in to DR Site (Local Site):
-
“Omar” User member of DR-Users security group logs-in to a VDI machine in DR.
-
VDI machines in DR are part of DR-VDI-OU
organizational unit. -
DR-VDI-OU has two group policies linked to
it (DR-SS-VDI-GP & HO-SS-VDI-GP). -
DR-SS-VDI-GP has a security filtering to
apply only on DR-Users group and HO-SS-VDI-GP
has a security filtering to apply only on HO-Users. -
Since “Omar” is a member of DR-Users
, the group policy named DR-SS-VDI-GP will apply. -
DR-SS-VDI-GP group policy points UEM/Data
to clustered share //FS-DR/DR-Profiles , this file share
cluster is local to DR site so experience is optimal for user. -
Any change done by “Omar” is replicated to HO folder
//FS-HO/DR-Profiles .
HO User logging-in to DR Site (Failover):
-
“Mo” User member of HO-Users security group logs-in to a VDI machine in DR.
-
VDI machines in DR are part of DR-VDI-OU organizational unit.
-
DR-VDI-OU has two group policies linked to it (DR-SS-VDI-GP & HO-SS-VDI-GP).
-
DR-SS-VDI-GP has a security filtering to apply only on DR-Users group and HO-SS-VDI-GP has a security filtering to apply only on HO-Users.
-
Since “Mo” is a member of HO-Users , the group policy named HO-SS-VDI-GP will apply.
-
HO-SS-VDI-GP group policy points UEM/Data to clustered share //FS-DR/HO-Profiles which is being replicated from HO site (the actual main site of HO user Mo), this file share cluster is local to DR site so experience is optimal for user.
-
Any change done by “Mo” is replicated back to main HO folder //FS-HO/HO-Profiles .
Considerations:
-
User must NOT NOT NOT have multiple sessions open from different sites at the same time since each site will point to the same replicated folder from its end thus causing corruption. We make sure of that in the above listed configuration never the less do note that DFS-R cannot handle multiple writes at the same time from two ends of the same replicated folder.
-
Force the user to Logoff upon disconnection because if for some reason the user moves to another site locally , the user will be logged in to the nearest site which is different from where the session is currently open so what happens is he gets a new virtual desktop which loads the profile from the local share and so we have 2 open profiles on the same replicated folder from different sites which will corrupt the profile share DFS-R discussed earlier.
-
Depending on how users will connect and how often, it would be advisable that DFS-R replication is always-on and not scheduled ( This depends highly on environment variables and is not a core requirement ). Also make sure DFS-Replication is full mesh.
-
Though UEM conditions can be used instead of multiple group policies and security filtering, remember the conditions only apply on folder redirection (data) and not UEM profile (settings/customization) which still needs Group Policy to pull settings.
-
When using the cluster share path not DFS namespace we are considering that the whole VDI site is down if the file server cluster is down which mostly should be the case but is not so that is a consideration because if the cluster fails and VDI environment is still up , UEM/Profile group policy is not pointing to DFS name space so it will not forward you to the DFS folder in DR. To me this is a requirement as having thousands of users pull data on a limited shared WAN connection is not something production sound.
Conclusion:
I could have gone by just specifying the configuration and be on my way leaving you to figure it out but I really tried my best to explain the logic behind this so I hope Albert Einstein quote doesn’t hit me in the face “If you can’t explain it simply, you don’t understand it well enough.”
Salam .
Top post! User profile data is indeed the biggest challenge in an active/active scenario.
Kudos, awesome post! Thanks for all the detailing.
Thank you .