Introduction:
In part 1 of this series we discussed a seamless active-active Citrix XenDesktopp deployment in which HO & DR sites where connected over a WAN link and users are able to connect to either sites internally/externally using the same profile and data.
Citrix XenDesktop 7 VDI Active-Active/Passive Multi-Site Disaster Recovery Part 1
The reason I say WAN link is that a stretched site or sites connected using dark fiber are not considered as a multi site deployment so whatever has been discussed does not apply. Well it applies to a certain extent but configuration would surely differ, one that comes to mind is that in a stretched site deployment we will have to use the DFS namespace in WEM or GP rather than the independent replicated share cluster name path in each site.
Requirement:
In this post we are going to take a different scenario and approach. We have a Primary Site ( Head Office ) with users local to that site and a Secondary Site (Disaster Recovery) with users local to that site. Users in HO should be able to failover to DR in case of disaster and vice-versa users in DR should be able to failover to HO incase of disaster. Both sets of users should have the same Profile/Data needless to say.
Honestly speaking putting this setup into words is no easy task although when understood the logic behind it is fairly simple so I urge you if anything is not clear ( my bad ) just try to read it more than once while contemplating the diagram below until everything makes sense or drop me a comment/email and we can take it from there.
I understand there are many different requirements in terms of DR and solutions to meet them, I am not trying to cover all scenarios but rather establish a baseline for us to build upon. Every environment is different especially with VDI because of the tight integration of all IT components ( servers, storage, networking, security, virtualization, … ) required to establish a working VDI environment let alone an active or passive DR.
Scenario:
We have two active sites one called HO acting as primary site and one called DR acting as Secondary Site. Each site has local or remote users that always connect to it either internally or externally. Both sites are connected through a WAN link ( IPSEC or MPLS or … ) with each site having its own infrastructure.
From an infrastructure perspective Both sites are part of the same domain name structure with each site having couple of GC domain controllers. An Microsoft DFS-R “ Distributed File System-Replication “ Full Mesh namespace ( \\domain\DFS ) is configured between both sites using a file server Failover cluster in Head Office (Primary Site) utilizing two file servers [Anti-Affinity] \\FS-HO and a file server Failover cluster in Disaster Recovery (Secondary Site) utilizing two file servers [Anti-Affinity] \\FS-DR . Each site has a DHCP server and an independent vCenter/vSphere deployment.
We have two independent sites connected with an IPSEC tunnel with 50MB bandwidth on the WAN link. Each site has an application delivery controller (NetScaler) acting as a load balancer, Global Server Load Balancer, and Access Gateway. Two GSLB virtual servers exist one for external users and one for internal users both configured with site proximity. All of these components are deployed and configured independently at each site. Internal GSLB is configured with SF virtual servers not Access Gateway so if HO fails , users connect directly to Load balanced SF in DR through IPSEC tunnel and are provided a virtual desktop from DR else if HO is completely down including networking then users connect to external GSLB. Note that Optimal Gateway routing is not configured as both these sites are independent and share no Storefront or XenDesktop configuration. Again there are many ways this can be deployed depending on scenario requirements.
Each site hosts two load balanced Storefront servers, two XD delivery controllers, and two load balanced Director/WEM Infrastructure servers “Workspace Environment Manager”. Each site has a pool of virtual desktops created from master images that were prepared in HO and replicated to DR to avoid preparing images twice, only base images are replicated never the less desktop pools are independently provisioned in each site to Different OUs (Organizational Units), HO Primary Site OU that hosts HO virtual desktops is called HO-VDI-OU and DR Secondary Site OU that hosts DR virtual desktops is called DR-VDI-OU .
Users local to HO Primary Site are grouped in a security group called HO-Users and Users local to DR Secondary Site are grouped in a security group called DR-Users. The grouping of site users in specific groups and placement of VDI virtual desktop computer accounts for each site in a different OUs are essential for this setup and is a core requirement for this specific scenario.
UPM Profiles and Data (Folder Redirection) folders in HO Primary Site hosted on //FS-HO/HO-Profiles are being replicated through DFS-R to DR Secondary Site hosted on //FS-DR/HO-Profiles .
UPM Profiles and Data (Folder Redirection) folders in DR Secondary Site hosted on //FS-DR/DR-Profiles are being replicated through DFS-R to HO Primary Site hosted on //FS-HO/DR-Profiles .
HO-VDI-OU hosting VDI machines in HO Primary Site has two group policies (Loopback Processing) linked to it:
-
HO-PS-VDI-GP group policy has UPM/Profile Redirection policies configured pointing to cluster share //FS-HO/HO-Profiles which is locally hosted in HO. HO-PS-VDI-GP group policy also has security filtering configured in delegation tab to apply this group policy ONLY when users part of HO-Users group login.
-
DR-PS-VDI-GP group policy has UPM/Profile Redirection policies configured pointing to cluster share //FS-HO/DR-Profiles which is locally hosted in HO (replicated from DR). DR-PS-VDI-GP group policy also has security filtering configured in delegation tab to apply this group policy ONLY when users part of DR-Users group login.
DR-VDI-OU hosting VDI machines in DR Secondary Site has two group policies (Loopback Processing) linked to it:
-
DR-SS-VDI-GP group policy has UPM/Profile Redirection policies configured pointing to cluster share //FS-DR/DR-Profiles which is locally hosted in DR. DR-SS-VDI-GP group policy also has security filtering configured in delegation tab to apply this group policy ONLY when users part of DR-Users group login.
-
HO-SS-VDI-GP group policy has UPM/Profile Redirection policies configured pointing to cluster share //FS-DR/HO-Profiles which is locally hosted in DR. HO-SS-VDI-GP group policy also has security filtering configured in delegation tab to apply this group policy ONLY when users part of HO-Users group login.
Challenge:
So why do we have different user groups, different OUs, different group policies, and pointing to different files shares !? Why is the cluster file share path used and not the DFS namespace name path being used !? Why do we have group policy security filtering configured !?
Our aim is to have no single point of failure for VDI users in either sites seamlessly without downtime and/or manual intervention while maintaining optimal experience. To guarantee this we need to discuss what makes up a VDI session and move on from there. From a user perspective as long as the OS is delivered with the same profile ( customization settings ), data ( Desktop, Documents, Home Folder ), and backend applications, nothing can be noticed or complained about thus the VDI environment is considered functional.
In order to provide this optimal seamless experience we must make sure that the user profile/data is being accessed locally as long as the site is up and not go through WAN since the WAN link bandwidth will only hold so much until it eventually breaks or if not would provide a very bad user experience one of which the user will not be able to work with.
To guarantee that the user will always be assigned a virtual desktop, profile, and data from the same site to ensure a usable experience we need to first identify each site user group for entitlements, identify VDI computer account machines in each site, identify clustered file share holding the user profiles/data in each site, and logically group all of these components when a VDI machine is being delivered to an end user.
The aim is for group policy to point the user to the profile/data local to the site they are logging in to thus the usage of filtering on different group policies pointing to different file shares not the unified DFS namespace.
Logic:
HO site has a dedicated OU that hold all VDI machines local to HO site infrastructure. HO site also has a user group that includes all users local to HO site. The VDI OU in HO has two group policies that apply when a user log-in and each of these group policies points to a different file share for profiles/data depending on the AD security group membership of the user logging in.
HO User logging-in to HO Site (Local Site) :
-
“Mo” User member of HO-Users security group logs-in to a VDI machine in HO.
-
VDI machines in HO are part of HO-VDI-OU organizational unit.
-
HO-VDI-OU has two group policies linked to it (HO-PS-VDI-GP & DR-PS-VDI-GP).
-
HO-PS-VDI-GP has a security filtering to apply only on HO-Users group and DR-PS-VDI-GP has a security filtering to apply only on DR-Users.
-
Since “Mo” is a member of HO-Users , the group policy named HO-PS-VDI-GP will apply.
-
HO-PS-VDI-GP group policy points UPM/Data to clustered share //FS-HO/HO-Profiles , this file share cluster is local to HO site so experience is optimal for user.
-
Any change done by “Mo” is replicated to DR folder //FS-DR/HO-Profiles .
DR User logging-in to HO Site (Failover) :
-
“Omar” User member of DR-Users security group logs-in to a VDI machine in HO.
-
VDI machines in HO are part of HO-VDI-OU organizational unit.
-
HO-VDI-OU has two group policies linked to it (HO-PS-VDI-GP & DR-PS-VDI-GP).
-
HO-PS-VDI-GP has a security filtering to apply only on HO-Users group and DR-PS-VDI-GP has a security filtering to apply only on DR-Users.
-
Since “Omar” is a member of DR-Users , the group policy named DR -PS-VDI-GP will apply.
-
DR-PS-VDI-GP group policy points UPM/Data to clustered share //FS-HO/DR-Profiles which is being replicated from DR site ( the actual main site of DR user Omar ), this file share cluster is local to HO site so experience is optimal for user.
-
Any change done by “Omar” is replicated back to main DR folder //FS-DR/DR-Profiles .
DR site has a dedicated OU that hold all VDI machines local to DR site infrastructure. DR site also has a user group that includes all users local to DR site. The VDI OU in DR has two group policies that apply when a user log-in and each of these group policies points to a different file share for profiles/data depending on the AD security group membership of the user logging in.
DR User logging-in to DR Site (Local Site):
-
“Omar” User member of DR-Users security group logs-in to a VDI machine in DR.
-
VDI machines in DR are part of DR-VDI-OU organizational unit.
-
DR-VDI-OU has two group policies linked to it (DR-SS-VDI-GP & HO-SS-VDI-GP).
-
DR-SS-VDI-GP has a security filtering to apply only on DR-Users group and HO-SS-VDI-GP has a security filtering to apply only on HO-Users.
-
Since “Omar” is a member of DR-Users , the group policy named DR-SS-VDI-GP will apply.
-
DR-SS-VDI-GP group policy points UPM/Data to clustered share //FS-DR/DR-Profiles , this file share cluster is local to DR site so experience is optimal for user.
-
Any change done by “Omar” is replicated to HO folder //FS-HO/DR-Profiles .
HO User logging-in to DR Site (Failover):
-
“Mo” User member of HO-Users security group logs-in to a VDI machine in DR.
-
VDI machines in DR are part of DR-VDI-OU organizational unit.
-
DR-VDI-OU has two group policies linked to it (DR-SS-VDI-GP & HO-SS-VDI-GP).
-
DR-SS-VDI-GP has a security filtering to apply only on DR-Users group and HO-SS-VDI-GP has a security filtering to apply only on HO-Users.
-
Since “Mo” is a member of HO-Users , the group policy named HO-SS-VDI-GP will apply.
-
HO-SS-VDI-GP group policy points UPM/Data to clustered share //FS-DR/HO-Profiles which is being replicated from HO site (the actual main site of HO user Mo), this file share cluster is local to DR site so experience is optimal for user.
-
Any change done by “Mo” is replicated back to main HO folder //FS-HO/HO-Profiles .
Considerations:
-
User must NOT NOT NOT have multiple sessions open from different sites at the same time since each site will point to the same replicated folder from its end thus causing corruption. We make sure of that in the above listed configuration never the less do note that DFS-R cannot handle multiple writes at the same time from two ends of the same replicated folder.
-
Force the user to Logoff upon disconnection because if for some reason the user moves to another site locally , the user will be logged in to the nearest site which is different from where the session is currently open so what happens is he gets a new virtual desktop which loads the profile from the local share and so we have 2 open profiles on the same replicated folder from different sites which will corrupt the profile share DFS-R discussed earlier.
-
Depending on how users will connect and how often, it would be advisable that DFS-R replication is always-on and not scheduled ( This depends highly on environment variables and is not a core requirement ). Also make sure DFS-Replication is full mesh.
-
When using the cluster share path not DFS namespace we are considering that the whole VDI site is down if the file server cluster is down which mostly should be the case but is not so that is a consideration because if the cluster fails and VDI environment is still up , UPM/Profile group policy is not pointing to DFS name space so it will not forward you to the DFS folder in DR. To me this is a requirement as having thousands of users pull data on a limited shared WAN connection is not something production sound.
Note to Citrix:
I would like the WEM “Workspace Environment Manager” product team at Citrix to add the capability of assigning different UPM and Folder Redirection configuration policies to different user groups based on conditions/filters. As of now UPM/FR are configured per WEM site .
VMware UEM can apply conditions to Folder Redirection Policies so that is an advantage but because Citrix WEM is not a profile management solution by itself, providing the conditions/assignment/filter configuration to UPM and FR makes is a much better tool for our scenario since we can rely on it completely rather than four group policies with security filtering while with VMware UEM that is not possible since UEM config is pulled from GP directly.
Conclusion:
I could have gone by just specifying the configuration and be on my way leaving you to figure it out but I really tried my best to explain the logic behind this so I hope Albert Einstein quote doesn’t hit me in the face “If you can’t explain it simply, you don’t understand it well enough.”
Salam .
Great write up. We are investigating putting our DR in the cloud. Your drawing shows some of the things I was not thinking about when it comes to sending data to both sites. DR users might have been hosed. I also was presented with the requirement of have a group of users always at the DR site. We are looking at Amazon, Azure, and Century Link. Would love to pick your brain on what you think would be the best solution. We are running Citrix running on top VMware. 2500 users.
Thanks
Bill
Solutions Architect
Hi Bill,
Having a group always at the DR site to me is always an easier approach for Active-Active scenarios rather than having users connect all over the place every time they try to login. I am an Azure fan but technically speaking no cloud component would interfere with your DR setup since all components are cloud irrelevant except your WAN link. Aside from Base Images being replicated for the sake of not having to apply changes multiple times to the DR site, a complete environment would need to be built on the cloud. I prefer Azure Traffic Manager over NetScaler GSLB when NS for DR is deployed on Azure because of its simplicity and NetScaler considerations on Azure. The challenge here is having an adequate link that can replicate your profiles and data to the cloud and back, this is where Azure Express Route is a very good idea.
Hello Bill,
Thanks for the very nice solution. I have the following scenario.
Active/Active deployment:-
On-prem Production is running ( (XenApp 7.15 cu2, no XenDesktop).
Now DR site to be set up (XenApp 7.15 cu2, no XenDesktop).
DR site is to sync with – on prem Site (1 Citrix Database with 2 sites (mirror or always on)
Also, there is a requirement that needs to extend application on Cloud (Azure and Oracle Cloud). (Azure express route will be configured)
Client proposal is to extend on Prem site with Zones. On prem would be primary zone and Azure and Oracle cloud will be satellite zones.
Can we achieve this with the solution which you have provided? What are things to be considered?
Salam
I would like to speak to you about Citrix deployment between 3 different datacenters in Active-Active mode.
let me know if you are interested in discussing this design
Salam Haytham, sure , send me whats on your mind on the contact us page and I will email you back from my personal email and maybe we can have a call to discuss further. Thanks.