Citrix XenDesktop 7 VDI Active-Active/Passive Multi-Site Disaster Recovery Part 2
Introduction:
Finally VDI “Virtual Desktop Infrastructure” has taken off as it should have years ago making 2017 undisputedly the year of VDI. The recent and rapid advancements in Technology (HCI, Cloud, Graphics, SDN, Endpoints, Automation, Security …), Pricing, and Workspace is now mature enough placing VDI in the frontier of business/IT demand.
VDI deployments are no walk in the park, so many components are involved in an VDI environment that everything has to be correctly in-place for a successful result yet post-deployment operations advantages do outweigh the hectic job required at pre-deployment so planning is of the essence.
That been said “With great power comes great responsibility” , giving your users the luxury of Anywhere, Anytime, and Any Device workspace then taking it away for any reason that being a catastrophe or a minor Datacenter glitch is not an option in the current world we live in and the workspace environment we have come accustomed to.
This is where Multi-Site Active-Active or Active-Passive Disaster Recovery VDI infrastructure design and deploy comes to play. Initial design planning for VDI availability between sites/regions should be well though off as Virtualization, Storage, Networking, and Security components tie together with a very specific configuration to provide an highly available environment with minimal downtime if none.
Scenario:
Multi-Site High availability for VDI comes in different forms and flavors as it depends on many variables especially networking. I have a specific scenario though I did work on different ones never the less it should be a good starting point for an availability strategy based on different environments.
First things First , the scenario I am going to touch base on and deploy is NOT supported by Microsoft in regards to profile/data folder replication handling, the reason being DFS-Replication is not capable of handling simultaneous writes to a replicated folder on both ends of the namespace thus causing sync/replication issues. Because I am looking to configure an true active-active scenario, the trick here is to make sure that users are not able to open/write data from 2 different sites at the same time and so we shall.
From an infrastructure perspective Both sites are part of the same domain name structure with each site having couple of GC domain controllers. An Microsoft DFS-R “ Distributed File System-Replication “ namespace is configured for both sites utilizing one file server cluster in Head Office Site with 2 file servers and one file server cluster in Disaster Recovery Site with 2 file servers. Each site has an DHCP server and an independent vCenter/vSphere ( Any Supported Hypervisor) deployment.
We have 2 independent sites connected with an IPSEC tunnel with 50MB bandwidth on the WAN link. Each site has an application delivery controller (NetScaler) acting as an load balancer and Global Server Load Balancer. 2 GSLB virtual servers exist one for external users and one for internal users both configured with site proximity. 2 Citrix Storefront servers are deployed at each site load balanced using (NetScaler) acting as ICA entry points to each VDI environment. Again all of these components are deployed and configured independently at each site. Internal GSLB is configured with SF virtual servers not Access Gateway so if HO fails , users connect directly to Load balanced SF in DR through IPSEC tunnel and are provided a virtual desktop from DR else if HO is completely down including networking then users connect to external GSLB. Note that Optimal Gateway routing is not configured as both these sites are independent and share no Storefront or XenDesktop configuration. Again there are many ways this can be deployed depending on scenario requirements.
Each site hosts 2 load balanced Storefront servers, 2 delivery controllers, and 2 load balanced Director/WEM Infrastructure servers “Workspace Environment Manager” . Each site has a pool of virtual desktops created from master images that were prepared in HO and replicated to DR to avoid preparing images twice, only base images are replicated never the less desktop pools are independently provisioned in each site to Different OUs (Organizational Units). Each OU has a group policy appointed to it that points VDI machines to the WEM server which is hosted in its local site independently.
UPM “User Profile Manager” Profiles and Data (Folder Redirection) folders in HO are being replicated through DFS-R to DR. This is the only shared configuration between both sites. In HO file cluster is named “SF1” and the share folder hosting UPM Profiles/Folder Redirection Data is called “FS1”. In DR file cluster is named “SF2” and the share folder that is getting replicated Profiles/Data from HO is called “FS2”. The DFS-R namespace could be anything because we are not going to use it to connect to our folders , we just needed it to configure replication. Now we have 2 shares that are replicating from //SF1/FS1 in HO to //SF2/FS2 in DR which is holding user Profiles/Data. Don’t forget Loopback processing … Both UPM and Folder Redirection policies are being applied using Citrix WEM not group policy.
WEM in HO contains an Microsoft USV Settings Policy “Folder Redirection” which contains (Desktop, Documents, Favorites, ..) and UPM profile configuration pointing to //SF1/FS1. This policy will apply on computer accounts ( VDI virtual desktops ) that are created inside the HO-VDI-OU organization unit since WEM GP is assigned/enforced on that OU only.
WEM in DR contains an Microsoft USV Settings Policy “Folder Redirection” which contains (Desktop, Documents, Favorites, ..) and UPM profile configuration pointing to //SF2/FS2. This policy will apply on computer accounts ( VDI virtual desktops ) that are created inside the DR-VDI-OU organization unit since WEM GP is assigned/enforced on that OU only.
Login sequence for external users would be as follows:
-
An external user logins to the external GSLB address of VDI environment.
-
Proximity Load Balancing points the user to the nearest geographical site.
-
The user launches a pooled or session based desktop.
-
While logging-in, WEM recognizes that the computer name is in the local site OU and applies the UPM Profile and Folder redirection policy pointing to the DFS-R share in the same site.
-
Any Profile/Data change by user is replicated to the second site file share using DFS-R.
-
In DFS-R last write always wins so even if the user logoff and is redirected to the second site, any change will also be replicated to the other site and so on ..
Remember that all we need for an active-active VDI multi site configuration is consistent user experience and data. The most important and honestly Only core requirement is for the user to see the same profile and data when logging to any site making it a seamless experience. All other components replicated or not, do not effect the user experience which is the ultimate goal of any VDI environment.
Considerations:
-
User must NOT NOT NOT have multiple sessions open from different sites at the same time since each site will point to the same replicated folder from its end thus causing corruption. We make sure of that in the above listed configuration never the less do note that DFS-R cannot handle multiple writes at the same time from two ends of the same replicated folder, that is why its not supported by Microsoft never the less 100% works when everything configured in-place makes sure user has open sessions pointing two one of the replicated folders.
-
Force the user to Logoff upon disconnection because if for some reason the user moves to another site locally , the user will be logged in to the nearest site which is different from where the session is currently open so what happens is he gets a new virtual desktop which loads the profile from the local share and so we have 2 open profiles on the same replicated folder from different sites which will corrupt the profile share DFS-R discussed earlier.
-
Depending on how users will connect and how often, it would be advisable that DFS-R replication is always-on and not scheduled ( This depends highly on environment variables and is not a core requirement ). Also make sure DFS-Replication is full mesh.
Citrix XenDesktop 7 VDI Active-Active/Passive Multi-Site Disaster Recovery Part 2
Salam .
This is only good for pooled random catalog I guess…what about dedicated vdi’s …can you pls suggest DR solution for it?
Ansar Thank you for your comment. Seamless active-active DR setup for dedicated virtual desktops is not possible , there has to be some manual intervention to fail-over to secondary site mainly because profile/data is local to every VM which means we cannot have a different pool or hostname or OU for the user VDI VM unless dedicated is being used with a centralized profile management solution. For an active-passive scenario , I would replicate VMs to DR (vSphere replication for VMware and Hyper-V replica for Hyper-V), create a new dedicated pool in DR XD and add the replicated VMs as existing VMs to the pool then assign the same users to the replicated VMs. There are lots of considerations here because data/profile is local to each VM , so reverse replication when HO is restored is something to think about. Bandwidth requirements when replicating hundreds of dedicated fully provisioned VMs is another consideration.
What are the advantages of using WEM folder redirection vs AD folder redirection?
Much much faster in applying the policy than GP especially login times , no relying on AD/GP team, avoid AD/GP conflict/replication issues, The actual settings are exactly the same in terms of folder redirection.
Do you have a version for xenapp/xendesktop?
This works irrelevant of the version being used, I was using XD 7.9 I beleive.
Thanks.
Hi,
Thank you for sharing this great post !
Could you confirm me that you use two WEM database ? Is there a way to synchronize this element in order to avoid double assignment ? (this point can not be scripted and this is a risk for user experience consistency when the PRA is engaged).
Thanks for your answer !
Hi, well I always like to completely separate components between sites in order to limit dependencies and achieve a true active-active experience but the fact is this will always incur additional administration overhead. Yes we use independent WEM server DBs in every site and whenever a change is made we export from site 1 and import to site 2 the assignments part. Now you can deploy SQL AlwaysOn between sites and point WEM servers in secondary sites to that if you choose to do so but I would not. Changes are not that frequent and having some administration work is not an issue with WEM since its export/import basically.