Finally VDI “Virtual Desktop Infrastructure” has taken off as it should have years ago making 2017 undisputedly the year of VDI. The recent and rapid advancements in Technology (HCI, Cloud, Graphics, SDN, Endpoints, Automation, Security …), Pricing, and Workspace is now mature enough placing VDI in the frontier of business/IT demand.
VDI deployments are no walk in the park, so many components are involved in an VDI environment that everything has to be correctly in-place for a successful result yet post-deployment operations advantages do outweigh the hectic job required at pre-deployment so planning is of the essence.
That been said “With great power comes great responsibility” , giving your users the luxury of Anywhere, Anytime, and Any Device workspace then taking it away for any reason that being a catastrophe or a minor Datacenter glitch is not an option in the current world we live in and the workspace environment we have come accustomed to.
This is where Multi-Site Active-Active or Active-Passive Disaster Recovery VDI infrastructure design and deploy comes to play. Initial design planning for VDI availability between sites/regions should be well though off as Virtualization, Storage, Networking, and Security components tie together with a very specific configuration to provide an highly available environment with minimal downtime if none.
Multi-Site High availability for VDI comes in different forms and flavors as it depends on many variables especially networking. I have a specific scenario though I did work on different ones never the less it should be a good starting point for an availability strategy based on different environments.
First things First , the scenario I am going to touch base on and deploy is NOT supported by Microsoft in regards to profile/data folder replication handling, the reason being DFS-Replication is not capable of handling simultaneous writes to a replicated folder on both ends of the namespace thus causing sync/replication issues. Because I am looking to configure an true active-active scenario, the trick here is to make sure that users are not able to open/write data from 2 different sites at the same time and so we shall.
From an infrastructure perspective Both sites are part of the same domain name structure with each site having couple of GC domain controllers. An Microsoft DFS-R “ Distributed File System-Replication “ namespace is configured for both sites utilizing one file server cluster in Head Office Site with 2 file servers and one file server cluster in Disaster Recovery Site with 2 file servers. Each site has an DHCP server and an independent vCenter/vSphere deployment.
We have 2 independent sites connected with an IPSEC tunnel with 50MB bandwidth on the WAN link. Each site has an application delivery controller (F5 or NetScaler) acting as an load balancer and Global Server Load Balancer. 2 GSLB virtual servers exist one for external users and one for internal users both configured with site proximity. 2 VMware Access Points are deployed at each site load balanced using (F5 or NetScaler) acting as access gateways to each VDI environment. Again all of these components are deployed and configured independently at each site.
Each site hosts 2 load balanced connection servers, 1 composer server, 2 load balanced App Volumes/UEM mgmt. console. Each site has a pool of virtual desktops created from master images that were prepared in HO and replicated to DR to avoid preparing images twice, only base images are replicated never the less desktop pools are independently provisioned in each site to Different OUs (Organizational Units). Each OU has a group policy appointed to it that points VDI machines to the UEM configuration file share which is hosted in its local site independently. The same UEM and App Volumes can be used between sites which would require some additional configuration and SQL dependency but I am trying make it a simpler configuration that can be built upon so configure them as totally independent.
UEM Profiles and Data (Folder Redirection) folders in HO are being replicated through DFS-R to DR. This is the only shared configuration between both sites. In HO file cluster is named “SF1” and the share folder hosting UEM Profiles/Data is called “FS1”. In DR file cluster is named “SF2” and the share folder that is getting replicated Profiles/Data from HO is called “FS2”. The DFS-R namespace could be anything because we are not going to use it to connect to our folders , we just needed it to configure replication. Now we have 2 shares that are replicating from //SF1/FS1 in HO to //SF2/FS2 in DR which is holding user Profiles/Data. UEM configuration file share is not replicated only profiles and folder redirection data. Don’t forget Loopback processing …
UEM in HO contains an User Environment Policy “Folder Redirection” which contains (Desktop, Documents, Favorites, ..) pointing to //SF1/FS1. This “Folder Redirection” Policy in HO has an condition configured which instructs UEM to apply this policy only when users are logging in to Computer accounts in the HO VDI OU.
UEM in DR contains an User Environment Policy “Folder Redirection” which contains (Desktop, Documents, Favorites, ..) pointing to //SF2/FS2. This “Folder Redirection” Policy in DR has an condition configured which instructs UEM to apply this policy only when users are logging in to Computer accounts in the DR VDI OU.
Horizon Pod Architecture is configured with Head Office Site and Disaster Recovery site added in the federation. Assign users to a global entitlement so that they can see only a single icon. Home sites can be assigned to users if they are fixed to one location. Now this drawing should make a bit more sense.
An external user logins to the external GSLB address of VDI environment.
Proximity Load Balancing points the user to the nearest geographical site.
The user launches a pooled or session based desktop.
While logging-in, UEM recognizes that the computer name is in the local site OU and applies the UEM Profile and Folder redirection policy pointing to the DFS-R share in the same site.
Any Profile/Data change by user is replicated to the second site file share using DFS-R.
In DFS-R last write always wins so even if the user logoff and is redirected to the second site, any change will also be replicated to the other site and so on ..
Remember that all we need for an active-active VDI multi site configuration is consistent user experience and data. The most important and honestly Only core requirement is for the user to see the same profile and data when logging to any site making it a seamless experience. All other components replicated or not, do not effect the user experience which is the ultimate goal of any VDI environment.
User must NOT NOT NOT have multiple sessions open from different sites at the same time since each site will point to the same replicated folder from its end thus causing corruption. We make sure of that in the above listed configuration never the less do note that DFS-R cannot handle multiple writes at the same time from two ends of the same replicated folder, that is why its not supported by Microsoft never the less 100% works when everything configured in-place makes sure user has open sessions pointing to one of the replicated folders.
Force the user to Logoff upon disconnection because if for some reason the user moves to another site locally , the user will be logged in to the nearest site which is different from where the session is currently open so what happens is he gets a new virtual desktop which loads the profile from the local share and so we have 2 open profiles on the same replicated folder from different sites which will corrupt the profile share DFS-R discussed earlier.
Depending on how users will connect and how often, it would be advisable that DFS-R replication is always-on and not scheduled ( This depends highly on environment variables and is not a core requirement ). Also make sure DFS-Replication is full mesh.