VMware SRM is it the right choice?

Caution: Articles written for technical not grammatical accuracy, If poor grammar offends you proceed with caution ;-)

VMware SRM is gaining a lot of traction and many companies are quickly making it the defacto choice for DR in their environments, but is SRM the right choice?  For those of you who haven’t had the opportunity to get familiar with SRM (Site Recovery manager), it is a Disaster Recovery automation product from VMware that integrates into vCenter.  Through the use of SRA’s (Storage Replication Adapters) SRM is able to integrate with many storage arrays making it aware of Datastores that are replicated.  Some of it’s most popular fetures include the ability to group servers in to recovery groups giving you the ability to fail groups of servers or a whole datacenter.  It also allows you to perform live failover tests on the  the same groups of servers or an entire site.  These are some of the most popular reasons companies are implementing SRM.  The ability to easily run DR tests without impacting live running systems has made it a huge success.  SRM also allows you to create DR run book automation through the use of linear workflows that you create to perform different steps and tasks involved with failing over from the primary site to a secondary.

All of this is great stuff right?  What could possibly be better that this?   What can’t SRM do?

With all of it’s great features SRM is not without it’s limitations.  SRMs biggest limitation is the fact that it is limited to the automation and fail over of only VMware virtual machines.  So no utilizing SRM to fail over Physical servers or VM’s on other hypervisors.  This is a huge limitation.  Almost no organizations have a 100% VMware virtualized infrastructure.  The configuration of SRM can also become a daunting  task especially if you have a pre-existing environment in which you wan to implement SRM.  When creating recovery groups it’s important to contain all the VM’s in a group to the same datastore or datastores so if your VMs that will be in the recovery group are not currently on the same datastores then you will have your work cut out for you determining where VM’s should be located and which VM’s they should be located with.  To better allocate bandwidth it becomes important to split vm drives as well having datastores for OS drives, data drives, log drives, page files, vswap files, so forth and so on.  Now the configuration becomes increasingly more complex.  Splitting up these drives and having to ensure that all VM’s in a recovery group are located on the same drives can be a huge task.  Don’t want to replicate pagefile data to conserve bandwidth no problem but be prepared for even more configuration at the secondary site to pre-seed your vm’s with pagefiles, this can become a fairly large task especially in large environments.

What if you want to fail over a single server to the secondary site?  With SRM it is possible to fail over a single server but not realistic.  You would need a dedicated datastore for each VM and each vm will need to be in its own recovery group.  You would then also need to create a recovery group for each and every VM.  SRM wasn’t designed to be utilized for single VM failures making it not a realistic option.  Have multiple datacenters?  Want the ability to fail to more that one datacenter or split your servers across different datacenters?  This is not currently supported with SRM.  SRM is design to have a primary and a secondary datacenter with one direction DR failover.  If you want to fail back to your primary site be prepared for a few sleepless nights because once you failover your secondary datacenter becomes your primary datacenter.  This means you now need new failover groups and a new run book to perform a DR failover back to your original primary site.  The task of failing back and then re-enabling SRM to protect your site would be enough for me to make a career change to avoid all the hassles associated with it.

What other options are available?

While there are many options available today for deploying DR in your environment it’s important to look at what is going to be the most flexible while at the same time trying to keep the solution simple enough that it doesn’t become a management nightmare.  The system would need to support replicating your servers to a remote data center or many remote data centers and allow for an automated runbook to handle the fail over.  With that said I’m going to focus on one solution that I really like that allows for greater flexibility that SRM with all the benefits and more.

Syncsort backups express is a great backup solution that could help create a highly scalable, flexible, and robust DR solution. Combining this product with a workflow automation product such as DynamicOps VRM would enable you to create one of the most flexible and manageable solutions possible.  The difficulty with a solution like this is the two products as of today don’t have out of the box integration so there would need to be some up front customization.  However because of VRM’s robust ability to be customized and integrated with just about any product it is not an impossible task.  Why these two products, what can they do for me that others can’t, what are the benefits?

Let me paint a picture

DynamicOps VRM is a highly scalable and flexible workflow automation suite that enables you to automate just about anything you want in your virtual and soon physical environments.  It offers multi-vendor support and is currently compatible with VMware, Hyper-V, and Xen hypervisors.  It allows you to create workflows from the most basic to the most complex to automate tasks in your environment.  This engine combined with just about any product can help you create very elegant solutions.  VRM also allows you to include multi-step approvals into your workflows giving you governance over what takes place in your environment, combine this with the many other features such as server/desktop provisioning, and reclamation for a seriously powerful platform.

Now let’s look at some of the Syncsort backup Express features.  This product is primarily a business continuity and disaster recovery product with some very unique features.  Backup Express offers traditional backup and recovery with block level incremental backups and their own their own blend of features that are instrumental in creating a highly scalable, manageable, and robust Business Continuity and Disaster recovery Solution.  Unlike SRM Backup Express utilizes agent based backups for VM’s.  Although we are taught agent based backups are not ideal and should be avoided that is not the case with Backup Express (BEX).  After your first full backup BEX allows you to utilize forever incremental backups.  BEX performs block level incremental backups of your servers causing very little resource overhead.  The reason for such little overhead is BEX not utilize the server files system when performing backups because they are block level.

I know this is all great but how does this help me and how does it compare to SRM, right?  I’m getting there.  Leveraging agent based backups you an backup physical systems and virtual systems across any hardware or hypervisor.  This gives you great flexibility and reduced complexity.  Like other traditional backup systems BEX offers traditional restores but what makes the product really flexible is it’s instant virtualization feature.  In the event that a server fails BEX instant virtualization can provision a VM and boot it off of an iSCSI target that has been provisioned on the fly from the backup set.  Literally allowing you to take a backup set from any server and make it an iSCSI target.  This is very powerful stuff.  Once provisioned the VM now has read write access to the backup set through delta tracking (snaphot technology) so all changes are being recorded but your original backups are still in tact, once your physical server is repaired you can schedule downtime and restore the backup.

This technology provides unlimited potential.  Imagin backing all servers in your environment utilizing one technology to a disk based storage array and replicating those backups to a remote site.  Now not only do you have complete restorative backups off site you have the ability to leverage those backups for instant virtualization at that site.  This is where DR comes in.  All of your servers physical and virtual are located at your secondary site, not just your virtuals.  From these backup sets you can fail a single server to run at the DR site or all the servers to run and the DR site.  Want to fail back no problem simply sync the updated back sets to the primary and perform restores on  the machines one at a time or many at a time.

The real power to a solution like this is adding the workflow automation to it.  Create a DR runbook utilizing a workflow built in VRM.  Create different workflows for different failover scenarios, create workflows for performing DR testing, create custom integration within the self service provisioning to allows users to request backups for their machines with schedules that can be approved as part of the multi-step approval process and then have the backup jobs automatically scheduled.  Add the option for the users to request DR as part of the machine request and automatically configure the backup job to be stored on replicated storage as opposed to non-replicated storage for machines that don’t need DR.  Create workflows that could facilitate the fail over of a server physical or virtual to a virtual instance at a DR site.  Create rules that restore VM’s to many different sites based on criteria set in the workflow, the possibilities are endless and not bound by the constraints of SRM.  No worrying about pagefiles, just exclude them from the backups.  No need to go through special configuration at the remote site to account for missing pagefile drives, by excluding them from the backup set the data won’t be replicated and the setup becomes much more manageable.

On top of the BC and DR advantages of a solution like this SyncSort backup express there are many other benefits.  Utilizing BEX you can perform P2V as well as V2P conversions all through one utility.  No need to have a bunch of specialized applications for each and every function.  These are just some of the BEX and VRM features.  If I were to cover all the features and possibilities this article would turn into a book.  Hope this gave some of you some insight into additional options other than SRM.

Leave a Reply