Release Announcement – vSphere PowerCLI 6.3 Release 1

PowerCLI just got supercharged.  New features in this release include vSAN, Horizon View, VVOLs and more.  Check out below for more details.

For more information on changes made in VMware PowerCLI 6.5 Release 1, including improvements, security enhancements, and deprecated features, see the VMware PowerCLI Change Log. For more information on specific product features, see the VMware PowerCLI 6.5 Release 1 User’s Guide. For more information on specific cmdlets, see the VMware PowerCLI 6.5 Release 1 Cmdlet Reference.

You can find the PowerCLI 6.5 Release 1 download HERE. Get it today!

Continue reading “Release Announcement – vSphere PowerCLI 6.3 Release 1”

DailyHypervisor Forums are online.

We have just launched our DailyHypervisor Forum located at http://www.dailyhypervisor.com/forum. Stop by, contribute and be a part of our community. The DH Forum is intended to be for all things cloud. Currently we have forums created for vCAC, vCD, vCO, Cloud General, and Openstack. More forum categories will be coming based on demand. If you have a category you would like to see shoot us a note and let us know.

Our goal is to create a common place where anyone can come to learn, get help, share ideas, or just about anything that will help foster knowledge regarding cloud computing. Considering this very blog is the announcement of our forum you could image there isn’t a whole lot happening yet so what are you waiting for, be the first. Go ask a question, post an issue, share a thought and let’s get things rolling.

A Different Take on CEE and FCoE

Last Month, I attended a Brocade Net.Ed Session that covered Converged Enhanced Ethernet (CEE) and Fibre Channel over Ethernet (FCoE) and the idea of Server I/O Consolidation. If you missed the Net.Ed sessions, you can learn about it at Brocade’s Training Portal.  Once you register / login, click on Self-Pased Training and search or browse for FCoE 101 Introduction to Fibre Channel over Ethernet (FCoE).  It’s free. Here is an unabridged report about the Net.Ed session with some of my opinions wrapped in:

Trends

With cloud computing, the consolidation of servers, storage and I/O are becoming popular. Once upon a time, server consolidation ratios were bound by processor and RAM count. With the introduction of servers with higher core count, faster processors and higher RAM capacities, the new boundaries are becoming I/O. related. And the I/O stack is answering the call for faster speeds. If you look at the trends, Fibre Channel speed has gone from 1Gb to 2Gb to 4Gb and now 8Gb. Soon, 16Gb FC will be the norm. Ethernet has gone from 10Mb to 100Mb to 1Gb and now 10Gb. The next chapter will bring 40Gb or 100Gb or both.

Fibre Channel and Ethernet have been in a leap frog contest since Fibre Channel was introduced. And there are plenty of arguments about which is “better” and why. Remember how iSCSI was going to take over the world with storage I/O? Why? Because people think they can implement it on the cheap. If it is implemented properly, it may not be that much cheaper than FC. I see too many instances where admins will implement iSCSI over their existing network, without thought of available bandwidth, security, I/O, etc. Then they complain how iSCSI sucks because of poor performance. Consolidation magnifies this. To top it off, iSCSI doesn’t help when dealing with things like FICON or the many tape drives that need faster throughput than what iSCSI can offer.

Hardware consolidation is also popular, and sometimes occurs during the server consolidation project. Blade servers are becoming more popular for many reasons. Less rack space, less cables, centralized management, etc. are all good reasons for blade servers. I just LOVE walking in to a data center and looking at the spaghetti mess behind the racks! Even with blade servers, the number of cables is still crazy. Some people still have Top of Rack switches, even with blades. More enlightened people have End of Row or Middle of Row switches. But there is still that mess in the back of the rack. I especially love when some genius decides to weave cables through the handles on a power supply….

Consolidate Your I/O

Enter I/O consolidation. Brocade calls it Unified I/O.  This is supposed to reduce cabling even more. I say “maybe.” In order to consolidate I/O, different protocols, adapters and switches are necessary. OH MY GAWD! New technology! This means the dreaded “C” word…Change. In a nutshell, it reduces the connections. You go from two to four NICs and two to four FCAs to two Converged Ethernet Adapters (CNAs). It is supposed to reduce cabling and complexity. It’s supposed to help with OpEx and CapEx by enabling more airflow/ cooling, and saving money on admin costs and cable costs, blah blah blah… Didn’t we hear this about blades too?

The Protocols (Alphabet Soup)

In order to make all of this work and become accepted, you need to worry about things like low latency, flow control and lossless quality. This needs to be addressed with standards. The results are CEE and FCoE. The issue arises with CEE. Not all of the components have been finalized. Things like priority based flow control (IEEE 802.1Qbb), Enhanced Transmission Selection (IEEE 802.1Qaz), Congestion Management (IEEE 802.1Qau) and. The IETF is still working on Transparent Interconnection of Lots of Links (TRILL) which will enable a layer 2 multipath without STP.

Feature/Standard

Benefit

Priority Flow Control (PFC)
IEEE 802.1Qbb

Helps enable a lossless network, allowing storage and networking traffic types to share a common network link

Enhanced Transmission Selection (Bandwidth Management)
IEEE 802.1Qaz

Enables bandwidth management by assigning bandwidth segments to different traffic flows

Congestion Management
IEEE 802.1Qau

Provides end-to-end congestion management for Layer 2 networks

Data Center Bridging Exchange Protocol (DCBX)

Provides the management protocol for CEE

L2 Multipathing: TRILL in IETF

Recovers bandwidth, multiple active paths; no spanning tree

FCoE/FC awareness

Preserves SAN management practices

Source: Brocade Data Center Convergence Overview Net.Ed Session
.
.

My Two Cents

So, without fully functioning CEE, the FCoE cannot traverse the network. This stuff is all supposed to be ratified soon. Until these components are ratified, the dream of true FCoE is just a dream. The bridging can’t be done close to the core yet. So People who decided to start using CNAs and Data Center Bridges will need to place the DCBs close to the server (No Hops!) and terminate their FC at the DCB. In the case of the UCS, this is the Top of Rack or End/Middle of Row switch . In the case of an HP chassis, it’s the chassis, and they don’t even have this stuff yet.

My question is this: Why adopt a technology that is not completely ratified? Like I said before, all of this requires change. You may be in the middle of a consolidation project and you are looking at I/O consolidation. Do you really want to design your data center infrastructure to support part of a protocol? Are you willing to make changes now and then make new changes in six months to bring the storage closer to the core?

So, let’s assume everything is ratified. You have decided to consolidate your I/O. How many connections do you really save? Based on typical blade chassis configurations, it may be four to eight FC cables. But look at it another way: You are losing that bandwidth. A pair of 10Gb CNAs will give you a total of about 20Gb of bandwidth. A pair of 10GbE Adapters and a pair of 8Gb FC adapters gives you about 36Gb. So, sure, you save a few cables. But you give away bandwidth. When you think about available bandwidth, is a pair of 10Gb CNAs or NICs enough? I remember when 100Mb was plenty. If consolidation is becoming I/O bound, do you want to limit yourself?  How about politics? Will your network team and storage team play nice together? Where is the demarcation between SAN and LAN?

I first saw the UCS Blades almost a year ago and I was excited about the new technology. Their time is coming soon. The HP Blades have always impressed me since they were introduced. They will never go away. I have used the IBM and Dell blades. My mother always said that if I didn’t have anything nice to say about something, don’t say anything at all…

When I take a look at the server hardware available to me now (HP and Cisco), I see pluses and minuses to both. The UCS Blades have no provisions for FC, so you need to drink the FCoE Kool Aide or use iSCSI. The HP blades allow for more I/O connections and can support FC, but not FCoE. If you want to make the playing field similar, you should compare UCS to the HP Blades with Flex-10. This will make the back-end I/O modules similar. Both act as a sort of matrix to map internal I/O to external I/O. Both will pass VLAN tags for VST and both will accommodate the 1000-v dvSwitches. The thing about Flex-10 is that it requires a different management interface if you are already a Cisco shop.

There’s a fast moving freight train called CHANGE on the track. It never stops. You need to decide when you have the guts to jump on and when you have the guts to jump off.

Storage Protocol Differences and FCoE Diagrams

Just thought I would share these diagrams that I used in a recent training session. I used them to explain the differences in the storage protocols that may be used for a vStorage Cloud and how FCoE works. Click on the images for a larger view.

Storage Protocol Differences

The first image shows the differences between the common storage protocols and what it takes for the data to get from point A to point B.

FCoE Packet

This diagram demonstrates the FCoE packet. The top block is an Ethernet Packet and the bottom block is the FCoE data.

Converged Network Adapter

This diagram shows the data flow within a Converged Network Adapter (CNA).

Converged Enhanced Ethernet Bridge

This diagram is the Converged Enhanced Ethernet Bridge. CEE in one end, FC out the other.

VMTN: I/O Performance in vSphere, Block Sizes and Disk Alignment

Yes folks, it rears its ugly head again…Disk Alignment… If you have not read it yet, check out the whitepaper on disk alignment from VMware.

First, chethan from VMware posted a great thread on VMTN about I/O performance in vSphere. The start of the thread talks about I/O, then leads into anice discussion about block size. A couple of weeks ago, Duncan Epping posted a very informative article about block sizes. It convinced me to use 8MB blocks in VMFS designs.

Finally, the thread kicked into a discussion about disk alignment. As you know, the VMFS partitions created using the VI Client will aoutmatically be aligned. This is why I advocate NOT putting VMFS partitioning into a kcikstart script. The whitepaper demonstrates how to create aligned patrtitions on winders and Linux guests as well. The process is highly recommended for any intensive app. But I have always questioned the need to do this for system drives (C:) on guests. To do it requires a multi step process or the use of a tool, like mbrscan and mbralign, And I have wondered if it was worth the effort. Well, Jason Boche gave me a reason why it should be done across the board. And it makes sense: “This is an example of where the value of the savings is greater than the sum of all of its parts.”

Jas also outlined a very nice process for aligning Linux VMs and fixing a common Grub issue. Thanks for the tip Jas!

I should also thank everyone else involved: Chethan, Duncan and Gabe!

Business Continuity and Disaster Recovery with Virtualization

In the previous years Business Continuity and Disaster Recovery have been big buzz words. All companies small and large vowed to launch initiatives to implement either or both in their current IT strategies. My question is what happened? Why is it that I rarely see organizations that have implemented or even have a plan to implement Disaster Recovery?

Is it a lack of understanding? Is it that most companies believe it is to expensive or complicated to implement? Well it doesn’t have to be either. Most companies that are undergoing virtualization initiatives already have half if not more of what they need to implement Disaster Recovery. The simple fact is if you already have at least two data centers and are virtualizing you are a prime candidate. Here are some common question and my answers regarding this subject:

1.) Do I need to utilize SAN replication to implement Disaster Recovery in a virtualized environment?

No! There are other option to achieve Disaster Recovery without SAN replication. If you are running VMware you can utilize some of what you already have. VMware VCB in conjunction with VMware converter can be used to implement Disaster Recovery. Now this wouldn’t be as elegant as doing SAN replication but you could implement scheduled V2V’s of your Virtual Machines from one site to another and it’s a very simple solution to implement.

What about the hardware right….where do we get the additional hardware? The answer is simple reuse what you already have. Take those old servers you just freed up and put them to some good use. Beef them up! Need more ram in them tear ram out of some and add it to other, do the same with CPU’s to make a number of more power servers that you can use for DR. Granted you may need more of the reused servers to host all the vm’s needed but at the end of the day you would have a disaster recovery plan.

2.) What if I can’t do SAN replication but want synchronous and asynchronous replication?

This can still be achieved using software based replication in your virtual machines. Software like NSI Doubletake and Replistor provide this functionality at a a relatively low cost. With virtualization you can cut cost even more. With physical servers you traditionally needed to have a 1 to 1 mapping for replication which required a license for each host. With virtualization you can take a many to one approace cutting down on the licenses you need to replicate your data.

With this approach I would still use VCB or VMware converter to make weekly copies of your virtual machine OS drives. You can then utilize one of the mentioned applications (Doubletake or Replisor) to synchronous replication of your data volumes. You can achieve this and save licenses by installing say Doubletake on each of the source systems. The you would create a virtual machine at the DR site and add a drive to it for each of the source systems data volumes and replicate each source data to a different data volume on the destination vm. If you ever need to fail over just dismount the volumes from the destination vm and attach each one it’s respective vm that was created through the use of VCB or VMware converter.

3.) These methods are great but what would it take to bring an environment back up using them?

That’s rather hard to say because it depends on the size of your environment and how many vm’s you are relocating to your DR site. If your environment is large and you have specific SLA’s to adhere to regarding RTO (Recovery Time Objective’s) and RPO (Recovery Point Objectives) then you should consider SAN to SAN replication and utilizing something like VMware SRM which does an outstanding job of handling this. VMware SRM also allows you to run disaster recovery simulations to determine the effectiveness of your DR strategy that allows you to determine if you are meeting your SLA.

If you are doing DR on the cheap the real answer is to this question is you will be able to recover your systems a heck of a lot quicker than if had to restore via backups of rebuild your systems.

4.) This is great but where do we begin?

Don’t know where to begin, the answer is easy. Start small and grow into it. Find at least 2 servers that you can reuse beef’em up determine a configuration for them and deploy ESX to the servers. You need to have some infrastructure in place at your DR location to make DR work so that is a good place to start. You need to add the following service at your DR location:

  • Active Directory Servers
  • DNS Servers
  • NTP Servers
  • Virtual Center Server

It may be required to to deploy additional servers for your specific environment but I think you get the idea.

Next pick a few development machines or test machines that you can replicate to the DR site. Develop a plan and schedule down time and perform a test fail over to the remote site. Once you have work out the kinks and have a written DR plan determine your first phase of servers to incorporate into your DR site. Generally at this point you would want to pick some of your most valuable servers to ensure they are protected.

You can then break all your servers that need to be replicated into phases and determine the host requirements at the DR site and develop a plan for each phase of your DR implementation. It would be a good idea to have a remote replication vm for every 20 or so source vm’s. This really would depend on the data chance rate of your servers but 20 is a good starting point.

This article is obviously not all inclusive and is very high level but hopefully it inspires some of you to start developing a DR strategy and at least start testing some of these solutions in your environment because data is a terrible thing to waste.

ESX local partitioning when booting from SAN

A few days ago I wrote a blog about ESX local partitions. A good question was raised after I wrote the article concerning ESX hosts that boot from SAN. In my last article I asked the question “Should the partition scheme be standardized, even across different drive sizes? My question today is should that standard also be used when booting from SAN? I’ve heard the argument that when booting from SAN you should make the partitions smaller to conserve space. Anyone have an opinion on this? I feel it should conform to the standard. We determine the partition sizes for a reason based on need, and that same need still exists regardless of what medium you are booting from.

My recommendation would be to develop a standard partition scheme and utilze it across all drive sizes and mediums. You can find my recommended partition scheme in my previous post mentioned above.

ESX local disk partitioning

I had a conversation with some colleagues of mine about ESX local disk partitioning and some interesting questions were raised.

How many are creating local vmfs storage on their ESX servers?
How many actually use that local vmfs storage?

Typically it is frowned upon to store vm’s on local vmfs because you loose the advances features of ESX such as vMotion, DRS, and HA. So if you don’t run vm’s from the local vmfs, then why create it? Creating this local datastore promotes it’s use just by being there. If you’re short on SAN space and need to deploy a vm and can’t wait for the SAN admins to present you more storage, what do you do? I’m sure more frequently than not you deploy to the local storage to fill the need for the vm. I’m also sure that those at least 20% of the time those vm’s continue to live there.

Is the answer to not utilize local vmfs storage? If you don’t what do you do with the left over space? Not all servers are created equal, sometimes servers have different size local drives so you have a few options. Do you create standards for your partitioning and set a partition such as / to grow and have varying configurations amongst your hosts? Or do you create a standard for all partition sizes and leave the rest of the space raw?

Typically this is the partition scheme I use for all deployments I do.

Boot = 250 (Primary)
Swap = 1600 (Primary)
/ = Fill (Primary)
/var = 4096 (Extended)
/opt = 4096 (Extended)
/tmp =4096 (Extended)
/home =4096 (Extended)
vmkcore = 100 (Extended)

This configuration will create inconsistencies amongst hosts with varying drive sizes. To maintain consistency I could do something like the following and leave the rest of the space raw.

Boot = 250 (Primary)
Swap = 1600 (Primary)
/ = 8192 (Primary)
/var = 4096 (Extended)
/opt = 4096 (Extended)
/tmp =4096 (Extended)
/home =4096 (Extended)
vmkcore = 100 (Extended)

I’m a fan for utilizing all the space you have available, but others like consistency. What is your preference? Weight in an let us know.

ESX Datastore sizing and allocation

I have been seeing a lot of activity in the VMTN forums regarding datastore sizing and free space.  That said I decided to write a post about this topic.  There are endless possibilities when it comes to datastore sizing and configurations but I’m going to focus on a few keep points that should be considered when structuring your ESX datastores.

All VM files kept together

In this configuration all VM files are kept together on one datastore.  This includes the vmdk file for each drive allocated to the VM, the vmx file, log files, the nvram file, and the vswap file.  When storing virtual machines this way there are some key considerations that need to be taken into account.  You should always allow for 20% overhead on your datastores to allow enough space for snapshots and vmdk growth if necessary.   When allocating for this overhead you have to realize that when a VM is powered on a vswap file is created for the virtual machine equal in size to the VM’s memory.  This has to be accounted for when allocating your 20% overhead.

For Fiber Channel and iSCSI SAN’s you should also limit the number of VM’s per datastore to no more than 16.  WIth these types of datastores file locking and scsi reservations create extra overhead.  Limiting the number of VM’s to 16 or less reduces the risk of contention on the datastore.  So how big should you make your datastores?  That’s a good question and it will vary from environment to environment.  I always recommend 500GB as a good starting point.  This is not a number that works for everyone but I use it because it helps limit the number of vm’s per datastore.

Consider the following your standard VM template consist of two drives an OS drive and a Data drive.  Your OS drive is standardized at 25Gb and your Data drives default starts at 20Gb with larger drives when needed.  Your standard template also allocated 2Gb of memory to your VM.  Anticipating a max of 16 VM’s per datastore I would allocate as follows:

((osdrive + datadrive) * 16) = total vm disk space + (memory * 16) =vm disk & vswap + (16 * 100Mb(log files) = total VM space needed * 20% overhead

(25 + 20) * 16 = 720Gb + ((2Gb * 16)=32) = 752Gb + ((16 * 100mb) = 1.6Gb) = 753.6Gb * 20% = 904.32Gb Round up to 910Gb needed

Depending on how you carve up your storage you may want to bump this to 960Gb or 1024Gb so as you can see the 500Gb rule was proven wrong for this scenario.  The point is you should have a standardized OS and data partition to properly estimate and determine a standardized datastore size.  This will never be perfect as there will always be VM’s that are anomalies.

Keep in mind if you fill your datastore and don’t leave room for the vswp file that is created when a VM powers on you will not be able to power on the VM.  Also if you have a snapshot that grows to fill a datastore the VM will crash and your only option to commit the snapshot will be to add an extent to the datastore because you will need space to commit the changes.  Extents are not recommended should be avoided as much as possible.

Separate VM vswap files

There are a number of options available in Virtual Infrastructure on how to handle the VM’s vswap file.  You can set the location of this file at the vm, the ESX Server, or the cluster.  You can choose to locate it on a local datastore or one or more shared datastores. Below are some examples:

Assign a local datastore per ESX server for all VM’s running on that server.

This option allows you to utilize a local vmfs datastore to store the VM’s vswap saving valuable disk space.  When using a local datastore I recommend allocating enough storage for all the available memory in the host + 25% for memory over subscription.

Create one shared datastore per ESX cluster.

In this option you can set one datastore at the cluster level for all vswap files.  This allows you to create one large datastore and set the configuration option once and never worry about it again.  Again I would allocate enough space for the total amount of memory for the whole cluster +25% for over subscription.

Multiple shared datastores in a cluster.

In this option you have different scenarios.  You can have one shared datastore per esx hosts in the cluster or one datastore for every two servers in the cluster, etc..  You would need to assign the vswap datastore at the esx host level for this configuration.

Note: When moving the vswap to a separate location it can impact the performance of vmotion.  It could extend the amount of time it takes for the vm to fully migrate from one host to another.

Hybrid Configuration.

Just as it’s possible to locate the vswap on another datastore it is also possible to split the vmdk disks on to separate datastores.  For instance you could have datastores for:

OS Drives
Data Drives
Page Files
vSwap files

To achieve this you would tell the vm where to create the drive and have different datastores allocated for these different purposes.  This is especially handy when planning to implement DR.  This allows you to only replicate the data you want and skip the stuff you don’t like the vswap and page files.  With this configuration you can also have different replication strategies for the data drives an OS drives.

Hope you found this post useful.