DailyHypervisor Forums are online.

We have just launched our DailyHypervisor Forum located at http://www.dailyhypervisor.com/forum. Stop by, contribute and be a part of our community. The DH Forum is intended to be for all things cloud. Currently we have forums created for vCAC, vCD, vCO, Cloud General, and Openstack. More forum categories will be coming based on demand. If you have a category you would like to see shoot us a note and let us know.

Our goal is to create a common place where anyone can come to learn, get help, share ideas, or just about anything that will help foster knowledge regarding cloud computing. Considering this very blog is the announcement of our forum you could image there isn’t a whole lot happening yet so what are you waiting for, be the first. Go ask a question, post an issue, share a thought and let’s get things rolling.

Is Your Blade Ready for Virtualization? Part 2 – Real Numbers

OK, so my last post brought on a blizzard of remarks questioning some of the validity of the data presented. I used what I was told during a presentation was a “Gartner recommended” configuration for a VM. My error was that I could not find this recommendation anywhere, but the sizing seems fairly valid, so I went with it. I went back to some of the assessments I have done and took data from about 2,000 servers to come up with some more real-world averages. I wanted to post these averages tonight. Remember what I said previously: This is just a set of numbers. You must ASSESS and DESIGN your virtual infrastructure properly. This is only a small piece of it.

Is Your Blade Ready for Virtualization? A Math Lesson.

I attended the second day of the HP Converged Infrastructure Roadshow in NYC last week. Most of the day was spent watching PowerPoints and demos for the HP Matrix stuff and Virtual Connect. Then came lunch. I finished my appetizer and realized that the buffet being set up was for someone else. My appetizer was actually lunch! Thanks God there was cheesecake on the way…

There was a session on unified storage, which mostly covered the LeftHand line. At one point, I asked if the data de-dupe was source based or destination based. The “engineer” looked like a deer in the headlights and promptly answered “It’s hash based.” ‘Nuff said… The session covering the G6 servers was OK, but “been there done that.”

VMware Virtual Center – Physical or Virtual?

Over the years there have been some controversy over this topic. Should Virtual Center (vCenter) be physical or virtual? There is the argument that it should be physical to ensure consistent management of the virtual environment. Of course there is also the fact that Virtual Center requires a good amount of resources to handle the logging and performance information.

I’m a big proponent for virtualizing Virtual Center. With the hardware available today there is no reason not to. Even in large environments that really tax the Virtual Center server you can just throw more resources at it.

Many companies are virtualizing mission critical application to leverage VMware HA to protect these applications. How is Virtual Center any different. So what do you do if Virtual Center crashes? How do you find and restart Virtual Center when DRS is enabled on the cluster?

You ave a few options here.

  1. Override the DRS setting for the Virtual Center vm and set it to manual. Now you will always know where your virtual center server is if you need to resolve issues with it.
  2. Utilize Powershell to track the location of your virtual machines. I wrote an article that included a simple script to do this which I will include on our downloads page for easy access.
  3. Run an isolated 2 node ESX cluster for infrastructure machines.

So my last option warrants a little explaining. Why would you want to run a dedicated 2 node cluster just for infrastructure vms? The real question is why wouldn’t you? Think about it. Virtual Center is a small part of the equation. VC and your ESX hosts depend on DNS, NTP, AD, and other services. What happens if you loose DNS? You loose your ability to manage your ESX hosts through VC if you follow best practice and add them by FQDN. Now if AD goes down you have much larger issues, but if your AD domain controllers are virtual and you somehow loose them both that’s a problem. It’s a problem that could affect your ability to access Virtual Center. So why not build an isolated two node cluster that houses your infrastructure servers. You’ll always know where they will be, you can use affinity rules to keep servers that back each other up on separate hosts, and you can always have a cold spare available for Virtual Center.

Obviously this is not a good option for small environments, but if you have 10,30, 40, 80, 100 ESX hosts and upwards of a few hundred VM’s I believe this is not only a great design solution but a much needed one. If you are managing this many ESX hosts it’s important to know for sure where your essential infrastructure virtual machines reside that affect a large part of your environment if not all.

VMware HA Cluster Sizing Considerations

To properly size a HA fail over cluster there are a few things that need to be determined.  You need to know how many hosts are going to be in your cluster, how many hosts you want to be able to fail (N+?), and it helps to know resource utilization information about your vm’s to gauge fluctuation.  Once we know this information we can use a simple formula to determine the maximum utilization for each host to maintain the appropriate DRS fail over level.

Here is an example:

Let’s say we have 5 hosts in a DRS cluster and we want to be able to fail (1) hosts (N+1).  We also want to have 10% overhead on each server to account for resource fluctuation.  First we need take 10% off the top of all (5) servers which leaves up with 90% utilizable resources on all hosts.  Next we need to account for the loss of (1) hosts.  In the event that a host is loss we need to distribute its load across the remaining (4) host.  To do this we need to divide up one hosts 90% possible resources by (4) remaining hosts.  This tells us that we need to distribute 22.5% of the servers load to each of the remaining hosts.

Taking in to account the original 10% over head plus the 22.5% capacity needed for fail over we need to have 32.5% of each hosts resources available which means we can only utilize 67.5% of each host in the cluster to maintain an N+1 fail over cluster with 10% overhead for resource fluctuation.  The formula for this would be:

((100 – %Overhead)*#host_failures)/(num_hosts – #host_failures)+%overhead = overhead needed per ESX host

Example 1:

((100-10)*1)/(5-1)+10 = 32.5    
(5 Server cluster with 10% overhead allowing 1 host failure) 67.5& of each host usable

((100-20)*2)/(8 -2)+20 =46.6   
(8 Server cluster with 20% overhead allowing for 2 host failures) 53.4% of each host usable

Example 2:

Fail over of 1 host

((100-20)*1)/(8 -1)+20 =31.4   
(8 Server cluster with 20% overhead allowing for 1 host failures) 68.6% of each host usable

Fail over of 2 hosts

((100-20)*2)/(8 -2)+20 =46.6   
(8 Server cluster with 20% overhead allowing for 2 host failures) 53.4% of each host usable

Determining the %Overhead can be tricky without a good capacity assessment so be careful if you don’t allocate enough overhead and you have host failures performance can degrade and you could experience contention within the environment.  I know some of the numbers seem dramatic but redundancy comes with a cost no matter what form of redundancy it may be.