Stevie’s Unified Event Management, My Cloud Shangri-La

If you know Steve Chambers you know he just moved to Cisco. Before that, he was with VMware and has been a pillar of the VI:OPS boards. He is now working on a document about Unified Event Management and in the spirit of community, he is looking for comments, suggestion, etc. He called my attention to the post via Twitter as we were discussing Splunk and it’s capabilities for “Centralized Event Aggregation” (Steve’s terms). Take a look at his post when you get a chance and make some comments. You know that I have heralded the benefits of a centralized logging server. Steve just plain gets it.

And since I mentioned Cisco, I also discovered that Cisco put out a whitepaper on their take regarding the Virtualization Blueprint for the Datacenter. Its their take on how virtualization will benefit your business.  The chart shows how a business’ agility will increase as we climb the lifecycle from consolidation to virtualization and then on to automation.

It doesn’t matter what you are using underneath of it all – VMware, Xen, Hyper-V – UCS, Matrix. It just matters that you have methods to provide centralized monitoring and centralized automation. Although centralized event monitoring and centralized automation are two different things, they are both necessary if you wish to properly monitor and manage your piece of the cloud. I’ve already said my piece on the need for centralized event monitoring and Steve lays out a sample blueprint.

Automation is the new big thing when it comes to the cloud. VMware saw that way back when and they bought Dunes almost two years ago. VMware Orchestrator (VMO) was a big buzz for a little while, but great big VMware couldn’t couldn’t pull off what teenie little Dunes could when it comes to customizing the Orchestrator. They left it in a fairly decent state for smaller businesses with VMware Lifecycle Manager, but it was a hobbled state and didn’t scale very well. You can customize VMO, but you need to be good at the Dunes interface and have a decent knowledge of JavaScripting and that kind of stuff. Even being free, its not for me. The standard release of VMO allows you to set up a facility to request, approve, provision and archive VMs. A great start, but not quite enough.

A quick search for data center orchestration reveals Cisco at the top of the list. But there are others from Novell PlateSpin, Egenera, and DynamicOps that appear to do more. What we REALLY need is a way to orchestrate/ automate the entire data center. Physical servers, VMs, storage and networking can all be provisioned, monitored and managed. Can they all be managed from a common platform? Once you can have a seamless process for provisioning, managing and monitoring every component of the data center, you will see cloud computing really take off. A user (consumer / customer) that needs an application should not care if it is deployed on a physical or virtual machine, what storage devices hold the data or the network that connects it. The user should know the basic requirements for the application and the ORCHESTRATOR should make the decisions about all of these things. The orchestrator will take a request, ask for approval and make sure the application gets deployed without making mistakes. The orchestrator will interface with the monitoring facility and change management to make sure the application is accounted-for. The orchestrator will hand off to the backup facility. The orchestrator will notify you when the application as reached end of life. That’s when we will have “Cloud Shangri-La” (My term).

Setting up a Splunk Server to Monitor a VMware Environment

In a previous article, I compared syslog servers and decided to use Splunk. Splunk is easy to set up as a generic Syslog server, but it can be a pain in the ass getting the winders machines to send to it. There is a home brewed java based app on the Splunk repository of user submitted solutions, but I have heard complaints about its stability and decided that I was going to set out to find a different way to do it.

During my search, I discovered some decent (free!) agents on sourceforge. One will send event logs to a syslog server (SNARE) and one will send text based files to a syslog server (Epilog). Using the SNARE agents appear to be more stable than using the Java App and does a pretty good job. So I basically came up with a free way to set up a great Syslog server using Ubuntu Server, Splunk, SNARE and Epilog.

I created a “Proven Practice Guide” for VI:OPS and posted it there, but it seems that it is stuck in the approval process. I usually psot the doc on VI:OPS and then link to it in my blog post, and follow up later with a copy on our downloads area. To hurry things along, I also posted it in both places:

http://www.www.dailyhypervisor.com/?file_id=17

http://viops.vmware.com/home/docs/DOC-1563

SPLUNK! Goes the Syslog Server…

The use of a “syslog” server is important in today’s data center. Most network and SAN switches, along with Unix and Linux servers are capable of sending logging information to a syslog server. The obvious reason for a syslog server is to centralize all of your logs. This enables you to troubleshoot issues more efficiently. Most syslog servers allow you to do a time-line based analysis of log data so that you have an enterprise – wide view of all activity. This allows you to see how different devices interact.

An less obvious reason for a syslog server is for security purposes. The theory is that an attacker will attempt to elevate to root privileges and then try to delete or alter logs to hide evidence of the attack. If all log information is relayed to a syslog server, the hope is that this data is secured for forensic study, if needed.

I have tried a few different “free” and non-free syslog servers. I didn’t do extensive research into all available syslog servers, but I have to say that I like Splunk the best. It starts with a free server with a limited amount of data. This may be fine for smaller shops. There is also a paid version that allows for more data collection. The fully “free” syslog server that came close was the combination of syslogd and phplogcon on a Linux server. I also tried Kiwi syslog, which also has a “free” version and a paid version. But it only installs on winders. Most of the syslog servers are great. There were a few capabilities I felt made Splunk a nice syslog server:

  • Act as a standard syslog server.
  • The ability to “scrape” directories.
  • Monitor Windows logs.
  • Allow for upload of log data.
  • Provide Time line Analysis.

Acting as a standard syslog server is really a no-brainer. All of the packages that I tested worked fine in this respect. You set up pointers to the syslog server in the *nix /etc/syslog.conf file and all logs are automatically sent.

When dealing with collecting logs on an ESX server, the standard syslog.conf settings may not cut it. The HA logs reside in a different location and should be “scraped”. In this context, “scraping” is the process of reading all of the text files in a specified directory and compiling them into the syslog database.

Monitoring Windows logs is also a key ingredient in the datacenter stew. If you are going to do centralized collection of logs, collect everything. Splunk uses WMI to gather this information.

The ability to upload log data manually is also a nice option. I was recently troubleshooting an issue with VMware Consolidated Backup and I was able to manually upload all of the related VCB logs right into a Splunk server VM. I exported the Windows system and application logs to .csv files and copied them to a directory on the Splunk server. I also copied the VCB logs and ESX logs to the same directory. After a few minutes, the data was assimilated into the database and ready for analysis. I was able to look at a specific point in time and look at errors across the entire environment. I could see errors in the VCB logs and relate them to errors in the Windows system and application logs. I was also able to track all of the ESX and VM logs for the time period.

The Splunk server offers WAY more than the logging functions described here. It is also a great tool for compliance, change control, security, server management, etc. It has install packages for winders, Linux, Solaris (x86, x64 AND Sparc), Mac OSX, FreeBSD and AIX.

As you can see, the Splunk server is very useful for capturing all kinds of logs for security and troubleshooting purposes. In part two, I will dig deeper into setting up a Splunk server and configuring *nix, ESX, ESXi and winders machines to send their logs. As with the VCB Proven Practice Guide, there will be a companion doc on the VI:OPS site.