By Rich Fenton – Senior Systems Engineer
Nimble Storage announced recently that the InfoSight storage analytics service had been upgraded so that customers using VMware’s vCenter can monitor their arrays right down to the individual VM (virtual machine) level. It sounds great, but how does this translate into business value for companies trying to better manage their ever-expanding storage infrastructure?
Part of my job involves going on-site to customers’ data centers to help them diagnose and correct performance issues, and on one visit last week I saw a great example of how valuable this kind of technical capability can be.
I was working with an enterprise customer that had virtualized many of their critical business applications, and together we were trying to understand how to improve the performance of one of their key apps whose performance had steadily degraded over time. Their old legacy storage system had come bundled with software to monitor their storage arrays and report on the virtual infrastructure, but it was complex to setup and understand, and though deployed was not readily used.
The customer initially planned to solve the problem by throwing money at it – more memory, faster CPUs, 10GB Ethernet network and high-speed storage. Then they discovered Nimble Storage. When I arrived on-site, it quickly became clear that the performance issues were related to their VMs, so we decided to use InfoSight’s new “per-VM monitoring” feature.
First, in order to run per-VM monitoring, you need to purchase – nothing. (It’s part of Nimble’s standard support package). Next, you need to install – nothing. You just register your vCenter credentials with the Nimble array: in fact, if you have the vSphere plugin running, you’ve already done this.
Finally, you need to setup – you guessed it – nothing. Assuming your array is sending home data (which is true for most Nimble customers), all you need to do is reach out to your account team so we can enable the statistics collection, as the capability is based on an opt-in mechanism.
It Gets Better
Nimble customers who are already familiar with InfoSight will notice a few improvements in the recent update, along with changes to some menu options. In particular, the Manage menu, in addition to Assets and Volumes, now includes a Virtual Environment option:
This is the option that enabled our customer to quickly identify the root cause of their performance problem and take corrective action. I’ve recreated the process in the screenshots below, using an internal Nimble test environment so as not to expose confidential data.
In InfoSight, selecting Manage > Virtual Environment takes you to the registered vCenter plugin (the datasource where we are collecting virtual information). Below we can see the two vCenter instances we are polling; expanding the vCenter tabs shows the Data centers, ESX Nodes and VMs in each of the vCenter servers. This can be navigated to individually and reported on, or you can select the higher-level objects and run reports on those:
The above view shows the Data center HQ selected, with the right-hand pane showing the performance of all the hosts in that data center. The available reports include:
- Host Activity: Report on the busiest hosts during the last period
- Top VMs: Show the busiest VMs in the data center over the last 24 hours by IOPS and latency
- Inactive VMs: Shows which VMs have been dormant and therefore candidates to clean to recoup space.
I’m going to focus on the datastore treemap view, as this was the capability that enabled our customer to resolve their particular performance issue.
Clicking on the Datastore Treemap view displays a tree with a “heat map” of all the datastores in that data center:
Each square in the screenshot above denotes a datastore. The bigger the square, the more IOPS that datastore has seen over the last 24 hours; a smaller square means fewer IOPS. The color also represents latency – a blue color means all VMs have been showing low latency, while a red square means a VM has been experiencing abnormal latency and therefore ought to be investigated. Hovering the mouse over the squares reveals the underlying figures.
In this case, we’ll click on the red square to see which VM is in trouble. Clicking on the red datastore opens up that datastore to show the VMs:
We now get the see the same view, but from the VMs that are hosted on that datastore. Hovering over an individual VM shows us the IOPS and average latency for that VM:
It looks like this particular VM is in trouble, so let’s drill down and look at that VM in detail. Clicking on the VM now gives us a historical view of that VM’s performance over time. Mousing over the charts shows us the VM’s performance with regards to latency and also which resource was contributing to that latency (i.e.: host, network or storage):
We can see from the above graph that storage latency has been fine, and in fact the host is the major culprit.
This is where InfoSight provides immense business value. Instead of randomly throwing money at every component in the data center, the customer can now spend only where it really matters, gaining maximum performance improvement at minimal cost.
In this case, the customer described the new InfoSight functionality as a “game changer”, and given how much money they’d just saved, I would agree.
For additional info:
- Here’s a video demonstration of the above process on YouTube
- Nimble’s VP of customer support, Rod Bagg, wrote a blog post on how InfoSight no provides both granular insight and the big-picture overview.
- Rod also worked with George Crump (a Storage Switzerland analyst) to produce an informative overview video.
- Rich Fenton