Find One of our Global Locations

By Wen Yu – Senior Product Manager

With flash becoming mainstream in the corporate data center, some people have mistakenly assumed that flash can solve any performance problem, especially in virtualized environments like VMware’s vSphere. The reality is that many storage admins and vSphere admins must spend a lot of time performing environmental health checks for early detection of any issues, and yet are still constantly running into challenges in isolating performance problems.

Since the launch of InfoSight™ two years ago, Nimble Storage’s customers and partners have benefited from its highly data-driven approach to storage management, one that has enabled everyone involved to better understand the dynamics of their storage infrastructure. This visibility, combined with on-going product feedback interviews and customer councils, has helped us identify three of the top challenges in virtualized environments:

  • It’s difficult to isolate performance bottleneck in the many layers of the virtualization stack.
  • Existing monitoring tools simply display statistics with no actionable recommendations.
  • Agent-based or CLI (command line interface) stats collection during the critical troubleshooting stage is painful and inefficient.

What we heard from customers was they wanted InfoSight expanded to collect sensor data from the hypervisor side, and provide meaningful environmental info on a per-VM level granularity. With the challenges well understood, we have now expanded InfoSight to enable customers to easily drill down into their virtualized environments. The new capabilities include:

  • Agent-less collection of key performance indicator metrics across the entire virtualization stack.
  • The ability to easily pinpoint performance bottleneck on a per-virtual-machine basis.
  • A simple-to-digest health dashboard of the entire virtualized infrastructure.

Agent-less KPI Collection and Analysis

Each Nimble array has an embedded automation plugin that communicates directly with the vCenter Server. We simply leverage this service to enable stats collection for all relevant KPIs (key performance indicators) in the vSphere environment. There is no software/agent to install or maintain on the host/hypervisor/VMs. Customers simply register the vCenter plugin to start the sensor data collection.

With regard to KPIs, the following stats are collected for deep analysis:

Compute

  • CPU utilization, CPU ready
  • Memory utilization, VMkernel swap, balloon driver activities

Storage

  • VMDK read/write IOPS, latency & MB/s throughput
  • Datastore read/write IOPS, MB/s throughput, latency

Additionally, IO queue latency, VMkernel latency, and device round trip latency are captured.

Per-VM Level Granular Troubleshooting

The inventory view for vSphere environment has a look and feel that’s similar to the vSphere client, allowing customers to easily identify specific VMs of interest, either through built-in search or through the VM folder structure:

vm-blog-1a

Figure 1: The vSphere Inventory with smart search presents a familiar folder view of your VMs.

With the virtual machine identified, end users can easily pinpoint the key contributors to latency (that is, whether it’s the compute, network or storage layer):

vm-latency-breakdown

Figure 2: A VM latency chart provides a breakdown of contributing factors, showing the influence of the host environment, the network, and the storage system.

Additionally, a single click let’s you identify the noisiest neighbors in the environment:

Untitled-3

Figure 3: The “noisy neighbor” analysis makes it easy to identify the top 10 neighbor VMs consuming IOPS.

Proactive environmental health information is revealed through the dashboard, to help answer the a number of common questions:

  • In my virtualized environment, which datastore is the busiest, and which has the highest latency? With the two dimensional treemap dashboard below, the block size represents the amount of IO being pushed through the datastore, and latency is represented by the color chart (red being highest latency).
datastore-treemap

Figure 4: The treemap dashboard provides an overview of datastore and IO performance / latency.

  • Which VM is the busiest in a given datastore? This information can easily be found by drilling down in a given datastore:
vm-treemap

Figure 5: From the treemap dashboard, you can drill down into the performance of individual VM virtual disk(s) in a given datastore

InfoSight has always provided Nimble customers with remarkable insight into their storage infrastructure, and now it provides even more visibility and control over their virtualized environments. It’s a very streamlined, automated approach: agent-less KPI stats are ingested into InfoSight for analysis, InfoSight then pinpoints performance bottlenecks and conducts a proactive health analysis, after which you take appropriate action based on the analytics / recommendations.

So, it’s true that virtualized environments benefit from flash storage. But with the addition of per-VM monitoring, it’s ever more true that Nimble’s Adaptive Flash platform goes beyond flash alone to provide a powerful easy-to-use solution for virtually all enterprise workloads.

Written by:
Wendel Yu