Introducing InfoSight Labs – A Framework That Takes Storage Analytics to New Heights
By: Karthik Krishnaswamy, InfoSight Product Manager, Nimble Storage
Enterprises are increasingly looking to analytics to improve all aspects of their businesses. Insights obtained from analytics help enterprises develop newer revenue streams, identify opportunities that drive higher efficiency as well as enable faster, better decision making. At Nimble Storage, we strive to provide sophisticated analytics that help find the needle in the haystack and solve complex issues across the infrastructure stack.
This blog provides an overview of InfoSight Labs – a new framework that allows us to quickly introduce advanced analytics to our customers. These analytics provide granular insights that can be used for advanced troubleshooting purposes. Questions such as which volume is growing fastest, how much will a specific application grow to, or are there any volumes monopolizing resources can be answered using Labs. We plan to introduce new features and analytics first in Labs. We will monitor how customers use them and will use this feedback mechanism to improve thee features.
Each application in Labs can be used on demand. Labs data can be aggregated in a variety of ways so that users can quickly get the information needed: E.g. by Volumes, Applications, Performance Policies and Tenants (Folders). Users can provide instant feedback. This feedback will be used to constantly refine both the usability and interpretability of the analytics as well as the models behind these applications to ensure maximum accuracy.
The following shows some of the applications currently available in Labs:
- Volume Performance Explorer: This app helps you troubleshoot performance problems at a granular level. It allows you to view historical performance metrics (IOPS, throughput, latency) at the volume level, enabling you to hone in on the specific volume that is most impacted by performance.
This includes a “DNA view” of your application: a heatmap of read and write I/O block size. This chart plots the I/O block size across the selected time (date) range. The darker the color, the higher the contribution of a specific block size at that specific time. In the example below, 89% of IO operations on Oct 28 2016 at midnight are of size 4K – 8K.
If you want to investigate a ‘low IOPS’ issue, you can quickly get visibility into IO block sizes using this application. Arrays typically achieve higher IOPS at smaller IO sizes and higher throughput at larger IO sizes. You can determine if lower IOPS is a result of larger IO sizes by looking at this heatmap.
- Capacity Consumers Timeline:This app allows you to view historical capacity usage by volume, folder, applications, and more. This is highly useful for finding those offending applications that are growing beyond their true need. For example, historical trending at the array level would indicate overall growth, but when you zoom in and look at the volume breakdown, you could quickly see the volume that is growing rapidly. This application helps answer the question: “Which volumes/folders/applications are the top contributors to growth”?
- Capacity Forecast Explorer: This app allows you to forecast capacity for volumes and snapshot data. This augments our existing automatic forecasting and alerting of array-level consumption: allowing for a more fine-grained view into future capacity needs. You can now forecast growth for specific volumes or applications that will contribute to overall array-level growth and consumption.
- New and intuitive Replication Planning Tool: Quickly figure out bandwidth needed for replication. This app provides bandwidth needed based on observations from entire install base. It allows exploration of ‘what-if’ scenarios – You can provide replication interval (frequency of replication) as well as recovery point objective (maximum targeted period in which data might be lost due to a major incident) as inputs to obtain an estimate of bandwidth needed for replication.
- Replication Timeline: This app allows you to view historical replication rates on a per application, volume, performance policy basis. Bandwidth used in the past for replication will be very helpful with planning network capacity for future needs.
- Inter-Volume Performance & Contention: This app enables you to determine which volumes are consuming the most resources. It also diagnoses the volumes that are experiencing high latency as a result of volumes that are monopolizing resources.
This is particularly helpful for identifying “Noisy Neighbor” volumes and candidates for QoS IOPS or throughput throttling.
In this example, Nimble-02 is driving high throughput between 5:00 – 7:00 PM on Jan 6 as indicated by the blue coloring in the top chart. Other volumes don’t seem to be driving much activity (as indicated by lighter shades of blue). But they all seem to be experiencing higher latency than Nimble-02 as indicated by the bottom chart. The darker the pink color, higher the latency. Nimble-02 doesn’t seem to be experiencing high latency. All volumes are experiencing high latency because of high throughput activity by Nimble-02.
- Recurring Performance Patterns: This app allows for flexible investigation of recurring patterns in workload and performance metrics on a daily, weekly, or monthly basis for any selected subset of volumes.
In this example, there’s a recurring IOPS activity at 7:10 PM every day on volume SQL-Backups. It turned out that there was a rogue workload that ran on this array every day. This was stopped which in turn resulted in higher performance.
You can access Labs from Manage->Labs. Please try it and we encourage you to use the feedback button on each application to provide your feedback.
By providing granular visibility into your storage layer (and soon the virtualization as well), we believe Labs will be very useful in quickly identifying infrastructure issues that impact applications. Stay tuned for exciting updates – we plan to introduce new apps at a regular cadence.
- Karthik Krishnaswamy