The Escher Stairs of Efficiency Claims
An end user bombarded by the many efficiency claims made by storage vendors might be forgiven for being confused, skeptical, or both. How is it possible for so many vendors to claim they deliver storage with X% lower cost than other vendors? For all these claims to be true, the storage world would have to be the real world equivalent of M.C. Escher’s mind-bending Penrose Stairs. What’s really going on here?
Comparing Storage Efficiency
Well, the problem is that most such claims are based on simplistic comparisons, such as only comparing capacity efficiency (usable capacity/raw capacity). Or just comparing raw performance. And even these are often inflated with unrealistic assumptions.
While interesting, such one-dimensional comparisons are typically only useful for niche applications such as archiving or HPC. For mainstream applications you typically care about multiple dimensions of a storage solution such as price/performance, data protection, availability and capacity efficiency. Knowing this, the question then is – how does one construct more meaningful comparisons?
A Better Comparison
Assuming many solutions meet your threshold of reliability and availability, here are some dimensions of storage efficiency you might consider in comparing them:
· Capacity AND Performance Efficiency
A basic definition of capacity efficiency (usable capacity/raw capacity) can be too simplistic for a couple of reasons. Often it ignores capacity savings techniques like inline compression and cloning. More importantly, it ignores the inherent performance differences between architectures. If you could get 50% compression without a performance impact, that’s certainly nice. But if you could get the performance of high performance drives (15K rpm disks, or better, flash SSDs) and the capacity of high density drives (7.2K rpm disks) in a single tier of storage – that’s HUGE! When you consider 15K RPM drives cost 500% more per GB than 7.2K RPM drives, the above example translates to a 500% capacity advantage from the get go! To capture such differences, a meaningful comparison of efficiency ought to consider both $/GB AND $/IOPS.
· Data Protection Efficiency
The most visible elements of efficient data protection are the capacity efficiency of backup storage (e.g. dedupe ratios), and the bandwidth and capacity efficiency of DR storage. It’s less common to see quantitative comparisons of the level of data protection – namely the RPOs and RTOs enabled by the system although these translate to very real and potentially big costs. And then there’s another part which is sometimes overlooked and typically harder to quantify: operational efficiency, in other words how easy is it to setup and manage backups and DR on a day to day basis. More on this topic next.
· Operational Efficiency (i.e. Simplicity)
This is the dimension that is hardest to measure, but no less important to consider. Operational Efficiency encompasses qualitative attributes like simplicity – can an admin just install and start using a storage technology without days of training, professional services and years of experience? Does the performance adapt quickly to changing workloads? Quantitative measures might be the time (or number of steps) required for common tasks.
There’s another reason to pay close attention to operational efficiency – it helps you distinguish truly efficiently designed storage solutions from less efficiently “bundled” ones. Here’s a hypothetical example to illustrate:
What if you had a shrink-wrapped solution that bundled a small amount of expensive but fast storage together with a lot of cheap but slow storage. And also threw in some software to slowly move data back and forth – to relocate the right data on the right tier. And some more software to do the same for backup purposes. On paper such a solution can appear to have it all– good $/IOPS, good $/GB and automation to simplify management. So what could be missing – potentially a LOT!
If the data transfer process is slow and heavy duty – it might take hours to complete and impact performance while it’s happening. And since application workloads change dynamically, you’d be constantly monitoring workloads and over-allocating performance tiers to ensure bursty applications don’t experience bad performance for extended periods. Despite this, it’s virtually certain that some applications would experience poor performance. As for backups/restores – you’d be constantly battling backup windows and dealing with poor recovery points and slow, painful restores. So in reality, such a package would deliver much less than the sum of its parts.
What This Means for You
Not every application needs a multi-dimensional, well balanced storage solution. Perhaps for an archive tier $/GB is the one over-riding concern. Or maybe for a critical application you’re willing to pay a lot for performance, even if it means compromising on capacity and efficient data protection. However the vast majority of mainstream applications need more versatile storage solutions.
One approach to picking the right one is to assign explicit weights to your criteria: for example capacity efficiency, performance efficiency, data protection efficiency and operational efficiency might be all equally important in your environment and deserve equal weights. You can then compare storage solutions under each of these four criteria and rate each on a scale of 1-5. The overall weighted rating would give you a much better measure of storage efficiency for your applications than anything vendor marketing materials could. In upcoming blogs we will share real world data on how Nimble does on each of these criteria.