Nimble Storage was founded in early 2008 on the premise that hybrid storage arrays would be the dominant networked storage architecture over the next decade – a premise that is now widely accepted. The interesting question today is, “Are all hybrid storage arrays are created equal?” After all, SSDs and HDDs are commodities, so the only factor setting them apart is the effectiveness of the array software.
How does one compare hybrid storage arrays? Here are some key factors:
- How cost-effectively does the hybrid storage array use SSDs to minimize costs while maximizing performance?
- How cost-effectively does the hybrid storage array use HDDs to minimize costs while maximizing useable capacity?
- How responsive and flexible is the hybrid array at handling multiple workloads and workload changes?
- Aside from price/performance and price/capacity, how efficient is the array data management functionality (such as snapshots, clones, and replication)?
This blog will cover the first three. The fourth dimension of efficient data management is a very important factor in evaluating storage arrays, and a topic we’ll cover in detail in a future blog post.
How cost-effectively does the hybrid storage array use SSDs?
Most hybrid storage array architectures stage all writes to SSDs first in order to accelerate write performance, allowing data that is deemed less “hot” to be moved to HDDs at a later point. However as explained below, this is an expensive approach. Nimble storage arrays employ a unique architecture in that only data that is deemed to be cache-worthy for subsequent read access is written to SSDs, while all data is written to low-cost HDDs. Nimble’s unique architecture achieves very high write performance despite writing all data to HDDs by converting random write IOs issued by applications into sequential IOs on the fly, leveraging the fact that HDDs are very good at handling sequential IO.
- Write endurance issues demand the use of expensive SSDs. When SSDs receive random writes directly, the actual write activity within the physical SSD itself is higher than the number of logical writes issued to the SSD (a phenomenon called write amplification). This eats into the SSD lifespan, i.e. the number of write cycles that the SSD can endure. Consequently, many storage systems are forced to use higher endurance eMLC or SLC SSDs, which are far more expensive. In addition to the selective writing capability mentioned above, the Nimble architecture also optimizes the written data layout on SSDs so as to minimize write amplification. This allows the use of lower cost commodity MLC SSDs, while still delivering a 5 year lifespan.
- Overheads reduce useable capacity relative to raw capacity of SSDs. Hybrid arrays that can leverage data reduction techniques such as compression and de-duplication can significantly increase useable capacity. On the flip side, RAID parity overheads can significantly reduce useable capacity. Nimble’s architecture eliminates the need for RAID overheads on SSD entirely and further increases useable capacity by using inline compression.
- Infrequent decision-making about what data to place on SSDs and moving large-sized data chunks wastes SSD capacity. Most hybrid storage arrays determine what data gets placed on SSDs vs. HDDs by analyzing access patterns for (and eventually migrating) large “data chunks”, sometimes called pages or extents. This allows “hot” or more frequently requested data chunks to be promoted into SSDs, while keeping the “cold” or less frequently requested data on HDDs.
- Infrequent decisions on data placement cause SSD over-provisioning. Many storage systems analyze what data is “hot” on an infrequent basis (every several hours) and move that data into SSDs with no ability to react to workload changes between periods. Consequently, they have to over-provision SSD capacity to optimize performance between periods. Nimble’s architecture optimizes data placement real-time, with every IO operation.
- Optimizing data placement in large data chunks (many MB or even GB) causes SSD over-provisioning. The amount of meta-data needed to manage placement of data chunks gets larger as the data chunks get smaller. Most storage systems are not designed to manage a large amount of meta-data and they consequently use large-sized data chunks, which wastes SSD capacity. For example, if a storage array were to use data chunks that are 1GB in size, frequent access of a database record that is 8KB in size results in an entire 1GB chunk of data being treated as “hot” and getting moved into SSDs. Nimble’s architecture manages data placement in very small chunks (~4KB), thus avoiding SSD wastage.
How cost-effectively does the hybrid storage array use HDDs?
This means assessing the ratio of usable to raw HDD capacity, as well as the cost per GB of capacity. Three main areas drive this:
- Type of HDDs. Many hybrid arrays are forced to use high-RPM (10K or 15K) HDDs to handle performance needs for data that is not on SSDs, because of their (higher) random IO performance. Unfortunately high RPM HDD capacity is about 5x costlier ($/GB) vs. low RPM HDDs. As mentioned earlier, Nimble’s write-optimized architecture coalesces thousands of random writes into a small number of sequential writes. Since low-cost, high-density HDDs are good at handling sequential IO, this allows Nimble storage arrays to deliver very high random write performance with low-cost HDDs. In fact a single shelf of low RPM HDDs with the Nimble layout handily outperforms the random write performance of multiple shelves of high RPM drives.
- Data Reduction. Most hybrid arrays are unable to compress or de-duplicate data that is resident on HDDs (some may be able to compress or de-duplicate data resident on SSDs). Even among those that do, many recommend that data reduction approaches not be deployed for transactional applications (e.g., databases, mail applications, etc.). The Nimble architecture is able to compress data inline, even for high-performance applications.
- RAID and Other System Overheads. Arrays can differ significantly in how much capacity is lost due to RAID protection and other system overheads. For example many architectures force the use of mirroring (RAID-10) for performance intensive workloads. Nimble on the other hand uses a very fast version of dual parity RAID that delivers resiliency in the event of dual disk failure, allows high performance, and yet consumes low capacity overhead. This can be assessed by comparing useable capacity relative to raw capacity, while using the vendor’s RAID best practices for your application.
How responsive and flexible is the hybrid array at handling multiple workloads?
One of the main purposes of a hybrid array is to deliver responsive, high performance at a lower cost than traditional arrays. There are a couple of keys to delivering on the performance promise:
- Responsiveness to workload changes based on timeliness and granularity of data placement. As discussed earlier, hybrid arrays deliver high performance by ensuring that “hot” randomly accessed data is served out of SSDs. However many hybrid arrays manage this migration process only on a periodic basis (on the order of hours) which results in poor responsiveness if workloads change between intervals. And in most cases hybrid arrays can only manage very large data chunks for SSD migration, on the order of many MB or even GB. Unfortunately, when such large chunks are promoted into SSDs, large fractions of that can be “cold data” that is forced to be promoted because of design limitations. Then because some of the SSD capacity is used up by this cold data, not all the “hot” data that would have been SSD worthy is able to make it into SSDs. Nimble’s architecture optimizes data placement real-time, for every IO that can be as small as a 4 KB in size.
- The IO penalty of promoting “hot” data and demoting “cold” data. Hybrid arrays that rely on a migration process often find that the very process of migration can actually hurt performance when it is most in need! In a migration based approach, promotion of “hot” data into SSDs requires not just that data be read from HDDs and written to SSDs, but also that to make room for that hot data, some colder data needs to be read from SSDs and written into HDDs – which we already know are slow at handling writes. The Nimble architecture is much more efficient in that promoting hot data only requires that data be read from HDDs and written into SSDs – the reverse process is not necessary since a copy of all data is already stored in HDDs.
- Flexibly scaling the ratio of SSD to HDD on the fly. Hybrid arrays need to be flexible in that as the attributes of SSDs and HDDs change over time (performance, $/GB, sequential bandwidth, etc.), or as the workloads being consolidated on the array evolve over time, you can vary the ratio of SSD to HDD capacity within the array. A measure of this would be whether a hybrid array can change the SSD capacity on the fly without requiring application disruption, so that you can adapt the flash/disk ratio if and when needed, in the most cost effective manner.
We truly believe that storage infrastructure is going through the most significant transformation in over a decade, and that efficient hybrid storage arrays will displace modular storage over that time frame. Every storage vendor will deploy a combination of SSDs and HDDs within their arrays, and argue that they have already embraced hybrid storage architectures. The real winners over this transformation will be those who have truly engineered their product architectures to maximize the best that SSDs and HDDs together can bring to bear for Enterprise applications.
- Ajay Singh