A few months ago, I blogged about iajay-blog-1ndustry announcements regarding multi-core CPUs, and how the vendors introducing those products, were struggling to articulate tangible customer benefits. After all, a spindle bound architecture remains spindle bound even if you remove single-threaded bottlenecks in some obscure internal process. In fact, lots of marketing ink has been spilled for loosely defined claims of architectures supposedly leveraging multi-core CPUs and “Moore’s law.”

CASL™, the Cache-Accelerated Sequential Layout and the crown jewel of Nimble’s Adaptive Flash platform, is NOT spindle bound.  That makes it unique in primary storage. Here’s how it allows us to ride the inexorable rise of multi-core CPUs, while delivering a compelling customer benefit:

  • When we shipped our first product in 2010 (a CS220), we could get about 1,000 4K random write IOPS out of a single low RPM HDDwith sub-millisecond latency (yes, we’re referring to a vanilla 7,200 RPM HDD, normally capable of something like 50 to 60 IOPS).  Of course, the array had a shelf full of these – not just one – so system performance was much higher. The 20x acceleration in IOPS is the magic of CASL. And this was sustained, steady state performance – not the short lived boost that you might get with a vanilla write cache or tiered system (more on the CASL vs write cache comparison in a subsequent blog). If you’re familiar with CASL, you know we achieve this level of performance without reliance on flash, but by converting inefficient random writes into deterministic, efficient sequential stripes (more details in this blog. Over time, this write performance grew to over 1,500 IOPS per disk, by virtue of additional software optimizations.leveraging CPU
  • In 2012, we introduced the CS400 series – the same chassis, the same disks, but with more CPU cores. Currently, those arrays are capable of about 4,500 (4K random) write IOPS per single low RPM HDD, with sub-millisecond latency, an estimated 75x increase compared to the traditional disk layout found in just about every other storage architecture.
  • Today, we introduce the CS700 series – the same chassis, the same disks, but with more CPU cores. This product delivers a whopping 10,000 (4K random) write IOPS per single low RPM HDDagain with sub-millisecond latency. This represents an estimated 166x acceleration compared to the traditional disk layout in just about every other storage architecture. Of course, this system has a shelf of these HDDs – not just one – so system performance is much higher.
  • And, we’re not done yet: There’s yet more “arbitrage” headroom left in each commodity HDD to convert random IO to sequential IO. When the next generation of x86 CPUs comes around we’ll be ready to squeeze more performance out of the same number of spindles.

Oh, and did I mention all the numbers above include any overhead imposed by two other features that bring the typical storage system to its knees namely inline compression, and triple-parity RAID. Compression remains the most reliable way to achieve data reduction across the spectrum of enterprise applications, and we have already blogged about our “on-by-default” inline compression. Our proprietary triple-parity RAID format is brand new; it can tolerate up to (any) 3 disks failing in a shelf, and yet deliver extremely high performance and fast re-build times. (More on this in a later blog.)

leveraging CPU One other note: The numbers, above, aren’t just useful in demonstrating how CASL leverages CPU cores differently than spindle bound architectures. They also allow a comparison with systems that deliver write performance via brute force – using SSDs to replace high RPM HDDs. Obviously SSDs are very expensive per GB (a downside further amplified by using 40 to 50 percent capacity for overprovisioning and RAID), but the presumed advantage is that SSDs are faster than anything disk based. However, if you ask any storage vendor for their sizing guidelines with MLC SSDs, most can only derive 2,500 to 3,000 IOPS per SSD. In other words CASL+ multi-core CPUs + a lowly 7.2K RPM HDD actually beats the performance of multiple MLC SSDs.

If you want a more traditional comparison, this is equivalent to 50-plus high RPM HDDs (when accounting for the RAID overhead in most storage systems) or, in other words, about 3 shelves of expensive, power hungry high RPM disks.

So there you have it: When tackling random write performance (one of the hardest problems in storage), CASL leverages multi-core CPUs to turn inexpensive, dense, low RPM HDDs into high-performance beasts.

In the next blog, I’ll describe how Adaptive Flash tackles read performance.