All Flash: Performance, Scalability, and More
By Jeff Feierfeil – Product Management
The all flash data center is here. Flash storage is at the heart of every Nimble Storage array ever shipped, and today we extend that heritage with the AF series of All Flash arrays.
But fast flash is not enough. People now expect data velocity – instant response from their apps, both desktop and mobile. The only way IT can fulfill that demand is by predicting and preventing barriers between your apps and your data. That’s the value of Nimble’s new Predictive All Flash arrays. Instead of just responding to changing workloads, Nimble has leveraged the power of predictive analytics to optimize performance across your organization’s infrastructure.
In addition to these unmatched new capabilities, Nimble’s new All Flash arrays are also designed to stand out across a number of traditional dimensions. I’ll focus here on three that I believe are top of mind for many customers:
- Total Cost of Ownership.
Since introducing its first Adaptive Flash array in July 2010, Nimble Storage has delivered a storage platform that extracts maximum business value from the underlying hardware components. There are two parts to the secret sauce:
- the operating system (NimbleOS), a highly multi-threaded architecture where performance is linked to CPU cores rather than to the underlying media;
- CASL (Cache Accelerated Sequential Layout), a log structured filesystem designed for flash.
Nimble has taken this foundation and extended it to build an All Flash array with unparalleled performance. Because it’s built atop a log structured file system, CASL is able to leverage the newest generation of cost-optimized flash chips (3D TLC NAND), using innovative software engineering to extend their lifespan to seven-plus years.
In the All Flash array, random data is block-coalesced in NVDIMM (non-volatile memory) and then sequentially written to SSD. During block-coalescing, the array performs inline variable block deduplication, compression and zero pattern elimination.
And CASL was designed to be extensible for future media types, whether emergent flash technologies such as 3D XPoint chips, or future solid state media that haven’t yet been dreamed up.
Feeds & Speeds
Performance benchmarks are important, but only if they’re based on test conditions that match your real-world environment. In designing Nimble’s All Flash arrays, we knew up front that we’d need to provide hundreds of thousands or more IOPS at consistent sub-millisecond latencies, combined with effective flash capacity of at least a petabyte or more regardless of read/write (RW) mix and dedupe ratios.
One big advantage we have is InfoSight, the Predictive Analytics platform that collects and analyzes more than a trillion data points a day from thousands of Nimble Storage customer arrays. The data from InfoSight is very enlightening – unlike irrelevant “average workloads” provided by vendors, it lets us graph the exact “real-life” distribution of customer workloads from a block size and RW mix perspective.
Contrary to what some vendors have stated, the data shows that most real-world workloads are performing at I/O block sizes ranging in the 4k-8k block range, with more than 70% of the traffic being write-specific. We were able to integrate this information into our All Flash designs to ensure the highest realistic application performance.
For instance, Nimble’s garbage collection (GC) routines are optimized for performance consistency for both reads and writes under heavy loads. Some all flash arrays focus mostly on reads, because that is the easier problem to solve and allows for bigger marketing claims. However as read caches on servers become increasingly larger and more economical, the workloads a SAN (storage area network) sees are increasingly write-dominated.
To further boost performance, Nimble’s data structures are very efficient, making heavy use of indirection, multiple indexes, and other techniques. An efficient metadata implementation maintains consistent latency regardless of how full the system is by virtue of a continuous lightweight GC process. This also helps in maintaining performance consistency regardless of the “dedupability” of the data. Where some architectures will vary wildly depending on dedupe, Nimble’s performance is consistent. We explore this topic in greater detail in the blog post on Data Reduction.
Nimble’s scale-to-fit paradigm allows our customers to grow the capacity and performance of their existing arrays independently and non-disruptively, and we’ve taken the same approach with our All Flash arrays.
Customers can non-disruptively scale-up (add performance), scale-deep (add capacity), and scale-out (cluster multiple arrays into a single pool), both on the current Adaptive Flash platform and with the All Flash arrays (AFAs). Nimble is the only AFA that can effectively scale in all three dimensions (competing architectures can scale in one or two dimensions but not all three).
- Scale Up allows non-disruptive upgrades to bigger CPU and memory hardware to enable higher performance. If you need more performance, but not more capacity, a Nimble AFA can be non-disruptively upgraded to a higher end controller. With Timeless Storage this is automatically included as part of the initial three-year support purchase and subsequent three-year support purchases.
- Scale Deep allows non-disruptive adding of capacity behind a single All Flash array. Nimble’s DRAM-efficient architecture enables an order of magnitude higher capacity scalability than many competing architectures, lowering the total cost per gigabyte.
- Scale Out allows multiple Nimble All Flash arrays to be clustered together to present a single storage pool. A Nimble All Flash array scale-out cluster, managed as a single entity, can non-disruptively scale beyond the limits of other All Flash arrays to over 8PB effective (assuming 5x dedupe), delivering more than 1.2 million IOPS at less than 1ms response time. In addition to performance and capacity, the Nimble scale-out architecture also balances SSD wear across arrays and enables non-disruptive data migration as new arrays are added or removed.
Total Cost of Ownership
Nimble All Flash arrays provide an unbeatable level of performance, at 33% to 66% lower TCO than other solutions. This is due to a number of factors, including memory requirements, storage efficiency, SSD design and efficient backup and DR.
Nimble arrays have a very low memory overhead (10 to 30x less than other All Flash arrays), which provides customers with a significant cost advantage. This is coupled with the ability to scale capacities as high as ~410TB usable (553TB raw) per array, before any data reduction (over 2PB effective capacity).
For instance, Nimble’s highest-end array has 448GB of DRAM and can address 553TB of raw flash, whereas competing arrays have as much as 1TB per array to address significantly less capacity. More memory, in this case, is not better, especially when you consider the cost of all that DRAM. And higher capacity scalability for Nimble means the array cost is spread out over more terabytes. Nimble’s dedupe implementation is also very memory efficient, allowing a given amount of memory to support a much higher logical deduped capacity than competing architectures.
To effectively compare the efficiency of different storage systems, it’s important that you include RAID and other overhead factors. One simple way to measure this is by observing an array’s capacity ratio – Nimble’s is as high as 74% (after Triple+ Parity, integrated hot sparing, metadata, and overprovisioning). Relative to some AFA competitors, we see a 20% advantage on usable capacity (raw capacity being equal), which can be like the difference between 3x versus 5x data reduction. In the end all that should be measured and considered is the cost of usable capacity ($/GB), which is computed by dividing the system cost by the usable capacity. This will reveal how efficient the architecture really is.
Snapshots and Replication
Nimble’s lightweight snapshots and zero-copy clones don’t employ the per-block metadata heavy approaches used by some vendors, and instead use the more efficient block-sharing scheme that we’ve employed since day one. This lets you keep way more snapshots and clones without requiring additional CPU or memory to create and manage them, and affords much more efficient system-level garbage collection.
For companies needing the performance of All Flash arrays, Nimble can deliver a significantly lower cost solution by retaining long term snapshots / replicas on lower cost-optimized Adaptive Flash arrays containing both flash and disk. This provides customers with the most cost-effective replication / vaulting solution in the industry by transparently employing less-expensive Adaptive Flash arrays natively at secondary sites with All Flash systems located in the primary data center. Adaptive Flash arrays are the perfect companion to All Flash arrays for backup, disaster recovery, and archival. Backup and archival copies can be retained for longer at a much lower cost. And since both primary and secondary arrays run the same version of NimbleOS, customers get the same functionality and user experience across all Nimble devices.
Nimble has always provided cost effective and non-disruptive (no forklift) in-place upgrades across technology generations, with all-inclusive licensing and flat support pricing over the product line’s lifetime. We’ve continued that tradition with our All Flash products, offering support options which include free faster controller upgrades at the end of three years, something no other vendor can match.
In summary, it’s clear that Nimble’s new All Flash arrays have been designed to meet and exceed the requirements of customers in corporate data centers and data service providers.
To learn more, check out the NimbleConnect community online, which will feature a series of in-depth technical blog posts over the next couple of weeks.
- Jeff Feierfeil