It’s a privilege for me to help realize the vision of converged storage put forth by Varun. Similar ideas have been implemented by other innovative vendors, but not with the architectural foundation needed to support them fully. At Nimble, we are fortunate to be able to leverage technological advances such as flash memory and multi-core CPUs to do a clean-slate design which addresses real pain points of storage users. And, we were able to assemble a team of seasoned architects to build it.
Our recipe for converging primary and backup storage is simple and has two parts.
1. Capacity optimization: Storing backups for 30–90 days needs lots of capacity. In a system not designed to store backups, they can easily use 10–20x the space used in primary storage. We handle this problem as follows:
- Store all data on high-capacity disk drives. These disks have over 3x the capacity and 1/6x the cost per GB of high-performance disks. They also have only 1/3x the performance of high-performance disks, but we deal with that separately. (The high-capacity disks have often been called SATA disks, but that is quickly becoming a misnomer as high-capacity SAS drives enter the market.)
- Use data reduction techniques such as compression and block sharing. These techniques can reduce the space used by backups by 10–20x. Block sharing can take many forms, e.g., snapshots and dedupe, and it is important to pick judiciously based on the context. I will write further about this in the next article.
2. Performance optimization: Especially random IO performance. Common business applications such as Exchange and SQL Server generate lots of random IO. Hard disks are generally bad at random IO. High-capacity disks are particularly bad. We use two techniques that more than make up for this slowness:
- Accelerate random reads using flash as a large cache. Most storage vendors have a story around using flash. However, flash has some peculiar characteristics, and how a system uses flash is more important than whether it uses flash. In particular, flash is not a performance cure-all; e.g., it might not be cost effective in accelerating random writes.
- Accelerate random writes by sequentializing them on disk. This technology has been known for some time as log-structured file systems, but it has become more interesting recently because of new enabling technologies.
I will be writing more about these and related issues in this blog. Your thoughts and questions are most welcome.