Implementing Synchronous Replication
By Richard Jooss, Director, Product Management

Synchronous replication (or ‘synchronous mirroring’ as it’s also often referred to), is the concept of ensuring that data written is safely stored in two separate locations before a successful acknowledgment is sent to the host.   Internally, our Triple+ Parity RAID provides better availability within one location, but providing a recovery point objective (RPO) of 0 across two locations requires replicating or mirroring the writes inline before acknowledging the write to the host.

Most customers choose locations that are a few kilometers (km) apart, but the locations can sometimes be as close as the next rack or as far as the next city.  Since response time is important for most workloads that use block storage, the latency of the links between sites (speed of light through fiber optical cable is ~2/3 speed of light in a vacuum) is the limiting factor as 100 km of distance adds, at a minimum, a full millisecond of write latency.

Implementing synchronous replication is always a difficult proposition that includes a number of challenges such as optimizing the latency. In addition, it’s critical to maintain data consistency in the face of various site, network and array failure scenarios.  As can be seen by the various products in the marketplace, there is vast array (pun fully intended) of design and functionality choices.  While it doesn’t make sense to go into great detail, here are three aspects of our ongoing development work that we believe are important:

  • Extension of our scale-out cluster capabilities
  • We can non-disruptively scale in three dimensions: (UP) with increased IO capability through increased compute resources, (DEEP) with increased capacity and (OUT) with increased compute and capacity.  Our ability to create a scale-out cluster is a great match for synchronous replication.  It provides a proven robust communications infrastructure to efficiently pass data between arrays enabling us to keep latency to a minimum by having a single round-trip between locations.  Perhaps, and more importantly, it provides an existing SCSI target that spans multiple arrays enabling a logical unit (LUN) to be visible across the scale-out cluster.  We’ll talk about why that’s important in the next bullet.

  • Automatic Transparent failover (ATF)
    When we talk about ATF, we aren’t referring to the Bureau of Alcohol, Tobacco and Firearms but to Automatic Transparent Failover. We realize that when customers want synchronous replication they not only want RPO=zero, i.e. no data loss because every write is safely stored in two separate locations, but also want an RTO=zero where IO continues in the face of site or array failure without requiring special host-clustering software integrations to drive the failover process.   As mentioned above, we can leverage our scale-out capabilities to allow us to have a volume or LUN available on multiple arrays across sites so that standard host multipathing solutions are all that is needed for host integration.
  • Simplicity
    We put a lot of value and work on keeping things simple, and it’s no different for synchronous replication. Today, a large percentage of our customers use our snapshot based replication.  From a management and configuration perspective, customers will use the same simple management paradigms that we have in place for periodic snapshot replication.  For example, the existing management infrastructure includes the concept of volume collections that allow the user to have volume level granularity while managing a smaller number of objects.

Synchronous replication is currently in development and planned for a future NimbleOS release. There will be no cost for existing customers who maintain an active support contract.  Here’s the section our lawyers force us to include (I try to read it as fast as the people on the TV commercials):

Any unreleased services, features or functions referenced in this document, our website or other press releases or public statements that are not currently available are subject to change at Nimble Storage’s discretion and may not be delivered as planned or at all. Customers who purchase Nimble Storage’s products and services should make their purchase decisions based upon services, features and functions that are currently available.