Rocksteady: Fast Migration for Low-latency In-memory Storage

Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, and Ryan Stutsman

Proceedings of the Symposium on Operating System Principles (SOSP) 2017.

abstract

Scalable in-memory key-value stores provide low-latency access times of a few microseconds and perform millions of operations per second per server. With all data in memory, these systems should provide a high level of reconfigurability. Ideally, they should scale up, scale down, and rebalance load more rapidly and flexibly than disk-based systems. Rapid reconfiguration is especially important in these systems since a) DRAM is expensive and b) they are the last defense against highly dynamic workloads that suffer from hot spots, skew, and unpredictable load. However, so far, work on in-memory key-value stores has generally focused on performance and availability, leaving reconfiguration as a secondary concern.

We present Rocksteady, a live migration technique for the RAMCloud scale-out in-memory key-value store. It balances three competing goals: it migrates data quickly, it minimizes response time impact, and it allows arbitrary, fine-grained splits. Rocksteady migrates 758 MB/s between servers under thhigh load while maintaining a median and 99.9 percentile latency of less than 40 and 250 μs, respectively, for concurrent operations without pauses, downtime, or risk to durability (compared to 6 and 45 μs during normal operation). To do this, it relies on pipelined and parallel replay and a lineage- like approach to fault-tolerance to defer re-replication costs during migration. Rocksteady allows RAMCloud to defer all repartitioning work until the moment of migration, giving it precise and timely control for load balancing.