In Datacenter Performance, The Only Constant Is Change
Proceedings of the Twentieth IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid) 2020.
All computing infrastructure suffers from performance variability, be it bare-metal or virtualized. This phenomenon originates from many sources: some transient, such as noisy neighbors, and others more permanent but sudden, such as changes or wear in hardware, changes in the underlying hypervisor stack, or even undocumented interactions between the policies of the computing resource provider and the active workloads. Thus, performance measurements obtained on clouds, HPC facilities, and, more generally, datacenter environments are almost guaranteed to exhibit performance regimes that evolve over time, which leads to undesirable nonstationarities in application performance. In this paper, we present our analysis of performance of the bare-metal hardware available on the CloudLab testbed where we focus on quantifying the evolving performance regimes using changepoint detection. We describe our findings, backed by a dataset with nearly 6.9M benchmark results collected from over 1600 machines over a period of 2 years and 9 months. These findings yield a comprehensive characterization of real-world performance variability patterns in one computing facility, a methodology for studying such patterns on other infrastructures, and contribute to a better understanding of performance variability in general.