Flux Research Group / School of Computing

The Part-Time Cloud: Enabling Balanced Elasticity Between Diverse Computing Environments

Dmitry Duplyakin, David Johnson, and Robert Ricci

Proceedings of the Eighth Workshop on Scientific Cloud Computing (ScienceCloud) 2017.

Testbeds, Cloud


Clouds, HPC clusters, HTC systems, and testbeds all serve different parts of the computing ecosystem: each are designed for different types of workloads and suited to different types of research and commercial users. We propose that an effective way to share resources among these diverse applications is to not shoehorn them all into the same resource management framework, but to partition a common hardware substrate among different frameworks: for example, to have part of a cluster managed by a cloud framework such as OpenStack, part of it managed by an HPC scheduler such as SLURM, etc. In order to efficiently manage such a shared resource, it must be possible to adjust the set of resources controlled by each in an elastic manner.

While resource allocation and scheduling within each of these types of environments are well studied, what we consider in this paper is elasticity between them. Our goal is to enable each management framework to separately manage the resources currently within its own domain, scheduling jobs, VMs, etc. according to its own needs and policies. At the same time, the frameworks can coordinate with one another so that when resources must be moved between them, it can be done in the most fair and efficient manner possible. We evaluate our ideas using a prototype that shares resources between a testbed and an HPC cluster, and with simulations using real workload traces. We find that with only minimal information flow it is possible to elastically adjust resource assignments while each framework optimizes for its own internal criteria.