Publications from these projects can be found on my pubs page.
Many of the ideas that drive modern cloud computing, such as server virtualization, network slicing, and robust distributed storage arose from the research community. Despite this success, today’s clouds have become environments that are unsuitable for moving this research agenda forward—they have particular, unmalleable implementations of these ideas baked in. We are building CloudLab, a facility that will enable fundamental advances in cloud architecture. CloudLab will not be a cloud; CloudLab will be large-scale, distributed scientific infrastructure on top of which many different clouds can be built. It will support thousands of researchers and run hundreds of different, experimental clouds simultaneously. The Phase I CloudLab deployment will provide approximately 15,000 cores in data centers at Clemson (with Dell equipment), Utah (HP), and Wisconsin (Cisco), with each industrial partner collaborating to explore next-generation ideas for cloud architectures.
The use of cloud computing has revolutionized the way in which cyber infrastructure is used and managed. The on-demand access to seemingly infinite resources provided by this paradigm has enabled technical innovation and indeed innovative business models and practices. This rosy picture is threatened, however, by increasing nefarious interest in cloud platforms. Specifically, the shared-tenant, shared-resource nature of cloud platforms, as well as the natural accrual of valuable information in cloud platforms, provide both the incentive and the possible means of exploitation.
To address these concerns we are developing a self-defending, self-evolving, and self-accounting trustworthy cloud platform, the TCloud. Our approach in realizing TCloud holds to the following five tenets. First, defense-in-depth through innate containment, separation and diversification at the architectural level. Second, least authority by clear separation of functionality and associated privilege within the architecture. Third, explicit orchestration of security functions based on cloud-derived and external intelligence. Fourth, moving-target-defense through deception and dynamic evolution of the platform. Fifth, verifiable accountability through lightweight validation and auditable monitoring, record keeping, and analysis.
Operational complexity counts among the top challenges faced by network operators. This complexity arises, in part, because of the scale and continued growth of modern networks, the inherent complexity and intricate dependencies of the protocols that these networks run, and the increased expectations of network users due to the increasing importance that network connectivity and networked services play in society. This complexity has been heightened by recent developments that make networks much more dynamic, adding a whole new dimension to the complexities of network management and operations (M&O). The resulting state of affairs acts to impede the pace of innovation and change in networks. In short, research on network M&O has not kept pace with the research transforming the networks themselves. We identify three primary research challenges that stand in the way of improving M&O and designing systems for autonomic network M&O. The first is the lack of a holistic network-wide management framework that puts knowledge, policy, and practices into software, rather than in the hands of operators. The second is the fact that, despite the inherent structure present in networks, data from these networks are highly unstructured, semantically deficient and suffer from data uncertainty. Finally, though there is some understanding of how to set the myriad of discrete configuration options in a modern network individually, it is an open problem to set them dynamically on a network wide-basis, responding to changing conditions.
To address these deficiencies, we propose to realize a Knowledge-Centric Software-Defined Network Management and Operations architecture (KnowOps). We propose to create a network operations framework (NOF) as a systematic and principled foundation for comprehensive network management and operations. We will combine this foundation with information centric data mining methods to create a structured information base which captures, in a systematic manner, the status of the network and expose it to other network management functions. We plane to create a knowledge base capable of systematically capturing operational procedures and policies as specified by domain experts. Finally, we will develop search based policy execution strategies to allow the setting of network operating points to be optimized based on current network conditions.
The importance of mobile devices and the mobile networks that support them can hardly be overemphasized. Despite fantastic advances in wireless technologies and mobile devices, current mobile network architectures, while packet based, overwhelmingly resemble their circuit-switched forebears. To enable the fundamental research and innovation demanded to advance mobile networking beyond the state-of-the-art, a new facility called PhantomNet, will be developed and coupled with the Emulab testbed at the University of Utah. PhantomNet will be a fully programmable end-to-end mobile testbed with unique features to facilitate research efforts at the intersection of mobile networking, cloud computing and software defined networking. PhantomNet will enable hands-on teaching in mobile networking technology in a manner that simply does not exist today. Further, the facility will enable forward-thinking research that re-considers the technical and economic factors involved with the interplay between mobile networks, software defined networks and cloud technologies. The availability of a physical facility will help researchers to transition new mobile network designs from theory into practice.
Emulab is a network testbed, giving researchers a wide range of environments in which to develop, debug, and evaluate their systems. The name Emulab refers both to a facility and to a software system. The primary Emulab installation is run by the Flux Group, part of the School of Computing at the University of Utah. There are also installations of the Emulab software at more than two dozen sites around the world, ranging from testbeds with a handful of nodes up to testbeds with hundreds of nodes. Emulab is widely used by computer science researchers in the fields of networking and distributed systems. It is also designed to support education, and has been used to teach classes in those fields. Emulab is a public facility, available without charge to most researchers worldwide.
GENI, the Global Environment for Network Innovations, is a national facility that supports exploration of radical designs for a future global networking infrastructure. It is a research network/testbed that is geographically distributed, contains diverse devices including wireless, supports many simultaneous experimenters, and allows end-users to use and exploit those experimental protocols. ProtoGENI is an NSF-funded and GPO-funded prototype implementation and deployment of GENI, led by the Flux research group at the University of Utah, and largely based on our Emulab software. ProtoGENI is the Control Framework for GENI Cluster C, the largest set of integrated projects in GENI.
PRObE is an NSF-sponsored project aimed at providing a large-scale, low-level systems research facility. It is a collaborative effort by the New Mexico Consortium, Los Alamos National Laboratory, Carnegie Mellon University, the University of Utah, and the University of New Mexico. It is housed at NMC in the Los Alamos Research Park. PRObE provides a highly reconfigurable, remotely accessible and controllable environment that researchers can use to perform experiments that are not possible at a smaller scale. PRObE at full production scale provides at least two 1024 node clusters, one of 200 nodes, and some smaller machines with extreme core count and bleeding edge technology. The machines are retired large clusters donated by DOE facilities.
KGPU is a GPU computing framework for the Linux kernel. It allows Linux kernel to call CUDA programs running on GPUs directly. The motivation is to augment operating systems with GPUs so that not only userspace applications but also the operating system itself can benefit from GPU acceleration. It can also free the CPU from some computation intensive work by enabling the GPU as an extra computing device.
Modern GPUs can be used for more than just graphics processing; they can run general-purpose programs as well. While not well-suited to all types of programs, they excel on code that can make use of their high degree of parallelism. Most uses of so-called “General Purpose GPU” (GPGPU) computation have been outside the realm of systems software. However, recent work on software routers and encrypted network connections has given examples of how GPGPUs can be applied to tasks more traditionally within the realm of operating systems. These uses are only scratching the surface. Other examples of system-level tasks that can take advantage of GPUs include general cryptography, pattern matching, program analysis, and acceleration of basic commonly-used algorithms; we give more details in our whitepaper. These tasks have applications on the desktop, on the server, and in the datacenter.
Trusted Disk Loading System (TDLS)
Network testbeds like Emulab allocate physical computers to users for the
duration of an experiment. During an experiment, a user has nearly unfettered
access to the devices under his or her control. Thus, at the end of an
experiment, an allocated computer can be in an arbitrary state. A testbed must
reclaim devices and ensure they are properly configured for future experiments.
This is particularly important for security-related experiments: for example, a
testbed must ensure that malware cannot persist on a device from one experiment
We have implemented a trusted disk-loading system (TDLS) for Emulab. When
Emulab allocates a PC to an experiment, the TDLS ensures that if experiment
set-up succeeds, the PC is configured to boot the operating system specified by
the user. The TDLS uses the Trusted Platform Module (TPM) of an allocated PC
to securely communicate with Emulab’s control infrastructure and attest about
the PC’s configuration. The TDLS prevents state from surviving from one
experiment to another, and it prevents devices in the testbed from
impersonating one another. The TDLS addresses the challenges of providing a
scalable and flexible service, which allows large testbeds to support a wide
range of systems research.
Fair Cloud Storage
A common problem with disk-based cloud storage services is that performance can
vary greatly and become highly unpredictable in a multi-tenant environment. A
fundamental reason is the interference between workloads co-located on the same
physical disk. We observe that different IO patterns interfere with each other
significantly, which makes the performance of different types of workloads
unpredictable when they are executed concurrently. Unpredictability implies
that users may not get a fair share of the system resources from the cloud
services they are using. At the same time, replication is commonly used in
cloud storage for high reliability. Connecting these two facts, we propose a
cloud storage system designed to minimize workload interference without
increasing storage costs or sacrificing the overall system throughput. Our
design leverages log-structured disk layout, chain replication and a
workload-based replica selection strategy to minimize interference, striking a
balance between performance and fairness. Our initial results suggest that this
approach is a promising way to improve the performance and predictability of
Performance Analysis and Visualization of Xentrace Logs
Diagnosing performance problems in virtualized environments can be challenging: analyzing and improving performance in VMs requires a deep understanding of the dependancies between hypervisors, guest and host OS kernels, and applications. This project seeks to produce a framework for writing analyses of logs produced by the xentrace facility in the Xen hypervisor, and to produce visualizations of those logs so that developers can easily
- Agile Freenet
- Search within NUCA caches