Abstract

Despite the fact that large-scale shared-memory multiprocessors have been commercially available for several years, system software that fully utilizes all their features is still not available, mostly due to the complexity and cost of making the required changes to the operating system. A recently proposed approach, called Disco, substantially reduces this development cost by using a virtual machine monitor that leverages the existing operating system technology.In this paper we present a system called Cellular Disco that extends the Disco work to provide all the advantages of the hardware partitioning and scalable operating system approaches. We argue that Cellular Disco can achieve these benefits at only a small fraction of the development cost of modifying the operating system. Cellular Disco effectively turns a large-scale shared-memory multiprocessor into a virtual cluster that supports fault containment and heterogeneity, while avoiding operating system scalability bottle-necks. Yet at the same time, Cellular Disco preserves the benefits of a shared-memory multiprocessor by implementing dynamic, fine-grained resource sharing, and by allowing users to overcommit resources such as processors and memory. This hybrid approach requires a scalable resource manager that makes local decisions with limited information while still providing good global performance and fault containment.In this paper we describe our experience with a Cellular Disco prototype on a 32-processor SGI Origin 2000 system. We show that the execution time penalty for this approach is low, typically within 10% of the best available commercial operating system for most workloads, and that it can manage the CPU and memory resources of the machine significantly better than the hardware partitioning approach.

Keywords

Computer scienceScalabilityMultiprocessingEmbedded systemPage faultVirtual machineDistributed computingVirtual memoryShared memoryOperating systemMemory managementOverlay

Affiliated Institutions

Related Publications

HSP-HMMER

HMMER is arguably the best tool for protein domain identification, which is essential for biological function prediction. There are many software and hardware enhancements of HM...

2009 Proceedings of the 2009 ACM symposium... 12 citations

Dryad

Dryad is a general-purpose distributed execution engine for coarse-grain data-parallel applications. A Dryad application combines computational "vertices" with communication "ch...

2007 2446 citations

GeePS

Large-scale deep learning requires huge computational resources to train a multi-layer neural network. Recent systems propose using 100s to 1000s of machines to train networks w...

2016 296 citations

Publication Info

Year
1999
Type
article
Pages
154-169
Citations
114
Access
Closed

External Links

Social Impact

Altmetric
PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

114
OpenAlex

Cite This

Kinshuk Govil, Dan Teodosiu, Yongqiang Huang et al. (1999). Cellular Disco. , 154-169. https://doi.org/10.1145/319151.319162

Identifiers

DOI
10.1145/319151.319162