[NRG] CCI Talk - Wed. Feb 6, 2013: The Convergence of HPC, BigData, and the Cloud

Bestavros, Azer best at bu.edu
Mon Feb 4 17:13:55 EST 2013


Title: 	 The Convergence of HPC, BigData, and the Cloud
Speaker: David Cohen
Speaker Affiliation: EMC

DATE: 		February 6, 2013
TIME: 		1:00 PM - 3:00 PM
LOCATION: 	Hariri Conference Room

Over the past decade or more, the SuperComputing community has refined the 
notion of a “Scalable Unit (SU).” This consists of a data center 
rack/frame that comes preconfigured with network, storage, and compute 
resources in well-defined, balanced ratios.  Many of these SUs are 
aggregated into larger resource pools via an aggregation layer of 
switching infrastructure. The resulting “cluster” provides a partitioning 
scheme with supporting software so that per-node resources are 
disaggregated and treated independently. Disaggregated resources are 
scaled into fabric-wide pools, managed by cluster resource managers that 
work in conjunction with job schedulers.

More recently and emerging in parallel, Cloud Computing has refined the 
notions of an SU, fabric-based scaling, and resource disaggregation via 
virtualization. Certainly, Amazon Web Service (AWS) stands as the 
trailblazer. Of note is that AWS’s HPC offerings fielded an HPC cluster 
that entered the 2011 annual SuperComputing event at 42nd. Competing 
directly with AWS, Google Compute Engine (GCE) and Microsoft Windows Azure 
are fast-followers. On the HPC front, Windows Azure’s Big Compute entered 
the top500 at 165th at last year’’s annual Super Computing event. Clearly, 
these Cloud operators are building infrastructure that can support HPC 
workloads.

However, the respective infrastructures of Amazon, Google, and Microsoft 
are proprietary systems, closed to innovation from the outside. The 
emergence of the OpenStack project is enabling others to transform their 
data centers into Cloud infrastructure. Certainly Rackspace stands as an 
example while other, smaller entrants include Dreamhost and Endurance 
International Group. It is in this context that we pose the questions: Can 
this so-called Cloud architecture be employed by the Massachusetts Open 
Cloud (MOC) initiative? And if so, can such an architecture satisfy the 
demands of HPC and BigData workloads?

Dave Cohen’s Bio

Dave Cohen is a Director at EMC, reporting to John Roese, EMC’s CTO. Dave 
is responsible for a variety of activities in the area of network 
virtualization, especially as it relates to storage and data management. 
He is the consummate technologist with a diverse set of skills and 
experiences. Over the course of his tenure at EMC, Dave served as the 
Atmos Cloud Storage product group’s acting CTO and most recently provides 
technical leadership in the areas of OpenStack, OpenCompute, and Software 
Defined Networking. His efforts in these areas have been key to EMC 
joining the OpenStack and OpenCompute communities as well as instrumental 
to Vmware’s acquisition of Nicira.

Dave joined EMC from Wall Street, where he worked most recently for
Goldman Sachs and previously Merrill Lynch. Over the course of his
30-year career, he has designed, engineered, and successfully
delivered large-scale, distributed systems for numerous enterprises
across industries. Dave is a published author, a sought-after speaker,
and a widely-respected practitioner in the field of distributed
computing.

References

DOE/Sandia Cplant – Concepts (see “Scalable Units”)
http://www.cs.sandia.gov/cplant/project/concepts.html

Barney, “Linux Clusters Overview,” 2013 (see “Cluster Configurations
and Scalable Units”)
https://computing.llnl.gov/tutorials/linux_clusters/

Winett, “Building Fast, Scalable I/O Infrastructures for
High-Performance Computing Clusters,” 2005
http://www.dell.com/downloads/global/power/ps4q05-20050332-DataDirect.pdf

Greenberg et al, “Enabling Department-Scale SuperComputing,” 1997
http://dakota.sandia.gov/papers/DeptScaleSC.pdf


More information about the NRG-L mailing list