Skip to end of metadata
Go to start of metadata

Principal Investigator: Rajeev Thakur
Co-PIs: Marc Snir, Pavan Balaji, Ewing Lusk
Mathematics and Computer Science Division
Argonne National Laboratory

Email: {thakur, snir, balaji, lusk} @mcs.anl.gov


The vast majority of DOE's parallel scientific applications running on the largest HPC systems are written in a distributed-memory style using MPI as the standard interface for communication between processes. These application codes represent billions of dollars worth of investment. As we transition from today's petascale systems to exascale systems by the end of this decade, it is not clear what will be the right programming model for the future. However, until a viable alternative to MPI is available, and large DOE application codes have been ported to the new model, MPI must evolve to run as efficiently as possible on future systems. This situation requires that both the MPI standard and MPI implementations address the challenges posed by the architectural features, limitations, and constraints expected in future post-petascale and exascale systems.

The most critical issue is likely to be interoperability with intranode programming models with a high thread count. This requirement has implications both for the definition of the MPI standard itself and for MPI implementations. Other important issues, also impacting both the standard and its implementation, include scalability, performance, enhanced functionality based on application experience, and topics that become more significant as we move to the next generation of HPC architectures: memory utilization, power consumption, and resilience.

Our group at Argonne has been a leader in MPI from the beginning, including the MPI standardization effort; research into implementing MPI efficiently that has resulted in a large number of publications; and development of a high-performance, production-quality MPI implementation (MPICH) that has been adopted by leading vendors (IBM, Cray, Intel, Microsoft, Myricom) and runs on most of the largest machines in the world. This project continues the ongoing MPI-related research and development work at Argonne, with the overall goal of enabling MPI to run effectively at exascale. Specific goals of this project fall into three categories:

  1. Continued enhancement of the MPI standard through participation in the MPI Forum to ensure that the standard evolves to meet the needs of future systems and also of applications, libraries, and higher-level languages.
  2. Continued enhancement of the MPICH implementation of MPI to support the new features in future versions of the MPI standard (MPI-3 and beyond) and to address the specific challenges posed by exascale architectures, such as lower memory per core, higher thread concurrency, lower power consumption, scalability, and resilience.
  3. Investigation of new programming approaches to be potentially included in future versions of the MPI standard, including generalized user-defined callbacks, lightweight tasking, and extensions for heterogeneous computing systems and accelerators.

We have close ties with various DOE applications that are targeted to scale to exascale, including the exascale codesign centers. We will work with these applications, particularly the mini-apps and skeleton codes from the codesign centers, to study the effectiveness of our MPI implementation and of the new features in the MPI standard. We will also continue our collaboration with vendors, particularly IBM, Cray, and Intel, to codesign MPICH such that it remains the leading implementation running on the fastest machines in the world.

MPICH Project web site: www.mpich.org

Other Synergistic Efforts
  1. Center for Exascale Simulation of Advanced Reactors (CESAR codesign center)
  2. Exascale Operating Systems and Runtime Technical Council
  3. Computation-Driven Discovery for the Dark Universe, SciDAC-3 project, PI: Salman Habib
  4. Compiled MPI (CoMPI)
  5. Exploring Efficient Data Movement Strategies for Exascale Systems with Deep Memory Hierarchies, Pavan Balaji, DOE Early Career Award
  6. Every major DOE application code uses MPI, so this project will benefit all applications.
  7. Since MPICH is used as the basis of MPI implementations from IBM, Cray, and Intel, the major DOE supercomputer installations will benefit from this work.
Publications
  1. See publications list at www.mcs.anl.gov/~thakur/papers and http://www.mcs.anl.gov/~balaji/publications.php.
Presentations
  1. Message Passing in Hierarchical and Heterogeneous Environments: MPI-3 and Beyond, Pavan Balaji, Workshop on Productive Programming Models for Exascale, Portland, OR, August 2012
  2. Challenges for Communication Libraries and Runtime Systems at Exascale, Rajeev Thakur, Workshop on Clusters, Clouds, and Data for Scientific Computing, Dareizé, France, September 2012
  3. The Role of MPI in the Migration of Legacy Applications, Rajeev Thakur, DOE Exascale Research Conference, Arlington, VA, October 2012
Labels
  • None