As the only forecasting centre ECMWF has assumed full membership status of the ETP4HPC since 2014 with the aim to generate a concerted European approach to produce sustainable excellence in HPC for weather and climate prediction. ESCAPE was the first project targeted to address specific aspects of the ETP4HPC research agenda and its realisation in multi-scale numerical modelling of the atmosphere. ESCAPE-2 elevates this effort to a much wider community, the full Earth system and across a wider set of representative models, and with substantial technical enhancements towards fulfilling the ETP4HPC SRA's multi-dimensional HPC vision.
Technical Research Priorities (and milestones):
- HPC system architecture and components: ESCAPE-2 prepares for diverse options to use specialized compute units but does not contribute directly to hardware development. Based on existing and pre-release technology to be made available by BULL and HPC centres associated with the ESCAPE-2 consortium partners, the performance portability and programmability of ESCAPE-2 benchmarks will be tested, and performance (with focus on energy efficiency and time-to-solution) will be assessed based on defined metrics. Hence, ESCAPE-2 will directly impact on the application side of co-design. Moreover, compilers are an essential part of HPC installations. In providing weather & climate dwarfs and the HPCW to compiler developers, systematic and routine testing of vendor provided compilers against archetypical domain algorithms will aid and accelerate compiler development cycles and add robustness together with enhanced customer satisfaction.
Relevant SRA-2 milestones: M-ARCH-1, M-ARCH-7.
- Programming environment: ESCAPE-2 contributes directly to the programming environment by adopting the effectiveness of domain-specific languages for enhancing productivity, accelerating development cycles and achieve performance portability in terms of computing and energy efficiency of key algorithmic components. The design and implementation of a weather and climate domain-specific language (DSL) concept based on the tools introduced in ESCAPE will bridge the chasm between highly complex heritage codes and software layers with substantial hardware specific design elements. This development is considered crucial for enabling the mathematical and algorithmic developments to be (a) useable across different models and (b) applicable to and portable between existing and future hardware technologies. The collaboration with BULL as a partner introduces an interface to compiler design in support of hardware abstraction towards a flexible management of data locality and concurrency. ESCAPE-2's weather and climate DSL design will dramatically transform the implementation and adaptation efficiency of weather and climate prediction applications throughout the FET programme's co-design phase.
Relevant SRA-2 milestones: M-PROG-API-1, M-PROG-API-2, M-PROG-API-5.
- Energy and resilience: Enhancing energy efficiency in weather and climate prediction is essential when approaching global kilometre-scale simulations and under stringent operational time constraints. ESCAPE combined a paradigm change for the relevant algorithms with a concept for employing specialized hardware in a heterogeneous environment for dedicated tasks dealing with the resolved flow (model dynamics) and unresolved processes (physical parameterizations). New approaches to enhancing time-to-solution at the same time as energy-to-solution (e.g. energy per forecast) represents a key objective of ESCAPE-2. ESCAPE-2 proposes the development of novel numerical techniques that combine highly effective large-time-step advection with highly scalable, flexible order spatial discretization, thus minimizing communication and enhancing data locality without compromising time-to-solution. The definition of metrics and the employment of generic VVUQ tools will achieve community-wide applicability by providing a detailed quantification of performance portability achieved through domain-specific language implementations.
ESCAPE-2 directly addresses resilience with hierarchical concepts for fault tolerant solvers that support application resilience during large-scale parallel simulations under strict weather and climate work schedule constraints. The solvers will be tested by implementing a fault detection scheme and iterative data recovery schemes preserving the numerical performance of the solver.
Relevant SRA-2 milestones: M-ENR-MS-2, M-ENR-FT-6, M-ENR-AR-7, M-ENR-AR-8.
- Mathematics and algorithms for extreme-scale HPC systems: ESCAPE-2 aims to deliver a breakthrough in time-to-solution effectiveness of highly scalable, flexible-order spatial discretizations, introducing fault tolerant algorithms supported by hierarchical multigrid tools and a controlled sensitivity to numerical precision, as well as introducing surrogate neural network models by essentially moving training periods outside the critical path and by transforming low-flop operations typical in physical parametrizations to efficient matrix-multiply operations. Connecting and combining these techniques, ESCAPE-2 will directly address the software gap between complex hardware and complex applications through its focus on advancing energy efficient algorithmic building blocks optimized for data flow, data locality and communication patterns across processors. Weather and climate dwarfs pioneered in ESCAPE are emerging as an accepted development template for the entire weather and climate prediction community. Performance portability for emerging hardware is a second corner stone that will ensure sustainable development productivity of software cycles with complex weather & climate codes. The developments will impact the European science community by advancing productivity and showcasing performance portability with world-leading and highly complex forecasting models. These models are at the core of operational service providers and the ESCAPE-2 developments will affect (a) science implementation roadmaps and (b) future HPC procurements throughout the community as the community's workloads approach the exascale era.
Relevant SRA-2 milestones: M-ALG-1, M-ALG-2, M-ALG-8, M-ALG-9.
Extreme-scale Demonstrators:
- ESCAPE-2 will play an important role in defining a key European weather and climate prediction application benchmark (HPCW) for Extreme-scale Demonstrators. ESCAPE-2 will further develop the dwarf concept pioneered in ESCAPE and the Kronos workload simulator to generate ready-to-use applications for co-design projects (e.g. EuroEXA, NextGenIO) and Extreme-scale Demonstrators. The inclusion of the ICON and NEMO models and the establishment of a weather & climate specific DSL concept --- to allow the implementation of novel mathematical concepts and algorithms across models and hardware platforms --- prepares the weather and climate applications for deployment on the Extreme-scale Demonstrators. The combined outcomes of ESCAPE, NextGenIO, EuroEXA and ESCAPE-2 will be readily available in phase B of the demonstrators.
Ecosystem at large - stakeholders and initiatives:
- European Extreme Data and Computing Initiative: Weather and climate prediction represents a dedicated application area within the current EXDCI project (its work package 3), co-led by an ESCAPE-2 partner institute (CMCC). ESCAPE-2 will impact the definition of the science case for the weather and climate community and act as a focal point for the transition of community models to the exascale with the centre of excellence (ESiWACE) as a dissemination hub. Note that ESCAPE-2 will be the only core development project supporting this transition within FET.
- Centres of Excellence in Computing Applications: ESCAPE-2 develops user-driven application components that provide scalable benchmarks for weather and climate prediction. This will be used to support the definition of the use cases that represent the grand science challenges addressed by the weather and climate prediction centre of excellence ESiWACE (and its potential successor). ESCAPE-2 partners comprise the ESiWACE co-leading institutes (DKRZ and ECMWF), and key partners (MPIM, CMCC, BSC, BULL). ESCAPE-2 will be instrumental in defining the scope of community models supported by future centres of excellence acting on behalf of the weather and climate prediction community.
ETP4HPC SRA, Completing the value chain:
- ECMWF combines advanced research and operational applications which benefits both the application and service layers spanned by ECMWF (including Copernicus services), its member states and ESCAPE-2 project partners as they represent a significant portion of the European weather and climate forecasting community. The push-through of the envisaged ESCAPE-2 developments follows the same impact route. While the ETP4HPC SRA focuses its recommendations on the industrial impact, a similar value-chain template applies to environmental application and service provision.