Mesh adaptivity, heterogeneous modelling and resilience.

Realistic flow problems involving turbulence problems quickly require large-scale simulation capabilities. The most crucial aspect in these situations is the proper representation of small-scale flow structures in time-dependent transport problems. This must be accomplished with minimal dissipation, as errors accumulated at small scales may become dominant when propagated through large computational domains over long integration times. To reliably simulate turbulent flows with significant regions of separation requires schemes based on high-order accurate discretisations, e.g. spectral element methods. Such discretisation techniques offer sufficient accuracy and fast, nearly exponential convergence as well as large-scale parallelism and flexibility in prescribing mesh topology. An additional aspect of real flow problems is that simulations may in practice be heterogeneous, with different physical models and/or algorithms applied in different regions.

Flexibility in mesh topology is instrumental, as simulation accuracy depends strongly on the quality of the mesh, which in turn must be adjusted to the (a priori unknown) flow with potentially heterogeneous approaches. This is why mesh generation is considered to be a significant bottleneck in modern and future CFD. As more powerful HPC resources enable the simulation of complex, more realistic and industrially relevant flow problems, reliable mesh generation becomes more problematic, resulting in significant uncertainties in the simulation result. Although numerical uncertainties arise from many sources, including errors due to spatial and temporal discretisations or incomplete convergence, they can be minimised during the simulation by appropriate adaptation of the grid structure to the dynamic flow solution. Such automated mesh adaptivity, combining error estimation with dynamical refinement, is considered an essential feature for large-scale, computationally expensive simulations. It considerably improves the efficient use of resources, simplifies the grid generation and ensures a consistent accuracy (depending on the chosen measure) of the solution.

With the number of computing cores expected to exceed 1 million on an exascale platform, the risk of hardware failures, either permanent or transient, becomes statistically significant. Indeed, some estimates predict the mean time between failures on an exascale platform to be on the order of minutes. This is a serious challenge that must be addressed both at the hardware and the algorithmic level. Regardless of the source of the fault, resilience requires the ability to recover, with some fidelity, the lost results. For this we shall pursue the development of low storage, low complexity models, formulated in-situ and executing in a mirror state at different nodes. If a node fails, these models can be activated to recover the lost solution and operators can be generated. While such a strategy may help recovery in cases where the hardware indicates a fault, we will additionally pursue the development of suitable error indicators to detect silent errors, e.g. bit-flips due to radiation or low power consumption. This combination, embedded into the key computational engine of the CFD models, will ensure fault tolerance and resilience even on very large platforms.

A final crucial aspect is flexibility in applying different physical models and/or solution algorithms to different flow regions, which enables more of the relevant flow physics in complex geometries to be captured at reasonable cost. Such heterogeneous computing combines different modelling methods like Reynolds-Averaged Navier-Stokes (RANS), Large-Eddy Simulation (LES) and Direct Numerical Simulations (DNS), embedding high-resolution zones computed with more costly algorithms within another calculation. Hybrid RANS-LES or RANS-DNS is a promising alternative for simulating problems of practical engineering interest.

Error control, mesh adaptivity, and heterogeneous modeling are recognised as essential aspects and important challenges for exascale CFD workflows.