Parallelisation
ANUGA uses two main approaches to parallelisation: MPI-based domain decomposition (multiple processes, distributed memory) and OpenMP threading (multiple threads, shared memory). The two approaches can be combined.
An experimental OpenMP target-offloading GPU backend is under development in
the sp26 branch — see OpenMP Parallelisation for details.
For long parallel runs, see Checkpointing for how to save and restart simulations from periodic checkpoints.
Choosing an MPI distribution strategy
Four functions are available for distributing a domain across MPI ranks. The right choice depends on mesh size and how you construct the domain.
Function |
Rank-0 peak memory |
When to use |
|---|---|---|
|
Full domain + quantities |
Default choice. Rank 0 builds the complete |
|
Full domain + quantities (shared) |
Same interface as |
|
Topology only |
Rank 0 builds a lightweight |
|
Topology only (shared) |
Like |
Decision guide
Does rank 0 have enough RAM to hold the full Domain (mesh + quantities)?
│
├─ Yes ──► Do multiple ranks share the same node?
│ ├─ No ──► distribute()
│ └─ Yes ──► distribute_collaborative()
│
└─ No ──► Does rank 0 have enough RAM for topology only (no quantities)?
├─ Yes ──► Do multiple ranks share the same node?
│ ├─ No ──► distribute_basic_mesh()
│ └─ Yes ──► distribute_basic_mesh_collaborative()
└─ No ──► Reduce mesh resolution or use more nodes
As a rough guide, a mesh of N triangles with P quantities requires
approximately 8 × N × P bytes for quantity arrays alone on rank 0
(double precision). Topology (coordinates + connectivity) is
~56 × N bytes.
See also
ANUGA User Manual — Chapter 12: Parallel Simulation covers the MPI domain decomposition in depth, including ghost cell communication, scalability benchmarks, and HPC cluster setup.