MPI Distribute Mesh-first parallel workflow (`distribute_basic_mesh`)

Overview

The distribute_basic_mesh workflow is the recommended way to start a parallel ANUGA simulation when your mesh can be described as a structured grid (rectangular or rectangular-cross) or when you have a Pmesh object from create_mesh_from_regions.

The key idea is that only rank 0 builds the mesh. All other ranks pass None to distribute_basic_mesh and receive their submesh automatically via MPI. Initial conditions (quantities, boundaries, operators) are then set on every rank’s local Parallel_domain after distribution.

This contrasts with the traditional distribute() workflow, where rank 0 builds a full Domain including quantities, which means all the elevation, stage, and momentum arrays are allocated on rank 0 before any MPI communication takes place. For large meshes this can be a significant memory and time saving.

Basic_mesh

Basic_mesh is a lightweight mesh container that stores only the topology needed for partitioning:

Attribute	Description
`nodes`	`(M, 2)` float array of node x, y coordinates
`triangles`	`(N, 3)` int array of triangle vertex indices
`boundary`	dict mapping `(triangle_id, edge_index)` → tag string
`geo_reference`	coordinate origin and UTM zone
`number_of_triangles`	`N`
`number_of_nodes`	`M`
`neighbours`	`(N, 3)` int array, -1 for boundary edges (computed lazily)
`centroid_coordinates`	`(N, 2)` float array (computed lazily)

Unlike the full Mesh class, Basic_mesh does not compute normals, edge lengths, areas, radii, or vertex coordinates at construction time. Those are only needed by the finite-volume solver and are computed later on each rank when the Parallel_domain is built.

Factory functions

Structured rectangular grids

from anuga.abstract_2d_finite_volumes.basic_mesh import (
    rectangular_basic_mesh,
    rectangular_cross_basic_mesh,
)

# 2*m*n triangles, (m+1)*(n+1) nodes
bm = rectangular_basic_mesh(m, n, len1=Lx, len2=Ly)

# 4*m*n triangles, (m+1)*(n+1) + m*n nodes  (recommended for accuracy)
bm = rectangular_cross_basic_mesh(m, n, len1=Lx, len2=Ly)

Both functions accept an optional origin=(x0, y0) keyword argument.

Polygon regions

For meshes created with create_mesh_from_regions, use pmesh_to_basic_mesh to obtain a Basic_mesh without building a full Domain:

import anuga
from anuga.abstract_2d_finite_volumes.pmesh2domain import pmesh_to_basic_mesh

pmesh = anuga.create_mesh_from_regions(
    bounding_polygon,
    boundary_tags={'left': [0], 'right': [1], 'top': [2], 'bottom': [3]},
    maximum_triangle_area=0.01,
    filename=None,
    use_cache=False,
)
bm = pmesh_to_basic_mesh(pmesh)

`distribute_basic_mesh`

from anuga.parallel.parallel_api import distribute_basic_mesh

domain = distribute_basic_mesh(basic_mesh, verbose=False, parameters=None)

Partitions the mesh on rank 0 and sends each rank its local submesh.

Argument	Description
`basic_mesh`	A `Basic_mesh` on rank 0; `None` on all other ranks
`verbose`	Print partitioning progress (default `False`)
`parameters`	Dict of partitioning options, e.g. `{'partition_scheme': 'morton'}`

Available partition schemes are 'metis' (default, best quality), 'morton', and 'hilbert'.

Returns a Parallel_domain on every rank, or a plain Domain when numprocs == 1 so the same script runs serially without modification.

After the call, every rank holds only its local submesh — full and ghost triangles for that rank’s partition. The caller then sets quantities, boundaries, operators, and structures on the returned domain exactly as in a serial script.

General script structure

"""Example parallel simulation using distribute_basic_mesh."""

import anuga
from anuga import (
    Reflective_boundary,
    Dirichlet_boundary,
    myid, numprocs, barrier, finalize,
)
from anuga.abstract_2d_finite_volumes.basic_mesh import rectangular_cross_basic_mesh
from anuga.parallel.parallel_api import distribute_basic_mesh

# 1. Build the mesh on rank 0 only
if myid == 0:
    bm = rectangular_cross_basic_mesh(100, 100, len1=100.0, len2=100.0)
else:
    bm = None

# 2. Distribute the mesh to all ranks
domain = distribute_basic_mesh(bm, parameters={'partition_scheme': 'metis'})

# Give the simulation a name.
# Each rank automatically appends _P<nproc>_<rank> to the filename.
domain.set_name('my_simulation')
domain.set_datadir('output')

# 3. Set initial conditions on the local submesh.
#    Each rank evaluates quantities only over its own triangles.
def topography(x, y):
    return -x / 100.0

domain.set_quantity('elevation', topography)
domain.set_quantity('friction',  0.03)
domain.set_quantity('stage',     expression='elevation')

# 4. Set boundary conditions.
#    ghost and exterior edges are handled automatically by Parallel_domain.
Br = Reflective_boundary(domain)
Bd = Dirichlet_boundary([-0.2, 0.0, 0.0])
domain.set_boundary({'left': Br, 'right': Bd, 'top': Br, 'bottom': Br})

# 5. Evolve
for t in domain.evolve(yieldstep=60.0, finaltime=3600.0):
    if myid == 0:
        domain.print_timestepping_statistics()

# 6. Merge per-rank SWW files into a single file (optional)
domain.sww_merge(delete_old=False)

finalize()

Run with:

mpirun -np 4 python my_simulation.py

Notes

Only rank 0 needs the `Basic_mesh`

if myid == 0:
    bm = rectangular_cross_basic_mesh(M, N, len1=Lx, len2=Ly)
else:
    bm = None                         # other ranks pass None

domain = distribute_basic_mesh(bm)   # all ranks call this

Allocating the full mesh on every rank defeats the purpose of the workflow and increases peak memory usage.

Quantities are set after distribution

Unlike distribute(), which copies quantity arrays from rank 0, with distribute_basic_mesh each rank sets its own quantities independently after receiving its submesh:

domain.set_quantity('elevation', topography)  # evaluated per-rank
domain.set_quantity('stage',     0.0)

This is more memory-efficient and often faster for large initial-condition functions, since rank 0 never holds the full quantity arrays.

Boundary tags

The boundary tags in the Basic_mesh boundary dict must match the strings used in domain.set_boundary(). For rectangular and rectangular-cross meshes the built-in tags are 'left', 'right', 'top', and 'bottom'.

Parallel_domain.set_boundary automatically handles the internal 'ghost' and 'exterior' tags that appear on inter-rank edges — you do not need to include them in your boundary map.

Serial compatibility

When numprocs == 1, distribute_basic_mesh returns a plain Domain built directly from the Basic_mesh with no MPI calls. The same script therefore runs correctly both serially and in parallel without any modifications.

SWW output

Each rank writes its own SWW file named <name>_P<nproc>_<rank>.sww. Call domain.sww_merge() after the evolve loop to combine them into a single <name>.sww for visualisation. To suppress SWW output entirely:

domain.set_quantities_to_be_stored(None)

`distribute_basic_mesh_collaborative`

distribute_basic_mesh_collaborative is an alternative to distribute_basic_mesh that uses shared memory within each compute node to reduce inter-process data movement. The API is identical:

from anuga.parallel.parallel_api import distribute_basic_mesh_collaborative

domain = distribute_basic_mesh_collaborative(basic_mesh, verbose=False, parameters=None)

Argument	Description
`basic_mesh`	A `Basic_mesh` on rank 0; `None` on all other ranks
`verbose`	Print partitioning progress (default `False`)
`parameters`	Dict of partitioning options, e.g. `{'partition_scheme': 'metis'}`

Returns a Parallel_domain on every rank, or a plain Domain when numprocs == 1.

How it differs from `distribute_basic_mesh`

Both functions share the same mesh-first interface — rank 0 builds a Basic_mesh, all other ranks pass None, and quantities are set per-rank after distribution. The difference is in how the mesh topology reaches each rank:

Step	`distribute_basic_mesh`	`distribute_basic_mesh_collaborative`
Partition	rank 0 (METIS / Morton / Hilbert)	rank 0 (same)
Topology broadcast	Point-to-point `send` / `recv` per rank	Shared-memory window on each node; one physical copy per node
Ghost layer	Extracted from global submesh data sent by rank 0	Per-rank BFS on the local shared-memory copy
Quantities	None distributed (set per-rank after return)	None distributed (set per-rank after return)

The collaborative variant is advantageous when many MPI ranks share the same node (e.g. 32–64 ranks on a single large-memory node) because the full topology is stored only once in shared memory per node rather than once per rank.

Example script

The script structure is identical to distribute_basic_mesh; simply swap the function name:

"""Example parallel simulation using distribute_basic_mesh_collaborative."""

import anuga
from anuga import (
    Reflective_boundary,
    Dirichlet_boundary,
    myid, numprocs, barrier, finalize,
)
from anuga.abstract_2d_finite_volumes.basic_mesh import rectangular_cross_basic_mesh
from anuga.parallel.parallel_api import distribute_basic_mesh_collaborative

# 1. Build the mesh on rank 0 only
if myid == 0:
    bm = rectangular_cross_basic_mesh(100, 100, len1=100.0, len2=100.0)
else:
    bm = None

# 2. Distribute the mesh to all ranks (shared-memory broadcast)
domain = distribute_basic_mesh_collaborative(bm, parameters={'partition_scheme': 'metis'})

domain.set_name('my_simulation')
domain.set_datadir('output')

# 3. Set initial conditions per-rank
def topography(x, y):
    return -x / 100.0

domain.set_quantity('elevation', topography)
domain.set_quantity('friction',  0.03)
domain.set_quantity('stage',     expression='elevation')

# 4. Set boundary conditions
Br = Reflective_boundary(domain)
Bd = Dirichlet_boundary([-0.2, 0.0, 0.0])
domain.set_boundary({'left': Br, 'right': Bd, 'top': Br, 'bottom': Br})

# 5. Evolve
for t in domain.evolve(yieldstep=60.0, finaltime=3600.0):
    if myid == 0:
        domain.print_timestepping_statistics()

# 6. Merge per-rank SWW files (optional)
domain.sww_merge(delete_old=False)

finalize()