Dear Alexei, I echo the comments that Barry and others have made. Some more in line below.
On 5 Mar 2021, at 21:06, Alexei Colin <acolin@isi.edu> wrote:
To PETSc DMPlex users, Firedrake users, Dr. Knepley and Dr. Karpeev:
Is it expected for mesh distribution step to (A) take a share of 50-99% of total time-to-solution of an FEM problem, and
We hope not!
(B) take an amount of time that increases with the number of ranks, and (C) take an amount of memory on rank 0 that does not decrease with the number of ranks
This is a consequence, as Matt notes, of us making a serial mesh and then doing a one to all distribution.
1a. Is mesh distribution fundamentally necessary for any FEM framework, or is it only needed by Firedrake? If latter, then how do other frameworks partition the mesh and execute in parallel with MPI but avoid the non-scalable mesh destribution step?
Matt points out that we should do something smarter (namely make and distribute a small mesh from serial to parallel and then do refinement and repartitioning in parallel). This is not implemented out of the box, but here is some code that (in up to date Firedrake/petsc) does that from firedrake import * from firedrake.cython.dmcommon import CELL_SETS_LABEL, FACE_SETS_LABEL from firedrake.cython.mgimpl import filter_labels from firedrake.petsc import PETSc # Create a small mesh that is cheap to distribute. mesh = UnitSquareMesh(10, 10) dm = mesh.topology_dm dm.setRefinementUniform(True) # Refine it a bunch of times, edge midpoint division. rdm = dm.refine() rdm = rdm.refine() rdm = rdm.refine() # Remove some labels that will be reconstructed. filter_labels(rdm, rdm.getHeightStratum(1), "exterior_facets", "boundary_faces", FACE_SETS_LABEL) filter_labels(rdm, rdm.getHeightStratum(0), CELL_SETS_LABEL) for label in ["interior_facets", "pyop2_core", "pyop2_owned", "pyop2_ghost"]: rdm.removeLabel(label) # Redistributed for better load balanced partitions (this is in parallel). rdm.distribute() # Now make the firedrake mesh object. rmesh = Mesh(rdm, distribution_parameters={"partition": False}) # Now do things in parallel. This is probably something we should push into the library (it's quite fiddly!), so if you can try it out easily and check that it works please let us know! Thanks, Lawrence