New subject: [petsc-users] DMPlex in Firedrake: scaling of mesh distribution

6 Mar 2021

      On Fri, Mar 5, 2021 at 4:06 PM Alexei Colin <acolin@isi.edu> wrote:
...
To PETSc DMPlex users, Firedrake users, Dr. Knepley and Dr. Karpeev:
Is it expected for mesh distribution step to
(A) take a share of 50-99% of total time-to-solution of an FEM problem, and
No
...
(B) take an amount of time that increases with the number of ranks, and
See below.
...
(C) take an amount of memory on rank 0 that does not decrease with the
number of ranks
The problem here is that a serial mesh is being partitioned and sent to all
processes. This is fundamentally
non-scalable, but it is easy and works well for modest clusters < 100 nodes
or so. Above this, it will take
increasing amounts of time. There are a few techniques for mitigating this.

a) For simple domains, you can distribute a coarse grid, then regularly
refine that in parallel with DMRefine() or -dm_refine <k>.
    These steps can be repeated easily, and redistribution in parallel is
fast, as shown for example in [1].

b) For complex meshes, you can read them in parallel, and then repeat a).
This is done in [1]. It is a little more involved,
    but not much.

c) You can do a multilevel partitioning, as they do in [2]. I cannot find
the paper in which they describe this right now. It is feasible,
     but definitely the most expert approach.

Does this make sense?

  Thanks,

    Matt

[1]  Fully Parallel Mesh I/O using PETSc DMPlex with an Application to
Waveform Modeling, Hapla et.al.
      https://arxiv.org/abs/2004.08729
[2] On the robustness and performance of entropy stable discontinuous
collocation methods for the compressible Navier-Stokes equations, ROjas .
et.al.
      https://arxiv.org/abs/1911.10966
...
?
The attached plots suggest (A), (B), and (C) is happening for
Cahn-Hilliard problem (from firedrake-bench repo) on a 2D 8Kx8K
unit-square mesh. The implementation is here [1]. Versions are
Firedrake, PyOp2: 20200204.0; PETSc 3.13.1; ParMETIS 4.0.3.
Two questions, one on (A) and the other on (B)+(C):
1. Is (A) result expected? Given (A), any effort to improve the quality
of the compiled assembly kernels (or anything else other than mesh
distribution) appears futile since it takes 1% of end-to-end execution
time, or am I missing something?
1a. Is mesh distribution fundamentally necessary for any FEM framework,
or is it only needed by Firedrake? If latter, then how do other
frameworks partition the mesh and execute in parallel with MPI but avoid
the non-scalable mesh destribution step?
2. Results (B) and (C) suggest that the mesh distribution step does
not scale. Is it a fundamental property of the mesh distribution problem
that it has a central bottleneck in the master process, or is it
a limitation of the current implementation in PETSc-DMPlex?
2a. Our (B) result seems to agree with Figure 4(left) of [2]. Fig 6 of [2]
suggests a way to reduce the time spent on sequential bottleneck by
"parallel mesh refinment" that creates high-resolution meshes from an
initial coarse mesh. Is this approach implemented in DMPLex?  If so, any
pointers on how to try it out with Firedrake? If not, any other
directions for reducing this bottleneck?
2b. Fig 6 in [3] shows plots for Assembly and Solve steps that scale well
up
to 96 cores -- is mesh distribution included in those times?  Is anyone
reading this aware of any other publications with evaluations of
Firedrake that measure mesh distribution (or explain how to avoid or
exclude it)?
Thank you for your time and any info or tips.
[1]
https://github.com/ISI-apex/firedrake-bench/blob/master/cahn_hilliard/firedr...
[2] Unstructured Overlapping Mesh Distribution in Parallel, Matthew G.
Knepley, Michael Lange, Gerard J. Gorman, 2015.
https://arxiv.org/pdf/1506.06194.pdf
[3] Efficient mesh management in Firedrake using PETSc-DMPlex, Michael
Lange, Lawrence Mitchell, Matthew G. Knepley and Gerard J. Gorman, SISC,
38(5), S143-S155, 2016. http://arxiv.org/abs/1506.07749

Re: [firedrake] DMPlex in Firedrake: scaling of mesh distribution

Matthew Knepley

Junchao Zhang

Mark Adams

Barry Smith

Mark Adams

Stefano Zampini

Mark Adams

Stefano Zampini

Mark Adams

Mark Adams

Barry Smith

Stefano Zampini

Jed Brown

Junchao Zhang

tags

participants (6)