Hey Firedrakers, The attached simple code is failing in parallel but works fine in serial. Specifically, when I run mpirun -n 1 python3 failing_example.py everything is fine. However, using mpirun -n 2 python3 failing_example.py fails with a segfault in PETSc. As far as I can tell (by adding print statements to various parts of the Firedrake src), the segfault is occurring in the set_function method of the _SNESContext class from solving_utils.py, specifically line 130: with self._F.dat.vec_wo as v: I don't know enough about the internals of Firedrake to diagnose the issue beyond that. Any ideas what could be going wrong? I am running a new copy of Firedrake on Ubuntu 18.04. Regards, Chris Eldred -- Chris Eldred https://www.math.univ-paris13.fr/~eldred/ Research Scientist, INRIA/Laboratoire Jean Kuntzmann Postdoctoral Fellow, LAGA, University of Paris 13 PhD, Atmospheric Science, Colorado State University, 2015 DOE Computational Science Graduate Fellow (Alumni) B.S. Applied Computational Physics, Carnegie Mellon University, 2009 chris.eldred@gmail.com
Hi Chris,
On 12 Nov 2018, at 16:44, Chris Eldred <chris.eldred@gmail.com> wrote:
Hey Firedrakers,
The attached simple code is failing in parallel but works fine in serial. Specifically, when I run mpirun -n 1 python3 failing_example.py everything is fine. However, using mpirun -n 2 python3 failing_example.py fails with a segfault in PETSc.
As far as I can tell (by adding print statements to various parts of the Firedrake src), the segfault is occurring in the set_function method of the _SNESContext class from solving_utils.py, specifically line 130: with self._F.dat.vec_wo as v: I don't know enough about the internals of Firedrake to diagnose the issue beyond that.
Any ideas what could be going wrong?
I am running a new copy of Firedrake on Ubuntu 18.04.
Ubuntu openmpi has "known broken" MPI implementation. We recently switched to using mpich, but possibly not cleanly. What mpi implementation are you using? mpicc -show mpiexec --help Cheers, Lawrence
Hey Lawrence, Yep that seems to be the issue! If I run mpicc -show in the Firedrake virtual environment I get back gcc -I/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi -I/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent -I/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include -I/usr/lib/x86_64-linux-gnu/openmpi/include -pthread -L/usr//lib -L/usr/lib/x86_64-linux-gnu/openmpi/lib -lmpi I've attached my firedrake-install.log. -Chris On Mon, Nov 12, 2018 at 5:54 PM Lawrence Mitchell <wencel@gmail.com> wrote:
Hi Chris,
On 12 Nov 2018, at 16:44, Chris Eldred <chris.eldred@gmail.com> wrote:
Hey Firedrakers,
The attached simple code is failing in parallel but works fine in serial. Specifically, when I run mpirun -n 1 python3 failing_example.py everything is fine. However, using mpirun -n 2 python3 failing_example.py fails with a segfault in PETSc.
As far as I can tell (by adding print statements to various parts of the Firedrake src), the segfault is occurring in the set_function method of the _SNESContext class from solving_utils.py, specifically line 130: with self._F.dat.vec_wo as v: I don't know enough about the internals of Firedrake to diagnose the issue beyond that.
Any ideas what could be going wrong?
I am running a new copy of Firedrake on Ubuntu 18.04.
Ubuntu openmpi has "known broken" MPI implementation.
We recently switched to using mpich, but possibly not cleanly.
What mpi implementation are you using?
mpicc -show mpiexec --help
Cheers,
Lawrence
-- Chris Eldred https://www.math.univ-paris13.fr/~eldred/ Research Scientist, INRIA/Laboratoire Jean Kuntzmann Postdoctoral Fellow, LAGA, University of Paris 13 PhD, Atmospheric Science, Colorado State University, 2015 DOE Computational Science Graduate Fellow (Alumni) B.S. Applied Computational Physics, Carnegie Mellon University, 2009 chris.eldred@gmail.com
Hey Lawrence, I followed the advice at https://github.com/firedrakeproject/firedrake/issues/1325 and used sudo update-alternatives --config mpi sudo update-alternatives --config mpirun to select mpich instead of openmpi. Now mpicc -show gives gcc -Wl,-Bsymbolic-functions -Wl,-z,relro -I/usr/include/mpich -L/usr/lib/x86_64-linux-gnu -lmpich which seems right. But when I run in parallel using mpirun -n X python3 foo.py it simply runs (as noted in the issue referenced above) X copies of foo.py in serial. What is the correct way to get Firedrake working in parallel on Ubuntu 18.04? Also, I would suggest writing up some instructions and putting them on the website (I am happy to do this) since the "obvious" approach of mpirun -n X python3 foo.py has a nasty and non-obvious failure mode. Thanks! -Chris On Mon, Nov 12, 2018 at 6:05 PM Chris Eldred <chris.eldred@gmail.com> wrote:
Hey Lawrence,
Yep that seems to be the issue! If I run mpicc -show in the Firedrake virtual environment I get back
gcc -I/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi -I/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent -I/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include -I/usr/lib/x86_64-linux-gnu/openmpi/include -pthread -L/usr//lib -L/usr/lib/x86_64-linux-gnu/openmpi/lib -lmpi
I've attached my firedrake-install.log.
-Chris
On Mon, Nov 12, 2018 at 5:54 PM Lawrence Mitchell <wencel@gmail.com> wrote:
Hi Chris,
On 12 Nov 2018, at 16:44, Chris Eldred <chris.eldred@gmail.com> wrote:
Hey Firedrakers,
The attached simple code is failing in parallel but works fine in serial. Specifically, when I run mpirun -n 1 python3 failing_example.py everything is fine. However, using mpirun -n 2 python3 failing_example.py fails with a segfault in PETSc.
As far as I can tell (by adding print statements to various parts of the Firedrake src), the segfault is occurring in the set_function method of the _SNESContext class from solving_utils.py, specifically line 130: with self._F.dat.vec_wo as v: I don't know enough about the internals of Firedrake to diagnose the issue beyond that.
Any ideas what could be going wrong?
I am running a new copy of Firedrake on Ubuntu 18.04.
Ubuntu openmpi has "known broken" MPI implementation.
We recently switched to using mpich, but possibly not cleanly.
What mpi implementation are you using?
mpicc -show mpiexec --help
Cheers,
Lawrence
-- Chris Eldred https://www.math.univ-paris13.fr/~eldred/ Research Scientist, INRIA/Laboratoire Jean Kuntzmann Postdoctoral Fellow, LAGA, University of Paris 13 PhD, Atmospheric Science, Colorado State University, 2015 DOE Computational Science Graduate Fellow (Alumni) B.S. Applied Computational Physics, Carnegie Mellon University, 2009 chris.eldred@gmail.com
-- Chris Eldred https://www.math.univ-paris13.fr/~eldred/ Research Scientist, INRIA/Laboratoire Jean Kuntzmann Postdoctoral Fellow, LAGA, University of Paris 13 PhD, Atmospheric Science, Colorado State University, 2015 DOE Computational Science Graduate Fellow (Alumni) B.S. Applied Computational Physics, Carnegie Mellon University, 2009 chris.eldred@gmail.com
On 13 Nov 2018, at 10:11, Chris Eldred <chris.eldred@gmail.com> wrote:
Hey Lawrence,
I followed the advice at https://github.com/firedrakeproject/firedrake/issues/1325 and used sudo update-alternatives --config mpi sudo update-alternatives --config mpirun to select mpich instead of openmpi.
Now mpicc -show gives gcc -Wl,-Bsymbolic-functions -Wl,-z,relro -I/usr/include/mpich -L/usr/lib/x86_64-linux-gnu -lmpich which seems right.
But when I run in parallel using mpirun -n X python3 foo.py it simply runs (as noted in the issue referenced above) X copies of foo.py in serial.
What is the correct way to get Firedrake working in parallel on Ubuntu 18.04?
Also, I would suggest writing up some instructions and putting them on the website (I am happy to do this) since the "obvious" approach of mpirun -n X python3 foo.py has a nasty and non-obvious failure mode.
You also need to rebuild firedrake to link against MPICH. Run with firedrake-install --no-package-manager ... Cheers, Lawrence
That worked, thanks for the help! -Chris On Tue, Nov 13, 2018 at 11:21 AM Lawrence Mitchell <wencel@gmail.com> wrote:
On 13 Nov 2018, at 10:11, Chris Eldred <chris.eldred@gmail.com> wrote:
Hey Lawrence,
I followed the advice at https://github.com/firedrakeproject/firedrake/issues/1325 and used sudo update-alternatives --config mpi sudo update-alternatives --config mpirun to select mpich instead of openmpi.
Now mpicc -show gives gcc -Wl,-Bsymbolic-functions -Wl,-z,relro -I/usr/include/mpich -L/usr/lib/x86_64-linux-gnu -lmpich which seems right.
But when I run in parallel using mpirun -n X python3 foo.py it simply runs (as noted in the issue referenced above) X copies of foo.py in serial.
What is the correct way to get Firedrake working in parallel on Ubuntu 18.04?
Also, I would suggest writing up some instructions and putting them on the website (I am happy to do this) since the "obvious" approach of mpirun -n X python3 foo.py has a nasty and non-obvious failure mode.
You also need to rebuild firedrake to link against MPICH.
Run with firedrake-install --no-package-manager
...
Cheers,
Lawrence
-- Chris Eldred https://www.math.univ-paris13.fr/~eldred/ Research Scientist, INRIA/Laboratoire Jean Kuntzmann Postdoctoral Fellow, LAGA, University of Paris 13 PhD, Atmospheric Science, Colorado State University, 2015 DOE Computational Science Graduate Fellow (Alumni) B.S. Applied Computational Physics, Carnegie Mellon University, 2009 chris.eldred@gmail.com
participants (2)
- 
                
                Chris Eldred
- 
                
                Lawrence Mitchell