Installing Firedrake on an HPC machine
Hi everyone, I have installed firedrake on my university's HPC machine, and whenever i attempt to run any Firedrake program, I get this error: -------------------------------------------------------------------------- An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: compute-0-0 (PID 28214) MPI_COMM_WORLD rank: 0 If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- Why is this happening? Program seems fine when I only use a few processes but when I use two or more compute nodes, my program hangs. I'll do my best to explain how I built firedrake: 1) Our system has openmpi-1.8.3 and gcc-4.9.2 available. so those are the only modules I have loaded, everything else was downloaded. 2) I built Python 2.7.10 from source and installed it in ${HOME}/.local 3) pip was obtained via python get-pip.py 4) PETSc was installed --with-cc=mpmicc, --with-cxx=mpicxx, and has all the necessary external packages 5) petsc4py was installed via git clone ... and set up with python setup.py build/install 6) PyOP2 was built with python setup.py build_ext --inplace 7) Firedrake was built via make and everything else was installed via 'pip install <package>'. I had to manually install swig and pcre. Any ideas what this may be? And/or how I can circumvent this issue? Thanks, Justin
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Justin, On 06/08/15 06:16, Justin Chang wrote:
Hi everyone,
I have installed firedrake on my university's HPC machine, and whenever i attempt to run any Firedrake program, I get this error:
--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.
The process that invoked fork was:
Local host: compute-0-0 (PID 28214)
MPI_COMM_WORLD rank: 0
If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
So I recently made some changes to PyOP2 to make us more robust in the face of OpenMPI not allowing forking, which we need to do to invoke compilers when jit-compiling code. To do this, we therefore attempt to fork a single process /before/ MPI is initialized (which is safe, because OpenMPI doesn't see it), this child process then does subsequent forks. Naturally, this will fail if MPI is already initialized by the time we come to fork. So possibly the programs you're running are initialising MPI? Let's check some things. Let's first try something that doesn't invoke fork at all: cat > no-fork.py << EOF from mpi4py import MPI print MPI.COMM_WORLD.size EOF mpiexec -n 2 python no-fork.py Now something that does call fork, but /before/ initialising MPI cat > fork-before.py << EOF import os def my_fork(): ret = os.fork() if ret == 0: print 'child exiting' os._exit(0) else: pass my_fork() from mpi4py import MPI print MPI.COMM_WORLD.size EOF mpiexec -n 2 python fork-before.py I hope this one works! Now fork afterwards (which I expect to fail with the error message above): cat > fork-after.py << EOF import os def my_fork(): ret = os.fork() if ret == 0: print 'child exiting' os._exit(0) else: pass from mpi4py import MPI print MPI.COMM_WORLD.size my_fork() EOF mpiexec -n 2 python fork-after.py Now something more like how PyOP2/Firedrake does things: cat > closer-test.py << EOF import os import socket def child(sock): val = sock.recv(1) import mpi4py.rc mpi4py.rc.initialize = False from mpi4py import MPI print 'In child', MPI.Is_initialized() os._exit(0) def parent(sock): from mpi4py import MPI print 'In parent', MPI.Is_initialized() sock.send("1") a, b = socket.socketpair() ret = os.fork() if ret == 0: a.close() child(b) else: b.close() parent(a) EOF mpiexec -n 2 python closer-test. Now let's try doing it the way PyOP2/firedrake does this: cat > fork-pyop2.py << EOF from pyop2_utils import enable_mpi_prefork enable_mpi_prefork() from mpi4py import MPI print MPI.COMM_WORLD.size EOF mpiexec -n 2 python fork-pyop2.py I hope this should work, because it's effectively just doing what fork-before.py does. Now let's just run pyop2 on its own: cat > pyop2.py << EOF from pyop2 import op2 op2.init() EOF mpiexec -n 2 python pyop2.py And then firedrake: cat > import-firedrake.py << EOF from firedrake import * EOF mpiexec -n 2 python import-firedrake.py And finally a short test in firedrake: cat > firedrake-test.py << EOF from firedrake import * mesh = UnitSquareMesh(3, 3) print assemble(Constant(1)*dx(domain=mesh)) EOF mpiexec -n 2 python firedrake-test.py Hopefully these tests will allow us to better see where things are going wrong. Cheers, Lawrence -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVwyZ9AAoJECOc1kQ8PEYv+dEIAIYn6MfkhLS1XVKbqzTfhQ6T Yb+uoGm2/hXnUki5JYoVRWrWrc3gOYDBxBFWBEBKQHy/d5tzutDvEZyM66nmzAhl YXSZEcfputIbT9d6VlmAzdjW39Yi/V6v+imuuyIhsAVDo8P/J5bD4xR2Q6DC+v30 +QglNfStcAfuQGrlfE7uQpR0SV4+PdkpQHCsbhuV8fGrXptQTSB+Q6GqNxrIK72X BmLR20dLZCW01pW0GYoSqak92E8SpFgaFTScPHHj4jV2yDyJpvWBnuxcdbfnOV3r 0hOh2gk2pHRcHdetL/pdhdQ2WkXevXTtrGeeqwMaw19Jq/XRaQa9umR4m1O5FKU= =68Vi -----END PGP SIGNATURE-----
Lawrence,
mpiexec -n 2 python no-fork.py
2 2
mpiexec -n 2 python fork-before.py
child exiting child exiting 2 2
mpiexec -n 2 python fork-after.py
2 2 child exiting child exiting -------------------------------------------------------------------------- An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: compute-0-0 (PID 43057) MPI_COMM_WORLD rank: 0 If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- [compute-0-0.local:43055] 1 more process has sent help message help-mpi-runtime.txt / mpi_init:warn-fork [compute-0-0.local:43055] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
mpiexec -n 2 python closer-test.py
In parent True In parent True In child False In child False
mpiexec -n 2 python fork-pyop2.py
2 2 The last three examples pyop2.py, import-firedrake.py, and firedrake-test.py did not run because they say "ImportError: cannot import name op2". And now all of my firedrake programs run into this exact error, which is confusing. Thanks, Justin On Thu, Aug 6, 2015 at 4:18 AM, Lawrence Mitchell < lawrence.mitchell@imperial.ac.uk> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi Justin,
On 06/08/15 06:16, Justin Chang wrote:
Hi everyone,
I have installed firedrake on my university's HPC machine, and whenever i attempt to run any Firedrake program, I get this error:
--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.
The process that invoked fork was:
Local host: compute-0-0 (PID 28214)
MPI_COMM_WORLD rank: 0
If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
So I recently made some changes to PyOP2 to make us more robust in the face of OpenMPI not allowing forking, which we need to do to invoke compilers when jit-compiling code. To do this, we therefore attempt to fork a single process /before/ MPI is initialized (which is safe, because OpenMPI doesn't see it), this child process then does subsequent forks. Naturally, this will fail if MPI is already initialized by the time we come to fork.
So possibly the programs you're running are initialising MPI?
Let's check some things.
Let's first try something that doesn't invoke fork at all:
cat > no-fork.py << EOF from mpi4py import MPI print MPI.COMM_WORLD.size EOF mpiexec -n 2 python no-fork.py
Now something that does call fork, but /before/ initialising MPI
cat > fork-before.py << EOF import os def my_fork(): ret = os.fork() if ret == 0: print 'child exiting' os._exit(0) else: pass my_fork() from mpi4py import MPI print MPI.COMM_WORLD.size EOF mpiexec -n 2 python fork-before.py
I hope this one works!
Now fork afterwards (which I expect to fail with the error message above):
cat > fork-after.py << EOF import os def my_fork(): ret = os.fork() if ret == 0: print 'child exiting' os._exit(0) else: pass from mpi4py import MPI print MPI.COMM_WORLD.size my_fork() EOF mpiexec -n 2 python fork-after.py
Now something more like how PyOP2/Firedrake does things:
cat > closer-test.py << EOF import os import socket
def child(sock): val = sock.recv(1) import mpi4py.rc mpi4py.rc.initialize = False from mpi4py import MPI print 'In child', MPI.Is_initialized() os._exit(0)
def parent(sock): from mpi4py import MPI print 'In parent', MPI.Is_initialized() sock.send("1")
a, b = socket.socketpair() ret = os.fork()
if ret == 0: a.close() child(b) else: b.close() parent(a) EOF mpiexec -n 2 python closer-test.
Now let's try doing it the way PyOP2/firedrake does this:
cat > fork-pyop2.py << EOF from pyop2_utils import enable_mpi_prefork enable_mpi_prefork() from mpi4py import MPI print MPI.COMM_WORLD.size EOF mpiexec -n 2 python fork-pyop2.py
I hope this should work, because it's effectively just doing what fork-before.py does.
Now let's just run pyop2 on its own:
cat > pyop2.py << EOF from pyop2 import op2 op2.init() EOF mpiexec -n 2 python pyop2.py
And then firedrake:
cat > import-firedrake.py << EOF from firedrake import * EOF mpiexec -n 2 python import-firedrake.py
And finally a short test in firedrake:
cat > firedrake-test.py << EOF from firedrake import * mesh = UnitSquareMesh(3, 3) print assemble(Constant(1)*dx(domain=mesh)) EOF mpiexec -n 2 python firedrake-test.py
Hopefully these tests will allow us to better see where things are going wrong.
Cheers,
Lawrence -----BEGIN PGP SIGNATURE----- Version: GnuPG v1
iQEcBAEBAgAGBQJVwyZ9AAoJECOc1kQ8PEYv+dEIAIYn6MfkhLS1XVKbqzTfhQ6T Yb+uoGm2/hXnUki5JYoVRWrWrc3gOYDBxBFWBEBKQHy/d5tzutDvEZyM66nmzAhl YXSZEcfputIbT9d6VlmAzdjW39Yi/V6v+imuuyIhsAVDo8P/J5bD4xR2Q6DC+v30 +QglNfStcAfuQGrlfE7uQpR0SV4+PdkpQHCsbhuV8fGrXptQTSB+Q6GqNxrIK72X BmLR20dLZCW01pW0GYoSqak92E8SpFgaFTScPHHj4jV2yDyJpvWBnuxcdbfnOV3r 0hOh2gk2pHRcHdetL/pdhdQ2WkXevXTtrGeeqwMaw19Jq/XRaQa9umR4m1O5FKU= =68Vi -----END PGP SIGNATURE-----
_______________________________________________ firedrake mailing list firedrake@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/firedrake
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/08/15 15:26, Justin Chang wrote:
Lawrence,
mpiexec -n 2 python no-fork.py
2
2
OK, good.
mpiexec -n 2 python fork-before.py
child exiting
child exiting
2
2
Also good.
mpiexec -n 2 python fork-after.py
2
2
child exiting
child exiting
--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.
The process that invoked fork was:
Local host: compute-0-0 (PID 43057)
MPI_COMM_WORLD rank: 0
If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
[compute-0-0.local:43055] 1 more process has sent help message help-mpi-runtime.txt / mpi_init:warn-fork
[compute-0-0.local:43055] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
As expected, we fork()ed after MPI_Init and so OpenMPI complains.
mpiexec -n 2 python closer-test.py
In parent True
In parent True
In child False
In child False
OK good, so if we fork before MPI_Init then OpenMPI doesn't complain.
mpiexec -n 2 python fork-pyop2.py
2
2
So this one just does the same as before, but using the PyOP2-internal "early fork", so that works too.
The last three examples pyop2.py, import-firedrake.py, and firedrake-test.py did not run because they say "ImportError: cannot import name op2". And now all of my firedrake programs run into this exact error, which is confusing.
Ah the pyop2.py example was badly named. If you rename it to run-pyop2.py and do mpiexec -n 2 python run-pyop2.py I hope things work again. What's happened is that this file, with name "pyop2" is now in your pythonpath and python when trying to import pyop2 looks in this file for all the symbols rather than the proper PyOP2 package. Can you do that and then try again? Lawrence -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVw3DdAAoJECOc1kQ8PEYv1j0IALOwkyECkCNyOTxmVoGcS0A2 1iv7CNNjCSDUO4P6zhqLAVQ5yPSJmupAiQcA/FBkwJE+HvHG3c++szmmyMbeN8EL a7XblVdpnTwt9vFy5sTwZjzKm2RtnanRxrkY2FQhiD1ko6FQRXCBLvyWZGJ8ynUy a8MfcSO/UxCV1cjWzRq6F1AGZnwjUGnZRoidy5PxFE2mBqr/lw0DP9HKG/PfguUg vTXGMdHqAdp4GTi4U3SplVmRGCmmrI8MqQ39Ej/Ww1ksQVWeHBWEo+dg2qfEE6I5 679QRYCr24NLdNjWuuTJjSao5hvdFQMmJC+y0XRCMNljksWYIAm28fB3LKqqXZQ= =WvxN -----END PGP SIGNATURE-----
mpiexec -n 2 python run-pyop2.py
-------------------------------------------------------------------------- An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: compute-0-0 (PID 49631) MPI_COMM_WORLD rank: 1 If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- [compute-0-0.local:49628] 1 more process has sent help message help-mpi-runtime.txt / mpi_init:warn-fork [compute-0-0.local:49628] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages mpiexec -n 2 python import-firedrake.py -------------------------------------------------------------------------- An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: compute-0-0 (PID 49798) MPI_COMM_WORLD rank: 1 If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- [compute-0-0.local:49795] 1 more process has sent help message help-mpi-runtime.txt / mpi_init:warn-fork [compute-0-0.local:49795] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
mpiexec -n 2 python firedrake-test.py
-------------------------------------------------------------------------- An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: compute-0-0 (PID 49534) MPI_COMM_WORLD rank: 1 If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- Compiling form form Compiler stage 1: Analyzing form(s) ----------------------------------- Geometric dimension: 2 Number of cell subdomains: 0 Rank: 0 Arguments: '()' Number of coefficients: 1 Coefficients: '[w_0]' Unique elements: 'R0(?)' Unique sub elements: 'R0(?)' representation: quadrature quadrature_degree: auto --> 0 quadrature_rule: auto --> default Compiler stage 1 finished in 0.0059092 seconds. Compiler stage 2: Computing intermediate representation ------------------------------------------------------- Computing representation of integrals Computing quadrature representation Transforming cell integral Compiler stage 2 finished in 0.0390902 seconds. Compiler stage 3: Optimizing intermediate representation -------------------------------------------------------- Skipping optimizations, add -O to optimize Compiler stage 3 finished in 0.000159979 seconds. Compiler stage 4 finished in 0.00311899 seconds. *FFC finished in 0.0489531 seconds.* [0] pyop2:INFO Compiling wrapper... [0] pyop2:INFO Compiling wrapper...done 1.0 1.0 [compute-0-0.local:49531] 1 more process has sent help message help-mpi-runtime.txt / mpi_init:warn-fork [compute-0-0.local:49531] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages Thanks, Justin On Thu, Aug 6, 2015 at 9:36 AM, Lawrence Mitchell < lawrence.mitchell@imperial.ac.uk> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 06/08/15 15:26, Justin Chang wrote:
Lawrence,
mpiexec -n 2 python no-fork.py
2
2
OK, good.
mpiexec -n 2 python fork-before.py
child exiting
child exiting
2
2
Also good.
mpiexec -n 2 python fork-after.py
2
2
child exiting
child exiting
--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.
The process that invoked fork was:
Local host: compute-0-0 (PID 43057)
MPI_COMM_WORLD rank: 0
If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
[compute-0-0.local:43055] 1 more process has sent help message help-mpi-runtime.txt / mpi_init:warn-fork
[compute-0-0.local:43055] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
As expected, we fork()ed after MPI_Init and so OpenMPI complains.
mpiexec -n 2 python closer-test.py
In parent True
In parent True
In child False
In child False
OK good, so if we fork before MPI_Init then OpenMPI doesn't complain.
mpiexec -n 2 python fork-pyop2.py
2
2
So this one just does the same as before, but using the PyOP2-internal "early fork", so that works too.
The last three examples pyop2.py, import-firedrake.py, and firedrake-test.py did not run because they say "ImportError: cannot import name op2". And now all of my firedrake programs run into this exact error, which is confusing.
Ah the pyop2.py example was badly named. If you rename it to run-pyop2.py and do mpiexec -n 2 python run-pyop2.py I hope things work again. What's happened is that this file, with name "pyop2" is now in your pythonpath and python when trying to import pyop2 looks in this file for all the symbols rather than the proper PyOP2 package. Can you do that and then try again?
Lawrence -----BEGIN PGP SIGNATURE----- Version: GnuPG v1
iQEcBAEBAgAGBQJVw3DdAAoJECOc1kQ8PEYv1j0IALOwkyECkCNyOTxmVoGcS0A2 1iv7CNNjCSDUO4P6zhqLAVQ5yPSJmupAiQcA/FBkwJE+HvHG3c++szmmyMbeN8EL a7XblVdpnTwt9vFy5sTwZjzKm2RtnanRxrkY2FQhiD1ko6FQRXCBLvyWZGJ8ynUy a8MfcSO/UxCV1cjWzRq6F1AGZnwjUGnZRoidy5PxFE2mBqr/lw0DP9HKG/PfguUg vTXGMdHqAdp4GTi4U3SplVmRGCmmrI8MqQ39Ej/Ww1ksQVWeHBWEo+dg2qfEE6I5 679QRYCr24NLdNjWuuTJjSao5hvdFQMmJC+y0XRCMNljksWYIAm28fB3LKqqXZQ= =WvxN -----END PGP SIGNATURE-----
_______________________________________________ firedrake mailing list firedrake@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/firedrake
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/08/15 15:42, Justin Chang wrote:
mpiexec -n 2 python run-pyop2.py
--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.
Huh, OK. I was expecting this to work! To confirm, run-pyop2.py just contains: from pyop2 import op2 op2.init() Can you edit some files in PyOP2 just to check what's going on: In PyOP2/pyop2/__init__.py Replace: from pyop2_utils import enable_mpi_prefork with: print 'init pyop2' from pyop2_utils import enable_mpi_prefork In PyOP2/pyop2/mpi.py Replace: from decorator import decorator with: print 'init mpi' from decorator import decorator In PyOP2/pyop2_utils/__init__.py Replace: prefork.enable_prefork() with: print 'forking' prefork.enable_prefork() And then try running run-pyop2.py on just one MPI process. I see: init pyop2 forking loading mpi But maybe you get something else? And then let's test if the following works: cat > manual-fork.py << EOF from pyop2_utils import enable_mpi_prefork enable_mpi_prefork() from firedrake import * from mpi4py import MPI print MPI.COMM_WORLD.size EOF mpiexec -n 2 python manual-fork.py Cheers, Lawrence -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVw3VgAAoJECOc1kQ8PEYv6vMH/0axxqZd+KqdQ5OMIjdBuF9g OIJC6YcoCCcigYh/RccSAVpzTuAvGWQdRJqMMlEfukrZrn6NSsw+TT+kzR+SPneg D66OVIsO2vfKfia/9yflEE4gKkVHfzo067dzstDZ8HnG0tVEqAexJdKzfK7nUE6M LJovg+YbJ5g/50BPbfCYJNRCtYAfjHjZWNOfYlLRITH55O7oIlAo4d2Pvk3wb+ok lDJI9G1gdHOIz8U+uuQ8/dKbutUSRfH7a3NK4wbjf/Q05BCS16k2T/Ouim1wkUql 9/4lKDkKsNTfMILXxNf3STlHJJlF5JRpDRJ+N462UjfJyTI9qTeZIv5GD9lH/x8= =afxk -----END PGP SIGNATURE-----
Yes run-pyop2.py contains exactly those two lines
mpiexec -n 1 python run-pyop2.py
init pyop2 forking init mpi -------------------------------------------------------------------------- An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: compute-0-0 (PID 52215) MPI_COMM_WORLD rank: 0 If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- So I think I am getting exactly what you're getting except I am still getting the fork error. And when I try your latest test:
mpiexec -n 2 python manual-fork.py
forking forking init pyop2 init mpi init pyop2 init mpi -------------------------------------------------------------------------- An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: compute-0-0 (PID 52345) MPI_COMM_WORLD rank: 1 If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- 2 2 [compute-0-0.local:52342] 1 more process has sent help message help-mpi-runtime.txt / mpi_init:warn-fork [compute-0-0.local:52342] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages On Thu, Aug 6, 2015 at 9:55 AM, Lawrence Mitchell < lawrence.mitchell@imperial.ac.uk> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 06/08/15 15:42, Justin Chang wrote:
mpiexec -n 2 python run-pyop2.py
--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.
Huh, OK.
I was expecting this to work! To confirm, run-pyop2.py just contains:
from pyop2 import op2 op2.init()
Can you edit some files in PyOP2 just to check what's going on:
In PyOP2/pyop2/__init__.py
Replace:
from pyop2_utils import enable_mpi_prefork
with:
print 'init pyop2' from pyop2_utils import enable_mpi_prefork
In PyOP2/pyop2/mpi.py
Replace:
from decorator import decorator
with:
print 'init mpi' from decorator import decorator
In PyOP2/pyop2_utils/__init__.py
Replace:
prefork.enable_prefork()
with:
print 'forking' prefork.enable_prefork()
And then try running run-pyop2.py on just one MPI process.
I see:
init pyop2 forking loading mpi
But maybe you get something else?
And then let's test if the following works:
cat > manual-fork.py << EOF from pyop2_utils import enable_mpi_prefork enable_mpi_prefork() from firedrake import * from mpi4py import MPI print MPI.COMM_WORLD.size EOF mpiexec -n 2 python manual-fork.py
Cheers,
Lawrence
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1
iQEcBAEBAgAGBQJVw3VgAAoJECOc1kQ8PEYv6vMH/0axxqZd+KqdQ5OMIjdBuF9g OIJC6YcoCCcigYh/RccSAVpzTuAvGWQdRJqMMlEfukrZrn6NSsw+TT+kzR+SPneg D66OVIsO2vfKfia/9yflEE4gKkVHfzo067dzstDZ8HnG0tVEqAexJdKzfK7nUE6M LJovg+YbJ5g/50BPbfCYJNRCtYAfjHjZWNOfYlLRITH55O7oIlAo4d2Pvk3wb+ok lDJI9G1gdHOIz8U+uuQ8/dKbutUSRfH7a3NK4wbjf/Q05BCS16k2T/Ouim1wkUql 9/4lKDkKsNTfMILXxNf3STlHJJlF5JRpDRJ+N462UjfJyTI9qTeZIv5GD9lH/x8= =afxk -----END PGP SIGNATURE-----
_______________________________________________ firedrake mailing list firedrake@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/firedrake
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/08/15 16:05, Justin Chang wrote: ...
So I think I am getting exactly what you're getting except I am still getting the fork error. And when I try your latest test:
OK, so I think I know what's going on. I thought there was only one place in the PyOP2/Firedrake code where fork was called from, but that was incorrect. Can you try the following: edit PyOP2/pyop2/__init__.py and remove the lines: from ._version import get_versions __version__ = get_versions(default={"version": ver, "full": ""})['version'] del get_versions and edit firedrake/firedrake/__init__.py and similarly remove the lines: from ._version import get_versions __version__ = get_versions(default={"version": ver, "full": ""})['version'] del get_versions And then try running the latest test, then the run-pyop2 test and the firedrake-test examples. Fingers crossed, this will nail the problem! Cheers, Lawrence -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVw3qYAAoJECOc1kQ8PEYvv+sIAO5M/mQcGeckd9bvlzG74rOo HfadieG4ZZSguVy5VYyMDSXnig2FJGjsD/uOZW5GMrKGKGA/WE/B9jhhUPcfiz8k vmOhqxw8g1CGUKCrCNC7fGie+t3zq9AzN+NjlwcLlwxP8ePUcID0o6N3TG1G9Zow rxoyL1m05+mGU7zBcj31fMAHRDuqZAit910RQvclW0uZ6S9QJ1hP7TMNHP6hEsda XehARHx+GViaWuG5szjv+QtmUxCLHLfcvWmv58/2JTgrAlwFhVccKMYh+f8Bky49 Y/dRal1Uu2UfT2obHTc4NuafygOFbIAH7T5gxZCg9oerqT3kCnKysPnesF+VXLk= =SJFN -----END PGP SIGNATURE-----
Still same errors. I just commented out the lines you have mentioned, did I need to rebuild (or clear the cache of) anything? On Thu, Aug 6, 2015 at 10:17 AM, Lawrence Mitchell < lawrence.mitchell@imperial.ac.uk> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 06/08/15 16:05, Justin Chang wrote:
...
So I think I am getting exactly what you're getting except I am still getting the fork error. And when I try your latest test:
OK, so I think I know what's going on. I thought there was only one place in the PyOP2/Firedrake code where fork was called from, but that was incorrect. Can you try the following:
edit PyOP2/pyop2/__init__.py
and remove the lines:
from ._version import get_versions __version__ = get_versions(default={"version": ver, "full": ""})['version'] del get_versions
and edit firedrake/firedrake/__init__.py
and similarly remove the lines:
from ._version import get_versions __version__ = get_versions(default={"version": ver, "full": ""})['version'] del get_versions
And then try running the latest test, then the run-pyop2 test and the firedrake-test examples. Fingers crossed, this will nail the problem!
Cheers,
Lawrence -----BEGIN PGP SIGNATURE----- Version: GnuPG v1
iQEcBAEBAgAGBQJVw3qYAAoJECOc1kQ8PEYvv+sIAO5M/mQcGeckd9bvlzG74rOo HfadieG4ZZSguVy5VYyMDSXnig2FJGjsD/uOZW5GMrKGKGA/WE/B9jhhUPcfiz8k vmOhqxw8g1CGUKCrCNC7fGie+t3zq9AzN+NjlwcLlwxP8ePUcID0o6N3TG1G9Zow rxoyL1m05+mGU7zBcj31fMAHRDuqZAit910RQvclW0uZ6S9QJ1hP7TMNHP6hEsda XehARHx+GViaWuG5szjv+QtmUxCLHLfcvWmv58/2JTgrAlwFhVccKMYh+f8Bky49 Y/dRal1Uu2UfT2obHTc4NuafygOFbIAH7T5gxZCg9oerqT3kCnKysPnesF+VXLk= =SJFN -----END PGP SIGNATURE-----
_______________________________________________ firedrake mailing list firedrake@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/firedrake
On 6 Aug 2015, at 16:29, Justin Chang <jychang48@gmail.com> wrote:
Still same errors. I just commented out the lines you have mentioned, did I need to rebuild (or clear the cache of) anything?
I don't think so. I thought that was all the instances, but I may have missed some. I'll have a play around and try and get back to you tomorrow with a proper fix. Cheers, Lawrence
Okay, that's fine. Again thank you very much for all your help. Justin On Thu, Aug 6, 2015 at 10:42 AM, Lawrence Mitchell < lawrence.mitchell@imperial.ac.uk> wrote:
On 6 Aug 2015, at 16:29, Justin Chang <jychang48@gmail.com> wrote:
Still same errors. I just commented out the lines you have mentioned, did I need to rebuild (or clear the cache of) anything?
I don't think so. I thought that was all the instances, but I may have missed some. I'll have a play around and try and get back to you tomorrow with a proper fix.
Cheers,
Lawrence _______________________________________________ firedrake mailing list firedrake@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/firedrake
Hi Justin,
On 6 Aug 2015, at 16:44, Justin Chang <jychang48@gmail.com> wrote:
Okay, that's fine. Again thank you very much for all your help.
Can you please try with the "prefork-everywhere" branches of firedrake and PyOP2? If you're using checkouts of firedrake and PyOP2 do: git fetch origin git checkout prefork-everywhere in both firedrake and PyOP2, and to be safe, rebuild the extension modules: In PyOP2: make ext In firedrake: make clean all Alternately, if you've installed them via pip do the normal pip install except: pip install git+https://github.com/OP2/PyOP2.git@prefork-everywhere#egg=PyOP2 pip install git+https://github.com/firedrakeproject/firedrake.git@prefork-everywhere#egg=fir... On my system having installed a handler like OpenMPI does to check if I'm calling fork after MPI initialisation, I get no output when running firedrake programs, indicating that I'm not forking in an MPI process. Hope this solves the problem! Cheers, Lawrence
Hi Lawrence, After several trial and run experimentations these last few days, I still get the warning even with the pre-fork branches. However, if I use our system's intel-compiled libraries and compilers, I get no such errors when using mpirun. I suspect that this may have something to do with the HPC system that I am running on. But I am now running into a stranger issue: I have attached an RT0 code that I am working with. Basically, it takes as input the seed size (aka the number of cells to generate in each spatial direction). The command line argument goes like this: "mpirun -n 1 python Test_RT0.py <seed>" I get strange errors when the seed number changes. I ran all problems with -log_trace. For <seed> = 3, i get this: $ python Test_RT0.py 3 Discretization: RT0 [0] 2.14577e-06 Event begin: DMPlexStratify [0] 0.00011301 Event end: DMPlexStratify [0] 0.000148058 Event begin: VecSet [0] 0.000167131 Event end: VecSet The program freezes at this point. I had to forcibly cancel the process. For <seed>=4, i get this: $ python Test_RT0.py 4 Discretization: RT0 [0] 3.09944e-06 Event begin: DMPlexStratify [0] 0.000130892 Event end: DMPlexStratify [0] 0.000170946 Event begin: VecSet [0] 0.000179052 Event end: VecSet [0] 0.00339103 Event begin: DMPlexStratify [0] 0.00343394 Event end: DMPlexStratify [0] 0.00343895 Event begin: DMPlexInterp [0] 0.00394988 Event begin: DMPlexStratify [0] 0.00421691 Event end: DMPlexStratify [0] 0.00490594 Event begin: DMPlexStratify [0] 0.00530601 Event end: DMPlexStratify [0] 0.00533199 Event end: DMPlexInterp [0] 0.00535703 Event begin: VecSet [0] 0.00536108 Event end: VecSet [0] 0.00722694 Event begin: VecSet [0] 0.0072329 Event end: VecSet [0] 0.00725293 Event begin: SFSetGraph [0] 0.00726795 Event end: SFSetGraph [0] 0.0721381 Event begin: SFSetGraph [0] 0.0721519 Event end: SFSetGraph [0] 0.0721741 Event begin: SFSetGraph [0] 0.0721779 Event end: SFSetGraph [0] 0.072628 Event begin: VecSet [0] 0.0726349 Event end: VecSet [0] 0.122044 Event begin: SFSetGraph [0] 0.122067 Event end: SFSetGraph [0] 0.122077 Event begin: SFSetGraph [0] 0.122081 Event end: SFSetGraph [0] 0.122534 Event begin: VecSet [0] 0.122541 Event end: VecSet [0] 0.123546 Event begin: SFSetGraph [0] 0.123561 Event end: SFSetGraph [0] 0.12357 Event begin: SFSetGraph [0] 0.123574 Event end: SFSetGraph [0] 0.123893 Event begin: VecSet [0] 0.1239 Event end: VecSet [0] 0.12432 Event begin: VecSet [0] 0.124328 Event end: VecSet [0] 0.124644 Event begin: VecScatterBegin [0] 0.124655 Event end: VecScatterBegin [0] 0.124675 Event begin: VecScatterBegin [0] 0.124679 Event end: VecScatterBegin [0] 0.124693 Event begin: VecSet [0] 0.124697 Event end: VecSet MPI processes 1: solving... [0] 0.19036 Event begin: MatAssemblyBegin [0] 0.190368 Event end: MatAssemblyBegin [0] 0.190371 Event begin: MatAssemblyEnd [0] 0.190405 Event end: MatAssemblyEnd [0] 0.190592 Event begin: MatAssemblyBegin [0] 0.190598 Event end: MatAssemblyBegin [0] 0.1906 Event begin: MatAssemblyEnd [0] 0.190623 Event end: MatAssemblyEnd [0] 0.190784 Event begin: MatAssemblyBegin [0] 0.190789 Event end: MatAssemblyBegin [0] 0.190792 Event begin: MatAssemblyEnd [0] 0.190802 Event end: MatAssemblyEnd [0] 0.190931 Event begin: MatAssemblyBegin [0] 0.190937 Event end: MatAssemblyBegin [0] 0.190939 Event begin: MatAssemblyEnd [0] 0.190948 Event end: MatAssemblyEnd pyop2:INFO Compiling wrapper... Traceback (most recent call last): File "Test_RT0.py", line 80, in <module> solver = LinearSolver(A,solver_parameters=selfp_parameters,options_prefix="selfp_") File "/home/jchang23/firedrake-deps/firedrake/firedrake/linear_solver.py", line 83, in __init__ self.ksp.setOperators(A=self.A.M.handle, P=self.P.M.handle) File "/home/jchang23/firedrake-deps/firedrake/firedrake/matrix.py", line 145, in M self._M._force_evaluation() File "/home/jchang23/firedrake-deps/PyOP2/pyop2/base.py", line 1565, in _force_evaluation _trace.evaluate(reads, writes) File "/home/jchang23/firedrake-deps/PyOP2/pyop2/base.py", line 169, in evaluate comp._run() File "/home/jchang23/firedrake-deps/PyOP2/pyop2/base.py", line 4014, in _run return self.compute() File "/home/jchang23/firedrake-deps/PyOP2/pyop2/base.py", line 4038, in compute fun = self._jitmodule File "/home/jchang23/firedrake-deps/PyOP2/pyop2/utils.py", line 64, in __get__ obj.__dict__[self.__name__] = result = self.fget(obj) File "/home/jchang23/firedrake-deps/PyOP2/pyop2/sequential.py", line 158, in _jitmodule direct=self.is_direct, iterate=self.iteration_region) File "/home/jchang23/firedrake-deps/PyOP2/pyop2/caching.py", line 203, in __new__ obj = make_obj() File "/home/jchang23/firedrake-deps/PyOP2/pyop2/caching.py", line 193, in make_obj obj.__init__(*args, **kwargs) File "/home/jchang23/firedrake-deps/PyOP2/pyop2/host.py", line 704, in __init__ self.compile() File "/home/jchang23/firedrake-deps/PyOP2/pyop2/host.py", line 802, in compile compiler=compiler.get('name')) File "/home/jchang23/firedrake-deps/PyOP2/pyop2/compilation.py", line 269, in load dll = compiler.get_so(src, extension) File "/home/jchang23/firedrake-deps/PyOP2/pyop2/compilation.py", line 138, in get_so Original error: %s""" % (cc, logfile, errfile, e)) pyop2.exceptions.CompilationError: Command "['mpicc', '-std=c99', '-fPIC', '-Wall', '-g', '-O3', '-fno-tree-vectorize', '-I/home/jchang23/petsc-dev/include', '-I/home/jchang23/petsc-dev/arch-linux2-c-opt/include', '-I/home/jchang23/firedrake-deps/firedrake/firedrake', '-I/home/jchang23/firedrake-deps/PyOP2/pyop2', '-msse', '-o', '/tmp/pyop2-cache-uid3003/32e0bd01cb649f218f2092c503c1d41f.so.tmp', '/tmp/pyop2-cache-uid3003/32e0bd01cb649f218f2092c503c1d41f.c', '-shared', '-L/home/jchang23/petsc-dev/lib', '-L/home/jchang23/petsc-dev/arch-linux2-c-opt/lib', '-Wl,-rpath,/home/jchang23/petsc-dev/lib', '-Wl,-rpath,/home/jchang23/petsc-dev/arch-linux2-c-opt/lib', '-lpetsc', '-lm']" returned with error. Unable to compile code Compile log in /tmp/pyop2-cache-uid3003/32e0bd01cb649f218f2092c503c1d41f.log Compile errors in /tmp/pyop2-cache-uid3003/32e0bd01cb649f218f2092c503c1d41f.err Original error: status 1 invoking 'mpicc -std=c99 -fPIC -Wall -g -O3 -fno-tree-vectorize -I/home/jchang23/petsc-dev/include -I/home/jchang23/petsc-dev/arch-linux2-c-opt/include -I/home/jchang23/firedrake-deps/firedrake/firedrake -I/home/jchang23/firedrake-deps/PyOP2/pyop2 -msse -o /tmp/pyop2-cache-uid3003/32e0bd01cb649f218f2092c503c1d41f.so.tmp /tmp/pyop2-cache-uid3003/32e0bd01cb649f218f2092c503c1d41f.c -shared -L/home/jchang23/petsc-dev/lib -L/home/jchang23/petsc-dev/arch-linux2-c-opt/lib -Wl,-rpath,/home/jchang23/petsc-dev/lib -Wl,-rpath,/home/jchang23/petsc-dev/arch-linux2-c-opt/lib -lpetsc -lm' For <seed>=5 I get this: $ python Test_RT0.py 5 Discretization: RT0 [0] 1.90735e-06 Event begin: DMPlexStratify [0] 0.000158072 Event end: DMPlexStratify [0] 0.000201941 Event begin: VecSet [0] 0.000209093 Event end: VecSet Traceback (most recent call last): File "Test_RT0.py", line 31, in <module> mesh = UnitCubeMesh(seed, seed, seed) File "/home/jchang23/firedrake-deps/firedrake/firedrake/utility_meshes.py", line 511, in UnitCubeMesh return CubeMesh(nx, ny, nz, 1, reorder=reorder) File "/home/jchang23/firedrake-deps/firedrake/firedrake/utility_meshes.py", line 491, in CubeMesh return BoxMesh(nx, ny, nz, L, L, L, reorder=reorder) File "/home/jchang23/firedrake-deps/firedrake/firedrake/utility_meshes.py", line 443, in BoxMesh plex = PETSc.DMPlex().generate(boundary) File "PETSc/DMPlex.pyx", line 451, in petsc4py.PETSc.DMPlex.generate (src/petsc4py.PETSc.c:221438) petsc4py.PETSc.Error: error code 77 [0] DMPlexGenerate() line 1080 in /home/jchang23/petsc-dev/src/dm/impls/plex/plexgenerate.c [0] DMPlexGenerate_CTetgen() line 834 in /home/jchang23/petsc-dev/src/dm/impls/plex/plexgenerate.c [0] TetGenTetrahedralize() line 21483 in /home/jchang23/petsc-dev/arch-linux2-c-opt/externalpackages/ctetgen/ctetgen.c [0] TetGenMeshDelaunizeVertices() line 12113 in /home/jchang23/petsc-dev/arch-linux2-c-opt/externalpackages/ctetgen/ctetgen.c [0] TetGenMeshDelaunayIncrFlip() line 12046 in /home/jchang23/petsc-dev/arch-linux2-c-opt/externalpackages/ctetgen/ctetgen.c [0] TetGenMeshInsertVertexBW() line 11559 in /home/jchang23/petsc-dev/arch-linux2-c-opt/externalpackages/ctetgen/ctetgen.c [0] TetGenMeshInSphereS() line 5411 in /home/jchang23/petsc-dev/arch-linux2-c-opt/externalpackages/ctetgen/ctetgen.c [0] Petsc has generated inconsistent data [0] This is wrong I am very confused by these strange results. Any explanation for these? Thanks, Justin On Fri, Aug 7, 2015 at 4:00 AM, Lawrence Mitchell < lawrence.mitchell@imperial.ac.uk> wrote:
Hi Justin,
On 6 Aug 2015, at 16:44, Justin Chang <jychang48@gmail.com> wrote:
Okay, that's fine. Again thank you very much for all your help.
Can you please try with the "prefork-everywhere" branches of firedrake and PyOP2?
If you're using checkouts of firedrake and PyOP2 do:
git fetch origin git checkout prefork-everywhere
in both firedrake and PyOP2, and to be safe, rebuild the extension modules:
In PyOP2:
make ext
In firedrake:
make clean all
Alternately, if you've installed them via pip do the normal pip install except:
pip install git+ https://github.com/OP2/PyOP2.git@prefork-everywhere#egg=PyOP2 pip install git+ https://github.com/firedrakeproject/firedrake.git@prefork-everywhere#egg=fir...
On my system having installed a handler like OpenMPI does to check if I'm calling fork after MPI initialisation, I get no output when running firedrake programs, indicating that I'm not forking in an MPI process.
Hope this solves the problem!
Cheers,
Lawrence
_______________________________________________ firedrake mailing list firedrake@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/firedrake
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/08/15 18:16, Justin Chang wrote:
Hi Lawrence,
After several trial and run experimentations these last few days, I still get the warning even with the pre-fork branches. However, if I use our system's intel-compiled libraries and compilers, I get no such errors when using mpirun. I suspect that this may have something to do with the HPC system that I am running on.
Reading between the lines, it seems to be some bad interaction between OpenMPI, forking, and the infiniband OpenFabrics transport. I have filed a bug that will remind us about it: I think our local HPC system also has the same setup, but I have not run there recently.
But I am now running into a stranger issue:
I have attached an RT0 code that I am working with. Basically, it takes as input the seed size (aka the number of cells to generate in each spatial direction). The command line argument goes like this: "mpirun -n 1 python Test_RT0.py <seed>"
I get strange errors when the seed number changes. I ran all problems with -log_trace. For <seed> = 3, i get this:
$ python Test_RT0.py 3
...
pyop2.exceptions.CompilationError: Command "['mpicc', '-std=c99', '-fPIC', '-Wall', '-g', '-O3', '-fno-tree-vectorize', '-I/home/jchang23/petsc-dev/include', '-I/home/jchang23/petsc-dev/arch-linux2-c-opt/include', '-I/home/jchang23/firedrake-deps/firedrake/firedrake', '-I/home/jchang23/firedrake-deps/PyOP2/pyop2', '-msse', '-o', '/tmp/pyop2-cache-uid3003/32e0bd01cb649f218f2092c503c1d41f.so.tmp',
'/tmp/pyop2-cache-uid3003/32e0bd01cb649f218f2092c503c1d41f.c',
'-shared', '-L/home/jchang23/petsc-dev/lib', '-L/home/jchang23/petsc-dev/arch-linux2-c-opt/lib', '-Wl,-rpath,/home/jchang23/petsc-dev/lib', '-Wl,-rpath,/home/jchang23/petsc-dev/arch-linux2-c-opt/lib', '-lpetsc', '-lm']" returned with error.
Unable to compile code
This error, I think, comes from using intel compilers, but passing gcc options. Can you try with: parameters["pyop2_options"]["compiler"] = "intel" which has some prebaked options for the intel compiler.
For <seed>=5 I get this:
petsc4py.PETSc.Error: error code 77
[0] DMPlexGenerate() line 1080 in /home/jchang23/petsc-dev/src/dm/impls/plex/plexgenerate.c
[0] DMPlexGenerate_CTetgen() line 834 in /home/jchang23/petsc-dev/src/dm/impls/plex/plexgenerate.c
[0] TetGenTetrahedralize() line 21483 in /home/jchang23/petsc-dev/arch-linux2-c-opt/externalpackages/ctetgen/ctetgen.c
[0] TetGenMeshDelaunizeVertices() line 12113 in /home/jchang23/petsc-dev/arch-linux2-c-opt/externalpackages/ctetgen/ctetgen.c
[0] TetGenMeshDelaunayIncrFlip() line 12046 in /home/jchang23/petsc-dev/arch-linux2-c-opt/externalpackages/ctetgen/ctetgen.c
[0] TetGenMeshInsertVertexBW() line 11559 in /home/jchang23/petsc-dev/arch-linux2-c-opt/externalpackages/ctetgen/ctetgen.c
[0] TetGenMeshInSphereS() line 5411 in /home/jchang23/petsc-dev/arch-linux2-c-opt/externalpackages/ctetgen/ctetgen.c
[0] Petsc has generated inconsistent data
[0] This is wrong
I am very confused by these strange results. Any explanation for these?
This one, I /think/ I've seen before. I believe that the intel compiler produces bad code for ctetgen with optimisations on. I think this is plausibly a petsc-only problem, try with the same setup but do: from petsc4py import PETSc bdy = PETSc.DMPlex().create() bdy.setDimension(2) bdy.createCubeBoundary([0, 0, 0], [1, 1, 1], [5, 5, 5]) dm = PETSc.DMPlex().generate(bdy) I don't have access to the intel compiler stack here, but maybe it's worth reporting the bug in this form (assuming it shows up) to petsc-maint. If you're able to use gmsh to build the appropriate meshes, that is probably a more robust option. Alternately you could try (although I never have) using tetgen (rather than ctetgen). I think you'll need to configure PETSc with --with-clanguage=c++ --download-tetgen (rather than --download-ctetgen). Cheers, Lawrence -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVybRbAAoJECOc1kQ8PEYvvS0IALzJjuMdyjvdn5H4RQtNAuHy UiMVhPWqE6F9iIeTVBY+Or5YGjlTKFHBXKjZCl5Fs3TgK8GDsn0COuoy+9TWzMR9 Kn/XnxZuN1NYGwuiPlT8plahMGc2jAkI7peiPJsQyAfOfVA6nlBi6mDLiEDHuuvX 7g/9P/ZcWu89D9/9m/mfNTRZq9anF5zcXjIHHJzojdlg8DBWviAFO7gVpqivrD72 Zzr+Q3xaFKQI9deNkjFacjJPJT/OUkBKAf/n4BD5wNAPt6z/TCyTJ4fTdeZ4lY2I 2m6IDyJE4RNlmaPfzqq8D91T/kswy0wbC3IrMwTOGG47GoEYuOXqe45llZ/+M4U= =HGNY -----END PGP SIGNATURE-----
Lawrence, When I compile everything with MPICH-3.1.4 on this machine, I get no complains whatsoever. It only happens when I use OpenMPI. I don't like the default binding options (or lack thereof) for MPICH and would prefer to use OpenMPI. Could this have something to do with the compilers that I am using? And/or how I am configuring openmpi and/or python? I could try this out on another HPC system i have access to (Intel Xeon E5-2670) to see if I can reproduce the problem, but this other machine has a firewall and makes the installation process even more troublesome... Justin On Tue, Aug 11, 2015 at 3:37 AM, Lawrence Mitchell < lawrence.mitchell@imperial.ac.uk> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 10/08/15 18:16, Justin Chang wrote:
Hi Lawrence,
After several trial and run experimentations these last few days, I still get the warning even with the pre-fork branches. However, if I use our system's intel-compiled libraries and compilers, I get no such errors when using mpirun. I suspect that this may have something to do with the HPC system that I am running on.
Reading between the lines, it seems to be some bad interaction between OpenMPI, forking, and the infiniband OpenFabrics transport. I have filed a bug that will remind us about it: I think our local HPC system also has the same setup, but I have not run there recently.
But I am now running into a stranger issue:
I have attached an RT0 code that I am working with. Basically, it takes as input the seed size (aka the number of cells to generate in each spatial direction). The command line argument goes like this: "mpirun -n 1 python Test_RT0.py <seed>"
I get strange errors when the seed number changes. I ran all problems with -log_trace. For <seed> = 3, i get this:
$ python Test_RT0.py 3
...
pyop2.exceptions.CompilationError: Command "['mpicc', '-std=c99', '-fPIC', '-Wall', '-g', '-O3', '-fno-tree-vectorize', '-I/home/jchang23/petsc-dev/include', '-I/home/jchang23/petsc-dev/arch-linux2-c-opt/include', '-I/home/jchang23/firedrake-deps/firedrake/firedrake', '-I/home/jchang23/firedrake-deps/PyOP2/pyop2', '-msse', '-o', '/tmp/pyop2-cache-uid3003/32e0bd01cb649f218f2092c503c1d41f.so.tmp',
'/tmp/pyop2-cache-uid3003/32e0bd01cb649f218f2092c503c1d41f.c',
'-shared', '-L/home/jchang23/petsc-dev/lib', '-L/home/jchang23/petsc-dev/arch-linux2-c-opt/lib', '-Wl,-rpath,/home/jchang23/petsc-dev/lib', '-Wl,-rpath,/home/jchang23/petsc-dev/arch-linux2-c-opt/lib', '-lpetsc', '-lm']" returned with error.
Unable to compile code
This error, I think, comes from using intel compilers, but passing gcc options. Can you try with:
parameters["pyop2_options"]["compiler"] = "intel"
which has some prebaked options for the intel compiler.
For <seed>=5 I get this:
petsc4py.PETSc.Error: error code 77
[0] DMPlexGenerate() line 1080 in /home/jchang23/petsc-dev/src/dm/impls/plex/plexgenerate.c
[0] DMPlexGenerate_CTetgen() line 834 in /home/jchang23/petsc-dev/src/dm/impls/plex/plexgenerate.c
[0] TetGenTetrahedralize() line 21483 in
/home/jchang23/petsc-dev/arch-linux2-c-opt/externalpackages/ctetgen/ctetgen.c
[0] TetGenMeshDelaunizeVertices() line 12113 in
/home/jchang23/petsc-dev/arch-linux2-c-opt/externalpackages/ctetgen/ctetgen.c
[0] TetGenMeshDelaunayIncrFlip() line 12046 in
/home/jchang23/petsc-dev/arch-linux2-c-opt/externalpackages/ctetgen/ctetgen.c
[0] TetGenMeshInsertVertexBW() line 11559 in
/home/jchang23/petsc-dev/arch-linux2-c-opt/externalpackages/ctetgen/ctetgen.c
[0] TetGenMeshInSphereS() line 5411 in
/home/jchang23/petsc-dev/arch-linux2-c-opt/externalpackages/ctetgen/ctetgen.c
[0] Petsc has generated inconsistent data
[0] This is wrong
I am very confused by these strange results. Any explanation for these?
This one, I /think/ I've seen before. I believe that the intel compiler produces bad code for ctetgen with optimisations on. I think this is plausibly a petsc-only problem, try with the same setup but do:
from petsc4py import PETSc bdy = PETSc.DMPlex().create() bdy.setDimension(2) bdy.createCubeBoundary([0, 0, 0], [1, 1, 1], [5, 5, 5]) dm = PETSc.DMPlex().generate(bdy)
I don't have access to the intel compiler stack here, but maybe it's worth reporting the bug in this form (assuming it shows up) to petsc-maint.
If you're able to use gmsh to build the appropriate meshes, that is probably a more robust option. Alternately you could try (although I never have) using tetgen (rather than ctetgen). I think you'll need to configure PETSc with --with-clanguage=c++ --download-tetgen (rather than --download-ctetgen).
Cheers,
Lawrence -----BEGIN PGP SIGNATURE----- Version: GnuPG v1
iQEcBAEBAgAGBQJVybRbAAoJECOc1kQ8PEYvvS0IALzJjuMdyjvdn5H4RQtNAuHy UiMVhPWqE6F9iIeTVBY+Or5YGjlTKFHBXKjZCl5Fs3TgK8GDsn0COuoy+9TWzMR9 Kn/XnxZuN1NYGwuiPlT8plahMGc2jAkI7peiPJsQyAfOfVA6nlBi6mDLiEDHuuvX 7g/9P/ZcWu89D9/9m/mfNTRZq9anF5zcXjIHHJzojdlg8DBWviAFO7gVpqivrD72 Zzr+Q3xaFKQI9deNkjFacjJPJT/OUkBKAf/n4BD5wNAPt6z/TCyTJ4fTdeZ4lY2I 2m6IDyJE4RNlmaPfzqq8D91T/kswy0wbC3IrMwTOGG47GoEYuOXqe45llZ/+M4U= =HGNY -----END PGP SIGNATURE-----
_______________________________________________ firedrake mailing list firedrake@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/firedrake
On 13 Aug 2015, at 19:53, Justin Chang <jychang48@gmail.com> wrote:
Lawrence,
When I compile everything with MPICH-3.1.4 on this machine, I get no complains whatsoever. It only happens when I use OpenMPI. I don't like the default binding options (or lack thereof) for MPICH and would prefer to use OpenMPI. Could this have something to do with the compilers that I am using? And/or how I am configuring openmpi and/or python?
I don't think it's to do with how you're configuring openmpi. It's rather that the infiniband support is "known bad" when forking, see this OpenMPI FAQ: https://www.open-mpi.org/faq/?category=openfabrics#ofa-fork Our use of fork falls into the "calling system() or popen()" case, so plausibly you might be able to turn off that warning and continue. However, I recall you saying that your code just hangs when you do this, so maybe that's no good.
I could try this out on another HPC system i have access to (Intel Xeon E5-2670) to see if I can reproduce the problem, but this other machine has a firewall and makes the installation process even more troublesome...
I think we have infiniband-based clusters here, so hopefully we can reproduce at this end. There do appear to be some issues with robustness on these kind of systems though, so I'm definitely keen to fix things. Lawrence
Okay so on one compute node (20 cores, 2 sockets) works fine, even with the warning (originally my code hangs at 20 cores). However, if my sbatch script calls for more than one compute node my program freezes for anything > 20 processes. However, this also happens when I use MPICH. Now I am not sure if it's simply an issue with our university's HPC system or if MPICH also has the same problems as OpenMPI. On Thu, Aug 13, 2015 at 4:36 PM, Lawrence Mitchell <lawrence.mitchell@imperial.ac.uk> wrote:
On 13 Aug 2015, at 19:53, Justin Chang <jychang48@gmail.com> wrote:
Lawrence,
When I compile everything with MPICH-3.1.4 on this machine, I get no complains whatsoever. It only happens when I use OpenMPI. I don't like the default binding options (or lack thereof) for MPICH and would prefer to use OpenMPI. Could this have something to do with the compilers that I am using? And/or how I am configuring openmpi and/or python?
I don't think it's to do with how you're configuring openmpi. It's rather that the infiniband support is "known bad" when forking, see this OpenMPI FAQ: https://www.open-mpi.org/faq/?category=openfabrics#ofa-fork
Our use of fork falls into the "calling system() or popen()" case, so plausibly you might be able to turn off that warning and continue. However, I recall you saying that your code just hangs when you do this, so maybe that's no good.
I could try this out on another HPC system i have access to (Intel Xeon E5-2670) to see if I can reproduce the problem, but this other machine has a firewall and makes the installation process even more troublesome...
I think we have infiniband-based clusters here, so hopefully we can reproduce at this end. There do appear to be some issues with robustness on these kind of systems though, so I'm definitely keen to fix things.
Lawrence
_______________________________________________ firedrake mailing list firedrake@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/firedrake
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 18/08/15 08:38, Justin Chang wrote:
Okay so on one compute node (20 cores, 2 sockets) works fine, even with the warning (originally my code hangs at 20 cores). However, if my sbatch script calls for more than one compute node my program freezes for anything > 20 processes. However, this also happens when I use MPICH. Now I am not sure if it's simply an issue with our university's HPC system or if MPICH also has the same problems as OpenMPI.
Hmm, plausibly something goes wrong somewhere in code generation (such that different processes go down different code paths). Can you try running with: parameters["pyop2_options"]["check_src_hashes"] = True Cheers, Lawrence -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJV3DBbAAoJECOc1kQ8PEYv4Y4IANlMFr5WBIXWIAxTKaTQyACe ymVqIZzF97AxYRpvO5WXTDGTwcPYCWEOlBGoAq9vf6xNqGHBYsoKBf0tSv0gY7bb I7wyWqJvwFRpUsSgvZN5XR7nU0+jMPpaZ5m6m8Md8PLHQnmvErUJjgmnss+Nc//p A4ioYfmK0YNzk+simOb3wAeYFM5lCtvw+8W/1Ll6XeUEyaStzQ9ZDRRLi7DR/Ikq gdz5YvzLrpQHPMVQXnT8NcGxvsIJPXmwMENMsZ9M2lCCE0qcziaIvK7HHjGGTnEP NjvTRWy5PZOpn2r5tPd7T7VYSRIZX3hkW9vTtnhny1Msg1f6EgtszWmHJCxgouY= =NMSC -----END PGP SIGNATURE-----
Hi Lawrence, Coming back to this. I have managed to install Firedrake on LANL’s Mustang Machine (Intel Xeon E5-2670) through the firedrake-install script (w/ —developer —no_package_manager —disable_ssh), and it works great on a single compute node (16 cores). However, I am still running into the same issue: the program freezes when I use two or more compute nodes. I even did a simple mpirun -n 2 python myprogram.py where I allocated 2 nodes and 1 ppn. As with my University’s Cluster, I am using their Openmpi-1.6.5 but installed my own Python version, because the system installed python/2.7-anaconda doesn’t have the latest packages (i.e., my install script freezes when I attempt to use the provided python/pip). I copied and pasted 'parameters["pyop2_options"]["check_src_hashes"] = True’ into my code, but it didn’t do anything. Any suggestions or ideas on what I should do? Justin
On Aug 25, 2015, at 3:07 AM, Lawrence Mitchell <lawrence.mitchell@imperial.ac.uk> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 18/08/15 08:38, Justin Chang wrote:
Okay so on one compute node (20 cores, 2 sockets) works fine, even with the warning (originally my code hangs at 20 cores). However, if my sbatch script calls for more than one compute node my program freezes for anything > 20 processes. However, this also happens when I use MPICH. Now I am not sure if it's simply an issue with our university's HPC system or if MPICH also has the same problems as OpenMPI.
Hmm, plausibly something goes wrong somewhere in code generation (such that different processes go down different code paths). Can you try running with:
parameters["pyop2_options"]["check_src_hashes"] = True
Cheers,
Lawrence -----BEGIN PGP SIGNATURE----- Version: GnuPG v1
iQEcBAEBAgAGBQJV3DBbAAoJECOc1kQ8PEYv4Y4IANlMFr5WBIXWIAxTKaTQyACe ymVqIZzF97AxYRpvO5WXTDGTwcPYCWEOlBGoAq9vf6xNqGHBYsoKBf0tSv0gY7bb I7wyWqJvwFRpUsSgvZN5XR7nU0+jMPpaZ5m6m8Md8PLHQnmvErUJjgmnss+Nc//p A4ioYfmK0YNzk+simOb3wAeYFM5lCtvw+8W/1Ll6XeUEyaStzQ9ZDRRLi7DR/Ikq gdz5YvzLrpQHPMVQXnT8NcGxvsIJPXmwMENMsZ9M2lCCE0qcziaIvK7HHjGGTnEP NjvTRWy5PZOpn2r5tPd7T7VYSRIZX3hkW9vTtnhny1Msg1f6EgtszWmHJCxgouY= =NMSC -----END PGP SIGNATURE-----
_______________________________________________ firedrake mailing list firedrake@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/firedrake
Hi Justin, I can't remember if this was ever resolved, and it clearly dropped off my radar.
On 2 Oct 2015, at 09:22, Justin Chang <jychang48@gmail.com> wrote:
Hi Lawrence,
Coming back to this.
I have managed to install Firedrake on LANL’s Mustang Machine (Intel Xeon E5-2670) through the firedrake-install script (w/ —developer —no_package_manager —disable_ssh), and it works great on a single compute node (16 cores). However, I am still running into the same issue: the program freezes when I use two or more compute nodes.
The just-in-time code generation requires that the generated code is written to a filesystem all the MPI processes can see: on more than one node this can't be a node-local temporary directory. Maybe things work if you set the environment variables: FIREDRAKE_FFC_KERNEL_CACHE_DIR and PYOP2_CACHE_DIR To directories that all the processes can see. Lawrence
Lawrence, Yes this issue had been resolved. I have done exactly those. Thanks, Justin On Sat, Nov 14, 2015 at 3:29 PM, Lawrence Mitchell < lawrence.mitchell@imperial.ac.uk> wrote:
Hi Justin,
I can't remember if this was ever resolved, and it clearly dropped off my radar.
On 2 Oct 2015, at 09:22, Justin Chang <jychang48@gmail.com> wrote:
Hi Lawrence,
Coming back to this.
I have managed to install Firedrake on LANL’s Mustang Machine (Intel Xeon E5-2670) through the firedrake-install script (w/ —developer —no_package_manager —disable_ssh), and it works great on a single compute node (16 cores). However, I am still running into the same issue: the program freezes when I use two or more compute nodes.
The just-in-time code generation requires that the generated code is written to a filesystem all the MPI processes can see: on more than one node this can't be a node-local temporary directory. Maybe things work if you set the environment variables:
FIREDRAKE_FFC_KERNEL_CACHE_DIR
and
PYOP2_CACHE_DIR
To directories that all the processes can see.
Lawrence
_______________________________________________ firedrake mailing list firedrake@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/firedrake
participants (2)
- 
                
                Justin Chang
- 
                
                Lawrence Mitchell