-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Justin, On 06/08/15 06:16, Justin Chang wrote:
Hi everyone,
I have installed firedrake on my university's HPC machine, and whenever i attempt to run any Firedrake program, I get this error:
--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.
The process that invoked fork was:
Local host: compute-0-0 (PID 28214)
MPI_COMM_WORLD rank: 0
If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
So I recently made some changes to PyOP2 to make us more robust in the face of OpenMPI not allowing forking, which we need to do to invoke compilers when jit-compiling code. To do this, we therefore attempt to fork a single process /before/ MPI is initialized (which is safe, because OpenMPI doesn't see it), this child process then does subsequent forks. Naturally, this will fail if MPI is already initialized by the time we come to fork. So possibly the programs you're running are initialising MPI? Let's check some things. Let's first try something that doesn't invoke fork at all: cat > no-fork.py << EOF from mpi4py import MPI print MPI.COMM_WORLD.size EOF mpiexec -n 2 python no-fork.py Now something that does call fork, but /before/ initialising MPI cat > fork-before.py << EOF import os def my_fork(): ret = os.fork() if ret == 0: print 'child exiting' os._exit(0) else: pass my_fork() from mpi4py import MPI print MPI.COMM_WORLD.size EOF mpiexec -n 2 python fork-before.py I hope this one works! Now fork afterwards (which I expect to fail with the error message above): cat > fork-after.py << EOF import os def my_fork(): ret = os.fork() if ret == 0: print 'child exiting' os._exit(0) else: pass from mpi4py import MPI print MPI.COMM_WORLD.size my_fork() EOF mpiexec -n 2 python fork-after.py Now something more like how PyOP2/Firedrake does things: cat > closer-test.py << EOF import os import socket def child(sock): val = sock.recv(1) import mpi4py.rc mpi4py.rc.initialize = False from mpi4py import MPI print 'In child', MPI.Is_initialized() os._exit(0) def parent(sock): from mpi4py import MPI print 'In parent', MPI.Is_initialized() sock.send("1") a, b = socket.socketpair() ret = os.fork() if ret == 0: a.close() child(b) else: b.close() parent(a) EOF mpiexec -n 2 python closer-test. Now let's try doing it the way PyOP2/firedrake does this: cat > fork-pyop2.py << EOF from pyop2_utils import enable_mpi_prefork enable_mpi_prefork() from mpi4py import MPI print MPI.COMM_WORLD.size EOF mpiexec -n 2 python fork-pyop2.py I hope this should work, because it's effectively just doing what fork-before.py does. Now let's just run pyop2 on its own: cat > pyop2.py << EOF from pyop2 import op2 op2.init() EOF mpiexec -n 2 python pyop2.py And then firedrake: cat > import-firedrake.py << EOF from firedrake import * EOF mpiexec -n 2 python import-firedrake.py And finally a short test in firedrake: cat > firedrake-test.py << EOF from firedrake import * mesh = UnitSquareMesh(3, 3) print assemble(Constant(1)*dx(domain=mesh)) EOF mpiexec -n 2 python firedrake-test.py Hopefully these tests will allow us to better see where things are going wrong. Cheers, Lawrence -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVwyZ9AAoJECOc1kQ8PEYv+dEIAIYn6MfkhLS1XVKbqzTfhQ6T Yb+uoGm2/hXnUki5JYoVRWrWrc3gOYDBxBFWBEBKQHy/d5tzutDvEZyM66nmzAhl YXSZEcfputIbT9d6VlmAzdjW39Yi/V6v+imuuyIhsAVDo8P/J5bD4xR2Q6DC+v30 +QglNfStcAfuQGrlfE7uQpR0SV4+PdkpQHCsbhuV8fGrXptQTSB+Q6GqNxrIK72X BmLR20dLZCW01pW0GYoSqak92E8SpFgaFTScPHHj4jV2yDyJpvWBnuxcdbfnOV3r 0hOh2gk2pHRcHdetL/pdhdQ2WkXevXTtrGeeqwMaw19Jq/XRaQa9umR4m1O5FKU= =68Vi -----END PGP SIGNATURE-----