What is going on here? It looks like you are using subprocess. Why would you do that on a cluster rather than MPI?
Matt
---------- Forwarded message ----------
From: Buesing, Henrik <HBuesing@eonerc.rwth-aachen.de>
Date: Thu, Mar 30, 2017 at 8:25 AM
Subject: AW: [petsc-maint] Weak scaling test: Fieldsplit questions
To: Matthew Knepley <knepley@gmail.com>
Cc: Barry Smith <bsmith@mcs.anl.gov>, Hong <hzhang@mcs.anl.gov>, "petsc-maint@mcs.anl.gov" <petsc-maint@mcs.anl.gov>
In my opinion, there is some kind of race condition in Firedrake when running on more than one node. Thus, until this is fixed it is very unlikely for me to get the 64 cores case running.
Hmm, we are running Firedrake in parallel with no problems here. What is the error?
[Buesing, Henrik] See [1] for the error message and the attached three logs (for the 32 core case this was a 2/5 running and 3/5 crashing ).
This is just for running the compiled code. During the compile stage I had problems, too. What I did is the following: 1) Run Firedrake on 1 node (this works). Now all the *.so files are in place. 2) Run Firedrake on more than one node. This crashes more often the more processes I use.
I’m guessing for a race condition, because on 17 cores (1 node + 1 core) my problem runs fine. On 32 cores it sometimes runs. And on 64 cores it, up to now, never runs.
But if you are not having these problems, and if the provided code reproduces the MatCreateSubMats problem, then you can do tests on your own. Well, a lot of ifs, but better than nothing.
Thank you!
Henrik
[1]
Traceback (most recent call last):
File "/work/hb111949/Firedrake/twophase/2pDrake/2pinjection.py", line 228, in <module>
solver = NonlinearVariationalSolver(problem,options_prefix="")
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/firedrake/firedrake/variational_solver.py", line 156, in __init__
pre_function_callback=pre_f_callback)
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/firedrake/firedrake/solving_utils.py", line 260, in __init__
form_compiler_parameters=fcp)
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/firedrake/firedrake/assemble.py", line 143, in create_assembly_callable
collect_loops=True)
File "<decorator-gen-279>", line 2, in _assemble
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/firedrake/firedrake/utils.py", line 62, in wrapper
return f(*args, **kwargs)
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/firedrake/firedrake/assemble.py", line 192, in _assemble
kernels = tsfc_interface.compile_form(f, "form", parameters=form_compiler_parameters, inverse=inverse)
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/firedrake/firedrake/tsfc_interface.py", line 193, in compile_form
number_map).kernels
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/PyOP2/pyop2/caching.py", line 200, in __new__
obj = make_obj()
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/PyOP2/pyop2/caching.py", line 190, in make_obj
obj.__init__(*args, **kwargs)
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/firedrake/firedrake/tsfc_interface.py", line 121, in __init__
kernels.append(KernelInfo(kernel=Kernel(ast, ast.name, opts=opts),
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/PyOP2/pyop2/caching.py", line 200, in __new__
obj = make_obj()
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/PyOP2/pyop2/caching.py", line 190, in make_obj
obj.__init__(*args, **kwargs)
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/PyOP2/pyop2/base.py", line 3843, in __init__
self._code = self._ast_to_c(self._ast, opts)
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/PyOP2/pyop2/sequential.py", line 73, in _ast_to_c
ast_handler.plan_cpu(self._opts)
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/COFFEE/coffee/plan.py", line 121, in plan_cpu
loop_opt.rewrite(rewrite)
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/COFFEE/coffee/optimizer.py", line 117, in rewrite
ew.sharing_graph_rewrite()
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/COFFEE/coffee/rewriter.py", line 619, in sharing_graph_rewrite
prob.solve(ilp.GLPK(msg=0))
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/lib/python2.7/site-packages/pulp/pulp.py", line 1651, in solve
status = solver.actualSolve(self, **kwargs)
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/lib/python2.7/site-packages/pulp/solvers.py", line 383, in actualSolve
rc = subprocess.call(proc, stdout = pipe, stderr = pipe)
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/lib/python2.7/site-packages/subprocess32.py", line 578, in call
p = Popen(*popenargs, **kwargs)
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/lib/python2.7/site-packages/subprocess32.py", line 825, in __init__
restore_signals, start_new_session)
File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/lib/python2.7/site-packages/subprocess32.py", line 1574, in _execute_child
raise child_exception_type(errno_num, err_msg)
OSError: [Errno 14] Bad address
Thanks,
Matt
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener