What is going on here? It looks like you are using subprocess. Why would you do that on a cluster rather than MPI? Matt ---------- Forwarded message ---------- From: Buesing, Henrik <HBuesing@eonerc.rwth-aachen.de> Date: Thu, Mar 30, 2017 at 8:25 AM Subject: AW: [petsc-maint] Weak scaling test: Fieldsplit questions To: Matthew Knepley <knepley@gmail.com> Cc: Barry Smith <bsmith@mcs.anl.gov>, Hong <hzhang@mcs.anl.gov>, " petsc-maint@mcs.anl.gov" <petsc-maint@mcs.anl.gov> In my opinion, there is some kind of race condition in Firedrake when running on more than one node. Thus, until this is fixed it is very unlikely for me to get the 64 cores case running. Hmm, we are running Firedrake in parallel with no problems here. What is the error? *[Buesing, Henrik] *See [1] for the error message and the attached three logs (for the 32 core case this was a 2/5 running and 3/5 crashing ). This is just for running the compiled code. During the compile stage I had problems, too. What I did is the following: 1) Run Firedrake on 1 node (this works). Now all the *.so files are in place. 2) Run Firedrake on more than one node. This crashes more often the more processes I use. I’m guessing for a race condition, because on 17 cores (1 node + 1 core) my problem runs fine. On 32 cores it sometimes runs. And on 64 cores it, up to now, never runs. But if you are not having these problems, and if the provided code reproduces the MatCreateSubMats problem, then you can do tests on your own. Well, a lot of ifs, but better than nothing. Thank you! Henrik [1] Traceback (most recent call last): File "/work/hb111949/Firedrake/twophase/2pDrake/2pinjection.py", line 228, in <module> solver = NonlinearVariationalSolver(problem,options_prefix="") File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/ src/firedrake/firedrake/variational_solver.py", line 156, in __init__ pre_function_callback=pre_f_callback) File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/ src/firedrake/firedrake/solving_utils.py", line 260, in __init__ form_compiler_parameters=fcp) File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/ src/firedrake/firedrake/assemble.py", line 143, in create_assembly_callable collect_loops=True) File "<decorator-gen-279>", line 2, in _assemble File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/ src/firedrake/firedrake/utils.py", line 62, in wrapper return f(*args, **kwargs) File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/ src/firedrake/firedrake/assemble.py", line 192, in _assemble kernels = tsfc_interface.compile_form(f, "form", parameters=form_compiler_parameters, inverse=inverse) File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/ src/firedrake/firedrake/tsfc_interface.py", line 193, in compile_form number_map).kernels File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/PyOP2/pyop2/caching.py", line 200, in __new__ obj = make_obj() File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/PyOP2/pyop2/caching.py", line 190, in make_obj obj.__init__(*args, **kwargs) File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/ src/firedrake/firedrake/tsfc_interface.py", line 121, in __init__ kernels.append(KernelInfo(kernel=Kernel(ast, ast.name, opts=opts), File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/PyOP2/pyop2/caching.py", line 200, in __new__ obj = make_obj() File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/PyOP2/pyop2/caching.py", line 190, in make_obj obj.__init__(*args, **kwargs) File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/PyOP2/pyop2/base.py", line 3843, in __init__ self._code = self._ast_to_c(self._ast, opts) File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/ src/PyOP2/pyop2/sequential.py", line 73, in _ast_to_c ast_handler.plan_cpu(self._opts) File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/COFFEE/coffee/plan.py", line 121, in plan_cpu loop_opt.rewrite(rewrite) File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/ src/COFFEE/coffee/optimizer.py", line 117, in rewrite ew.sharing_graph_rewrite() File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/ src/COFFEE/coffee/rewriter.py", line 619, in sharing_graph_rewrite prob.solve(ilp.GLPK(msg=0)) File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/ lib/python2.7/site-packages/pulp/pulp.py", line 1651, in solve status = solver.actualSolve(self, **kwargs) File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/ lib/python2.7/site-packages/pulp/solvers.py", line 383, in actualSolve rc = subprocess.call(proc, stdout = pipe, stderr = pipe) File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/ lib/python2.7/site-packages/subprocess32.py", line 578, in call p = Popen(*popenargs, **kwargs) File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/ lib/python2.7/site-packages/subprocess32.py", line 825, in __init__ restore_signals, start_new_session) File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/ lib/python2.7/site-packages/subprocess32.py", line 1574, in _execute_child raise child_exception_type(errno_num, err_msg) OSError: [Errno 14] Bad address Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener