What is going on here? It looks like you are using subprocess. Why would you do that on a cluster rather than MPI?

  Matt

---------- Forwarded message ----------
From: Buesing, Henrik <HBuesing@eonerc.rwth-aachen.de>
Date: Thu, Mar 30, 2017 at 8:25 AM
Subject: AW: [petsc-maint] Weak scaling test: Fieldsplit questions
To: Matthew Knepley <knepley@gmail.com>
Cc: Barry Smith <bsmith@mcs.anl.gov>, Hong <hzhang@mcs.anl.gov>, "petsc-maint@mcs.anl.gov" <petsc-maint@mcs.anl.gov>


In my opinion, there is some kind of race condition in Firedrake when running on more than one node. Thus, until this is fixed it is very unlikely for me to get the 64 cores case running.


Hmm, we are running Firedrake in parallel with no problems here. What is the error?

 

[Buesing, Henrik] See [1] for the error message and the attached three logs (for the 32 core case this was a 2/5 running and 3/5 crashing ).

 

This is just for running the compiled code. During the compile stage I had problems, too. What I did is the following: 1) Run Firedrake on 1 node (this works). Now all the *.so files are in place. 2) Run Firedrake on more than one node. This crashes more often the more processes I use.  

 

I’m guessing for a race condition, because on 17 cores (1 node + 1 core) my problem runs fine. On 32 cores it sometimes runs. And on 64 cores it, up to now, never runs.

But if you are not having these problems, and if the provided code reproduces the MatCreateSubMats problem, then you can do tests on your own. Well, a lot of ifs, but better than nothing.

 

Thank you!
Henrik

 

[1]

 

Traceback (most recent call last):

  File "/work/hb111949/Firedrake/twophase/2pDrake/2pinjection.py", line 228, in <module>

    solver = NonlinearVariationalSolver(problem,options_prefix="")

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/firedrake/firedrake/variational_solver.py", line 156, in __init__

    pre_function_callback=pre_f_callback)

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/firedrake/firedrake/solving_utils.py", line 260, in __init__

    form_compiler_parameters=fcp)

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/firedrake/firedrake/assemble.py", line 143, in create_assembly_callable

    collect_loops=True)

  File "<decorator-gen-279>", line 2, in _assemble

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/firedrake/firedrake/utils.py", line 62, in wrapper

    return f(*args, **kwargs)

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/firedrake/firedrake/assemble.py", line 192, in _assemble

    kernels = tsfc_interface.compile_form(f, "form", parameters=form_compiler_parameters, inverse=inverse)

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/firedrake/firedrake/tsfc_interface.py", line 193, in compile_form

    number_map).kernels

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/PyOP2/pyop2/caching.py", line 200, in __new__

    obj = make_obj()

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/PyOP2/pyop2/caching.py", line 190, in make_obj

    obj.__init__(*args, **kwargs)

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/firedrake/firedrake/tsfc_interface.py", line 121, in __init__

    kernels.append(KernelInfo(kernel=Kernel(ast, ast.name, opts=opts),

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/PyOP2/pyop2/caching.py", line 200, in __new__

    obj = make_obj()

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/PyOP2/pyop2/caching.py", line 190, in make_obj

    obj.__init__(*args, **kwargs)

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/PyOP2/pyop2/base.py", line 3843, in __init__

    self._code = self._ast_to_c(self._ast, opts)

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/PyOP2/pyop2/sequential.py", line 73, in _ast_to_c

    ast_handler.plan_cpu(self._opts)

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/COFFEE/coffee/plan.py", line 121, in plan_cpu

    loop_opt.rewrite(rewrite)

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/COFFEE/coffee/optimizer.py", line 117, in rewrite

    ew.sharing_graph_rewrite()

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/src/COFFEE/coffee/rewriter.py", line 619, in sharing_graph_rewrite

    prob.solve(ilp.GLPK(msg=0))

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/lib/python2.7/site-packages/pulp/pulp.py", line 1651, in solve

    status = solver.actualSolve(self, **kwargs)

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/lib/python2.7/site-packages/pulp/solvers.py", line 383, in actualSolve

    rc = subprocess.call(proc, stdout = pipe, stderr = pipe)

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/lib/python2.7/site-packages/subprocess32.py", line 578, in call

    p = Popen(*popenargs, **kwargs)

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/lib/python2.7/site-packages/subprocess32.py", line 825, in __init__

    restore_signals, start_new_session)

  File "/rwthfs/rz/cluster/work/hb111949/Firedrake/firedrake/lib/python2.7/site-packages/subprocess32.py", line 1574, in _execute_child

    raise child_exception_type(errno_num, err_msg)

OSError: [Errno 14] Bad address

  Thanks,

 

    Matt

 

--

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener




--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener