On 13 Aug 2015, at 19:53, Justin Chang <jychang48@gmail.com> wrote:
Lawrence,
When I compile everything with MPICH-3.1.4 on this machine, I get no complains whatsoever. It only happens when I use OpenMPI. I don't like the default binding options (or lack thereof) for MPICH and would prefer to use OpenMPI. Could this have something to do with the compilers that I am using? And/or how I am configuring openmpi and/or python?
I don't think it's to do with how you're configuring openmpi. It's rather that the infiniband support is "known bad" when forking, see this OpenMPI FAQ: https://www.open-mpi.org/faq/?category=openfabrics#ofa-fork Our use of fork falls into the "calling system() or popen()" case, so plausibly you might be able to turn off that warning and continue. However, I recall you saying that your code just hangs when you do this, so maybe that's no good.
I could try this out on another HPC system i have access to (Intel Xeon E5-2670) to see if I can reproduce the problem, but this other machine has a firewall and makes the installation process even more troublesome...
I think we have infiniband-based clusters here, so hopefully we can reproduce at this end. There do appear to be some issues with robustness on these kind of systems though, so I'm definitely keen to fix things. Lawrence