Hi Bhavesh, The PETSc options look about right (and if the tests pass then that’s a good indicator that it is right). libpython3.7m.so.1.0 is the main library which makes Python work, so this looks like your cluster is failing to mpiexec python. To test this: 1. WITHOUT activating the firedrake venv, get mpi to run a trivial python “hello world” program. If that works: 1. Repeat the exercise with the VENV active. If one of those fails then the issue is that there is something about the MPI on your cluster or the set of directories that the compute nodes can see which is preventing MPI from running Python. If the issue is what the compute nodes can see then you won’t even be able to run a python job in serial on a compute node. Let us know how you get on with that. Regards, David -- Dr David Ham Department of Mathematics Imperial College London https://www.imperial.ac.uk/people/david.ham From: <firedrake-bounces@imperial.ac.uk> on behalf of "Shrimali, Bhavesh" <bshrima2@illinois.edu> Date: Sunday, 27 January 2019 at 22:42 To: firedrake <firedrake@imperial.ac.uk> Subject: Re: [firedrake] Firedrake Installation on Cluster Hello David, With the help of some people at our Campus Cluster, and following your suggestions, I was able to install Firedrake successfully (letting it build it's own PETSc) with the following configure options . export PETSC_CONFIGURE_OPTIONS="--download-eigen=/projects/meca/shared_resources/firedrake/src/eigen-3.3.3.tgz --download-fblaslapack --with-shared-libraries=1 --with-fortran-bindings=0 --download-chaco --download-metis --download-parmetis --download-scalapack --download-hypre --download-mumps --with-zlib --download-netcdf --download-hdf5 --download-pnetcdf --download-exodusii" Does this seem right ? I tried running the tests, and several examples that I had from FEniCS and with some changes to the code, all of them run pretty much successfully. However when I try to use MPI in order to run the calculations in parallel I get the following error... python3: error while loading shared libraries: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory python3: error while loading shared libraries: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory Would you happen to guess what could be the problem here, or could point out to a resource for debugging this ? Thanks for the help. Bhavesh ________________________________ From: Shrimali, Bhavesh Sent: Thursday, January 24, 2019 11:43 AM To: firedrake Subject: RE: [firedrake] Firedrake Installation on Cluster Hello David, Thanks for the prompt response. I really appreciate your help on this. Since the cluster does not let firedrake build it's own PETSc with the correct configuration I decided to do the following: 1. Build a docker image (docker://bhaveshshrimali/firedrakestable2) with firedrake installed (and hopefully stable). I tested the installed firedrake using "make alltest".. It does seem to pass a lot of them (if I understand correctly, F stands for a failed test, right?) with some exceptions. However when it tests regression/octahedral_hemisphere.py, it seems to fail here (it does warn that 'cannot create build.log': Permission denied): tests/regression/test_octahedral_hemisphere.py ....Makefile:60: recipe for target 'test' failed make: *** [test] Killed Is there a way to debug what happened here. Note that up until this point everything is on my local machine. 2. Since there is no direct way to run the image using Singularity on our cluster, I first built a singularity image by pulling the above image from DockerHub on my local machine and then just copied the '.simg' file to a folder on Cluster. There seem to be some issues running the file there (I do not understand if it is solely due to file/folder permissions) ... (firedrake) ........... > make alltest Building extension modules /bin/sh: 1: cannot create build.log: Permission denied running build_ext skipping 'firedrake/dmplex.c' Cython extension (up-to-date) skipping 'firedrake/extrusion_numbering.c' Cython extension (up-to-date) skipping 'firedrake/hdf5interface.c' Cython extension (up-to-date) skipping 'firedrake/spatialindex.c' Cython extension (up-to-date) skipping 'firedrake/mg/impl.c' Cython extension (up-to-date) Linting firedrake codebase Linting firedrake test suite Linting firedrake scripts Running all regression tests -------------------------------------------------------------------------- A process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: [[0,1],0] (PID 756) If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- ======================================================== test session starts ======================================================== platform linux -- Python 3.6.7, pytest-3.10.0, py-1.7.0, pluggy-0.8.0 benchmark: 3.1.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) rootdir: /firedrake/src/firedrake, inifile: setup.cfg plugins: xdist-1.24.0, forked-0.2, benchmark-3.1.1 collected 4442 items tests/test_0init.py ... [ 0%] tests/test_tsfc_interface.py .......... [ 0%] tests/benchmarks/test_assembly_overheads.py FFFFFFFFFFFFFFFFFFFFFFFFFF [ 0%] tests/benchmarks/test_solver_overheads.py FFFFFF.FFF.F [ 1%] tests/demos/test_demos_run.py FKilled 3. Is there an exisiting singularity image with firedrake installed on SingularityHub to your knowledge ? If so, I can directly try doing singularity pull shub://... ? 4. I also tried changing the file/folder permissions for different folders on the firedrake directory, and tried running the Helmholtz example through command line, but it fails too... Lastly I tried updating firedrake inside the singularity container, but that failed. I am attaching the firedrake-update.log file here. Please let me know if anthing else is needed Any help or references on debugging through the above will be appreciated. Thanks again Bhavesh ________________________________ From: Ham, David A [david.ham@imperial.ac.uk] Sent: Sunday, January 20, 2019 5:14 AM To: Shrimali, Bhavesh; firedrake Subject: Re: [firedrake] Firedrake Installation on Cluster Dear Bhavesh, There are a few things going on here that are problematic. The first is around PETSc. While it is often a good idea on a cluster to build PETSc separately from Firedrake, the PETSc installed on the cluster is very unlikely to have the right configuration settings for us, so you do need to build your own. You probably want to use the Firedrake PETSc version (https://github.com/firedrakeproject/petsc) which is known to be compatible with the petsc4py used in Firedrake. If building PETSc yourself then you need to specify the right configure options, which you can obtain by running: python3 firedrake-install --show-petsc-configure-options Alternatively you could just let firedrake-install build PETSc for you. This might be easier. See also the comments below: From: <firedrake-bounces@imperial.ac.uk> on behalf of "Shrimali, Bhavesh" <bshrima2@illinois.edu> Date: Friday, 18 January 2019 at 19:31 To: firedrake <firedrake@imperial.ac.uk> Subject: [firedrake] Firedrake Installation on Cluster Hello, I have been trying to install firedrake on our campus cluster, however I’ve had little success so far. I am not from a CS background so mostly have been googling stuff and trying different things. I am trying to make use of the modules available on our cluster, and use the PETSc already installed on the cluster. I assume the following environment variables were advised by the cluster admins. If they were obtained from the internet they are almost certainly wrong: export I_MPI_CC=gcc export I_MPI_CXX=g++ export I_MPI_F90=gfortran export I_MPI_FC="$I_MPI_F90" export I_MPI_F77="$I_MPI_F90" These export commands don’t do what you think. Each one is overwriting the previous one, they don’t accumulate. This is also not the correct set of PETSc configure options for Firedrake, for that please see the command given above. Finally, setting petsc configure options doesn’t actually help unless you are building your own PETSc. export PETSC_CONFIGURE_OPTIONS=--download-fblaslapack export PETSC_CONFIGURE_OPTIONS=--download-scalapack export PETSC_CONFIGURE_OPTIONS=--download-mumps export PETSC_CONFIGURE_OPTIONS=--download-superlu_dist export PETSC_CONFIGURE_OPTIONS=--download-hypre curl -O https://raw.githubusercontent.com/firedrakeproject/firedrake/master/scripts/... You almost certainly do not want --honour-pythonpath here. It is almost always better to unset the PYTHONPATH environment variable. It is very unlikely that you actually want external packages to be included in the Firedrake Python venv. python3 firedrake-install --no-package-manager --honour-pythonpath --honour-petsc-dir This throws exception at the following point (Also attached separately as firedrakeErrortxt) Removing existing h5py installations Installing h5py/ Installing libspatialindex Cloning libspatialindex Failed to clone libspatialindex using ssh, falling back to https. Successfully cloned repository libspatialindex. Checking out branch master Successfully checked out branch master Installing petsc4py/ Traceback (most recent call last): File "firedrake-install", line 1145, in <module> install(p+"/") File "firedrake-install", line 642, in install run_pip_install(["--ignore-installed", package]) File "firedrake-install", line 600, in run_pip_install check_call(pipinstall + pipargs) File "firedrake-install", line 438, in check_call log.debug(subprocess.check_output(arguments, stderr=subprocess.STDOUT, env=env).decode()) File "/usr/local/python/3.7.0/lib/python3.7/subprocess.py", line 376, in check_output **kwargs).stdout File "/usr/local/python/3.7.0/lib/python3.7/subprocess.py", line 468, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['/projects/meca/bshrima2/packages/firedrake/bin/python', '-m', 'pip', 'install', '--no-binary', 'mpi4py', '--no-deps', '-vvv', '--ignore-installed', 'petsc4py/']' returned non-zero exit status 1. The complete log file is also attached along with this email. Please let me know if you can be of some help, as I have received little-to-no help from the people managing the cluster. Also let me know if this is due to an error on my side while running the installer. Thank you --Bhavesh Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10