Hi all, I've returned to this task, and tried to follow David's suggestion of building on a clean conda environment. This time, I used the intel compiler (mpiicc). Trying to follow the steps in the install script, we cloned firedrake, ufl, fiat, PyOP2, COFFEE, loopy, petsc, supermesh, spatialindex, etc. Then we can successfully install all these. In particular, we can configure/make/make install petsc Finally, we run, in firedrake, python setup.py install with LDSHARED, CFLAGS, CC, CXX, and prefix set appropriately. This **seems** to work OK, and culminates in the output running install_egg_info Writing /MYOB/.conda/envs/firedrake/lib/python3.6/site-packages /firedrake-0.13.0_2890.g8f880fd3-py3.6.egg-info But I am at a loss as to how to complete the installation. (My fuzzy understanding of conda and venvs does not help). That is, if I try echo "import firedrake" | python I get Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/MYOB/r/libs/firedrake/firedrake/__init__.py", line 5, in <module> if "PETSC_DIR" in os.environ and not config["options"]["honour_petsc_dir"]: TypeError: 'NoneType' object is not subscriptable Unsetting PETSC_DIR does not help (note change in error): Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/MYOB/r/libs/firedrake/firedrake/__init__.py", line 8, in <module> elif "PETSC_DIR" not in os.environ and config["options"]["honour_petsc_dir"]: TypeError: 'NoneType' object is not subscriptable I note that, the "activate" script is not in firedrake/bin Neither, as it happens, is firedrake-update, but firedrake-clean, firedrake-install, and firedrake-zenodo are. Suggestions most welcome... Niall. -- ________________________________ From: Ham, David A <david.ham@imperial.ac.uk> Sent: Tuesday 6 August 2019 11:33 To: Sagiyama, Koki <k.sagiyama@imperial.ac.uk>; Madden, Niall <niall.madden@nuigalway.ie>; Lawrence Mitchell <wence@gmx.li> Cc: firedrake <firedrake@imperial.ac.uk> Subject: Re: [firedrake] Installing firedrake on a HPC system without package manager To be a bit more expansive, mpi4py is failing to build because it can’t find the symbols that koki points to: building 'mpi4py.dl' extension mpicc -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DHAVE_DLFCN_H=1 -I/ichec/home/users/whoever/r/libs/firedrake/include -I/ichec/packages/conda/2/envs/python3/include/python3.7m -c src/dynload.c -o build/temp.linux-x86_64-3.7/src/dynload.o gcc -pthread -shared -B /ichec/packages/conda/2/envs/python3/compiler_compat -L/ichec/packages/conda/2/envs/python3/lib -Wl,-rpath=/ichec/packages/conda/2/envs/python3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/src/dynload.o -Lbuild/temp.linux-x86_64-3.7 -o build/lib.linux-x86_64-3.7/mpi4py/dl.cpython-37m-x86_64-linux-gnu.so checking for MPI compile and link ... /ichec/packages/openmpi/gcc/3.1.2/bin/mpicc -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/ichec/home/users/whoever/r/libs/firedrake/include -I/ichec/packages/conda/2/envs/python3/include/python3.7m -c _configtest.c -o _configtest.o success! removing: _configtest.c _configtest.o /ichec/packages/openmpi/gcc/3.1.2/bin/mpicc -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/ichec/home/users/whoever/r/libs/firedrake/include -I/ichec/packages/conda/2/envs/python3/include/python3.7m -c _configtest.c -o _configtest.o /ichec/packages/openmpi/gcc/3.1.2/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.7 -o _configtest /ichec/packages/libfabric/1.7.1/lib/libfabric.so.1: undefined reference to `psm2_mq_ipeek_dequeue_multi@PSM2_1.0' /ichec/packages/libfabric/1.7.1/lib/libfabric.so.1: undefined reference to `psm2_info_query@PSM2_1.0' collect2: error: ld returned 1 exit status failure. removing: _configtest.c _configtest.o error: Cannot link MPI programs. Check your configuration!!! You could try building mpi4py in a clean venv. If this fails in the same way then it will produce a relatively small problem which you can take to the cluster admins. Regards, David From: "Sagiyama, Koki" <k.sagiyama@imperial.ac.uk> Date: Tuesday, 6 August 2019 at 10:31 To: "Madden, Niall" <niall.madden@nuigalway.ie>, "Ham, David A" <david.ham@imperial.ac.uk>, Lawrence Mitchell <wence@gmx.li> Cc: firedrake <firedrake@imperial.ac.uk> Subject: Re: [firedrake] Installing firedrake on a HPC system without package manager Hi Niall, So it seems the linker is unable to find function definitions that is most likely in (based on your ldd result): libpsm2.so.2 => /lib64/libpsm2.so.2 (0x00007fab642dd000) I wonder if you need additional packages. Just wondering, does this file exist? Thanks, Koki ________________________________ From: Madden, Niall <niall.madden@nuigalway.ie> Sent: Friday, August 2, 2019 11:52:35 PM To: Ham, David A <david.ham@imperial.ac.uk>; Lawrence Mitchell <wence@gmx.li>; Sagiyama, Koki <k.sagiyama@imperial.ac.uk> Cc: firedrake <firedrake@imperial.ac.uk> Subject: Re: [firedrake] Installing firedrake on a HPC system without package manager Hi David, Thanks for the reply. Emboldened by your suggestion that Firedrake and conda might not be so orthogonal after all, I tried that, first rebuilding PETSc (just to be sure). Right now, building Python seems a little daunting. Any way, firedrake-install still failed in the same place, though this time without the complaint about Python.h I attach the latest log file. Again, suggestions welcome. Have a good weekend. Niall. ________________________________ From: Ham, David A <david.ham@imperial.ac.uk> Sent: Friday 2 August 2019 12:12 To: Madden, Niall <niall.madden@nuigalway.ie>; Lawrence Mitchell <wence@gmx.li>; Sagiyama, Koki <k.sagiyama@imperial.ac.uk> Cc: firedrake <firedrake@imperial.ac.uk> Subject: Re: [firedrake] Installing firedrake on a HPC system without package manager Dear all, Currently it would be safer to say that anaconda is rather untested. I am aware that anaconda venv support has come on and Firedrake has been known to successfully build on anaconda, it’s just a very untested route. You could try to do that. Alternatively, some people who use Firedrake on supercomputers just build Python from source. That’s a fairly straightforward thing to do. For example the script that is used to build Python for Firedrake on ARCHER (the UK national supercomputer) is at: https://github.com/firedrakeproject/firedrake-archer/blob/master/build_pytho... [Image removed by sender.]<https://github.com/firedrakeproject/firedrake-archer/blob/master/build_python3.7_archer.sh> firedrakeproject/firedrake-archer<https://github.com/firedrakeproject/firedrake-archer/blob/master/build_python3.7_archer.sh> github.com Scripts to build and run Firedrake on Archer, the UK national supercomputer. - firedrakeproject/firedrake-archer Regards, David From: <firedrake-bounces@imperial.ac.uk> on behalf of "Madden, Niall" <niall.madden@nuigalway.ie> Date: Friday, 2 August 2019 at 12:06 To: Lawrence Mitchell <wence@gmx.li>, "Sagiyama, Koki" <k.sagiyama@imperial.ac.uk> Cc: firedrake <firedrake@imperial.ac.uk> Subject: Re: [firedrake] Installing firedrake on a HPC system without package manager Hi Koki, Lawrence Many thanks for getting back to me.
You want to have an appropriate package loaded.
If you have the `module` command available,
Yes, I used the Modules package, but python-dev is not avail(able) (because... conda)
So it would appear that /usr/include/python3.6m doesn't include
the Python header files, can you confirm?
Correct. That dir contains only pyconfig-64.h Instead I would have to load the conda module, and activate python3. Then /.../packages/conda/2/envs/python3/include/python3.7m/ does indeed have Python.h, and another 100 header files. So, since the docs say "The installation script does not work with anaconda<https://www.continuum.io/downloads> based python installations. This is due to venv issues in anaconda", I am in somewhat of a bind.
So I wonder if there is a module that should be loaded but isn't. What does: ldd /ichec/packages/libfabric/1.7.1/lib/libfabric.so.1 return.
linux-vdso.so.1 => (0x00007ffd5b8f3000) librdmacm.so.1 => /lib64/librdmacm.so.1 (0x00007fab64757000) libibverbs.so.1 => /lib64/libibverbs.so.1 (0x00007fab64540000) libpsm2.so.2 => /lib64/libpsm2.so.2 (0x00007fab642dd000) librt.so.1 => /lib64/librt.so.1 (0x00007fab640d5000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fab63eb9000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fab63cb5000) libc.so.6 => /lib64/libc.so.6 (0x00007fab638e8000) /lib64/ld-linux-x86-64.so.2 (0x00007fab64c82000) libnl-route-3.so.200 => /lib64/libnl-route-3.so.200 (0x00007fab6367b000) libnl-3.so.200 => /lib64/libnl-3.so.200 (0x00007fab6345a000) libnuma.so.1 => /lib64/libnuma.so.1 (0x00007fab6324e000) libm.so.6 => /lib64/libm.so.6 (0x00007fab62f4c000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fab62d36000)
And also /ichec/packages/openmpi/gcc/3.1.2/bin/mpicc -show gcc -I/ichec/packages/openmpi/gcc/3.1.2/include -fexceptions -pthread -L/ichec/packages/libfabric/1.7.1/lib -L/usr/lib64 -Wl,-rpath -Wl,/ichec/packages/libfabric/1.7.1/lib -Wl,-rpath -Wl,/usr/lib64 -Wl,-rpath -Wl,/ichec/packages/openmpi/gcc/3.1.2/lib -Wl,--enable-new-dtags -L/ichec/packages/openmpi/gcc/3.1.2/lib -lmpi
We can also see that you have openmpi loaded (`module list` will show you openmpi),
but it is known to cause some issues, so you probably want to unload openmpi and load mpich
This is a bit of red herring I think. If the ICHEC machine suggests using openmpi,
then I think that is fine.
Good. mpich is not available. Though I could use the Intel compiler, instead of gcc, if there were preferable. Thanks for the all the (continuing) help guys. Niall. ________________________________ From: Lawrence Mitchell <wence@gmx.li> Sent: Friday 2 August 2019 10:52 To: Sagiyama, Koki <k.sagiyama@imperial.ac.uk> Cc: Madden, Niall <niall.madden@nuigalway.ie>; firedrake <firedrake@imperial.ac.uk> Subject: Re: [firedrake] Installing firedrake on a HPC system without package manager Hi Niall, Koki,
On 2 Aug 2019, at 10:39, Sagiyama, Koki <k.sagiyama@imperial.ac.uk> wrote:
Dear Niall,
It seems to me the following line is critical:
src/dynload.c:5:10: fatal error: Python.h: No such file or directory
You want to have an appropriate package loaded. If you have the `module` command available, you could try `module search python3-dev` or (`module search python-dev`), followed by `module load package_name`.
This certainly looks like an issue. The relevant compile line is: mpicc -Wno-unused-result -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -DHAVE_DLFCN_H=1 -I/MYOB/r/libs/firedrake/include -I/usr/include/python3.6m -c src/dynload.c -o build/temp.linux-x86_64-3.6/src/dynload.o src/dynload.c:5:10: fatal error: Python.h: No such file or directory So it would appear that /usr/include/python3.6m doesn't include the Python header files, can you confirm? In addition to this there is also a link error: /ichec/packages/openmpi/gcc/3.1.2/bin/mpicc -Wno-unused-result -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/MYOB/r/libs/firedrake/include -I/usr/include/python3.6m -c _configtest.c -o _configtest.o /ichec/packages/openmpi/gcc/3.1.2/bin/mpicc _configtest.o -L/usr/lib64 -Lbuild/temp.linux-x86_64-3.6 -o _configtest /ichec/packages/libfabric/1.7.1/lib/libfabric.so.1: undefined reference to `psm2_mq_ipeek_dequeue_multi@PSM2_1.0' /ichec/packages/libfabric/1.7.1/lib/libfabric.so.1: undefined reference to `psm2_info_query@PSM2_1.0' collect2: error: ld returned 1 exit status failure. removing: _configtest.c _configtest.o error: Cannot link MPI programs. Check your configuration!!! So I wonder if there is a module that should be loaded but isn't. What does: ldd /ichec/packages/libfabric/1.7.1/lib/libfabric.so.1 return. And also /ichec/packages/openmpi/gcc/3.1.2/bin/mpicc -show
We can also see that you have openmpi loaded (`module list` will show you openmpi), but it is known to cause some issues, so you probably want to unload openmpi and load mpich
This is a bit of red herring I think. If the ICHEC machine suggests using openmpi, then I think that is fine.
Though it is not directly related to the error you are having right now, petsc usually requires some additional parameters when configuring on clusters (--with-batch --known-mpi-shared-libraries=0) (https://www.mcs.anl.gov/petsc/documentation/installation.html ), which we don't see in your previous email.
This is dependent on whether or not the compilation nodes can execute MPI programs, which the ICHEC machine seemingly allows. Cheers, Lawrence