Hi all,
I've returned to this task, and tried to follow David's suggestion
of building on a clean conda environment. This time, I used the intel compiler (mpiicc).

Trying to follow the steps in the install script, we cloned
firedrake, ufl, fiat, PyOP2, COFFEE, loopy, petsc, supermesh, spatialindex, etc.

Then we can successfully install all these. In particular, we can
configure/make/make install petsc

Finally, we run, in firedrake,
python setup.py install
with LDSHARED, CFLAGS, CC, CXX, and prefix set appropriately.

This **seems** to work OK, and culminates in the output
   running install_egg_info
   Writing /MYOB/.conda/envs/firedrake/lib/python3.6/site-packages  /firedrake-0.13.0_2890.g8f880fd3-py3.6.egg-info


But I am at a loss as to how to complete the installation. (My fuzzy understanding of conda and venvs does not help).

That is, if I try
echo "import firedrake" | python
I get
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/MYOB/r/libs/firedrake/firedrake/__init__.py", line 5, in <module>
    if "PETSC_DIR" in os.environ and not config["options"]["honour_petsc_dir"]:
TypeError: 'NoneType' object is not subscriptable

Unsetting PETSC_DIR does not help (note change in error):
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/MYOB/r/libs/firedrake/firedrake/__init__.py", line 8, in <module>
    elif "PETSC_DIR" not in os.environ and config["options"]["honour_petsc_dir"]:
TypeError: 'NoneType' object is not subscriptable

I note that, the "activate" script is not in firedrake/bin
Neither, as it happens, is  firedrake-update, but firedrake-clean, firedrake-install, and firedrake-zenodo are.

Suggestions most welcome...
Niall.
--



From: Ham, David A <david.ham@imperial.ac.uk>
Sent: Tuesday 6 August 2019 11:33
To: Sagiyama, Koki <k.sagiyama@imperial.ac.uk>; Madden, Niall <niall.madden@nuigalway.ie>; Lawrence Mitchell <wence@gmx.li>
Cc: firedrake <firedrake@imperial.ac.uk>
Subject: Re: [firedrake] Installing firedrake on a HPC system without package manager
 

To be a bit more expansive, mpi4py is failing to build because it can’t find the symbols that koki points to:

 

    building 'mpi4py.dl' extension

    mpicc -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DHAVE_DLFCN_H=1 -I/ichec/home/users/whoever/r/libs/firedrake/include -I/ichec/packages/conda/2/envs/python3/include/python3.7m -c src/dynload.c -o build/temp.linux-x86_64-3.7/src/dynload.o

    gcc -pthread -shared -B /ichec/packages/conda/2/envs/python3/compiler_compat -L/ichec/packages/conda/2/envs/python3/lib -Wl,-rpath=/ichec/packages/conda/2/envs/python3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/src/dynload.o -Lbuild/temp.linux-x86_64-3.7 -o build/lib.linux-x86_64-3.7/mpi4py/dl.cpython-37m-x86_64-linux-gnu.so

    checking for MPI compile and link ...

    /ichec/packages/openmpi/gcc/3.1.2/bin/mpicc -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/ichec/home/users/whoever/r/libs/firedrake/include -I/ichec/packages/conda/2/envs/python3/include/python3.7m -c _configtest.c -o _configtest.o

    success!

    removing: _configtest.c _configtest.o

    /ichec/packages/openmpi/gcc/3.1.2/bin/mpicc -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/ichec/home/users/whoever/r/libs/firedrake/include -I/ichec/packages/conda/2/envs/python3/include/python3.7m -c _configtest.c -o _configtest.o

    /ichec/packages/openmpi/gcc/3.1.2/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.7 -o _configtest

    /ichec/packages/libfabric/1.7.1/lib/libfabric.so.1: undefined reference to `psm2_mq_ipeek_dequeue_multi@PSM2_1.0'

    /ichec/packages/libfabric/1.7.1/lib/libfabric.so.1: undefined reference to `psm2_info_query@PSM2_1.0'

    collect2: error: ld returned 1 exit status

    failure.

    removing: _configtest.c _configtest.o

    error: Cannot link MPI programs. Check your configuration!!!

 

You could try building mpi4py in a clean venv. If this fails in the same way then it will produce a relatively small problem which you can take to the cluster admins.

 

Regards,

 

David

 

 

 

From: "Sagiyama, Koki" <k.sagiyama@imperial.ac.uk>
Date: Tuesday, 6 August 2019 at 10:31
To: "Madden, Niall" <niall.madden@nuigalway.ie>, "Ham, David A" <david.ham@imperial.ac.uk>, Lawrence Mitchell <wence@gmx.li>
Cc: firedrake <firedrake@imperial.ac.uk>
Subject: Re: [firedrake] Installing firedrake on a HPC system without package manager

 

Hi Niall,

 

So it seems the linker is unable to find function definitions that is most likely in (based on your ldd result):

libpsm2.so.2 => /lib64/libpsm2.so.2 (0x00007fab642dd000)

 

I wonder if you need additional packages. Just wondering, does this file exist?

 

Thanks,

Koki

 

 


From: Madden, Niall <niall.madden@nuigalway.ie>
Sent: Friday, August 2, 2019 11:52:35 PM
To: Ham, David A <david.ham@imperial.ac.uk>; Lawrence Mitchell <wence@gmx.li>; Sagiyama, Koki <k.sagiyama@imperial.ac.uk>
Cc: firedrake <firedrake@imperial.ac.uk>
Subject: Re: [firedrake] Installing firedrake on a HPC system without package manager

 

Hi David,

Thanks for the reply. Emboldened by your suggestion that Firedrake and conda might not be so orthogonal after all, I tried that, first rebuilding PETSc (just to be sure).

Right now, building Python seems a little daunting.

 

Any way,  firedrake-install still failed in the same place, though this time without the complaint about Python.h

I attach the latest log file.

Again, suggestions welcome.

 

Have a good weekend.

Niall.


From: Ham, David A <david.ham@imperial.ac.uk>
Sent: Friday 2 August 2019 12:12
To: Madden, Niall <niall.madden@nuigalway.ie>; Lawrence Mitchell <wence@gmx.li>; Sagiyama, Koki <k.sagiyama@imperial.ac.uk>
Cc: firedrake <firedrake@imperial.ac.uk>
Subject: Re: [firedrake] Installing firedrake on a HPC system without package manager

 

Dear all,

 

Currently it would be safer to say that anaconda is rather untested. I am aware that anaconda venv support has come on and Firedrake has been known to successfully build on anaconda, it’s just a very untested route. You could try to do that.

 

Alternatively, some people who use Firedrake on supercomputers just build Python from source. That’s a fairly straightforward thing to do. For example the script that is used to build Python for Firedrake on ARCHER (the UK national supercomputer) is at: https://github.com/firedrakeproject/firedrake-archer/blob/master/build_python3.7_archer.sh

Image removed by sender.

github.com

Scripts to build and run Firedrake on Archer, the UK national supercomputer. - firedrakeproject/firedrake-archer

 

Regards,

 

David

 

From: <firedrake-bounces@imperial.ac.uk> on behalf of "Madden, Niall" <niall.madden@nuigalway.ie>
Date: Friday, 2 August 2019 at 12:06
To: Lawrence Mitchell <wence@gmx.li>, "Sagiyama, Koki" <k.sagiyama@imperial.ac.uk>
Cc: firedrake <firedrake@imperial.ac.uk>
Subject: Re: [firedrake] Installing firedrake on a HPC system without package manager

 

Hi Koki, Lawrence

Many thanks for getting back to me.

 

> > You want to have an appropriate package loaded.

> >  If you have the `module` command available,

Yes, I used the Modules package, but python-dev is not avail(able) (because... conda)

 

> So it would appear that /usr/include/python3.6m doesn't include

> the Python header files, can you confirm?

Correct. That dir contains only pyconfig-64.h

 

Instead I would have to load the conda module, and activate python3.

Then /.../packages/conda/2/envs/python3/include/python3.7m/

does indeed have Python.h, and another 100 header files.

 

So, since the docs say "The installation script does not work with anaconda based python installations. This is due to venv issues in anaconda", I am in somewhat of a bind.

 

> So I wonder if there is a module that should be loaded but isn't. What does:
>      ldd /ichec/packages/libfabric/1.7.1/lib/libfabric.so.1
> return.

linux-vdso.so.1 =>  (0x00007ffd5b8f3000)

librdmacm.so.1 => /lib64/librdmacm.so.1 (0x00007fab64757000)

libibverbs.so.1 => /lib64/libibverbs.so.1 (0x00007fab64540000)

libpsm2.so.2 => /lib64/libpsm2.so.2 (0x00007fab642dd000)

librt.so.1 => /lib64/librt.so.1 (0x00007fab640d5000)

libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fab63eb9000)

libdl.so.2 => /lib64/libdl.so.2 (0x00007fab63cb5000)

libc.so.6 => /lib64/libc.so.6 (0x00007fab638e8000)

/lib64/ld-linux-x86-64.so.2 (0x00007fab64c82000)

libnl-route-3.so.200 => /lib64/libnl-route-3.so.200 (0x00007fab6367b000)

libnl-3.so.200 => /lib64/libnl-3.so.200 (0x00007fab6345a000)

libnuma.so.1 => /lib64/libnuma.so.1 (0x00007fab6324e000)

libm.so.6 => /lib64/libm.so.6 (0x00007fab62f4c000)

libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fab62d36000)

 

> And also
>   /ichec/packages/openmpi/gcc/3.1.2/bin/mpicc -show
gcc -I/ichec/packages/openmpi/gcc/3.1.2/include -fexceptions -pthread -L/ichec/packages/libfabric/1.7.1/lib -L/usr/lib64 -Wl,-rpath -Wl,/ichec/packages/libfabric/1.7.1/lib -Wl,-rpath -Wl,/usr/lib64 -Wl,-rpath -Wl,/ichec/packages/openmpi/gcc/3.1.2/lib -Wl,--enable-new-dtags -L/ichec/packages/openmpi/gcc/3.1.2/lib -lmpi

 

> > We can also see that you have openmpi loaded (`module list` will show you openmpi),

> > but it is known to cause some issues, so you probably want to unload openmpi and load mpich

> This is a bit of red herring I think. If the ICHEC machine suggests using openmpi,

> then I think that is fine.

Good. mpich is not available. Though I could use the Intel compiler, instead of gcc, if there were preferable.

 

Thanks for the all the (continuing) help guys.

 

Niall.


From: Lawrence Mitchell <wence@gmx.li>
Sent: Friday 2 August 2019 10:52
To: Sagiyama, Koki <k.sagiyama@imperial.ac.uk>
Cc: Madden, Niall <niall.madden@nuigalway.ie>; firedrake <firedrake@imperial.ac.uk>
Subject: Re: [firedrake] Installing firedrake on a HPC system without package manager

 

Hi Niall, Koki,

> On 2 Aug 2019, at 10:39, Sagiyama, Koki <k.sagiyama@imperial.ac.uk> wrote:
>
> Dear Niall,
>
> It seems to me the following line is critical:
>
> src/dynload.c:5:10: fatal error: Python.h: No such file or directory
>
> You want to have an appropriate package loaded.  If you have the `module` command available,
> you could try `module search python3-dev` or (`module search python-dev`), followed by `module load package_name`.

This certainly looks like an issue. The relevant compile line is:

   mpicc -Wno-unused-result -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -DHAVE_DLFCN_H=1 -I/MYOB/r/libs/firedrake/include -I/usr/include/python3.6m -c src/dynload.c -o build/temp.linux-x86_64-3.6/src/dynload.o
   src/dynload.c:5:10: fatal error: Python.h: No such file or directory

So it would appear that /usr/include/python3.6m doesn't include the Python header files, can you confirm?

In addition to this there is also a link error:

   /ichec/packages/openmpi/gcc/3.1.2/bin/mpicc -Wno-unused-result -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/MYOB/r/libs/firedrake/include -I/usr/include/python3.6m -c _configtest.c -o _configtest.o
   /ichec/packages/openmpi/gcc/3.1.2/bin/mpicc _configtest.o -L/usr/lib64 -Lbuild/temp.linux-x86_64-3.6 -o _configtest
   /ichec/packages/libfabric/1.7.1/lib/libfabric.so.1: undefined reference to `psm2_mq_ipeek_dequeue_multi@PSM2_1.0'
   /ichec/packages/libfabric/1.7.1/lib/libfabric.so.1: undefined reference to `psm2_info_query@PSM2_1.0'
   collect2: error: ld returned 1 exit status
   failure.
   removing: _configtest.c _configtest.o
   error: Cannot link MPI programs. Check your configuration!!!

So I wonder if there is a module that should be loaded but isn't. What does:

ldd /ichec/packages/libfabric/1.7.1/lib/libfabric.so.1

return.

And also

/ichec/packages/openmpi/gcc/3.1.2/bin/mpicc -show


> We can also see that you have openmpi loaded (`module list` will show you openmpi), but it is known to cause some issues, so you probably want to unload openmpi and load mpich

This is a bit of red herring I think. If the ICHEC machine suggests using openmpi, then I think that is fine.


> Though it is not directly related to the error you are having right now, petsc usually requires some additional parameters when configuring on clusters (--with-batch --known-mpi-shared-libraries=0)  (https://www.mcs.anl.gov/petsc/documentation/installation.html ),
> which we don't see in your previous email.

This is dependent on whether or not the compilation nodes can execute MPI programs, which the ICHEC machine seemingly allows.

Cheers,

Lawrence