Re: [Nektar-users] Error when compiling Nektar with Blas and Lapack

2 Dec 2014

      Dear Dr. Cantwell,

I installed Nektar to the cluster by switching NEKTAR_USE_ACML option 
ON. There are 92 nodes and 24 processors on each node. When I use 1 and 
2 processors on 1 node the analysis run. However, when I increase the 
processor number to 4 on 1 node I'm getting the error.

Regards,
Kamil

02.12.2014 23:43 tarihinde, Chris Cantwell yazdı:
...
Dear Kamil,
Your problem sounds to be specific to the cluster you are using, or to 
the use of ACML.
Do you use ACML on your workstation where KovaFlow_m8.xml ran 
successfully using mpirun? How many cores was this on, and how many 
were you using on the cluster?
We will need to see a backtrace at the point when the segmentation 
fault occurs to be able to diagnose what is going wrong and help 
further. How you do this will depend on what debugging software is 
available on your cluster. Your system administrator should be able to 
help you with this.
Cheers,
Chris
On 02/12/14 21:27, Kamil ÖZDEN wrote:
...
Dear Dr. Cantwell,
Latest situation about the Nektar installation to the cluster with ACML
is that KovaFlow_m8.xml analysis is running with 1 and 2 processors but
when I try to run it with 4 processors I'm getting the Segmentation
fault error.
Regards,
Kamil
02.12.2014 14:58 tarihinde, Kamil Ozden yazdı:
...
Dear Dr. Cantwell,
As an additional information I want to state that KovaFlow_m8.xml
analysis is running from the command line by using mpirun command but
not running when submitted to the cluster by using a script and giving
the error below.
/*mpirun noticed that process rank 2 with PID 32190 on node
mercan155.yonetim exited on signal 11 (Segmentation fault).*/
Is there any option that is to be changed in Nektar configuration to
run the analysis in the cluster?
NOTE: I have used both mpirun and mpiexec commands in the script but
I've taken the same error. If you want I can also send the script to 
you.
Regards,
Kamil
On 01-12-2014 23:26, Kamil ÖZDEN wrote:
...
Dear Dr. Cantwell,
I try to run the test file of Nektar++ KovaFlow_m8.xml via script
file and got the same Segmentation Fault error.
Then I copied the same file to the directory
/*nektar++-4.0.0/build/solvers/IncNavierStokesSolver/*//**/and tried
to run from the command line by typing the command
/*./IncNavierStokesSolver KovaFlow_m8.xml*/
but I got the following error
/*./IncNavierStokesSolver: error while loading shared libraries:
libacml_mv.so: cannot open shared object file: No such file or
directory*/
Regards,
Kamil
01.12.2014 22:42 tarihinde, Chris Cantwell yazdı:
...
Dear Kamil,
The first error is simply that more memory was needed than the
amount you allocated to the job (as you probably realised). The
second error is a segmentation fault.
Can you reproduce the problem using a (much) smaller job?
Cheers,
Chris
On 30/11/14 21:41, Kamil ÖZDEN wrote:
...
Dear Dr. Cantwell,
Thanks for your help. I'll try this and inform you about the result.
Meanwhile I made another installation with ACML on the same cluster
with
the following ACML and MPI configuration
****************
/* ACML
/truba/sw/centos6.4/lib/acml/4.4.0/gfortran64/lib/libacml.so *//*
*//* ACML_INCLUDE_PATH
/truba/sw/centos6.4/lib/acml/4.4.0/gfortran64/include *//*
*//* ACML_SEARCH_PATHS
/truba/sw/centos6.4/lib/acml/4.4.0/gfortran64/include *//*
*//* ACML_USE_OPENMP_LIBRARIES OFF *//*
*//* ACML_USE_SHARED_LIBRARIES        ON */
**********************
/*MPIEXEC /usr/mpi/gcc/openmpi-1.6.5/bin/mpiexec *//*
*//* MPIEXEC_MAX_NUMPROCS 2 *//*
*//* MPIEXEC_NUMPROC_FLAG -np *//*
*//* MPIEXEC_POSTFLAGS *//*
*//* MPIEXEC_PREFLAGS *//*
*//* MPI_CXX_COMPILER /usr/mpi/gcc/openmpi-1.6.5/bin/mpicxx *//*
*//* MPI_CXX_COMPILE_FLAGS *//*
*//* MPI_CXX_INCLUDE_PATH /usr/mpi/gcc/openmpi-1.6.5/include *//*
*//* MPI_CXX_LIBRARIES
/usr/mpi/gcc/openmpi-1.6.5/lib64/libmpi_cxx.so;/usr/mpi/gcc/openmpi-1.6.5/lib64/libmpi.so;/usr/lib64/libdl.so;/usr/lib64/libm.so;/usr/lib64/librt.so;/usr/lib64/libnsl.so;/usr/lib64/libutil.so;/usr/lib64/libm.so;/usr/lib64/libdl.so
*//*
*//* MPI_CXX_LINK_FLAGS -Wl,--export-dynamic *//*
*//* MPI_C_COMPILER /usr/mpi/gcc/openmpi-1.6.5/bin/mpicc *//*
*//* MPI_C_COMPILE_FLAGS *//*
*//* MPI_C_INCLUDE_PATH /usr/mpi/gcc/openmpi-1.6.5/include *//*
*//* MPI_C_LIBRARIES
/usr/mpi/gcc/openmpi-1.6.5/lib64/libmpi.so;/usr/lib64/libdl.so;/usr/lib64/libm.so;/usr/lib64/librt.so;/usr/lib64/libnsl.so;/usr/lib64/libutil.so;/usr/lib64/libm.so;/usr/lib64/libdl.so
*//*
*//* MPI_C_LINK_FLAGS -Wl,--export-dynamic *//*
*//* MPI_EXTRA_LIBRARY
/usr/mpi/gcc/openmpi-1.6.5/lib64/libmpi.so;/usr/lib64/libdl.so;/usr/lib64/libm.so;/usr/lib64/librt.so;/usr/lib64/libnsl.so;/usr/lib64/libutil.so;/usr/lib64/libm.so;/usr/lib64/libdl.so
*//*
*//* MPI_LIBRARY /usr/mpi/gcc/openmpi-1.6.5/lib64/libmpi_cxx.so
***********************
*/Nektar seems to be installed successfully. However when I try to
submit a job by using mpirun command with a script to the AMD
processors
of cluster (cluster uses SLURM resource manager) I face with such
an issue.
When I tried to run with 4 processors.Initial conditons are read and
first .chk directory is started to write as seen below:
/*=======================================================================*/
/**/
/*EquationType: UnsteadyNavierStokes*/
/**/
/*Session Name: Re_1_v2_N6*/
/**/
/*Spatial Dim.: 3*/
/**/
/*Max SEM Exp. Order: 7*/
/**/
/*Expansion Dim.: 3*/
/**/
/*Projection Type: Continuous Galerkin*/
/**/
/*Advection: explicit*/
/**/
/*Diffusion: explicit*/
/**/
/*Time Step: 0.01*/
/**/
/*No. of Steps: 300*/
/**/
/*Checkpoints (steps): 30*/
/**/
/*Integration Type: IMEXOrder1*/
/**/
/*=======================================================================*/
/**/
/*Initial Conditions:*/
/**/
/*- Field u: 0*/
/**/
/*- Field v: 0*/
/**/
/*- Field w: 0.15625*/
/**/
/*- Field p: 0*/
/**/
/*Writing: Re_1_v2_N6_0.chk
*/
/**/
/**/But after that the analysis is ended by giving the error below:
/*Warning: Conflicting CPU frequencies detected, using: 
2300.000000.*/
/**/
/*Warning: Conflicting CPU frequencies detected, using: 
2300.000000.*/
/**/
/*Warning: Conflicting CPU frequencies detected, using: 
2300.000000.*/
/**/
/*Warning: Conflicting CPU frequencies detected, using: 
2300.000000.*/
/**/
/*slurmd[mercan115]: Job 405433 exceeded memory limit (22245156 >
20480000), being killed*/
/**/
/*slurmd[mercan115]: Exceeded job memory limit*/
/**/
/*slurmd[mercan115]: *** JOB 405433 CANCELLED AT
2014-11-30T23:15:28 ****/
However when I try to run the analysis with 8 processors, the 
analysis
directly ends by giving the error below:
/*Warning: Conflicting CPU frequencies detected, using: 
2300.000000.*/
/**/
/*Warning: Conflicting CPU frequencies detected, using: 
2300.000000.*/
/**/
/*Warning: Conflicting CPU frequencies detected, using: 
2300.000000.*/
/**/
/*Warning: Conflicting CPU frequencies detected, using: 
2300.000000.*/
/**/
/*Warning: Conflicting CPU frequencies detected, using: 
2300.000000.*/
/**/
/*Warning: Conflicting CPU frequencies detected, using: 
2300.000000.*/
/**/
/*Warning: Conflicting CPU frequencies detected, using: 
2300.000000.*/
/**/
/*Warning: Conflicting CPU frequencies detected, using: 
2300.000000.*/
/**/
/*--------------------------------------------------------------------------*/
/**/
/*mpirun noticed that process rank 2 with PID 24004 on node
mercan146.yonetim exited on signal 11 (Segmentation fault).*/
What may be the reason for this problem?
Regards,
Kamil
/**/
30.11.2014 13:08 tarihinde, Chris Cantwell yazdı:
> Dear Kamil,
>
> This still seems to suggest that the version in your home
> directory is
> not compiled with -fPIC.
>
> Try deleting all library files (*.a) and all compiled object code
> (*.o) from within the LAPACK source tree and try compiling from 
> fresh
> again. Also note that you need to add the -fPIC flag to both the 
> OPTS
> and NOOPT variables in your LAPACK make.inc file (which 
> presumably is
> what your system administrator altered).
>
> Cheers,
> Chris
>