Dear Dr. Cantwell,

As an additional information I want to state that KovaFlow_m8.xml analysis is running from the command line by using mpirun command but not running when submitted to the cluster by using a script and giving the error below.

mpirun noticed that process rank 2 with PID 32190 on node mercan155.yonetim exited on signal 11 (Segmentation fault).

Is there any option that is to be changed in Nektar configuration to run the analysis in the cluster?

NOTE: I have used both mpirun and mpiexec commands in the script but I've taken the same error. If you want I can also send the script to you.

Regards,
Kamil

On 01-12-2014 23:26, Kamil ÖZDEN wrote:
Dear Dr. Cantwell,

I try to run the test file of Nektar++ KovaFlow_m8.xml via script file and got the same Segmentation Fault error.

Then I copied the same file to the directory nektar++-4.0.0/build/solvers/IncNavierStokesSolver/ and tried to run from the command line by typing the command

./IncNavierStokesSolver KovaFlow_m8.xml

but I got the following error

./IncNavierStokesSolver: error while loading shared libraries: libacml_mv.so: cannot open shared object file: No such file or directory

Regards,
Kamil

01.12.2014 22:42 tarihinde, Chris Cantwell yazdı:
Dear Kamil,

The first error is simply that more memory was needed than the amount you allocated to the job (as you probably realised). The second error is a segmentation fault.

Can you reproduce the problem using a (much) smaller job?

Cheers,
Chris


On 30/11/14 21:41, Kamil ÖZDEN wrote:
Dear Dr. Cantwell,

Thanks for your help. I'll try this and inform you about the result.

Meanwhile I made another installation with ACML on the same cluster with
the following ACML and MPI configuration

****************
/* ACML /truba/sw/centos6.4/lib/acml/4.4.0/gfortran64/lib/libacml.so *//*
*//* ACML_INCLUDE_PATH
/truba/sw/centos6.4/lib/acml/4.4.0/gfortran64/include *//*
*//* ACML_SEARCH_PATHS
/truba/sw/centos6.4/lib/acml/4.4.0/gfortran64/include *//*
*//* ACML_USE_OPENMP_LIBRARIES OFF *//*
*//* ACML_USE_SHARED_LIBRARIES        ON */
**********************
/*MPIEXEC /usr/mpi/gcc/openmpi-1.6.5/bin/mpiexec *//*
*//* MPIEXEC_MAX_NUMPROCS 2 *//*
*//* MPIEXEC_NUMPROC_FLAG -np *//*
*//* MPIEXEC_POSTFLAGS *//*
*//* MPIEXEC_PREFLAGS *//*
*//* MPI_CXX_COMPILER /usr/mpi/gcc/openmpi-1.6.5/bin/mpicxx *//*
*//* MPI_CXX_COMPILE_FLAGS *//*
*//* MPI_CXX_INCLUDE_PATH /usr/mpi/gcc/openmpi-1.6.5/include *//*
*//* MPI_CXX_LIBRARIES
/usr/mpi/gcc/openmpi-1.6.5/lib64/libmpi_cxx.so;/usr/mpi/gcc/openmpi-1.6.5/lib64/libmpi.so;/usr/lib64/libdl.so;/usr/lib64/libm.so;/usr/lib64/librt.so;/usr/lib64/libnsl.so;/usr/lib64/libutil.so;/usr/lib64/libm.so;/usr/lib64/libdl.so
*//*
*//* MPI_CXX_LINK_FLAGS -Wl,--export-dynamic *//*
*//* MPI_C_COMPILER /usr/mpi/gcc/openmpi-1.6.5/bin/mpicc *//*
*//* MPI_C_COMPILE_FLAGS *//*
*//* MPI_C_INCLUDE_PATH /usr/mpi/gcc/openmpi-1.6.5/include *//*
*//* MPI_C_LIBRARIES
/usr/mpi/gcc/openmpi-1.6.5/lib64/libmpi.so;/usr/lib64/libdl.so;/usr/lib64/libm.so;/usr/lib64/librt.so;/usr/lib64/libnsl.so;/usr/lib64/libutil.so;/usr/lib64/libm.so;/usr/lib64/libdl.so
*//*
*//* MPI_C_LINK_FLAGS -Wl,--export-dynamic *//*
*//* MPI_EXTRA_LIBRARY
/usr/mpi/gcc/openmpi-1.6.5/lib64/libmpi.so;/usr/lib64/libdl.so;/usr/lib64/libm.so;/usr/lib64/librt.so;/usr/lib64/libnsl.so;/usr/lib64/libutil.so;/usr/lib64/libm.so;/usr/lib64/libdl.so
*//*
*//* MPI_LIBRARY /usr/mpi/gcc/openmpi-1.6.5/lib64/libmpi_cxx.so
***********************

*/Nektar seems to be installed successfully. However when I try to
submit a job by using mpirun command with a script to the AMD processors
of cluster (cluster uses SLURM resource manager) I face with such an issue.

When I tried to run with 4 processors.Initial conditons are read and
first .chk directory is started to write as seen below:

/*=======================================================================*/

/**/

/*EquationType: UnsteadyNavierStokes*/

/**/

/*Session Name: Re_1_v2_N6*/

/**/

/*Spatial Dim.: 3*/

/**/

/*Max SEM Exp. Order: 7*/

/**/

/*Expansion Dim.: 3*/

/**/

/*Projection Type: Continuous Galerkin*/

/**/

/*Advection: explicit*/

/**/

/*Diffusion: explicit*/

/**/

/*Time Step: 0.01*/

/**/

/*No. of Steps: 300*/

/**/

/*Checkpoints (steps): 30*/

/**/

/*Integration Type: IMEXOrder1*/

/**/

/*=======================================================================*/

/**/

/*Initial Conditions:*/

/**/

/*- Field u: 0*/

/**/

/*- Field v: 0*/

/**/

/*- Field w: 0.15625*/

/**/

/*- Field p: 0*/

/**/

/*Writing: Re_1_v2_N6_0.chk
*/


/**/

/**/But after that the analysis is ended by giving the error below:

/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/

/**/

/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/

/**/

/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/

/**/

/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/

/**/

/*slurmd[mercan115]: Job 405433 exceeded memory limit (22245156 >
20480000), being killed*/

/**/

/*slurmd[mercan115]: Exceeded job memory limit*/

/**/

/*slurmd[mercan115]: *** JOB 405433 CANCELLED AT 2014-11-30T23:15:28 ****/

However when I try to run the analysis with 8 processors, the analysis
directly ends by giving the error below:

/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/

/**/

/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/

/**/

/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/

/**/

/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/

/**/

/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/

/**/

/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/

/**/

/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/

/**/

/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/

/**/

/*--------------------------------------------------------------------------*/

/**/

/*mpirun noticed that process rank 2 with PID 24004 on node
mercan146.yonetim exited on signal 11 (Segmentation fault).*/


What may be the reason for this problem?

Regards,
Kamil

/**/
30.11.2014 13:08 tarihinde, Chris Cantwell yazdı:
Dear Kamil,

This still seems to suggest that the version in your home directory is
not compiled with -fPIC.

Try deleting all library files (*.a) and all compiled object code
(*.o) from within the LAPACK source tree and try compiling from fresh
again. Also note that you need to add the -fPIC flag to both the OPTS
and NOOPT variables in your LAPACK make.inc file (which presumably is
what your system administrator altered).

Cheers,
Chris