Dear Kamil, The first error is simply that more memory was needed than the amount you allocated to the job (as you probably realised). The second error is a segmentation fault. Can you reproduce the problem using a (much) smaller job? Cheers, Chris On 30/11/14 21:41, Kamil ÖZDEN wrote:
Dear Dr. Cantwell,
Thanks for your help. I'll try this and inform you about the result.
Meanwhile I made another installation with ACML on the same cluster with the following ACML and MPI configuration
**************** /* ACML /truba/sw/centos6.4/lib/acml/4.4.0/gfortran64/lib/libacml.so *//* *//* ACML_INCLUDE_PATH /truba/sw/centos6.4/lib/acml/4.4.0/gfortran64/include *//* *//* ACML_SEARCH_PATHS /truba/sw/centos6.4/lib/acml/4.4.0/gfortran64/include *//* *//* ACML_USE_OPENMP_LIBRARIES OFF *//* *//* ACML_USE_SHARED_LIBRARIES ON */ ********************** /*MPIEXEC /usr/mpi/gcc/openmpi-1.6.5/bin/mpiexec *//* *//* MPIEXEC_MAX_NUMPROCS 2 *//* *//* MPIEXEC_NUMPROC_FLAG -np *//* *//* MPIEXEC_POSTFLAGS *//* *//* MPIEXEC_PREFLAGS *//* *//* MPI_CXX_COMPILER /usr/mpi/gcc/openmpi-1.6.5/bin/mpicxx *//* *//* MPI_CXX_COMPILE_FLAGS *//* *//* MPI_CXX_INCLUDE_PATH /usr/mpi/gcc/openmpi-1.6.5/include *//* *//* MPI_CXX_LIBRARIES /usr/mpi/gcc/openmpi-1.6.5/lib64/libmpi_cxx.so;/usr/mpi/gcc/openmpi-1.6.5/lib64/libmpi.so;/usr/lib64/libdl.so;/usr/lib64/libm.so;/usr/lib64/librt.so;/usr/lib64/libnsl.so;/usr/lib64/libutil.so;/usr/lib64/libm.so;/usr/lib64/libdl.so *//* *//* MPI_CXX_LINK_FLAGS -Wl,--export-dynamic *//* *//* MPI_C_COMPILER /usr/mpi/gcc/openmpi-1.6.5/bin/mpicc *//* *//* MPI_C_COMPILE_FLAGS *//* *//* MPI_C_INCLUDE_PATH /usr/mpi/gcc/openmpi-1.6.5/include *//* *//* MPI_C_LIBRARIES /usr/mpi/gcc/openmpi-1.6.5/lib64/libmpi.so;/usr/lib64/libdl.so;/usr/lib64/libm.so;/usr/lib64/librt.so;/usr/lib64/libnsl.so;/usr/lib64/libutil.so;/usr/lib64/libm.so;/usr/lib64/libdl.so *//* *//* MPI_C_LINK_FLAGS -Wl,--export-dynamic *//* *//* MPI_EXTRA_LIBRARY /usr/mpi/gcc/openmpi-1.6.5/lib64/libmpi.so;/usr/lib64/libdl.so;/usr/lib64/libm.so;/usr/lib64/librt.so;/usr/lib64/libnsl.so;/usr/lib64/libutil.so;/usr/lib64/libm.so;/usr/lib64/libdl.so *//* *//* MPI_LIBRARY /usr/mpi/gcc/openmpi-1.6.5/lib64/libmpi_cxx.so ***********************
*/Nektar seems to be installed successfully. However when I try to submit a job by using mpirun command with a script to the AMD processors of cluster (cluster uses SLURM resource manager) I face with such an issue.
When I tried to run with 4 processors.Initial conditons are read and first .chk directory is started to write as seen below:
/*=======================================================================*/
/**/
/*EquationType: UnsteadyNavierStokes*/
/**/
/*Session Name: Re_1_v2_N6*/
/**/
/*Spatial Dim.: 3*/
/**/
/*Max SEM Exp. Order: 7*/
/**/
/*Expansion Dim.: 3*/
/**/
/*Projection Type: Continuous Galerkin*/
/**/
/*Advection: explicit*/
/**/
/*Diffusion: explicit*/
/**/
/*Time Step: 0.01*/
/**/
/*No. of Steps: 300*/
/**/
/*Checkpoints (steps): 30*/
/**/
/*Integration Type: IMEXOrder1*/
/**/
/*=======================================================================*/
/**/
/*Initial Conditions:*/
/**/
/*- Field u: 0*/
/**/
/*- Field v: 0*/
/**/
/*- Field w: 0.15625*/
/**/
/*- Field p: 0*/
/**/
/*Writing: Re_1_v2_N6_0.chk */
/**/
/**/But after that the analysis is ended by giving the error below:
/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/
/**/
/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/
/**/
/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/
/**/
/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/
/**/
/*slurmd[mercan115]: Job 405433 exceeded memory limit (22245156 > 20480000), being killed*/
/**/
/*slurmd[mercan115]: Exceeded job memory limit*/
/**/
/*slurmd[mercan115]: *** JOB 405433 CANCELLED AT 2014-11-30T23:15:28 ****/
However when I try to run the analysis with 8 processors, the analysis directly ends by giving the error below:
/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/
/**/
/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/
/**/
/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/
/**/
/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/
/**/
/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/
/**/
/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/
/**/
/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/
/**/
/*Warning: Conflicting CPU frequencies detected, using: 2300.000000.*/
/**/
/*--------------------------------------------------------------------------*/
/**/
/*mpirun noticed that process rank 2 with PID 24004 on node mercan146.yonetim exited on signal 11 (Segmentation fault).*/
What may be the reason for this problem?
Regards, Kamil
/**/ 30.11.2014 13:08 tarihinde, Chris Cantwell yazdı:
Dear Kamil,
This still seems to suggest that the version in your home directory is not compiled with -fPIC.
Try deleting all library files (*.a) and all compiled object code (*.o) from within the LAPACK source tree and try compiling from fresh again. Also note that you need to add the -fPIC flag to both the OPTS and NOOPT variables in your LAPACK make.inc file (which presumably is what your system administrator altered).
Cheers, Chris
-- Chris Cantwell Imperial College London South Kensington Campus London SW7 2AZ Email: c.cantwell@imperial.ac.uk www.imperial.ac.uk/people/c.cantwell