Some parallel ctests hang
******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hi there, We're trying to install nektar++ 5.6.0 from source on our HPC system and are having trouble passing two parallel tests: MultiRegions_Helmholtz2D_CG_P7_Modes_AllBCs_iter_ml_par3 MultiRegions_Helmholtz3D_CG_Hex_AllBCs_iter_ml_par3 In both cases they hang. Have tried openmpi, intelmpi and mvapich2. Attaching strace to the processes reveals a lot of polling and not much else. To simplify, I've reproduced this on a separate Rocky 8.10 system + EPEL packages and building against master (7b1aa23bb) in case it has already been fixed. openmpi is 4.1.1: git clone http://gitlab.nektar.info/nektar/nektar.git mkdir build cd build cmake -D NEKTAR_USE_MPI=ON ../nektar make -j24 ctest Are these tests known to be problematic, please? Thanks, Mark
Hi Mark, As far as I am aware these tests typically pass. We have a got lab CI system you can check to see what builds we typically run under gitlab.nektar.info Best, Spencer Sent from my iPhone
On 24 Jul 2024, at 17:10, Mark Dixon <mark.c.dixon@durham.ac.uk> wrote:
 ******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hi there,
We're trying to install nektar++ 5.6.0 from source on our HPC system and are having trouble passing two parallel tests:
MultiRegions_Helmholtz2D_CG_P7_Modes_AllBCs_iter_ml_par3 MultiRegions_Helmholtz3D_CG_Hex_AllBCs_iter_ml_par3
In both cases they hang. Have tried openmpi, intelmpi and mvapich2. Attaching strace to the processes reveals a lot of polling and not much else.
To simplify, I've reproduced this on a separate Rocky 8.10 system + EPEL packages and building against master (7b1aa23bb) in case it has already been fixed. openmpi is 4.1.1:
git clone http://gitlab.nektar.info/nektar/nektar.git mkdir build cd build cmake -D NEKTAR_USE_MPI=ON ../nektar make -j24 ctest
Are these tests known to be problematic, please?
Thanks,
Mark
_______________________________________________ Nektar-users mailing list Nektar-users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/nektar-users
Hi Mark, Quick one, are you running these on the login nodes or are you submitting through your clusters scheduler/queuing system? If you aren't, would suggest trying through a job submission. MPI libraries aren't always available on the login nodes as a protection even if you load as a module. Thanks, James. Sent from Outlook for iOS<https://aka.ms/o0ukef> ________________________________ From: nektar-users-bounces@imperial.ac.uk <nektar-users-bounces@imperial.ac.uk> on behalf of Sherwin, Spencer J <s.sherwin@imperial.ac.uk> Sent: Thursday, July 25, 2024 4:37:43 PM To: Mark Dixon <mark.c.dixon@durham.ac.uk> Cc: nektar-users <nektar-users@imperial.ac.uk> Subject: Re: [Nektar-users] Some parallel ctests hang Hi Mark, As far as I am aware these tests typically pass. We have a got lab CI system you can check to see what builds we typically run under gitlab.nektar.info Best, Spencer Sent from my iPhone
On 24 Jul 2024, at 17:10, Mark Dixon <mark.c.dixon@durham.ac.uk> wrote:
 ******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hi there,
We're trying to install nektar++ 5.6.0 from source on our HPC system and are having trouble passing two parallel tests:
MultiRegions_Helmholtz2D_CG_P7_Modes_AllBCs_iter_ml_par3 MultiRegions_Helmholtz3D_CG_Hex_AllBCs_iter_ml_par3
In both cases they hang. Have tried openmpi, intelmpi and mvapich2. Attaching strace to the processes reveals a lot of polling and not much else.
To simplify, I've reproduced this on a separate Rocky 8.10 system + EPEL packages and building against master (7b1aa23bb) in case it has already been fixed. openmpi is 4.1.1:
git clone http://gitlab.nektar.info/nektar/nektar.git mkdir build cd build cmake -D NEKTAR_USE_MPI=ON ../nektar make -j24 ctest
Are these tests known to be problematic, please?
Thanks,
Mark
_______________________________________________ Nektar-users mailing list Nektar-users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/nektar-users
Nektar-users mailing list Nektar-users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/nektar-users
Hi James, Thanks for the suggestion - I'm afraid there's no difference in behaviour between running on the login nodes or through the queue. And note that I reproduced the problem on my standlone desktop. Hi Spencer, Taking a look at the nektar CI system, it seems that that tests aren't habitually run on RHEL or its derivatives like Rocky/Alma. In all cases I've been using Rocky 8. I'm about to go on holiday, but what's the most constructive thing I can do? Trace Helmholtz2D/Helmholtz3D? Best, Mark On Thu, 25 Jul 2024, Slaughter, James wrote:
You don't often get email from j.slaughter19@imperial.ac.uk. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> [EXTERNAL EMAIL] Hi Mark,
Quick one, are you running these on the login nodes or are you submitting through your clusters scheduler/queuing system?
If you aren't, would suggest trying through a job submission. MPI libraries aren't always available on the login nodes as a protection even if you load as a module.
Thanks, James.
Sent from Outlook for iOS<https://aka.ms/o0ukef> ________________________________ From: nektar-users-bounces@imperial.ac.uk <nektar-users-bounces@imperial.ac.uk> on behalf of Sherwin, Spencer J <s.sherwin@imperial.ac.uk> Sent: Thursday, July 25, 2024 4:37:43 PM To: Mark Dixon <mark.c.dixon@durham.ac.uk> Cc: nektar-users <nektar-users@imperial.ac.uk> Subject: Re: [Nektar-users] Some parallel ctests hang
Hi Mark,
As far as I am aware these tests typically pass. We have a got lab CI system you can check to see what builds we typically run under gitlab.nektar.info
Best, Spencer
Sent from my iPhone
On 24 Jul 2024, at 17:10, Mark Dixon <mark.c.dixon@durham.ac.uk> wrote:
 ******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hi there,
We're trying to install nektar++ 5.6.0 from source on our HPC system and are having trouble passing two parallel tests:
MultiRegions_Helmholtz2D_CG_P7_Modes_AllBCs_iter_ml_par3 MultiRegions_Helmholtz3D_CG_Hex_AllBCs_iter_ml_par3
In both cases they hang. Have tried openmpi, intelmpi and mvapich2. Attaching strace to the processes reveals a lot of polling and not much else.
To simplify, I've reproduced this on a separate Rocky 8.10 system + EPEL packages and building against master (7b1aa23bb) in case it has already been fixed. openmpi is 4.1.1:
git clone http://gitlab.nektar.info/nektar/nektar.git mkdir build cd build cmake -D NEKTAR_USE_MPI=ON ../nektar make -j24 ctest
Are these tests known to be problematic, please?
Thanks,
Mark
_______________________________________________ Nektar-users mailing list Nektar-users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/nektar-users
Nektar-users mailing list Nektar-users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/nektar-users
Do older versions run fine? Sent from my iPhone
On 26 Jul 2024, at 11:45, Mark Dixon <mark.c.dixon@durham.ac.uk> wrote:
Hi James,
Thanks for the suggestion - I'm afraid there's no difference in behaviour between running on the login nodes or through the queue. And note that I reproduced the problem on my standlone desktop.
Hi Spencer,
Taking a look at the nektar CI system, it seems that that tests aren't habitually run on RHEL or its derivatives like Rocky/Alma. In all cases I've been using Rocky 8.
I'm about to go on holiday, but what's the most constructive thing I can do? Trace Helmholtz2D/Helmholtz3D?
Best,
Mark
On Thu, 25 Jul 2024, Slaughter, James wrote:
You don't often get email from j.slaughter19@imperial.ac.uk. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> [EXTERNAL EMAIL] Hi Mark,
Quick one, are you running these on the login nodes or are you submitting through your clusters scheduler/queuing system?
If you aren't, would suggest trying through a job submission. MPI libraries aren't always available on the login nodes as a protection even if you load as a module.
Thanks, James.
Sent from Outlook for iOS<https://aka.ms/o0ukef> ________________________________ From: nektar-users-bounces@imperial.ac.uk <nektar-users-bounces@imperial.ac.uk> on behalf of Sherwin, Spencer J <s.sherwin@imperial.ac.uk> Sent: Thursday, July 25, 2024 4:37:43 PM To: Mark Dixon <mark.c.dixon@durham.ac.uk> Cc: nektar-users <nektar-users@imperial.ac.uk> Subject: Re: [Nektar-users] Some parallel ctests hang
Hi Mark,
As far as I am aware these tests typically pass. We have a got lab CI system you can check to see what builds we typically run under gitlab.nektar.info
Best, Spencer
Sent from my iPhone
On 24 Jul 2024, at 17:10, Mark Dixon <mark.c.dixon@durham.ac.uk> wrote:
 ******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hi there,
We're trying to install nektar++ 5.6.0 from source on our HPC system and are having trouble passing two parallel tests:
MultiRegions_Helmholtz2D_CG_P7_Modes_AllBCs_iter_ml_par3 MultiRegions_Helmholtz3D_CG_Hex_AllBCs_iter_ml_par3
In both cases they hang. Have tried openmpi, intelmpi and mvapich2. Attaching strace to the processes reveals a lot of polling and not much else.
To simplify, I've reproduced this on a separate Rocky 8.10 system + EPEL packages and building against master (7b1aa23bb) in case it has already been fixed. openmpi is 4.1.1:
git clone http://gitlab.nektar.info/nektar/nektar.git mkdir build cd build cmake -D NEKTAR_USE_MPI=ON ../nektar make -j24 ctest
Are these tests known to be problematic, please?
Thanks,
Mark
_______________________________________________ Nektar-users mailing list Nektar-users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/nektar-users
Nektar-users mailing list Nektar-users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/nektar-users
Hi Mark, I managed to track down the problem, the following merge request should fix it: https://gitlab.nektar.info/nektar/nektar/-/merge_requests/1850 Please notice that you can directly fetch the merge request from GitLab using git fetch origin merge-requests/1850/head:fix/iterativestaticcond-with-absolution-tolerance git checkout fix/iterativestaticcond-with-absolution-tolerance Cheers, Jacques ________________________________ From: nektar-users-bounces@imperial.ac.uk <nektar-users-bounces@imperial.ac.uk> on behalf of Sherwin, Spencer J <s.sherwin@imperial.ac.uk> Sent: 26 July 2024 18:14 To: Mark Dixon <mark.c.dixon@durham.ac.uk> Cc: nektar-users <nektar-users@imperial.ac.uk> Subject: Re: [Nektar-users] Some parallel ctests hang Do older versions run fine? Sent from my iPhone
On 26 Jul 2024, at 11:45, Mark Dixon <mark.c.dixon@durham.ac.uk> wrote:
Hi James,
Thanks for the suggestion - I'm afraid there's no difference in behaviour between running on the login nodes or through the queue. And note that I reproduced the problem on my standlone desktop.
Hi Spencer,
Taking a look at the nektar CI system, it seems that that tests aren't habitually run on RHEL or its derivatives like Rocky/Alma. In all cases I've been using Rocky 8.
I'm about to go on holiday, but what's the most constructive thing I can do? Trace Helmholtz2D/Helmholtz3D?
Best,
Mark
On Thu, 25 Jul 2024, Slaughter, James wrote:
You don't often get email from j.slaughter19@imperial.ac.uk. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> [EXTERNAL EMAIL] Hi Mark,
Quick one, are you running these on the login nodes or are you submitting through your clusters scheduler/queuing system?
If you aren't, would suggest trying through a job submission. MPI libraries aren't always available on the login nodes as a protection even if you load as a module.
Thanks, James.
Sent from Outlook for iOS<https://aka.ms/o0ukef> ________________________________ From: nektar-users-bounces@imperial.ac.uk <nektar-users-bounces@imperial.ac.uk> on behalf of Sherwin, Spencer J <s.sherwin@imperial.ac.uk> Sent: Thursday, July 25, 2024 4:37:43 PM To: Mark Dixon <mark.c.dixon@durham.ac.uk> Cc: nektar-users <nektar-users@imperial.ac.uk> Subject: Re: [Nektar-users] Some parallel ctests hang
Hi Mark,
As far as I am aware these tests typically pass. We have a got lab CI system you can check to see what builds we typically run under gitlab.nektar.info
Best, Spencer
Sent from my iPhone
On 24 Jul 2024, at 17:10, Mark Dixon <mark.c.dixon@durham.ac.uk> wrote:
 ******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hi there,
We're trying to install nektar++ 5.6.0 from source on our HPC system and are having trouble passing two parallel tests:
MultiRegions_Helmholtz2D_CG_P7_Modes_AllBCs_iter_ml_par3 MultiRegions_Helmholtz3D_CG_Hex_AllBCs_iter_ml_par3
In both cases they hang. Have tried openmpi, intelmpi and mvapich2. Attaching strace to the processes reveals a lot of polling and not much else.
To simplify, I've reproduced this on a separate Rocky 8.10 system + EPEL packages and building against master (7b1aa23bb) in case it has already been fixed. openmpi is 4.1.1:
git clone http://gitlab.nektar.info/nektar/nektar.git mkdir build cd build cmake -D NEKTAR_USE_MPI=ON ../nektar make -j24 ctest
Are these tests known to be problematic, please?
Thanks,
Mark
_______________________________________________ Nektar-users mailing list Nektar-users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/nektar-users
Nektar-users mailing list Nektar-users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/nektar-users
Nektar-users mailing list Nektar-users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/nektar-users
Hi Jacques, everyone, Sorry for the delay, been away... Thanks so much for this - I confirm that merge request 1850 fixes tests MultiRegions_Helmholtz2D_CG_P7_Modes_AllBCs_iter_ml_par3 and MultiRegions_Helmholtz3D_CG_Hex_AllBCs_iter_ml_par3 on our Rocky 8 systems. Well done! Would you recommend we continue to use nektar 5.6.0 as-is, or apply these patches on top? Best wishes, Mark On Sun, 28 Jul 2024, Xing, Jacques wrote:
You don't often get email from j.xing@imperial.ac.uk. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> [EXTERNAL EMAIL] Hi Mark,
I managed to track down the problem, the following merge request should fix it: https://gitlab.nektar.info/nektar/nektar/-/merge_requests/1850
Please notice that you can directly fetch the merge request from GitLab using
git fetch origin merge-requests/1850/head:fix/iterativestaticcond-with-absolution-tolerance git checkout fix/iterativestaticcond-with-absolution-tolerance
Cheers, Jacques
________________________________ From: nektar-users-bounces@imperial.ac.uk <nektar-users-bounces@imperial.ac.uk> on behalf of Sherwin, Spencer J <s.sherwin@imperial.ac.uk> Sent: 26 July 2024 18:14 To: Mark Dixon <mark.c.dixon@durham.ac.uk> Cc: nektar-users <nektar-users@imperial.ac.uk> Subject: Re: [Nektar-users] Some parallel ctests hang
Do older versions run fine?
Sent from my iPhone
On 26 Jul 2024, at 11:45, Mark Dixon <mark.c.dixon@durham.ac.uk> wrote:
Hi James,
Thanks for the suggestion - I'm afraid there's no difference in behaviour between running on the login nodes or through the queue. And note that I reproduced the problem on my standlone desktop.
Hi Spencer,
Taking a look at the nektar CI system, it seems that that tests aren't habitually run on RHEL or its derivatives like Rocky/Alma. In all cases I've been using Rocky 8.
I'm about to go on holiday, but what's the most constructive thing I can do? Trace Helmholtz2D/Helmholtz3D?
Best,
Mark
On Thu, 25 Jul 2024, Slaughter, James wrote:
You don't often get email from j.slaughter19@imperial.ac.uk. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> [EXTERNAL EMAIL] Hi Mark,
Quick one, are you running these on the login nodes or are you submitting through your clusters scheduler/queuing system?
If you aren't, would suggest trying through a job submission. MPI libraries aren't always available on the login nodes as a protection even if you load as a module.
Thanks, James.
Sent from Outlook for iOS<https://aka.ms/o0ukef> ________________________________ From: nektar-users-bounces@imperial.ac.uk <nektar-users-bounces@imperial.ac.uk> on behalf of Sherwin, Spencer J <s.sherwin@imperial.ac.uk> Sent: Thursday, July 25, 2024 4:37:43 PM To: Mark Dixon <mark.c.dixon@durham.ac.uk> Cc: nektar-users <nektar-users@imperial.ac.uk> Subject: Re: [Nektar-users] Some parallel ctests hang
Hi Mark,
As far as I am aware these tests typically pass. We have a got lab CI system you can check to see what builds we typically run under gitlab.nektar.info
Best, Spencer
Sent from my iPhone
On 24 Jul 2024, at 17:10, Mark Dixon <mark.c.dixon@durham.ac.uk> wrote:
 ******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hi there,
We're trying to install nektar++ 5.6.0 from source on our HPC system and are having trouble passing two parallel tests:
MultiRegions_Helmholtz2D_CG_P7_Modes_AllBCs_iter_ml_par3 MultiRegions_Helmholtz3D_CG_Hex_AllBCs_iter_ml_par3
In both cases they hang. Have tried openmpi, intelmpi and mvapich2. Attaching strace to the processes reveals a lot of polling and not much else.
To simplify, I've reproduced this on a separate Rocky 8.10 system + EPEL packages and building against master (7b1aa23bb) in case it has already been fixed. openmpi is 4.1.1:
git clone http://gitlab.nektar.info/nektar/nektar.git mkdir build cd build cmake -D NEKTAR_USE_MPI=ON ../nektar make -j24 ctest
Are these tests known to be problematic, please?
Thanks,
Mark
_______________________________________________ Nektar-users mailing list Nektar-users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/nektar-users
Nektar-users mailing list Nektar-users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/nektar-users
Nektar-users mailing list Nektar-users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/nektar-users
participants (4)
- 
                
                Mark Dixon
- 
                
                Sherwin, Spencer J
- 
                
                Slaughter, James
- 
                
                Xing, Jacques