Hi all, I'm trying to run a few scaling tests on the cluster I have acess to. I'm using a mesh with 10,800 elements using an expansion order of 5. The simulation is set to run for 10,000 time steps. The issue that I'm running into is as follows : Doubling the number of processors *increases* the total cpu wall time. *Procs Wall Time* * 1 201 s* *20 209 s* * 40 242 s* I believe this is due to the overhead caused by writing checkpoint files (each parallel stream seems to write a separate checkpoint file). I have reduced the output frequency to the point that only 1 checkpoint file should be written for the entire simulation time, However, this still requires *n *checkpoint files to be written where *n *is the number of processors the case is parallelised on. In all cases I use the mpirun command. For example mpirun -np *n* IncNavierStokesSolver case.xml Could I have some pointers for proceeding further with this issue? Sincerely, -- *Amitvikram Dutta* Graduate Research Assistant Fluid Mechanics Research Lab Multi-Physics Interaction Lab University of Waterloo
Hi Amitvikram, There are a number of possible reasons for this behaviour. Aside from any possible geometry-related issues that others on this list may be able to provide some advice on, this is most likely to be related to the configuration of your cluster and how you're building/running Nektar++. This could be a result of either the cluster interconnect and the related configuration of MPI and your Nektar++ build, or the use of a shared filesystem that may be suffering performance issues when each node is trying to write out its section of the checkpoint to the output directory concurrently. Does your cluster have any sort of batch submission system on it? I assume not since it looks from your description like you're running mpirun -np n directly (i.e. I assume you have a group of nodes and you SSH directly to one of them from where you run mpirun?) Some things to try/investigate: 1) Assuming that you're currently running with some sort of shared filesystem between the cluster nodes, and that you have the ability to log in to each of the nodes in the cluster, try setting up your computation to store its output in a location on each node that is on a local disk. For testing, you could perhaps try using /tmp. You may need to first copy your case.xml input file to the directory that you're running from on each node, although in theory, you might be able to run with the input file only on the submission node. So, for example, if you were to copy your case.xml file to /tmp/job/ on your submission node, create /tmp/job on each of your other compute nodes, and then run "mpirun -np n IncNavierStokesSolver /tmp/job/case.xml", I believe Nektar++ will detect that its not running on a shared filesystem and write out the relevant parts of the partitioned mesh on each compute node. The computation will then run without needing to write data to a shared disk. Someone with more experience of the software can confirm whether you need to push your input file out to each compute node first, but using this general approach you should be able to identify whether shared storage is affecting performance. 2) The other thing to investigate is the interconnect between your nodes. If this is using a standard gigabit ethernet connection, then communication latency could possibly be an issue, although at the number of processors you're trying to run with, I would have thought you'd see at least some speedup from running in parallel. What version of MPI are you using? Are you building Nektar++ yourself? I assume you are building from source? If your cluster has any sort of specialist high-performance interconnect, does this have its own libraries that you need to link Nektar++ against when building in order to correctly use the right interconnect? 3) Another thought that springs to mind, although I think this is unlikely, is whether your cluster is somehow overcommitting processes to nodes. This would certainly result in a drop in performance. The MPI machinefile, which is presumably in a standard location since you're not having to specify it on the command line, should list all the nodes in the cluster and can specify the number of 'slots' or MPI processes that can be run on each node. I don't have experience of this myself in terms of performance tests, but again maybe someone else on the list can comment, but I wonder, if you have CPUs with hyperthreading and this is enabled, whether MPI is trying to run processes on the number of cores reported by the operating system which would be twice the number of physical cores on the CPU(s). Again, this is not something I've looked at directly but it could well be something that's causing an issue. I guess you could test this by increasing the number of processes one at a time and see if performance increases up to a certain number of processes and then starts to drop. These are just a few thoughts that come to mind - someone else might be able to give you a more concrete answer to your query but I hope this helps and provides some ideas on things to look at to explain the issue you're experiencing. Cheers, Jeremy On 8 May 2018, at 18:09, Amitvikram Dutta <amitvdutta23@gmail.com> wrote:
Hi all,
I'm trying to run a few scaling tests on the cluster I have acess to.
I'm using a mesh with 10,800 elements using an expansion order of 5. The simulation is set to run for 10,000 time steps. The issue that I'm running into is as follows :
Doubling the number of processors increases the total cpu wall time.
Procs Wall Time
1 201 s 20 209 s 40 242 s
I believe this is due to the overhead caused by writing checkpoint files (each parallel stream seems to write a separate checkpoint file). I have reduced the output frequency to the point that only 1 checkpoint file should be written for the entire simulation time, However, this still requires n checkpoint files to be written where n is the number of processors the case is parallelised on.
In all cases I use the mpirun command. For example
mpirun -np n IncNavierStokesSolver case.xml
Could I have some pointers for proceeding further with this issue?
Sincerely,
-- Amitvikram Dutta
Graduate Research Assistant
Fluid Mechanics Research Lab
Multi-Physics Interaction Lab
University of Waterloo
_______________________________________________ Nektar-users mailing list Nektar-users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/nektar-users
participants (2)
- 
                
                Amitvikram Dutta
- 
                
                Jeremy Cohen