Re: [firedrake] parallelization
Dear Francis, What is job.py? If it is a "normal" firedrake script then approach 1) will not be running in parallel. If it is a python script that executes "mpirun -n 16 python simulation.py", on the other hand... Best, Andrew On 25 April 2016 at 18:59, Francis Poulin <fpoulin@uwaterloo.ca> wrote:
Hello,
So far I have only been running firedrake runs using
python job.py
and I thought it was in serial.
I just tried a run with a higher resolution on my dell that has 2x8 cores and it seems to be using nearly 32,000% of my cpu. I am very happy to know that it is parallelizing automatically, it makes my life a lot easier but I have two questions.
1) What kind of parallelization is it using? From the high percentage it seems like it's threaded but not sure how to check.
2) Is there any advantage to running
mpirun -np 16 python job.py
versus doing the more simple execution above?
If not I will continue doing the simple approach.
Cheers, Francis
------------------ Francis Poulin Associate Professor Department of Applied Mathematics University of Waterloo
email: fpoulin@uwaterloo.ca Web: https://uwaterloo.ca/poulin-research-group/ Telephone: +1 519 888 4567 x32637
Dear Andrew, Thanks for the response. Sorry. When I said job.py I meant any "normal" firedrake script. When I run my job on my mac using just python it does run in serial and I have tested that I can use mpirun to run it in parallel. This is what I expected. It is on my ubuntu machine that it seems to take up 32 cores even though I just used python, not mpirun. I wonder whether it might be using threads instead of cores? I have not tried mpirun on this machine yet but I can easily do that. I am curious to better learn how efficient the parallelization is so I plan to do tests for different numbers of cores. Best regards, Francis ------------------ Francis Poulin Associate Professor Department of Applied Mathematics University of Waterloo email: fpoulin@uwaterloo.ca Web: https://uwaterloo.ca/poulin-research-group/ Telephone: +1 519 888 4567 x32637 ________________________________ From: firedrake-bounces@imperial.ac.uk [firedrake-bounces@imperial.ac.uk] on behalf of Andrew McRae [A.T.T.McRae@bath.ac.uk] Sent: Tuesday, April 26, 2016 6:25 AM To: firedrake@imperial.ac.uk Subject: Re: [firedrake] parallelization Dear Francis, What is job.py? If it is a "normal" firedrake script then approach 1) will not be running in parallel. If it is a python script that executes "mpirun -n 16 python simulation.py", on the other hand... Best, Andrew On 25 April 2016 at 18:59, Francis Poulin <fpoulin@uwaterloo.ca<mailto:fpoulin@uwaterloo.ca>> wrote: Hello, So far I have only been running firedrake runs using python job.py and I thought it was in serial. I just tried a run with a higher resolution on my dell that has 2x8 cores and it seems to be using nearly 32,000% of my cpu. I am very happy to know that it is parallelizing automatically, it makes my life a lot easier but I have two questions. 1) What kind of parallelization is it using? From the high percentage it seems like it's threaded but not sure how to check. 2) Is there any advantage to running mpirun -np 16 python job.py versus doing the more simple execution above? If not I will continue doing the simple approach. Cheers, Francis ------------------ Francis Poulin Associate Professor Department of Applied Mathematics University of Waterloo email: fpoulin@uwaterloo.ca<mailto:fpoulin@uwaterloo.ca> Web: https://uwaterloo.ca/poulin-research-group/ Telephone: +1 519 888 4567 x32637<tel:%2B1%20519%20888%204567%20x32637>
On 26/04/16 11:48, Francis Poulin wrote:
Dear Andrew,
Thanks for the response.
Sorry. When I said job.py I meant any "normal" firedrake script. When I run my job on my mac using just python it does run in serial and I have tested that I can use mpirun to run it in parallel. This is what I expected. It is on my ubuntu machine that it seems to take up 32 cores even though I just used python, not mpirun. I wonder whether it might be using threads instead of cores?
I have not tried mpirun on this machine yet but I can easily do that. I am curious to better learn how efficient the parallelization is so I plan to do tests for different numbers of cores.
On an ubuntu machine, if you haven't explicitly set OMP_NUM_THREADS then any code that links against the openmp library will use all available cores. So, does this oversubscription go away if you do: OMP_NUM_THREADS=1 python script.py Thanks, Lawrence
Thanks Lawrence and Fabio for your feedback. Fabio: I tried running it on 4 cores using mpirun and there were 4 jobs, two of which used 100% or so and the other two used 1500% or so. It seems that ubuntu optimizes to use as many threads as possible. A nice feature. Lawrence: Yes, when I tried what you suggested it only used one core, as you predicted. Thanks for pointing that out. I am not sure of a 2D problem with a 200x200 grid can really be very efficient on 16 cores but I will try running an mpi job on 4 or maybe even 8 and see how that compares with the threading. Now that I know that firedrake does automatic threading on ubuntu but has the ability to do mpi, I'd like to know which one I should use to be efficient. Thanks again for all the help. Francis
On 26/04/16 16:54, Francis Poulin wrote:
Thanks Lawrence and Fabio for your feedback.
Fabio: I tried running it on 4 cores using mpirun and there were 4 jobs, two of which used 100% or so and the other two used 1500% or so. It seems that ubuntu optimizes to use as many threads as possible. A nice feature.
Please note that firedrake doesn't use any threaded parallelisation right now. So I don't know what the other cores are doing running at 1500% of CPU, but I suspect they are slowing things down!
Lawrence: Yes, when I tried what you suggested it only used one core, as you predicted. Thanks for pointing that out.
This is your best option
I am not sure of a 2D problem with a 200x200 grid can really be very efficient on 16 cores but I will try running an mpi job on 4 or maybe even 8 and see how that compares with the threading.
Now that I know that firedrake does automatic threading on ubuntu but has the ability to do mpi, I'd like to know which one I should use to be efficient.
Firedrake doesn't actually use any automatic threading! It's just that any library which happens to have linked against openmp on ubuntu spawns loads of threads that sit spinning idly. To achieve parallelisation please set OMP_NUM_THREADS to 1 and use mpirun. Thanks, Lawrence
Thanks again Lawrence, that is very helpful. I will probably set OMP_NUM_THREADS to 1 all the time on ubuntu, just to be safe, as you recommend.l In case you're curious, the job that I ran in serial (but lots of threads running around in circles) took 4.5 hours. When I ran it on 16 cores it took about 0.5 hours. When I used all the cores it didn't make much of a difference whether I set the number of threads or not. I suppose that's because there were no spare cores to waste. I suspect if I ran it on 8 cores it might take the same amount of time since the grid is pretty course but I'm very happy to know that the parallelization can give me an order of magnitude of improvement on my desktop. Thank you firedrakers! Francis
Hello again, In case your curious, on my ubuntu machine with 2x8 cores I get, approximately, np = 16 30 mins np = 8 35 mins np = 4 57 mins np = 2 ?? np = 1 ~4.5 hours I don't think I'll necessarily try np = 2 but it does seem that 8 probably the optimal value for this case. I am sure that with higher resolution (or a third dimension) the efficiency would improve but it's already looking very good. Francis
participants (3)
- 
                
                Andrew McRae
- 
                
                Francis Poulin
- 
                
                Lawrence Mitchell