Fwd: profiling screenshot
Hi Helmut, Chris ran the 1D solver and profiled it locally. When using Collections with auto-tuning he got the attached profile. If he turned off auto-tuning we did observe the mutex_lock being second highest and dgemv being the top of the list. However in your earlier profiling you had a boost method we do not recognise at the top. Could you perhaps try profiling your run again but with the auto-tuning turned on to see if this boost method is still at the top of your profiling list. If so the discrepancy is presumably related to this method and we should try and find out why this is arising. Cheers, Spencer PS We do have a threading library which has some lock on the memory pool and is probably why the no-auto-tuning is leading to high number of lock. We will look into turning off this feature if threading is not being used PPS Douglas is going to look into using boost 1.57 (since we are generally using 1.58 or higher) to see if he observes a similar slow down to what you see. Begin forwarded message: From: Chris Cantwell <c.cantwell@imperial.ac.uk<mailto:c.cantwell@imperial.ac.uk>> Subject: profiling screenshot Date: 18 May 2016 at 18:04:26 BST To: Spencer Sherwin <s.sherwin@imperial.ac.uk<mailto:s.sherwin@imperial.ac.uk>> [cid:0BEE0A69-B4F0-4DCC-BA56-28B13FFC4E9A@ae.ic.ac.uk] Spencer Sherwin McLaren Racing/Royal Academy of Engineering Research Chair, Professor of Computational Fluid Mechanics, Department of Aeronautics, Imperial College London South Kensington Campus London SW7 2AZ s.sherwin@imperial.ac.uk<mailto:s.sherwin@imperial.ac.uk> +44 (0) 20 759 45052
Hi Spencer, attached you find my profilings for auto-tune (=Sum Fac), Sum Fac and tuning off. They SumFacs look rather different from your screen shot. They were obtained using boost 1.57 (which comes with the third party build). I also run with the latest boost 1.61, without a difference in performance. Also turning on/off the boost multithreading option in ccmake does not make any difference Should the code run single or multi-threaded? With OpenBlas it runs single-threaded. However, I am testing now with a rather old and low performance mobile CPU (i5-2520m). Next week in office, I will repeat the test on my up to date Xeon Workstation. Cheers, Helmut ________________________________ Von: Sherwin, Spencer J [s.sherwin@imperial.ac.uk] Gesendet: Mittwoch, 18. Mai 2016 19:36 An: Kühnelt Helmut Cc: nektar-users Betreff: Fwd: profiling screenshot Hi Helmut, Chris ran the 1D solver and profiled it locally. When using Collections with auto-tuning he got the attached profile. If he turned off auto-tuning we did observe the mutex_lock being second highest and dgemv being the top of the list. However in your earlier profiling you had a boost method we do not recognise at the top. Could you perhaps try profiling your run again but with the auto-tuning turned on to see if this boost method is still at the top of your profiling list. If so the discrepancy is presumably related to this method and we should try and find out why this is arising. Cheers, Spencer PS We do have a threading library which has some lock on the memory pool and is probably why the no-auto-tuning is leading to high number of lock. We will look into turning off this feature if threading is not being used PPS Douglas is going to look into using boost 1.57 (since we are generally using 1.58 or higher) to see if he observes a similar slow down to what you see. Begin forwarded message: From: Chris Cantwell <c.cantwell@imperial.ac.uk<redir.aspx?REF=NbRTzVM_MhtdeD4BRV2oE6q3Q0gmZ5Qj-BPq0HTjsi5XW3uW3n_TCAFtYWlsdG86Yy5jYW50d2VsbEBpbXBlcmlhbC5hYy51aw..>> Subject: profiling screenshot Date: 18 May 2016 at 18:04:26 BST To: Spencer Sherwin <s.sherwin@imperial.ac.uk<redir.aspx?REF=L9AOOAl_RjN_dh1Ve4GU0D1mBhVYGN8-DcccSuL7ZAFXW3uW3n_TCAFtYWlsdG86cy5zaGVyd2luQGltcGVyaWFsLmFjLnVr>> [cid:0BEE0A69-B4F0-4DCC-BA56-28B13FFC4E9A@ae.ic.ac.uk] Spencer Sherwin McLaren Racing/Royal Academy of Engineering Research Chair, Professor of Computational Fluid Mechanics, Department of Aeronautics, Imperial College London South Kensington Campus London SW7 2AZ s.sherwin@imperial.ac.uk<redir.aspx?REF=L9AOOAl_RjN_dh1Ve4GU0D1mBhVYGN8-DcccSuL7ZAFXW3uW3n_TCAFtYWlsdG86cy5zaGVyd2luQGltcGVyaWFsLmFjLnVr> +44 (0) 20 759 45052
participants (2)
- 
                
                Kühnelt Helmut
- 
                
                Sherwin, Spencer J