Chris ran the 1D solver and profiled it locally. When using Collections with auto-tuning he got the attached profile. If he turned off auto-tuning we did observe the mutex_lock being second highest and dgemv being the top of the list. However
in your earlier profiling you had a boost method we do not recognise at the top. Could you perhaps try profiling your run again but with the auto-tuning turned on to see if this boost method is still at the top of your profiling list. If so the discrepancy
is presumably related to this method and we should try and find out why this is arising.
Cheers,
Spencer
PS We do have a threading library which has some lock on the memory pool and is probably why the no-auto-tuning is leading to high number of lock. We will look into turning off this feature if threading is not being used
PPS Douglas is going to look into using boost 1.57 (since we are generally using 1.58 or higher) to see if he observes a similar slow down to what you see.