Hi Kurt,
Your use of AllReduce seems to be correct, but you have to guarantee that all processes call it because AllReduce is a collective operation.
This locking usually occurs when there is a mismatch in the mpi calls, causing some processes to wait forever for a corresponding call in other ranks.
Considering you are working with boundaries, it is easy to get this wrong, since in general only some of the partitions will contain the boundary. You can get around this problem by forcing the other ranks to perform dummy communications. For example, your code could look like
flowrate = 0.0;
if (bnd)
{
flowrate = BndCondExp->VectorFlux(bndVelocity);
m_comm->AllReduce(flowrate, LibUtilities::ReduceSum);
(do other things)
}
else
{
m_comm->AllReduce(flowrate, LibUtilities::ReduceSum);
}
Cheers,
Douglas