Possible mpi bug in hdf5 1.10.2


#1

I have a code using netcdf 4.6.1 and hdf5 1.10.2 in parallel and I seem to have found a bug in the hdf5 library.

I am calling the netcdf4 function nc_get_vara_int from 4 tasks and reading a 4 dimensional variable.
All tasks have start = {0,0,0,0}, the first task has count = {1,2,10,2} all others have count={0,0,0,0}

This results in tasks 1-3 hanging in an MPI_Bcast that task 0 will never reach.

The conditional at line 1157 of file H5Dmpio.c and the comment at line 1283 imply that some tasks will participate in one branch and others in the other branch (and for my case task 0 has num_chunk=6 and 1-3 have num_chunk=0), however the call at line 1289 leads to a collective MPI_Bcast where all of the tasks are expected to land, since task 0 never arrives there the model hangs.


#2

Hi Jim,

For your reference, I entered HDFFV-10501 for this issue. (Currently our bug database is only available internally.) We will investigate it.

Thanks!

-Barbara