HDF5 parallel read 2D dataset with 32 processers


#1

In my cluster, one node has 12 processers. Now, my code’s function is to read a multi-dimension dataset parallelly. When I read these multi-dimension dataset with one or two nodes(no more than 24 processers), the code runs properly. While, when the code runs across three nodes, the code runs all the time and cannot stop.
I am sure I parallel open the hdf5 file properly. And the lines I read the 2D dataset are as follows.

call h5dopen_f(h5%lid, dame, set_id,ierr)
call h5dget_space_f(set_id, space_id, ierr)
call h5sget_simple_extent_dims_f(space_id, dims, maxdim, ierr)
allocate(value(dims(1),dims(2)))
call h5dread_f(set_id, h5kind_to_type(kind(value),H5_REAL_KIND), value,dims, ierr)
call h5dclose_f(set_id, ierr)

Any response will be appreciated.


#2

To increase replies try uploading a minimum compilable example that demonstrates the behaviour. I don’t write software in Fortran and can’t spot the error if any. Having said that, since each process is a separate memory space there should be no interaction between them. Reading a dataset from one process doesn’t affect the other processes.

Did you try to schedule the program by commenting out the h5dread_f call and see if you still having the same problem? If it does keep reducing the original problem by removing lines until the problem vanishes?

Is it a reliable cluster with a reliable parallel file system? Here is a link to a parallel fortran example
can you check if this runs properly on your system? Notice the mandatory MPI setup/shutdown and linking against mpi + phdf5

steve