Good afternoon,
We’re looking at H5Dread_multi
to read from multiple (1000s or more) datasets in a single call. Reading the [RFC] the new feature seems very promising, i.e. to my understanding the idea is to compute a list of offsets and sizes to read, sort the list by file offset, (maybe add checks to remove overlapping regions or join reads separated by small gaps), read collectively using MPI I/O. Then somehow HDF5 needs to convert the bytes to something the user expects (decompression, big/small endian, etc.).
To a naive reader none of this requires the following condition:
All datasets must be in the same HDF5 file, and each unique dataset may only be listed once. If this function is called collectively in parallel, each rank must pass exactly the same list of datasets in dset_id , though the other parameters may differ.
Our use case is that each MPI rank wants to read mostly distinct datasets; and therefore doesn’t seem to naturally satisfy the restriction mentioned above. The details about how many and how big each dataset is, have been described here [1]. However, feel free to ask for further information as needed.
We have the following questions:
- Is
H5Dread_multi
intended to work in our usecase? If yes:- How would we use it? One idea would be to: exchange the names of all datasets; open all datasets on every MPI rank; and tell almost every MPI rank to read
0
elements. I’m skeptical of this approach, because it in some ways it increases the size of the problem proportional to the number of MPI ranks in order to benefit from improved access patterns to the parallel filesystem. - Can HIDs be
MPI_Allgather
ed or are only valid in the process they were created?
- How would we use it? One idea would be to: exchange the names of all datasets; open all datasets on every MPI rank; and tell almost every MPI rank to read
- If not: can this feature be extended to support the case of many small groups? Are there any plans to do so?
Thank you for your time and help.