H5DOpen collective, driver MPIO


Hi all,

I’m trying to write a hdf5 dataset opened only by one process. This process does not modify the dataset metadata and only wants to write the data via write_direct:

ds = group[ds_name]

However, the first line (opening dataset) seems to be collective - the actual collective call happens on line 288 in file h5py/_hl/group.py:

oid = h5o.open(self.id, self._e(name), lapl=self._lapl)

According to the documentation (https://support.hdfgroup.org/HDF5/doc/RM/CollectiveCalls.html) the call doesn’t need to be collective if the dataset is not modified, which is confirmed in this thread Collective H5Dopen. I’m thinking that it would not make sense if that was not the case and all dataset operations would need to be collective.

Is this a bug or am I using the h5py API wrong?

Thanks very much!




Hi all, since no reaction here, I tried to dig into it myself and verified on the C library that the call indeed is not collective there:

plist_id = H5Pcreate(H5P_FILE_ACCESS);
H5Pset_fapl_mpio(plist_id, comm, info);

file_id = H5Fopen(H5FILE_NAME, H5P_DEFAULT, plist_id);
if (mpi_rank == 0) {
    dset_id = H5Oopen(file_id, DATASETNAME, H5P_DEFAULT);

indeed does finish normally when run with multiple processes. It doesn’t matter that only rank 0 does call the H5Oopen function.

Since the method:

h5o.open(self.id, self._e(name), lapl=self._lapl)

should only be a cython wrapper around the C method which I tested above, the only logical explanation I can think of is that it does not use the H5P_DEFAULT property list and the self._lapl has some properties that force the collective behavior of the h5o.open.

I also found within the base.py what the default configuration is:

def default_lapl():
    """ Default link access property list """
    lapl = h5p.create(h5p.LINK_ACCESS)
    fapl = h5p.create(h5p.FILE_ACCESS)
    return lapl

but I cannot find any suspicious property there. Can I somehow print the whole property list to compare the property list from C library and h5py?

Another thing that came to my mind is that I am using the track_order=True for my file structure, maybe that could also force the operations to be collective?
Edit: On second thought, the track_order is irrelevant as the C program mentioned above is working with the same H5 file and opening the dataset works independently. That brings me again to the only possible cause - the properties.

Anyway, I am already at the limits of what I am able to debug, please help.