Hi all, since no reaction here, I tried to dig into it myself and verified on the C library that the call indeed is not collective there:
plist_id = H5Pcreate(H5P_FILE_ACCESS);
H5Pset_fapl_mpio(plist_id, comm, info);
file_id = H5Fopen(H5FILE_NAME, H5P_DEFAULT, plist_id);
if (mpi_rank == 0) {
dset_id = H5Oopen(file_id, DATASETNAME, H5P_DEFAULT);
}
indeed does finish normally when run with multiple processes. It doesn’t matter that only rank 0 does call the H5Oopen function.
Since the method:
h5o.open(self.id, self._e(name), lapl=self._lapl)
should only be a cython wrapper around the C method which I tested above, the only logical explanation I can think of is that it does not use the H5P_DEFAULT property list and the self._lapl has some properties that force the collective behavior of the h5o.open.
I also found within the base.py what the default configuration is:
def default_lapl():
""" Default link access property list """
lapl = h5p.create(h5p.LINK_ACCESS)
fapl = h5p.create(h5p.FILE_ACCESS)
fapl.set_fclose_degree(h5f.CLOSE_STRONG)
lapl.set_elink_fapl(fapl)
return lapl
but I cannot find any suspicious property there. Can I somehow print the whole property list to compare the property list from C library and h5py?
Another thing that came to my mind is that I am using the track_order=True for my file structure, maybe that could also force the operations to be collective?
Edit: On second thought, the track_order is irrelevant as the C program mentioned above is working with the same H5 file and opening the dataset works independently. That brings me again to the only possible cause - the properties.
Anyway, I am already at the limits of what I am able to debug, please help.
Cheers,
Jiri