We stumbled into an unexpected behavior when opening a hdf5 file with file locking disabled and accessing an external link: The file containing the external link is accessed with file locking enabled.
We were expecting that by default the external link files would be opened with the same file locking as the main file, i.e., the file locking would be “inherited” like it is the case for the read/write mode.
This script reproduces the issue: accessing data in the main file works but not in the external link file: link_locking_issue.py (757 Bytes)
I’m raising this issue here rather than in the h5py repository, because even if this might be fixable in h5py, I was expecting that libhdf5 would at least by default inherit the file access property list (or some of it) from the main file when accessing an external link (related to External links & VFDs).
To try and propose a fix for this issue to h5py, I looked at retrieving the file locking value from the opened file id’s access property list in order to use it for opening external links.
However it seems the file locking value is not set in the access property list retrieved from the opened file. This script reproduce the issue:
I believe this is mostly an oversight and the library should be fixed so that files opened through external links inherit the file locking settings from the parent file when a default FAPL is used. I have a branch now that does this, however a few things I noted:
When I run your first example, I actually get a “file locking flag values don’t match” failure when opening the main file with locking=False during the read() method. This is because the __main__ method already has the main file open in append mode with the default file locking setting of on (at least on my machine and with my default build of HDF5). Did you otherwise have file locking in HDF5 disabled through the environment variable or the library configure/build time option?
While in my branch the library will now cause the file locking setting to be inherited when a default FAPL is used during opening of files through external links, from your second link h5py appears to be creating a FAPL so that it can call set_fclose_degree on it before setting it on the LAPL with set_elink_fapl. In this case, the library will not cause the external file to inherit the parent file’s locking setting since the FAPL used isn’t a default FAPL. That said, with my changes you should be able to properly retrieve the setting from the parent file’s access property list (your second example works correctly with my branch) and set that on the FAPL as well before it gets set on the LAPL with set_elink_fapl.
When I run your first example, I actually get a “file locking flag values don’t match”
I tested it on macOS. On Linux, I get the same issue has you.
This is due to the way Python multiprocessing starts Process by default: “spawn” on macOS but “fork” on Linux… Please check with this version of the script which uses spawn on all platforms: link_locking_issue.py (800 Bytes)
the file locking setting to be inherited when a default FAPL is used during opening of files through external links
Sounds good, thanks!
you should be able to properly retrieve the setting from the parent file
Great, if it’s not possible for h5py to use a default fapl, then this will be really useful.
BTW, let me know if/when your branch with the fixes is publicly available, so I can check what can be done on h5py side with it.
This PR fixes an issue in h5py with H5F_close_degree_t enum and makes use of the default link access property list to let libhdf5 handle this (which fixes the external link+ file locking issue with the latest version of libhdf5, thanks!).
Since this change can have side effects, it would be great if someone from the hdfgroup can have a look at it too.