We stumbled into an unexpected behavior when opening a hdf5 file with file locking disabled and accessing an external link: The file containing the external link is accessed with file locking enabled.
We were expecting that by default the external link files would be opened with the same file locking as the main file, i.e., the file locking would be “inherited” like it is the case for the read/write mode.
This script reproduces the issue: accessing data in the main file works but not in the external link file:
link_locking_issue.py (757 Bytes)
The main file is opened by passing
h5py.File (which calls
H5Pset_file_locking to disable file locking in the file access property list passed to open the file).
Then h5py opens the external link with a default file access property list, and so with file locking:
See https://github.com/h5py/h5py/blob/6b512e5edf80f6660e0f07c704ab59faa733008b/h5py/_hl/group.py#L357 and https://github.com/h5py/h5py/blob/6b512e5edf80f6660e0f07c704ab59faa733008b/h5py/_hl/base.py#L129-L135
I’m raising this issue here rather than in the h5py repository, because even if this might be fixable in h5py, I was expecting that libhdf5 would at least by default inherit the file access property list (or some of it) from the main file when accessing an external link (related to External links & VFDs).
To try and propose a fix for this issue to h5py, I looked at retrieving the file locking value from the opened file id’s access property list in order to use it for opening external links.
However it seems the file locking value is not set in the access property list retrieved from the opened file. This script reproduce the issue:
fapl_locking_issue.py (513 Bytes)
Any idea how to tackle this issue? What is the proper way to handle file access property list?
I believe this is mostly an oversight and the library should be fixed so that files opened through external links inherit the file locking settings from the parent file when a default FAPL is used. I have a branch now that does this, however a few things I noted:
When I run your first example, I actually get a “file locking flag values don’t match” failure when opening the main file with
locking=False during the
read() method. This is because the __main__ method already has the main file open in append mode with the default file locking setting of on (at least on my machine and with my default build of HDF5). Did you otherwise have file locking in HDF5 disabled through the environment variable or the library configure/build time option?
While in my branch the library will now cause the file locking setting to be inherited when a default FAPL is used during opening of files through external links, from your second link h5py appears to be creating a FAPL so that it can call
set_fclose_degree on it before setting it on the LAPL with
set_elink_fapl. In this case, the library will not cause the external file to inherit the parent file’s locking setting since the FAPL used isn’t a default FAPL. That said, with my changes you should be able to properly retrieve the setting from the parent file’s access property list (your second example works correctly with my branch) and set that on the FAPL as well before it gets set on the LAPL with
Hi @jhenderson ,
Thanks for your answer!
When I run your first example, I actually get a “file locking flag values don’t match”
I tested it on macOS. On Linux, I get the same issue has you.
This is due to the way Python
Process by default: “spawn” on macOS but “fork” on Linux… Please check with this version of the script which uses spawn on all platforms: link_locking_issue.py (800 Bytes)
the file locking setting to be inherited when a default FAPL is used during opening of files through external links
Sounds good, thanks!
you should be able to properly retrieve the setting from the parent file
Great, if it’s not possible for h5py to use a default fapl, then this will be really useful.
BTW, let me know if/when your branch with the fixes is publicly available, so I can check what can be done on h5py side with it.