Issue unlocking HDF5 file?


#1

We’ve run into a strange, intermittent problem with some HDF5 files.

A large number of files are created while processing data. When interacting with those files much later, a small fraction of those files (< 10%) cannot be operated on by the current version of h5 tools (h5ls, h5stat, h5dump, etc.). Strangely, this only happens with the original files; a copy of the file works fine. Also, only the current version of the h5 tools, such as h5ls 1.10.6, has trouble; h5ls 1.8.12 (and related) works fine. The original file and its copy are identical in every way that I can find (md5sum, stat, ls -ls, ls -laZ), except for the inode, modification timestamp, and filename - but even the filename can be changed, and the v1.10 h5 tools still don’t work. Also, lslocks and fuser turn up no locks on the problematic file.

Some digging indicated that it’s probably an issue with an HDF5 lock on the file, implemented I think in HDF5 1.10.x. Disabling file locking with an environment variable, such as HDF5_USE_FILE_LOCKING=FALSE h5ls [badfile].h5, works for that command, but it doesn’t fix the files. I haven’t been able to figure out where the HDF5 lock is stored (in the superblock? I don’t know where that is stored or how to easily read it), but I did find that h5clear should be able to clear the lock (e.g., page 10 here: https://support.hdfgroup.org/HDF5/docNewFeatures/SWMR/Design-HDF5-FileLocking.pdf ), however running h5clear -s [badfile].h5 returns the terse h5clear error: h5tools_fopen, and googling doesn’t turn up anything useful there. Digging into the problematic commands a little more, such as h5stat --enable-error-stack [badfile].h5, returns lines like these:
#000: H5F.c line 509 in H5Fopen(): unable to open file”
#001: H5Fint.c line 1567 in H5F_open(): unable to lock the file”
#002: H5FD.c line 1640 in H5FD_lock(): driver lock request failed”
"#003: H5FDsec2.c line 959 in H5FD_sec2_lock(): unable to lock file, errno = 11, error message = ‘Resource temporarily unavailable’ "
which seem like they might be helpful to the right person, but that person is not me.

Does anyone have any insight as to what is going on here?

An ideal solution would be a 1-line command to identify locked files (beyond just trying to read a file and checking the exit code) and (more importantly) a 1-line command to remove a lock that should have already been removed. (But we’ll take any hints that someone might have.)


#2

In versions 2 and 3 of the file format spec., there is a file consistency flag. Assuming there is no user block, this would be byte 12 (1-based) in the file. What does

od -j 11 -N 1 <HDF5 file>

return?

G.


#3

We had the same issue, but it turned out to be a forked process that kept a file description open. It showed up in lslocks as “…” since the file had been renamed since the lock was taken. It could only be seen by comparing the inode numbers of ls -i with cat /proc/locks.


#4

Following up on your suggestion, I checked out the format spec. (BTW - thanks for the link! That answered several questions I had, including where the superblock is.)

Here is the output from xxd <HDF5 file>:
0000000: 8948 4446 0d0a 1a0a 0000 0000 0008 0800 .HDF…

As you can see, byte 12 does not have any of its bits set, so there does not seem to be a lock on the file. This is consistent with the result that a bitwise-identical copy of the file (verified with md5sum) is readable by h5ls et al.

On the other hand, the fact that the file is readable by h5 tools version 1.8.12 but not 1.10.6, unless the HDF5_USE_FILE_LOCKING='FALSE' environment variable is set before the 1.10.6 tool is used, strongly implies that the issue is an HDF5 lock on the file.

One new piece of information that might be relevant: the filesystem that the file is on is mounted over NFS, and I have been told that HDF5 file locks and NFS mounts are a bad combination. (Not my choice - I inherited this issue.) Even then, though, we can’t figure out what mechanism is locking the file.


#5

The primary locking is done by the operating system, via the flock() function, which doesn’t modify the contents of the file. This fits with what you see - e.g. a copy of the file is unlocked, but renaming it doesn’t unlock it, because the OS knows which inode the lock is associated with. The locks are ‘advisory’, so they only affect processes which try to get the lock, which is why you can bypass them with the HDF5_USE_FILE_LOCKING environment variable.

I don’t know why tools like lslocks and fuser aren’t showing the locks. It wouldn’t surprise me if NFS was part of the answer, though.