Possible bug on HDF5 on Windows with files located on a network drive


#1

Dear all!

I am using h5py running on Windows 10 and want to access files on a network storage. I have encountered unexpected behaviour that I believe resulted from the underlying HDF5 library. After discussing the issue on the h5py issue tracker, it seems that contacting this forum should be the next step. The h5py report can be found at https://github.com/h5py/h5py/issues/1518

Summary

When multiple files are created on a network drive and remain open in the same session, the contents of the second file seem to merge into the first and the second file becomes corrupted in the process. This does not happen when

  • only one file is open at the same time or
  • the files are located on a local hard drive.

I’m sorry that I did not yet have the time to re-build the issue using a lower-level interface directly, but I was able to follow the python error trace down to the library calls.

Is this the correct forum for such a report? If not, can You point me in the right direction?

Best Regards,
L. Lindenbauer


#2

L, how are you? This looks a lot like one of the corner cases described in the documentation of H5Fopen (https://portal.hdfgroup.org/display/HDF5/H5F_OPEN). Under Special cases - Multiple opens it states: In some cases, such as files on a local Unix file system, the HDF5 library can detect that a file is multiply opened and will maintain coherent access among the file identifiers.

But in many other cases, such as parallel file systems or networked file systems, it is not always possible to detect multiple opens of the same physical file. In such cases, HDF5 will treat the file identifiers as though they are accessing different files and will be unable to maintain coherent access.

Although dressed as multiple H5Fopen of the same physical file, your situation appears similar in the sense that the library cannot detect that it is actually dealing with different physical files. I believe the root cause is the same in both cases, the lack of a portable reliable way to identify and distinguish physical files on file systems that involve accesses over a network.

Best, G.