Virtual dataset in read-write file missing data from read-only file

If I build a virtual dataset in a file with read/write access, but pointing to data in files with read-only access, I can only see the data by opening the new file in read-only mode. When I open it in read/write mode, the virtual dataset contains the fill value instead of the data.

Here’s a notebook replicating the issue via h5py: https://gist.github.com/takluyver/577a0edf2fe67ea3590d8d5bced39c56

I’m guessing that HDF5 tries to open the source files with the same mode as the main file, and when that fails, it gives up on accessing the data. But in this case, opening the source files in read-only mode would work.

(This is exacerbated because h5py by default tries to open files for read/write access if possible. We’re aiming to change the default to read-only in the future.)

3 Likes

I also reproduced this bug, in a conda environment.
It would be nice if hdf5 was opening the source files in read-only mode in this case.

In any case, it would be interesting if hdf5 was reporting an error/warning if the source files could not be accessed. Filling the virtual dataset slices with the fillvalue if an error in the access of the source files is present (or if the source files are missing), should not be the behavior by default. A scientist could think that all data was recorded correctly, when in reality it could be missing data slices. Or maybe a flag for each source file could exist, indicating if the source file has been accessed correctly or not; allowing to check if the VDS has been correctly created with the data coming from the different source files.

1 Like