Virtual dataset in read-write file missing data from read-only file

If I build a virtual dataset in a file with read/write access, but pointing to data in files with read-only access, I can only see the data by opening the new file in read-only mode. When I open it in read/write mode, the virtual dataset contains the fill value instead of the data.

Here’s a notebook replicating the issue via h5py: https://gist.github.com/takluyver/577a0edf2fe67ea3590d8d5bced39c56

I’m guessing that HDF5 tries to open the source files with the same mode as the main file, and when that fails, it gives up on accessing the data. But in this case, opening the source files in read-only mode would work.

(This is exacerbated because h5py by default tries to open files for read/write access if possible. We’re aiming to change the default to read-only in the future.)

3 Likes

I also reproduced this bug, in a conda environment.
It would be nice if hdf5 was opening the source files in read-only mode in this case.

In any case, it would be interesting if hdf5 was reporting an error/warning if the source files could not be accessed. Filling the virtual dataset slices with the fillvalue if an error in the access of the source files is present (or if the source files are missing), should not be the behavior by default. A scientist could think that all data was recorded correctly, when in reality it could be missing data slices. Or maybe a flag for each source file could exist, indicating if the source file has been accessed correctly or not; allowing to check if the VDS has been correctly created with the data coming from the different source files.

1 Like

Hi,

We’re also facing this issue when accessing read-only files from virtual dataset defined in a file opened in read-write mode.

Changing acc_flags in the link access property list is taken into account for external link as expected but has no impact on Virtual Dataset (see sample code to reproduce based on h5py).

Looking at the different HDF5 property lists, I couldn’t find an equivalent of H5Pset_elink_acc_flags and H5Pset_elink_fapl for virtual dataset.

Since there is already a H5Pset_virtual_prefix for Dataset access property list, wouldn’t that make sense to also have set_virtual_acc_flags and set_virtual_fapl?

Best,

Hi @thomas.vincent,

you may be interested in tracking Virtual datasets: control how errors are handled when opening source files · Issue #5088 · HDFGroup/hdf5 · GitHub, previously reported by @thomas1. We’re at least aware of the issues, though no progress toward a solution has been made yet.