Virtual Data Set

bogdan.oancea.ext · November 27, 2021, 3:53pm

Hi,

We are prototyping spitting a large hdf5 file into multiple h5 files by using Virtual Data Sets. For our application, we need to return an error in case the caller tries to read data from a VDS and some of the referenced files that store the requested data are not available.

For example, consider a VDS contains a 3 rows and the VDS splits data by rows, Assume the refenced files for row 1 and 2 are available but the file containing rows 3 is not available.
We would like to read data with any chunk size from chunks containing data from rows 1 and 2 and get an error when trying to read data from chunks containing data from row 3 (for example, if reading data by columns with chunks 1x3).

Is it possible to specify either when creating the VDS or when opening a hdf5 for reading that HDF5 library should return an error for missing data from VDS files rather than using the VDS fill value?

Thanks,
Bogdan

gheber · November 29, 2021, 10:25pm

Bogdan, I think VDS was designed with a “forgiving” mindset w.r.t. missing elements, and I don’t see a way to change the error behavior. H5Pset_virtual_view() lets you control how aggressive you want to be in the substitution of missing values for missing mapped elements, but this doesn’t affect the error behavior.

Since a VDS is essentially metadata, you can do manually what you set out, i.e., through a combination of H5Pget_virtual_count(), H5Pget_virtual_filename(), etc., you can discover what is or isn’t available. This will be particularly tedious if some of your VDS contain a mixture of non-virtual and virtual datasets. Otherwise, this shouldn’t be too bad. A dimension with an indefinite extent combined with printf style file definitions might give you another headache, but in the end, only you can know what’s really there.

Best, G.

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Virtual Data Set