Virtual Data Set



We are prototyping spitting a large hdf5 file into multiple h5 files by using Virtual Data Sets. For our application, we need to return an error in case the caller tries to read data from a VDS and some of the referenced files that store the requested data are not available.

For example, consider a VDS contains a 3 rows and the VDS splits data by rows, Assume the refenced files for row 1 and 2 are available but the file containing rows 3 is not available.
We would like to read data with any chunk size from chunks containing data from rows 1 and 2 and get an error when trying to read data from chunks containing data from row 3 (for example, if reading data by columns with chunks 1x3).

Is it possible to specify either when creating the VDS or when opening a hdf5 for reading that HDF5 library should return an error for missing data from VDS files rather than using the VDS fill value?



Bogdan, I think VDS was designed with a “forgiving” mindset w.r.t. missing elements, and I don’t see a way to change the error behavior. H5Pset_virtual_view() lets you control how aggressive you want to be in the substitution of missing values for missing mapped elements, but this doesn’t affect the error behavior.

Since a VDS is essentially metadata, you can do manually what you set out, i.e., through a combination of H5Pget_virtual_count(), H5Pget_virtual_filename(), etc., you can discover what is or isn’t available. This will be particularly tedious if some of your VDS contain a mixture of non-virtual and virtual datasets. Otherwise, this shouldn’t be too bad. A dimension with an indefinite extent combined with printf style file definitions might give you another headache, but in the end, only you can know what’s really there.

Best, G.