I am using a passthrough-VOL (short as pt-VOL below) to trace the HDF5 operation coming from a python program (using h5py). I am using HDF5 13.0.1 with h5py 3.7. The VOL I am using is very similar to the templates vol-external-passthrough from the HDFGroup/vol-toolkit repo, and it is also programmed in C. My two questions below relate to accessing dataset layout information before and during I/O.
(1) In a passthrough-VOL, is there a way to obtain information through functions similar to h5stat and h5dump (directly interpreting metadata and header info), without using callbacks?
From the VOL level, I can obtain information such as the dataset name, dataset offset, and storage size using callbacks. In order to get that information through callbacks, the file object or the dataset object has to be opened (e.g. H5VLfile_open or H5VLobject_open) first. Is there a way to directly interpret the header and meta info of a dataset in a pt-VOL before the dataset object is opened?
Below is an example callback function I use to obtain dataset offset. This callback can only successfully obtain the offset of a newly created dataset when after the H5VLdataset_write happens on the VOL layer. But the offset information could be available before the H5VLdataset_write.
static haddr_t dataset_get_offset(void *under_dset, hid_t under_vol_id, hid_t dxpl_id)
{
H5VL_optional_args_t vol_cb_args; /* Arguments to VOL callback */
H5VL_native_dataset_optional_args_t dset_opt_args; /* Arguments for optional operation */
haddr_t dset_offset; /* Dataset's offset */
/* Set up VOL callback arguments */
dset_opt_args.get_offset.offset = &dset_offset;
vol_cb_args.op_type = H5VL_NATIVE_DATASET_GET_OFFSET;
vol_cb_args.args = &dset_opt_args;
/* Get the offset */
if (H5VLdataset_optional(under_dset, under_vol_id, &vol_cb_args, dxpl_id, NULL) < 0)
return HADDR_UNDEF;
return dset_offset;
}
Currently, I know a possible alternative that is similar to the callback is using functions like H5Dget_offset(hid_t dset_id). But for this, I can’t find a straightforward way to obtain the current dataset object dset_id.
(2) In a pt-VOL, is there a way to know the particular element index that is currently being written in the H5VLblob_put function?
For variable length data type, a single call to H5VLblob_put writes down an element. Some datasets have dimensions such as [rows, columns]. Is there a direct way to get the row and column of the current writing element in pt-VOL?
I’ve tried using H5Sget_select_bounds(space_id,start,end), but the start and end are for the entire dataset.
And the structs H5VL_blob_specific_t, H5VL_blob_optional (using H5VL_optional_args_t) don’t seem to have the information I am looking for.