Passthrough-VOL: Getting Dataset Layout Information

I am using a passthrough-VOL (short as pt-VOL below) to trace the HDF5 operation coming from a python program (using h5py). I am using HDF5 13.0.1 with h5py 3.7. The VOL I am using is very similar to the templates vol-external-passthrough from the HDFGroup/vol-toolkit repo, and it is also programmed in C. My two questions below relate to accessing dataset layout information before and during I/O.

(1) In a passthrough-VOL, is there a way to obtain information through functions similar to h5stat and h5dump (directly interpreting metadata and header info), without using callbacks?

From the VOL level, I can obtain information such as the dataset name, dataset offset, and storage size using callbacks. In order to get that information through callbacks, the file object or the dataset object has to be opened (e.g. H5VLfile_open or H5VLobject_open) first. Is there a way to directly interpret the header and meta info of a dataset in a pt-VOL before the dataset object is opened?

Below is an example callback function I use to obtain dataset offset. This callback can only successfully obtain the offset of a newly created dataset when after the H5VLdataset_write happens on the VOL layer. But the offset information could be available before the H5VLdataset_write.

static haddr_t dataset_get_offset(void *under_dset, hid_t under_vol_id, hid_t dxpl_id)
{
    H5VL_optional_args_t                vol_cb_args;               /* Arguments to VOL callback */
    H5VL_native_dataset_optional_args_t dset_opt_args;             /* Arguments for optional operation */
    haddr_t                             dset_offset; /* Dataset's offset */

    /* Set up VOL callback arguments */
    dset_opt_args.get_offset.offset = &dset_offset;
    vol_cb_args.op_type             = H5VL_NATIVE_DATASET_GET_OFFSET;
    vol_cb_args.args                = &dset_opt_args;

    /* Get the offset */
    if (H5VLdataset_optional(under_dset, under_vol_id, &vol_cb_args, dxpl_id, NULL) < 0)
        return HADDR_UNDEF;
    
    return dset_offset;
}

Currently, I know a possible alternative that is similar to the callback is using functions like H5Dget_offset(hid_t dset_id). But for this, I can’t find a straightforward way to obtain the current dataset object dset_id.

(2) In a pt-VOL, is there a way to know the particular element index that is currently being written in the H5VLblob_put function?

For variable length data type, a single call to H5VLblob_put writes down an element. Some datasets have dimensions such as [rows, columns]. Is there a direct way to get the row and column of the current writing element in pt-VOL?
I’ve tried using H5Sget_select_bounds(space_id,start,end), but the start and end are for the entire dataset.
And the structs H5VL_blob_specific_t, H5VL_blob_optional (using H5VL_optional_args_t) don’t seem to have the information I am looking for.

(1) I’m confused here - are you trying to get the dataset offset after dataset open and before dataset write, or are you trying to get the offset before dataset open? If it’s the former, it probably isn’t working because the dataset hasn’t been allocated yet so there’s no data offset to return. If it’s the latter, it won’t work because the dataset needs to be open first - there’s no way around that, as that information isn’t present in the arguments to the VOL callback.

(2) Are you referring to calls to blob_put that originate in the type conversion code of the native connector (such as during H5Dwrite)? In this case, there’s unfortunately currently no way to get his information in the blob_put callback.

1 Like

Thanks so much for the reply! You answered my questions.

(1) I want to get the dataset offset before the dataset open.
(2) My code is Python code using h5py, which I think calls the HDF5 C API through a. The blob_put occurs with a variable length datatype, which I think is the blob_put you are talking about.

Additionally, I didn’t find much documentation about the HDF5 BLOB (binary large object) class. It would be very helpful if you can point me to more information about BLOB, and more documentation on the passthrough-VOL ( other than the VOL Connector Author Guide).

The blob callbacks are used to (put)) store an opaque object in a file and create a blob id associated with it, and (get) retrieve that object given the blob id previously returned by put. Currently these are only called (in the HDF library) by reference and variable length type conversion routines. The size of the blob id is constant per file, and is given by the VOL connector through the “file_get” callback with the H5VL_FILE_GET_CONT_INFO operation.

There isn’t really any other documentation about passthrough VOLs that I know of at this time. The best point of reference right now is the no-op passthrough VOL included with the library (its purpose is to serve as such an example, and for testing). This is certainly something we hope to improve upon going forward.

2 Likes