RFC on extending VOL connector capability reporting

nfortne2 · April 15, 2025, 6:42pm

In order to facilitate easier automated testing of VOL connectors, we want to extend the VOL API and the public HDF5 API to allow VOL connectors to provide fine grained reporting of which operations are and are not supported. There are several methods to accomplish this, and we want to open it up to discussion so we can pick an interface that works for everyone. This RFC discusses some different options:

https://github.com/HDFGroup/hdf5doc/blob/master/RFCs/HDF5_Library/VOL_Capability/RFC-VOL-capability-v2.pdf

We will discuss this at this Thursday’s Working Group meeting.

koziol · April 30, 2025, 11:09pm

I re-read this, and reviewed the related code in the library. Here’s my thoughts and the options I would choose:

I believe that there’s a 6th option (i.e. 3.6 + 4.6) for how to indicate that an operation was unsupported:

For 3.6: the VOL connector could return the typical failure value (<0 or NULL), but retain state that the failure was due to the operation being unsupported, not a true failure. And then we could add a new ‘op’ to the ‘optional’ callback that the library could call when it received a failure value, to see if the failure was due to to the operation being unsupported. In some ways, this is the reverse of option 3.2, with the library making the ‘was this unsupported’ call into the VOL connector, instead of (per option 3.2) the VOL connector calling into the library to say ‘the failure I’m about to return really means the operation was unsupported’. This has the advantage that the existing callback parameters and return values (and error stack, I suppose) don’t change for the VOL connector. However, it would have to retain state (per-thread, if it was threadsafe) that the last failure was an unsupported operation, and reset that state when the next normal (not the query about the unsupported status) callback was received. This has the advantage of option 3.5 in that the callback interface is completely identical. However, VOL connectors that wanted to support this “unsupported operation” protocol would likely require more changes than other options, especially if they wanted to be threadsafe.

To apply the same idea to the public API for applications to use (ie. 4.6), the library would have to provide a new API routine for applications to call (e.g. “H5Ewas_error_really_an_unsupported_op” ) and applications would have to call that for each operation that they wanted to distinguish “failure” from “unsupported” for. This seems like a large amount of work to push onto users who care about distinguishing between them. It does have the upside that for applications which don’t care, they don’t have to make any changes at all (just like for option 4.5).

And, there’s another option for the VOL connector callback section, 3.7: Extend the capability flags dramatically (like with 10,000 flags), perhaps in an algorithmic approach, to attempt to cover future combinations that we haven’t thought of. This preserves the semantics of the existing callback parameters and return values (and error stacks), but at the expense of a vast and non-scalable space of capability flags that all VOL connector authors would have to set correctly.

OK, with those options on the table, I would rank the options for the VOL connector (i.e. 3.x) from best to worst like this:

Option 3.1, with the “magic” non-NULL value for callbacks that return a pointer
Option 3.1, with the “revise all the callbacks to return herr_t” idea
Option 3.2, since the bookkeeping burden will be in the library and implemented only once
Option 3.6, since it retains full semantics of the existing callbacks
Options 3.3, 3.4, 3.5, and 3.7 are all pretty poor options and if we start to seriously consider them, I would return to the drawing board to try to come with new ideas.

For the public API, I would rank the options (i.e. 4.x) from best to worst like this:

Option 4.1, it’s simple, and most applications wouldn’t have to change
Option 4.3, it’s similar enough to how errno works that most application developers could probably deal with it.
Options 4.2, 4.4, 4.5, and 4.6 all are pretty poor to me, and it would be back to the drawing board time if we started seriously thinking about choosing one of them.

For the considerations in section 5:

5.1: I like the idea of suppressing the error stack for unsupported operations, and we may want to make it the default
5.2: I agree that a VOL connector that can only partially support an aggregate operation would be better not executing any part of it, and returning unsupported for the entire operation. Anything else is pretty much asking for undefined behavior.
5.3: Yes, we’ll need to update the async error info struct, and there might be some sub-options here, depending on which of the 4.x options. If we went with option 4.1, an easy way to change the H5ES_err_info_t might be to have the ‘err_stack_id’ field be set to H5I_INVALID_HID for unsupported operations and a valid hid_t value for actual errors. That would mean that the structure wouldn’t have to change and that only applications that really cared about why an operation failed would need to be updated.

I think it would also be good to flesh out the “VOL Connector Report Card” idea that we talked about at the last meeting. It would give connector authors something concrete to aim for and would allow them to know when they had correctly implemented a callback to the library’s satisfaction.

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

RFC on extending VOL connector capability reporting