I’m working on the creation of a HDF5 filter that needs access to arbitrary metadata from the underlying file. The filter API won’t let me do that, however; the callbacks only provide access to the dataset’s property list, datatype, and dataspace:
I have been also working on another filter that, for performance reasons, needs to allocate memory once and reuse it across the various calls to the filter. While I can allocate that memory on set_local(), the filter never knows when it’s safe to deallocate that, as the current filter design does not include a teardown callback.
I have a working patchset that introduces two new optional callbacks: init() and teardown(). In the first, the file id that’s embedded in the pipeline object (H5O_shared_t) is provided to the user, along with the three other well known hid_t objects. The signature of the latter is the same as set_local()/can_apply(), and it’s called on H5D_close().
My first question to you is: is this the right place to discuss API changes? The second question is: are you willing to accept such modifications? Last, my understanding is that we’d need a new H5Z_class3_t structure to prevent breaking existing applications. If that’s the case, then we could probably have a new version of set_local() as well which simply takes an extra File handle object, as opposed to introducing one more callback.
Thank you in advance for your attention and guidance.
Lucas
Hi Lucas,
This is a good place to talk about it. And, yes, I believe that you will need additional callbacks, in a v3 of the I/O filter class. What do you need the file ID for? This, and your other suggestion about init / teardown callbacks is similar I’ve heard from other people, and I can work with you to refine an update to the class struct, if you’d like.
The file ID is useful so I can retrieve data from other datasets and combine them in different ways through user-provided scripts. Note that this is different from HDF5 virtual datasets, which basically let one create mosaics from a collection of datasets’ slices.
I already have a working patchset, but it modifies the existing v2 structure. I’ll update it to use v3 and then I’ll point you to the GitHub patch so you can take a look. Thanks for volunteering!
Please note that this version which adds H5Z_class3_t has not been properly tested yet – I’ve simply pushed it to the repository so you could give some early advice on the overall structural changes.
In particular, I’d like to hear your feedback on the following:
The new original_version member of H5Z_class3_t: since H5Zregister needs to deal with deprecated symbols and the new structure contains an enhanced API for set_local, we need a way to tell which version of that function the filter implements. What I dislike about it is that the initialization of filters becomes somewhat redundant for the casual reader (see my changes to c++/test/dsets.cpp for an example)
The file id is being extracted from the pipeline’s sh_loc.file object (please refer to the changes made to H5Z_prelude_callback()). Is that the right way to do it?
I have thought about replacing the new original_version member with an init callback, but I was discouraged because init and set_local would be semantically similar – and we would need to keep both in H5Z_class3_t for backwards compatibility purposes.
The questions I have raised on my post above are no longer relevant, as I have either worked around the problem or learned more about management of objects in HDF5.
Here is the most recent version of the code. It’s been tested to a good extent.