Feature request: more callback functions for dynamically loaded filters

Hi,

I'm currently working on a CUDA based compression filter, and I would find
it very useful to have some more callback functions to manage GPU
resources. Unfortunately allocating memory on a CUDA device has some
considerable overhead, and by having to do it separately for each chunk in
the dataset slows down compression and decompression by up to a factor of
10.

Right now I have 2 workarounds for writing the data (compression):

   1. Allocate the device memory in the *set local* callback function and
   store the pointer in the cd_values array. Disadvantage: possible memory
   leak if resources are not freed manually afterwards.
   2. Manage resources and compression manually and use H5DOwrite_chunk to
   write the dataset. Disadvantage: more difficult to include in other
   applications.

For reading (decompression) I have no workaround to keep track of the
device memory, so it has to be allocated and deallocated for every chunk
when reading the dataset.

It would be great to have two more callback functions for these kind of
tasks: one could be called before opening the dataset, and the other after
closing the dataset. These callback functions could then pass information
to the filter function similarly to the *set local*; either through
cd_values, or if that could case some inconsistencies, than maybe through a
new variable.

Having these features would really help to boost my filter's performance,
and I think other 3rd party filters could also benefit from this.

Cheers,
Bálint