Registering Custom Filter Issues


#1

I’ve written a custom compression filter and have had success implementing this filter with the HDF5 C API. However I can’t seem to get it to work properly in Python with h5py and am looking for help.

I’m running into issues properly registering the filter with h5py. I can manually read and write data to chunks using the dataset.id.write_direct_chunk and dataset.id.read_direct_chunk functionality and have confirmed that the data is being properly read/written with my C code. The problem is that as my filter isn’t properly registered, I can’t create a HDF5 file in Python which I would really like to do. Also when opening a file written from my C code, the dataset itself doesn’t know that it was compressed and returns None for both dataset.compression and dataset.compression_opts. I can however find that it was compressed by diving into the low level API as shown below and can decompress the raw data stored using the read_direct_chunk function and passing the output into my own function. To be clear, this is the only filter in the pipeline and the code flags it as not being available.

I’ve tried adding the path to the .so file that defines my filter function to the path using h5pl.append() but that doesn’t seem to solve the problem. I’ve also tried imitating how Bitshuffle registers a filter using similar code (modified for my particular filter of course) to the register_h5_filter() function shown in https://github.com/kiyo-masui/bitshuffle/blob/master/bitshuffle/h5.pyx but that doesn’t seem to solve the problem either.

Here is an example code that shows the error I’m seeing.

Any help or links to existing guides would be great.


#2

What’s the output from h5py.h5z.filter_avail(your_filter_id) and h5py.h5z.get_filter_info(your_filter_id)?


#3

It appears dataset.compression and dataset.compression_opts are only defined for known compression plugins (so your plugin will always have None returned). I’m undecided whether that’s a bug or not, but what you could do it look at the contents of dataset._filters and post that also.

You could also have a look at https://github.com/silx-kit/hdf5plugin, which collects together a number of filters and makes them available via h5py.


#4

Depending on the platform, your filter might need to be compiled with the HDF5 library used by h5py, and if you installed it with a Python wheel the HDF5 library is embedded in the wheel. That’s what the hdf5plugin package takes care of (and also bitshuffle now - not yet release) by adding a dynamic loading layer of the HDF library used by h5py to be used by the plugins.

As for the h5py.Dataset.compression and compression_opts properties, they look to only support “gzip”, “lzf” and “szip” filters. So you have to go through the low level API. Maybe that’s too restrictive?
See: https://github.com/h5py/h5py/blob/c62b34c1841afc05c4c7252317bba40f05f65e26/h5py/_hl/dataset.py#L500


#5

I was able to basically copy how bitshuffle works with HDF5 and solve a lot of my issues. Initially the output from h5py.h5z.filter_avail and h5py.h5z.get_filter_info wasn’t working (but I was able to get their analogous functions working in C) but now both are functioning after coyping the bitshuffle installation procedure. It does seem like the h5py.Dataset.compression and compression_opts won’t work but that’s honestly okay since I can tell users of this filter to check the low-level API. So long as it’s programmatically available it’s fine with me.

Currently with my copying of the bitshuffle installation procedure I have a working filter system for both C and Python through h5py on Ubuntu 20.04 and Windows subsystem for Linux (also Ubuntu but not sure what version). I’m waiting on a reply from a Mac OSX user to see if they’ve had success compiling the binaries with the “python setupy.py install” procedure as laid out in the bitshuffle github. I haven’t been able to compile on Windows yet.

I haven’t had success with conda based installation, in particular on Windows with Anaconda prompt. The issues I’m running into so far seem to have to do with environment variables not being passed correctly to the setup scripts. I’ll try to emulate what the hdf5plugin is doing for that because their installation process with conda was seamless for me on both Linux and Windows with anaconda prompt and reply here if I run into more issues.

Thanks for the help so far!