Thanks @kittisopikulm for bringing this up. Since we are not talking about a giant piece of code, perhaps we (The HDF Group) can adopt a fork, and move forward from there. Let me talk to my colleagues!
You have a fork that is different then the current zstandard. We do use forks of some compression libs so that we can build from source with cmake. We could create a compression repo in github for “HDF” alterations. I have always noted my changes in a README.HDF file.
Add a HDF5 filter plugin to the hdf5_plugins project. That is easily doable if the filter plugin file exists and I can just use a tar.gz file (or git repo) for the compression library ExternalProject build.
I am not forking Zstandard library or proposing to do so. That repository lives at https://github.com/facebook/zstd. In our testing, the compression is generally applicable and tunable to achieve high speeds and/or high compression ratio. Additionally, the implementation of the Zstandard-HDF5 plugin is compatible with the implementation in Zarr/numcodecs. Thus we are very interested in the capabilities of this compression codec.
Currently, the registered Zstandard plugin for HDF5 with filter ID 32015 is assigned to Andrey Paramonov(@paramon). I am proposing to change this registration.
Six months ago I proposed a pull request to allow for negative compression levels:
There are other features of the upstream library from Meta such as dictionary compression that would be great to enable.
In summary, the original registrant of the Zstandard plugin for HDF5 appears to no longer be available with no activity for more than three years. The upstream Zstandard library from Meta has gained new capabilities since then. In order to update this plugin, I propose to change the registration.
If HDF Group would like to adopt this plugin directly or incorporate it into the main code base, that would be amazing. I would then send my pull requests there.
After some discussion and code investigation, I think I understand the problem.
I can easily incorporate this filter into our hdf5_plugins repo. I will copy the filter code and format it to fit our filter plugins convention. Add some testing code. Pull in a specific version of the ZStandard compression for compiling and packaging.
Of course we will need to get the legal attribution correct for the original code and any other contributions.
Is https://github.com/hdfgroup/hdf5_plugins becoming the “official” repository for HDF5 compression filters?
It would be great to be clear about where is the reference source code of the filters is in order to avoid having 2 forks of a filter registered under the same filter ID becoming incompatible…
I’m one of the maintainer of the hdf5plugin Python package which bundles HDF5 compression filters for use with h5py and as such I am really interested in having a single maintained upstream source for each filter.
The goal of that repository is to allow builds of hdf5 to build filter plugins on our supported platforms. The repo has checks for which filters can be built on a certain platform-compiler combination. The compression libraries are also built, the compression repo (or tar.gz file) is preferred. If need be, we use a copy of the compression library with in house changes to allow building on our systems (last resort). The filters in the repo were requested by users, who usually provided assistance.
At HDF5 library release time, we try to update the filters if possible.
Primary goal of repo is convenience, secondary is portability.