Zstandard has allowed negative compressions since 2018:
@paramon , are you able to accept this pull request that I submitted in March 2022?
If @paramon is not able to respond, how should we proceed with updating the Zstandard HDF5 plugin? I notice that he has not had any Github activity since May 2019.
Thanks @kittisopikulm for bringing this up. Since we are not talking about a giant piece of code, perhaps we (The HDF Group) can adopt a fork, and move forward from there. Let me talk to my colleagues!
You have a fork that is different then the current zstandard. We do use forks of some compression libs so that we can build from source with cmake. We could create a compression repo in github for “HDF” alterations. I have always noted my changes in a README.HDF file.
Add a HDF5 filter plugin to the hdf5_plugins project. That is easily doable if the filter plugin file exists and I can just use a tar.gz file (or git repo) for the compression library ExternalProject build.
I am not forking Zstandard library or proposing to do so. That repository lives at https://github.com/facebook/zstd. In our testing, the compression is generally applicable and tunable to achieve high speeds and/or high compression ratio. Additionally, the implementation of the Zstandard-HDF5 plugin is compatible with the implementation in Zarr/numcodecs. Thus we are very interested in the capabilities of this compression codec.
Currently, the registered Zstandard plugin for HDF5 with filter ID 32015 is assigned to Andrey Paramonov(@paramon). I am proposing to change this registration.
Six months ago I proposed a pull request to allow for negative compression levels:
There are other features of the upstream library from Meta such as dictionary compression that would be great to enable.
In summary, the original registrant of the Zstandard plugin for HDF5 appears to no longer be available with no activity for more than three years. The upstream Zstandard library from Meta has gained new capabilities since then. In order to update this plugin, I propose to change the registration.
If HDF Group would like to adopt this plugin directly or incorporate it into the main code base, that would be amazing. I would then send my pull requests there.
After some discussion and code investigation, I think I understand the problem.
I can easily incorporate this filter into our hdf5_plugins repo. I will copy the filter code and format it to fit our filter plugins convention. Add some testing code. Pull in a specific version of the ZStandard compression for compiling and packaging.
Of course we will need to get the legal attribution correct for the original code and any other contributions.
I have the initial implementation completed, once we get the legal stuff updated, I will create a PR. Then we go forward fixing/changing/updating the code.
Is https://github.com/hdfgroup/hdf5_plugins becoming the “official” repository for HDF5 compression filters?
It would be great to be clear about where is the reference source code of the filters is in order to avoid having 2 forks of a filter registered under the same filter ID becoming incompatible…
I’m one of the maintainer of the hdf5plugin Python package which bundles HDF5 compression filters for use with h5py and as such I am really interested in having a single maintained upstream source for each filter.
The goal of that repository is to allow builds of hdf5 to build filter plugins on our supported platforms. The repo has checks for which filters can be built on a certain platform-compiler combination. The compression libraries are also built, the compression repo (or tar.gz file) is preferred. If need be, we use a copy of the compression library with in house changes to allow building on our systems (last resort). The filters in the repo were requested by users, who usually provided assistance.
At HDF5 library release time, we try to update the filters if possible.
Primary goal of repo is convenience, secondary is portability.
I have the zstd plugin mostly ready to go, waiting on a PR to be approved in hdf5. The remaining issues are some cleanup in the filter repo to remove pre-CMake-3.18 workarounds.
I agree with thomas.vincent. The main reason we would refer to a 3rd party repository for a plugin is because the registered plugin page points us there.
Of course we need to remember that there are two parts
the compression library that is registered and pointed to by the Filters page.
the filter function, which we maintain for those listed in the plugins repo (We could add a note to the Filters page) Otherwise the compression maintainer is responsible.
I think we’re about to replace #2 with H5Zzstd.c in the HDF Group Github organization. Should the registered plugins page point to the Facebook repository?