Zstandard plugin for HDF5 does not allow negative compression levels

Zstandard has allowed negative compressions since 2018:

@paramon , are you able to accept this pull request that I submitted in March 2022?

If @paramon is not able to respond, how should we proceed with updating the Zstandard HDF5 plugin? I notice that he has not had any Github activity since May 2019.

Thanks @kittisopikulm for bringing this up. Since we are not talking about a giant piece of code, perhaps we (The HDF Group) can adopt a fork, and move forward from there. Let me talk to my colleagues!

G.

1 Like

That would work.

Perhaps it can go into github.com/hdfgroup/hdf5_plugins ?

Alternatively, I’ve started to prepare a fork here:

Also, welcome back!

Two issues here - if I understand correctly.

  1. You have a fork that is different then the current zstandard. We do use forks of some compression libs so that we can build from source with cmake. We could create a compression repo in github for “HDF” alterations. I have always noted my changes in a README.HDF file.

  2. Add a HDF5 filter plugin to the hdf5_plugins project. That is easily doable if the filter plugin file exists and I can just use a tar.gz file (or git repo) for the compression library ExternalProject build.

I am not forking Zstandard library or proposing to do so. That repository lives at https://github.com/facebook/zstd. In our testing, the compression is generally applicable and tunable to achieve high speeds and/or high compression ratio. Additionally, the implementation of the Zstandard-HDF5 plugin is compatible with the implementation in Zarr/numcodecs. Thus we are very interested in the capabilities of this compression codec.

Currently, the registered Zstandard plugin for HDF5 with filter ID 32015 is assigned to Andrey Paramonov(@paramon). I am proposing to change this registration.

Unfortunately, we have not heard from Andrey Paramonov for three years. He also has no Github activity for three years.

I am trying to update the HDF5 plugin for Zstandard to take advantage of recent capabilities of Zstandard C library from Meta (formerly Facebook).

My proposed fork is currently at the same commit as the registered upstream repository.

Six months ago I proposed a pull request to allow for negative compression levels:

There are other features of the upstream library from Meta such as dictionary compression that would be great to enable.

In summary, the original registrant of the Zstandard plugin for HDF5 appears to no longer be available with no activity for more than three years. The upstream Zstandard library from Meta has gained new capabilities since then. In order to update this plugin, I propose to change the registration.

If HDF Group would like to adopt this plugin directly or incorporate it into the main code base, that would be amazing. I would then send my pull requests there.

The Zstandard is 32015, other then that, then I understand you want the to change the registered repo to the new location.

@gheber, What is the procedure to get this done?

After some discussion and code investigation, I think I understand the problem.

I can easily incorporate this filter into our hdf5_plugins repo. I will copy the filter code and format it to fit our filter plugins convention. Add some testing code. Pull in a specific version of the ZStandard compression for compiling and packaging.

Of course we will need to get the legal attribution correct for the original code and any other contributions.

You can then make PRs against that repo.

1 Like

I have the initial implementation completed, once we get the legal stuff updated, I will create a PR. Then we go forward fixing/changing/updating the code.

2 Likes

Hi. Any updates here?

Hi,

Is https://github.com/hdfgroup/hdf5_plugins becoming the “official” repository for HDF5 compression filters?
It would be great to be clear about where is the reference source code of the filters is in order to avoid having 2 forks of a filter registered under the same filter ID becoming incompatible…

I’m one of the maintainer of the hdf5plugin Python package which bundles HDF5 compression filters for use with h5py and as such I am really interested in having a single maintained upstream source for each filter.

Best

The goal of that repository is to allow builds of hdf5 to build filter plugins on our supported platforms. The repo has checks for which filters can be built on a certain platform-compiler combination. The compression libraries are also built, the compression repo (or tar.gz file) is preferred. If need be, we use a copy of the compression library with in house changes to allow building on our systems (last resort). The filters in the repo were requested by users, who usually provided assistance.

At HDF5 library release time, we try to update the filters if possible.

Primary goal of repo is convenience, secondary is portability.

I have the zstd plugin mostly ready to go, waiting on a PR to be approved in hdf5. The remaining issues are some cleanup in the filter repo to remove pre-CMake-3.18 workarounds.

I should mention that some filter/compression developers refer to our repo for the filter reference code.

Then it’s maybe good to update: https://portal.hdfgroup.org/display/support/Filters
to advertise the maintained reference code.

1 Like

I agree with thomas.vincent. The main reason we would refer to a 3rd party repository for a plugin is because the registered plugin page points us there.

Of course we need to remember that there are two parts

  1. the compression library that is registered and pointed to by the Filters page.
  2. the filter function, which we maintain for those listed in the plugins repo (We could add a note to the Filters page) Otherwise the compression maintainer is responsible.

I’m not sure if I follow.

  1. The compression library in question here is https://github.com/facebook/zstd
  2. The filter function is defined in https://github.com/aparamon/HDF5Plugin-Zstandard/blob/d5afdb5f04116d5c2d1a869dc9c7c0c72832b143/zstd_h5plugin.c#L7 according to the registered plugins page

I think we’re about to replace #2 with H5Zzstd.c in the HDF Group Github organization. Should the registered plugins page point to the Facebook repository?

Ah good points!
Yes, I think we need two links - one for the compression library and one for the filter.
Others will depend on individual cases.

Hi,

Yes, the compression library is not enough since the HDF5 filter code is: