Zstandard compression plug-in


#1

Hello HDF5 group!

Zstandard is a real-time compression algorithm, providing high compression ratios. It offers a very wide range of compression / speed trade-off, while being backed by a very fast decoder. Zstandard library is provided as open source software using a BSD license.
www.zstd.net

In attachment you can find an implementation of Zstd HDF5 gilter plug-in. My tests confirm the good properties of Zstd compression, even on small chunks.

I'd like the filter binary format to be registered in
https://support.hdfgroup.org/services/contributions.html#filters

Is anything else needed to get the filter ID registered? I think the filter code is trivial, but if explicit license is needed, please let me know.

Best wishes,
Andrey Paramonov

zstd_h5plugin.c (1.66 KB)

zstd_h5plugin.h (456 Bytes)

···

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


#2

Hi Andrey,

the zstd plugin has an issue as it does not check for validity of the return values, so if ZSTD_compress returns an error, it is interpreted as a size, returned to HDF5, which interprets the negative number as an insanely large chunk size, leading to weird error messages. Here a corrected version:

        compSize = ZSTD_compress(outbuf, compSize, inbuf, origSize, aggression);

        if (ZSTD_isError(compSize))
        {
                printf("ZSTD-Plugin: (compress %lld bytes) ZSTD ERROR %s!\n", origSize, ZSTD_getErrorName(compSize) );
                fflush(stdout);
                if (outbuf)
                        free(outbuf);

                return 0;
        }

Actually I ran into problems with the zstd library’s FSE tablelog, which seems to not have enough memory. Compiling the zstd library with settings such as

-DFSE_MAX_MEMORY_USAGE=18 -DFSE_TABLELOG_ABSOLUTE_MAX=16

cures that problem. Any ideas what may be the issue? It seems weird that the default settings of the zstd library are unable to compress certain datasets.

         Werner