Mixing use of ZLib and SZip compression

When using HDF5, is it possible to mix ZLib and SZip compression ? If yes how ?

My understanding is that:
1. ZLib can compress any data (char *, int, double, ...)
2. SZip is dedicated to number compression only (float, double)

According to HDF5 doc, we have:
1. to use H5Pset_deflate to use ZLib (deflate algorithm)
2. to use H5Pset_szip to use SZip

My understanding is that, if I use both H5Pset_deflate and H5Pset_szip, then:
1. data (whatever they are) which are NOT numbers will be compressed with ZLib
2. numbers will be compressed with SZip

Is this correct ? Or did I get things wrong ?

Thanks,

FH

When using HDF5, is it possible to mix ZLib and SZip compression ? If
yes how ?

Yes. You can have a single hdf5 file with some datasets that are compressed with zlib and others compressed with szip.

A single dataset compressed with both zlib and szip? I imagine that *might* be possible. Never tried it. Not sure why you'd want to do it. But, I can't think if a reason HDF5 might balk at it except if they've added logic to explicitly forbid it. For reasons you mention below, don't think szip *after* zlib would "work" at all. But, zlib *after* szip might.

My understanding is that:
1. ZLib can compress any data (char *, int, double, ...)

Yes, its a byte-level compressor. Doesn't care if those bytes comprise an array of floats, doubles, ints, chars, etc.

2. SZip is dedicated to number compression only (float, double)

I honestly can't recall but that sounds plausible/right.

According to HDF5 doc, we have:
1. to use H5Pset_deflate to use ZLib (deflate algorithm)
2. to use H5Pset_szip to use SZip

Yes. Though, take care to read licensing limitations regarding szip and confirm you're workflows involving it meet its requirements.

My understanding is that, if I use both H5Pset_deflate and H5Pset_szip,
then:
1. data (whatever they are) which are NOT numbers will be compressed
with ZLib
2. numbers will be compressed with SZip

I don't think it works that way. If you apply *both* filters to a dataset, HDF5 will apply each filter in order. Though, since zlib and szip are sort of built-in compressors, maybe HDF5 library has some logic to handle them specially? If not *and* if you want the behavior you describe here. Its easy to impliment your own sort of merged zlib/szip filter yourself that does something like…

  1. Check data type. If type is double or float, apply szip, else apply zlib.

Hope that helps. I am 99% certain what I've just written is accurate :wink:

Mark

···

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of houssen <houssen@ipgp.fr<mailto:houssen@ipgp.fr>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Monday, January 11, 2016 2:00 AM
To: "hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>" <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] Mixing use of ZLib and SZip compression

Check data type, if type is double or float apply szip, else apply zlib : sounds perfect to me !

Thanks,

Franck

Note : don't know why but I thought H5Pset_deflate / H5Pset_szip where supposed to be applied on the whole file (I got this wrong)

···

Le 2016-01-11 16:45, Miller, Mark C. a écrit :

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org [1]> on behalf
of houssen <houssen@ipgp.fr [2]>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org
[3]>
Date: Monday, January 11, 2016 2:00 AM
To: "hdf-forum@lists.hdfgroup.org [4]" <hdf-forum@lists.hdfgroup.org
[5]>
Subject: [Hdf-forum] Mixing use of ZLib and SZip compression

When using HDF5, is it possible to mix ZLib and SZip compression ?
If
yes how ?

Yes. You can have a single hdf5 file with some datasets that are
compressed with zlib and others compressed with szip.

A single dataset compressed with both zlib and szip? I imagine that
*might* be possible. Never tried it. Not sure why you'd want to do it.
But, I can't think if a reason HDF5 might balk at it except if they've
added logic to explicitly forbid it. For reasons you mention below,
don't think szip *after* zlib would "work" at all. But, zlib *after*
szip might.

My understanding is that:
1. ZLib can compress any data (char *, int, double, ...)

Yes, its a byte-level compressor. Doesn't care if those bytes
comprise an array of floats, doubles, ints, chars, etc.

2. SZip is dedicated to

can't recall but that sounds plausible/right.

"MAC_OUTLOOK_ATTRIBUTION_BLOCKQUOTE" style="BORDER-LEFT: #b5c4df 5
solid; PADDING:0 0 0 5; MARGIN:0 0 0 5;">

According to HDF5 doc, we have:
1. to use H5Pset_deflate to use ZLib (deflate algorithm)
2. to use H5Pset_szip to use SZip

volving it meet its requirements.

My understanding is that, if I use both H5Pset_deflate and
H5Pset_szip,
then:
1. data (whatever they are) which are NOT numbers will be compressed
with ZLib
2. numbers will be compressed with SZip

I don't think it works that way. If you apply *both* filter

DF5 library has some logic to handle them specially? If not *and*

if

you want the behavior you describe here. Its easy to impliment your
own sort of merged zlib/szip filter yourself that does something
like…

* Check data type. If type is double or float, apply szip, else

>

Hope that helps. I am 99% certain what I've just written is accurate
:wink:

Mark

Links:
------
[1] mailto:hdf-forum-bounces@lists.hdfgroup.org
[2] mailto:houssen@ipgp.fr
[3] mailto:hdf-forum@lists.hdfgroup.org
[4] mailto:hdf-forum@lists.hdfgroup.org
[5] mailto:hdf-forum@lists.hdfgroup.org

Hi all,

Resurrecting this ancient topic because another person asked about this and specifically mentioned this post :slight_smile:

Yes, you can combine szip and zlib. HDF5 doesn’t really have compression per se; instead it has filter pipelines and szip/libaec and zlib are just filters that transform input data to output data, which will be passed to the next filter in the chain. While it may not make semantic sense to chain certain filters together, the library will not stop you.

In szip’s case, there are checks that the dataset’s datatype meet certain criteria (see the H5Pset_szip() docs for a discussion), but there are no checks to see if the data may have been munged into something not recognizable as numerical values by an earlier filter.

As for recommendations about when to use szip vs zlib (or even both), you are going to be best served by actually compressing real-world data to see the effects on file size and I/O speed. Just out of curiosity, I tried compressing a small amount of synthetic data with zlib, szip, szip → zlib, and zlib → szip. Here’s what I got (YMMV):

68K      none_float.h5
68K      none_int.h5

22K      zlib_float.h5
27K      zlib_int.h5

18K      szip_float.h5
11K      szip_int.h5

6.2K     szip_zlib_float.h5
5.8K     szip_zlib_int.h5

22K      zlib_szip_float.h5
27K      zlib_szip_int.h5

Using szip followed by zlib was the clear winner on my contrived data. Keep in mind that this is synthetic data and I was just screwing around (though I did at least verify that the data were present and accurate, so the szip → zlib case isn’t a failure to write data).