H5repack and GZIP - Version 1.14.6 vs 2.1

Hey all, I found some strange behavior of h5repack which I don´t understand in detail. We are using OpenFOAM for CFD simulations and recently the vtkhdf writer was funded. For that we compiled OpenFOAM with HDF5 version 1.14.6 (I am not sure why I used that version). Anyway, the written data are not compressed. While using h5repack in a very easy form like:

h5repack -v -f GZIP=1 <input>.vtkdhf <output>.vtkhdf

we achieved a compression rate of around 50%. From 6 GB to 3 GB.
For defined reasons we put the HDF5 library and applications also to our module environment on our system (now using version 2.1.0). Doing the same command result in no-compression and the file-size increased a bit (potentially okay based on some overhead). However, the compression did not take any action.

I was checking the output file of version 1.14.6 using h5dump to get an idea of chunks, compression etc. Based on that, I re-formulated the h5repack command to something like that:

h5repack -v -l “/VTKHDF/CellData/U:CHUNK=2796202x3” -f “/VTKHDF/CellData/U:SHUF” -f “/VTKHDF/CellData/U:GZIP=1” \
-l “/VTKHDF/CellData/p:CHUNK=8388608” -f “/VTKHDF/CellData/p:SHUF" -f “/VTKHDF/CellData/p:GZIP=1” \
-l “/VTKHDF/Points:CHUNK=2796202x3” -f “/VTKHDF/Points:SHUF” -f “/VTKHDF/Points:GZIP=1” \
-l “/VTKHDF/Connectivity:CHUNK=8388608” -f “/VTKHDF/Connectivity:GZIP=1” \
-l “/VTKHDF/Offsets:CHUNK=8388608” -f “/VTKHDF/Offsets:UD=GZIP=1” \
-l “/VTKHDF/Types:CHUNK=33554432” -f “/VTKHDF/Types:UD=GZIP=1” \
../vtkhdfVisu/drivAer_00000002/internal.vtkhdf gzip_lvl1.vtkhdf

However, doing so, the compression still does not take place.
I got identical chunks compared to version 1.14.6 but the compression simply does not take place.

E.g., the output of the command h5repack -v -f GZIP=1 <input>.vtkhdf <output>.vtkhdf for version 1.14.6 looks like that:

Doing the same with version 2.1.0, we get the following:

I would be happy to understand the different behavior. I read that the older version implicitly made chunks and shuffled the data. However, I was not able to get any compression with version 2.1.0 even when I chunk the data, shuffle the data or using a higher compression level.

We also added ZSTD support today which worked as expected (compression rate with version 2.1.0 was 50%). However, ZSTD is not a native HDF5 compressor and hence, ParaView cannot read the data.

I guess I am doing something total wrong here with GZIP, don´t I?
Any suggestion is warmly welcomed.
Best, Tobi and thanks for the very interesting website, forum and tools you provide.

Hi @tobias.holzmann,

Could you post the output of h5cc -showconfig so we can verify if your HDF5 2.1.0 installation was built with zlib support? Two things should be noted here:

  • In HDF5 1.14.6, support for zlib was enabled by default and silently disabled if not found. In HDF5 2.0.0+, support for zlib is disabled by default and will cause a configuration error if enabled and zlib support isn’t found.
  • In HDF5 2.0.0+, the CMake option to enable zlib support was renamed from HDF5_ENABLE_Z_LIB_SUPPORT to HDF5_ENABLE_ZLIB_SUPPORT to be consistent with other filter options.