H5repack not working for ZFP (filter ID 32012)


#1

Hello HDF folks,

I desire the ability to use h5repack to ZFP compress 4D (1 unlimited time dimension, 3 spatial dimensions, so (time,z,y,x) arrays) that live in (uncompressed) netCDF4/HDF5 files and save them to new netcdf4/HDF5 (compressed dataset) files. Currently h5repack fails with a generic “Error occurred while repacking” error, creates a new readable netcdf4 file where the desired 4D array to be compressed contains a constant garbage huge number (shown below) for all elements.

I have built ZFP and H5Z-ZFP successfully, passing all tests. I also built HDF5 on a machine where it was already installed (Frontera supercomputer) with an official distribution accessed via “module load hdf5” just to check - results are identical. My built h5z-zfp plugin works; I have HDF5_PLUGIN_PATH set correctly with the directory that contains the libh5zzfp.so shared library, and when I h5dump a ZFP compressed HDF5 file, it shows uncompressed data correctly. So, the h5z-zfp plugin is working for h5dump (and also for other code I wrote in C/Fortran). But I can’t get it to work with h5repack.

First, I get the proper arguments to pass to h5repack (using the tool from the H5Z-ZFP github distribution):

login1: $ print_h5repack_farg zfpmode=3 acc=2.0

Print cdvals for set of ZFP compression paramaters...
    zfpmode=3  set zfp mode (1=rate,2=prec,3=acc,4=expert,5=rev)
    rate=3.5                    set rate for rate mode of filter
    acc=2               set accuracy for accuracy mode of filter
    prec=0        set precision for precision mode of zfp filter
    minbits=0          set minbits for expert mode of zfp filter
    maxbits=0          set maxbits for expert mode of zfp filter
    maxprec=0          set maxprec for expert mode of zfp filter
    minexp=0            set minexp for expert mode of zfp filter
    help=0                                     this help message

h5repack -f argument...
    -f UD=32013,4,3,0,0,1073741824

Then I try to repack:

login1:$ h5repack -v -f dbz:UD=32013,4,3,0,0,1073741824 -l dbz:CHUNK=1x20x20x20 small.054000.nc test.nc
No all objects to modify layout
 <dbz> with chunk size 1 20 20 20 
No all objects to apply filter
 <dbz> with UD filter 32013
Opening file. Searching 24 objects to modify ...
 <dbz>...Found
Making new file ...

 Type     Filter (Compression)     Name

 group                       /
  attr                        cm1_lofs_version
  attr                        uniform_mesh
  attr                        _NCProperties
 dset                        /xh
  attr                        CLASS
  attr                        NAME
  attr                        _Netcdf4Dimid
  attr                        long_name
  attr                        units
  attr                        axis
  attr                        REFERENCE_LIST
 dset                        /yh
  attr                        CLASS
  attr                        NAME
  attr                        _Netcdf4Dimid
  attr                        long_name
  attr                        units
  attr                        axis
  attr                        REFERENCE_LIST
 dset                        /zh
  attr                        CLASS
  attr                        NAME
  attr                        _Netcdf4Dimid
  attr                        long_name
  attr                        units
  attr                        axis
  attr                        REFERENCE_LIST
 dset                        /xf
  attr                        CLASS
  attr                        NAME
  attr                        _Netcdf4Dimid
  attr                        long_name
  attr                        axis
  attr                        units
 dset                        /yf
  attr                        CLASS
  attr                        NAME
  attr                        _Netcdf4Dimid
  attr                        long_name
  attr                        axis
  attr                        units
 dset                        /zf
  attr                        CLASS
  attr                        NAME
  attr                        _Netcdf4Dimid
  attr                        long_name
  attr                        axis
  attr                        units
 dset                        /time
  attr                        CLASS
  attr                        NAME
  attr                        _Netcdf4Dimid
  attr                        units
  attr                        axis
  attr                        long_name
  attr                        REFERENCE_LIST
 dset                        /X0
  attr                        long_name
 dset                        /Y0
  attr                        long_name
 dset                        /X1
  attr                        long_name
 dset                        /Y1
  attr                        long_name
 dset                        /Z0
  attr                        long_name
 dset                        /Z1
  attr                        long_name
Error occurred while repacking

login1: $ h5dump -d dbz test.nc | head
HDF5 "test.nc" {
DATASET "dbz" {
   DATATYPE  H5T_IEEE_F32LE
   DATASPACE  SIMPLE { ( 1, 101, 101, 101 ) / ( H5S_UNLIMITED, 101, 101, 101 ) }
   DATA {
   (0,0,0,0): 9.96921e+36, 9.96921e+36, 9.96921e+36, 9.96921e+36,
   (0,0,0,4): 9.96921e+36, 9.96921e+36, 9.96921e+36, 9.96921e+36,
   (0,0,0,8): 9.96921e+36, 9.96921e+36, 9.96921e+36, 9.96921e+36,
   (0,0,0,12): 9.96921e+36, 9.96921e+36, 9.96921e+36, 9.96921e+36,
   (0,0,0,16): 9.96921e+36, 9.96921e+36, 9.96921e+36, 9.96921e+36,

login1: $ h5dump -d dbz small.05400.000000.nc | head
HDF5 "small.05400.000000.nc" {
DATASET "dbz" {
   DATATYPE  H5T_IEEE_F32LE
   DATASPACE  SIMPLE { ( 1, 101, 101, 101 ) / ( H5S_UNLIMITED, 101, 101, 101 ) }
   DATA {
   (0,0,0,0): 14.5957, 14.9746, 15.4629, 16.043, 16.4648, 16.793, 17.0273,
   (0,0,0,7): 17.5449, 18.2129, 19.1152, 20.6895, 21.8193, 22.3174, 23.1123,
   (0,0,0,14): 23.9854, 24.6948, 24.4165, 21.8335, 19.9927, 19.9648, 21.2461,
   (0,0,0,21): 22.6914, 24.2383, 25.623, 25.6191, 25.0137, 24.9941, 25.5234,
   (0,0,0,28): 26.1641, 26.7578, 27.5156, 28.2188, 28.8672, 29.1963, 29.1123,

So the original uncompressed file (small.05400.000000.nc) that I started with clearly has good data, but the ZFP compressed file does not, which is unsurprising as it errored out (and it’s tiny because it’s compressing all of those constant numbers). The “broken” file correctly identifies as ZFP compressed data but taking up zero bytes:

login1: $ h5ls -rv test.nc
[skip other variables]
/dbz                     Dataset {1/Inf, 101/101, 101/101, 101/101}
    Location:  1:17728
    Links:     1
    Chunks:    {1, 20, 20, 20} 32000 bytes
    Storage:   4121204 logical bytes, 0 allocated bytes
    Filter-0:  H5Z-ZFP-1.0.1 (ZFP-0.5.5)-32013  {0, 0, 1073741824}
    Type:      native float

I noticed a long time ago there was a patch for h5repack that supposedly fixed something that was broken regarding unregistered filters such as ZFP.

The patch is referenced here:

https://h5z-zfp.readthedocs.io/en/latest/h5repack.html

And lives here:

The language on the first link is not very helpful, it states

Some versions of HDF5’s h5repack utility contain a bug that prevents proper parsing of the -f argument’s option. In order to use h5repack with -f argument as described here, you need to apply the patch from h5repack_parse.patch. To do so, after you’ve downloaded and untar’d HDF5 but before you’ve built it, do something like the following using HDF5-1.8.14 as an example:

gunzip < hdf5-1.8.14.tar.gz | tar xvf -
cd hdf5-1.8.14
patch ./tools/h5repack/h5repack_parse.c /h5repack_parse.patch

This is a very old patch and fails when applying to a recent version of h5repack_parse.c which has been changed a lot since the patch came out in Nov 2016. However when I look at the patch and look at the latest version of h5repack_parse.c, the change that was made in the patch file, it doesn’t look like whatever they did there made its way into the latest code. So it’s not clear whether h5repack is working for ANYONE right now with ZFP.

Any help much appreciated!

Leigh Orf
Research Scientist
UW-Madison


#2

filters need to be specified with UD=<filter_number,filter_flag,cd_value_count,value1[,value2,…,valueN]>

looks like you are missing the filter_flag: 1 is OPTIONAL or 0 is MANDATORY

Allen


#4

Ah!! that print_h5repack_farg utility lied to me then.

This is what ultimately worked:

h5repack -v -f dbz:UD=32013,0,4,3,0,0,1073741824 -l dbz:CHUNK=1x20x20x20 small.05400.000000.nc test.nc

I bolded the zero that fixed this mess.

I will let the h5z-zfp folks know to change their routine, because I was assuming it was giving me the correct magic for ZFP.


#5

It’s worth noting https://support.hdfgroup.org/HDF5/doc/RM/Tools.html#Tools-Repack states that for the -f option:

 UD=filter_id,nfilter_params,value_1[,value_2,....,value_n]
     filter_id is the filter identifier.
     nfilter_params is the number of filter parameters.
     value_1 through value_n are the values of each filter parameter.
             Number of values must match the value of nfilter_params.

-f dbz:UD=32013,0,4,3,0,0,1073741824

It looks like the actual arguments to -f are, for NFILTER_PARAMS=4:

UD=[FILTERID],[FILTER FLAG],[NFILTER_PARAMS],[val1],[val2],[val3],[val4]

This “filter flag” (0:mandatory,1:optional) needs to be included in hdf5’s official documentation!


#6

Try h5repack --help. It should be correct. I will investigate updating that link.


#7

The updated link is: https://portal.hdfgroup.org/display/HDF5/h5repack