Hello HDF folks,
I desire the ability to use h5repack to ZFP compress 4D (1 unlimited time dimension, 3 spatial dimensions, so (time,z,y,x) arrays) that live in (uncompressed) netCDF4/HDF5 files and save them to new netcdf4/HDF5 (compressed dataset) files. Currently h5repack fails with a generic “Error occurred while repacking” error, creates a new readable netcdf4 file where the desired 4D array to be compressed contains a constant garbage huge number (shown below) for all elements.
I have built ZFP and H5Z-ZFP successfully, passing all tests. I also built HDF5 on a machine where it was already installed (Frontera supercomputer) with an official distribution accessed via “module load hdf5” just to check - results are identical. My built h5z-zfp plugin works; I have HDF5_PLUGIN_PATH set correctly with the directory that contains the libh5zzfp.so shared library, and when I h5dump a ZFP compressed HDF5 file, it shows uncompressed data correctly. So, the h5z-zfp plugin is working for h5dump (and also for other code I wrote in C/Fortran). But I can’t get it to work with h5repack.
First, I get the proper arguments to pass to h5repack (using the tool from the H5Z-ZFP github distribution):
login1: $ print_h5repack_farg zfpmode=3 acc=2.0
Print cdvals for set of ZFP compression paramaters...
zfpmode=3 set zfp mode (1=rate,2=prec,3=acc,4=expert,5=rev)
rate=3.5 set rate for rate mode of filter
acc=2 set accuracy for accuracy mode of filter
prec=0 set precision for precision mode of zfp filter
minbits=0 set minbits for expert mode of zfp filter
maxbits=0 set maxbits for expert mode of zfp filter
maxprec=0 set maxprec for expert mode of zfp filter
minexp=0 set minexp for expert mode of zfp filter
help=0 this help message
h5repack -f argument...
-f UD=32013,4,3,0,0,1073741824
Then I try to repack:
login1:$ h5repack -v -f dbz:UD=32013,4,3,0,0,1073741824 -l dbz:CHUNK=1x20x20x20 small.054000.nc test.nc
No all objects to modify layout
<dbz> with chunk size 1 20 20 20
No all objects to apply filter
<dbz> with UD filter 32013
Opening file. Searching 24 objects to modify ...
<dbz>...Found
Making new file ...
Type Filter (Compression) Name
group /
attr cm1_lofs_version
attr uniform_mesh
attr _NCProperties
dset /xh
attr CLASS
attr NAME
attr _Netcdf4Dimid
attr long_name
attr units
attr axis
attr REFERENCE_LIST
dset /yh
attr CLASS
attr NAME
attr _Netcdf4Dimid
attr long_name
attr units
attr axis
attr REFERENCE_LIST
dset /zh
attr CLASS
attr NAME
attr _Netcdf4Dimid
attr long_name
attr units
attr axis
attr REFERENCE_LIST
dset /xf
attr CLASS
attr NAME
attr _Netcdf4Dimid
attr long_name
attr axis
attr units
dset /yf
attr CLASS
attr NAME
attr _Netcdf4Dimid
attr long_name
attr axis
attr units
dset /zf
attr CLASS
attr NAME
attr _Netcdf4Dimid
attr long_name
attr axis
attr units
dset /time
attr CLASS
attr NAME
attr _Netcdf4Dimid
attr units
attr axis
attr long_name
attr REFERENCE_LIST
dset /X0
attr long_name
dset /Y0
attr long_name
dset /X1
attr long_name
dset /Y1
attr long_name
dset /Z0
attr long_name
dset /Z1
attr long_name
Error occurred while repacking
login1: $ h5dump -d dbz test.nc | head
HDF5 "test.nc" {
DATASET "dbz" {
DATATYPE H5T_IEEE_F32LE
DATASPACE SIMPLE { ( 1, 101, 101, 101 ) / ( H5S_UNLIMITED, 101, 101, 101 ) }
DATA {
(0,0,0,0): 9.96921e+36, 9.96921e+36, 9.96921e+36, 9.96921e+36,
(0,0,0,4): 9.96921e+36, 9.96921e+36, 9.96921e+36, 9.96921e+36,
(0,0,0,8): 9.96921e+36, 9.96921e+36, 9.96921e+36, 9.96921e+36,
(0,0,0,12): 9.96921e+36, 9.96921e+36, 9.96921e+36, 9.96921e+36,
(0,0,0,16): 9.96921e+36, 9.96921e+36, 9.96921e+36, 9.96921e+36,
login1: $ h5dump -d dbz small.05400.000000.nc | head
HDF5 "small.05400.000000.nc" {
DATASET "dbz" {
DATATYPE H5T_IEEE_F32LE
DATASPACE SIMPLE { ( 1, 101, 101, 101 ) / ( H5S_UNLIMITED, 101, 101, 101 ) }
DATA {
(0,0,0,0): 14.5957, 14.9746, 15.4629, 16.043, 16.4648, 16.793, 17.0273,
(0,0,0,7): 17.5449, 18.2129, 19.1152, 20.6895, 21.8193, 22.3174, 23.1123,
(0,0,0,14): 23.9854, 24.6948, 24.4165, 21.8335, 19.9927, 19.9648, 21.2461,
(0,0,0,21): 22.6914, 24.2383, 25.623, 25.6191, 25.0137, 24.9941, 25.5234,
(0,0,0,28): 26.1641, 26.7578, 27.5156, 28.2188, 28.8672, 29.1963, 29.1123,
So the original uncompressed file (small.05400.000000.nc) that I started with clearly has good data, but the ZFP compressed file does not, which is unsurprising as it errored out (and it’s tiny because it’s compressing all of those constant numbers). The “broken” file correctly identifies as ZFP compressed data but taking up zero bytes:
login1: $ h5ls -rv test.nc
[skip other variables]
/dbz Dataset {1/Inf, 101/101, 101/101, 101/101}
Location: 1:17728
Links: 1
Chunks: {1, 20, 20, 20} 32000 bytes
Storage: 4121204 logical bytes, 0 allocated bytes
Filter-0: H5Z-ZFP-1.0.1 (ZFP-0.5.5)-32013 {0, 0, 1073741824}
Type: native float
I noticed a long time ago there was a patch for h5repack that supposedly fixed something that was broken regarding unregistered filters such as ZFP.
The patch is referenced here:
https://h5z-zfp.readthedocs.io/en/latest/h5repack.html
And lives here:
The language on the first link is not very helpful, it states
Some versions of HDF5’s h5repack utility contain a bug that prevents proper parsing of the -f argument’s option. In order to use h5repack with -f argument as described here, you need to apply the patch from h5repack_parse.patch. To do so, after you’ve downloaded and untar’d HDF5 but before you’ve built it, do something like the following using HDF5-1.8.14 as an example:
gunzip < hdf5-1.8.14.tar.gz | tar xvf -
cd hdf5-1.8.14
patch ./tools/h5repack/h5repack_parse.c /h5repack_parse.patch
This is a very old patch and fails when applying to a recent version of h5repack_parse.c which has been changed a lot since the patch came out in Nov 2016. However when I look at the patch and look at the latest version of h5repack_parse.c, the change that was made in the patch file, it doesn’t look like whatever they did there made its way into the latest code. So it’s not clear whether h5repack is working for ANYONE right now with ZFP.
Any help much appreciated!
Leigh Orf
Research Scientist
UW-Madison