Easy way to get compression ratios for 3D arrays?

I do serial HDF5 with ZFP compression on thousands of files. I can get compression ratios from 3D arrays by using h5ls -rv on individual files (and piping the output to some shell tools), but I’m looking for a way to get them from calling HDF5 routines from within C or Fortran, which h5ls must do. I want to collect statistics on how much compression is happening with ZFP on each array/file, as I am using the accuracy parameter which does dynamic compression. I’d like to be able to collect this information from the same code (written in C) that reads from the ZFP compressed files. Is there a straightforward way to do this?

Thanks,

Leigh

Leigh, how about H5Dget_storage_size (https://portal.hdfgroup.org/display/HDF5/H5D_GET_STORAGE_SIZE)? This is per dataset and you’d still have to compare the output to H5Sget_simple_extent_npoints * H5Tget_size and do the iteration over all relevant datasets.

G.

Hi Gerd, nice to talk to you again :slight_smile:
That is exactly what I was looking for. It’s trivial to get the uncompressed size, so this call was the missing link. Thanks!

Leigh

Hey Leigh,

Here is a quick bash script you can run on existing files…

#!/bin/sh

tbyts=0
zbyts=0

while read p; do
    if [[ -z "$(echo $p | grep '^ *Storage:')" ]]; then
        continue
    fi
    t=$(echo $p | tr -s ' ' | cut -d' ' -f2)
    z=$(echo $p | tr -s ' ' | cut -d' ' -f5)
    (( tbyts = tbyts + t ))
    (( zbyts = zbyts + z ))
done << EOT
$(h5ls -vlr $1)
EOT
ratio=$(echo "4@k@$tbyts@$zbyts@/@p" | tr '@' '\n' | dc)

echo "Total logic bytes = $tbyts"
echo "Total actual bytes = $zbyts"
echo "Compression ratio = $ratio"

And here is an example run…

% ./h5statz.sh test_zfp.h5
Total logic bytes = 133125
Total actual bytes = 61184
Compression ratio = 2.1758