transfer records to another file without decompress/recompress

dkokron · January 9, 2025, 2:54pm

We have a workflow that copies the last 120 of 121 records from one NetCDF-Classic file to another. The data are compressed and chunked. Each record is a chunk. Profiling shows the vast majority of time is spent decompressing (read from the first file) then recompressing the data (write to the target file). Both deflate and Zstd show the same behavior. A faster way would avoid decompressing and recompressing in the first place. I can see needing to decompress the data if the user wants to get at the real values, but I just want copy from one file to another. I was thinking of a low level block copy. Something like ‘dd’ command would do.

Is that possible with HDF5/NetCDF?
Does HDF5 NEED to decompress/recompress in this scenario?

I’m using nco-5.2.4 built using spack and running on a zen2 chip under SLES15sp4.
Example usage:
ncrcat -7 -d time,1,120 -L 4 file.in file.out
ncrcat -7 -d time,1,120 --cmp=‘shf|zst,4’ file.in file.out

mlarson · January 13, 2025, 3:13pm

H5Dread_chunk() and H5Dwrite_chunk() allow raw chunk data to be accessed and written while bypassing some or all compression filters.

dkokron · January 16, 2025, 7:48pm

H5Dread_chunk() and H5Dwrite_chunk() are exactly what I was looking for. Thank you.

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

transfer records to another file without decompress/recompress