Hi,
I’ve only been studying HDF5 for a few days, so sorry for the possibly stupid questions.
I’m experimenting with the level of image compression that HDF5 can give compared to other containers.
Using OpenCV, I take the raw image pixel data (RGB or RGBA) from many images (thousands) and save this data as a datasets in an H5 file without compression by H5IMmake_image_24bit() function
The following data is obtained:
h5dump file.h5:
HDF5 “file.h5” {
GROUP “/” {
DATASET “image1.png” {
DATATYPE H5T_STD_U8LE
DATASPACE SIMPLE { ( 256, 256, 3 ) / ( 256, 256, 3 ) }
DATA {
(0,0,0): 223, 211, 170,
(0,1,0): 223, 211, 170,
(0,2,0): 223, 211, 170,
…
DATASET “image2.png” {
…
Then I try to compress it using the h5repack utility and the ZSTD filter as shown below:
h5repack -l CHUNK=60x60x3 -f UD=32015,10 file.h5 file_zstd.h5
As a result, I get worse compression than if I compressed the same RGB data files in TAR+ZSTD archive.
TAR.ZSTD Size / H5 Size ~ 0.72
I understand that most likely I will not get the same compression ratio that tar+zstd gives,
but perhaps there are ways to improve compression in H5?
Perhaps the best compression would be to use ZSTD with a trained dictionary.
Is it possible to use the trained dictionary in zstd plugin, or perhaps such work is already underway.
I will be glad to any ideas that will help me compress my data better inside HDF5.
Thanks!