h5dump a 2D large chunk with GZIP filter is very slow


I have a 1.8.19 environment and an h5 file that has a two dimensional dataset with following specification:

h5ls -av myChunkedFile.h5/List/event
Opened “myChunkedFile.h5” with sec2 driver.
event Dataset {61751770/Inf, 5/5}

Location:  1:1832
Links:     1
Chunks:    {8000000, 5} 80000000 bytes
Storage:   617517700 logical bytes, 9243240 allocated bytes, 6680.75% utilization
Filter-0:  deflate-1 OPT {6}
Type:      native short
Address: 2560
       Flags    Bytes     Address          Logical Offset
    ========== ======== ========== ==============================
    0x00000000  1273973       5176 [0, 0, 0]
    0x00000000  1193401    1279149 [8000000, 0, 0]
    0x00000000  1193287    2472550 [16000000, 0, 0]
    0x00000000  1193159    3665837 [24000000, 0, 0]
    0x00000000  1191715    4858996 [32000000, 0, 0]
    0x00000000  1192051    6050711 [40000000, 0, 0]
    0x00000000  1193409    7242762 [48000000, 0, 0]
    0x00000000   812245    8436171 [56000000, 0, 0]

I want to h5dump the data associated with one chunk, for example, to dump the 2nd chunk:

h5dump -d /List/event -s "8000000,0" -c "8000000,5" myChunkedFile.h5

The single chunk h5dump takes an incredible amount of time (eg. hours), where as the h5dump of the entire dataset only takes minutes.

It appears to me that a h5dump with a “-s” start_offset and an associated GZIP filter results in the ‘chunk’ being decompressed for each data element in the chunk. As you can see, my dataset chunk size is 80 Mbytes. I have read that the default chunk cache size is 1 MByte.

There seems to be no way to specify the ‘chunk cache’ in the context of a h5dump.

Any suggestions?
Can anyone confirm that h5dump of a chunk does full chunk decompression for each data element in the chunk?



Hi Mike,

I don’t see a way with h5dump to specify a larger chunk cache size. (I’ll check on that.)

However, I think it should help to switch your chunk size to 5,8000000.
When using C (which h5dump is written in), data elements are stored in row-major order, meaning the elements in a row are contiguous. Only one read access has to be done to read a row, but if reading a column, then multiple read accesses must be performed.

Here is a document with images (Figure 3 and Figure 4 under “Dataset Storage Order”) that describe the issue:


Chunking is a dataset creation property, so you have to re-create the file with a different chunk size to change it. That can be done with the h5repack utility that comes with the HDF5 binary distribution.



Hi Mike,

We discussed this and need to look at the issue further. I entered bug HDFFV-10620 for the issue.