I have a 1.8.19 environment and an h5 file that has a two dimensional dataset with following specification:
h5ls -av myChunkedFile.h5/List/event
Opened “myChunkedFile.h5” with sec2 driver.
event Dataset {61751770/Inf, 5/5}
Location: 1:1832
Links: 1
Chunks: {8000000, 5} 80000000 bytes
Storage: 617517700 logical bytes, 9243240 allocated bytes, 6680.75% utilization
Filter-0: deflate-1 OPT {6}
Type: native short
Address: 2560
Flags Bytes Address Logical Offset
========== ======== ========== ==============================
0x00000000 1273973 5176 [0, 0, 0]
0x00000000 1193401 1279149 [8000000, 0, 0]
0x00000000 1193287 2472550 [16000000, 0, 0]
0x00000000 1193159 3665837 [24000000, 0, 0]
0x00000000 1191715 4858996 [32000000, 0, 0]
0x00000000 1192051 6050711 [40000000, 0, 0]
0x00000000 1193409 7242762 [48000000, 0, 0]
0x00000000 812245 8436171 [56000000, 0, 0]
I want to h5dump the data associated with one chunk, for example, to dump the 2nd chunk:
h5dump -d /List/event -s "8000000,0" -c "8000000,5" myChunkedFile.h5
The single chunk h5dump takes an incredible amount of time (eg. hours), where as the h5dump of the entire dataset only takes minutes.
It appears to me that a h5dump with a “-s” start_offset and an associated GZIP filter results in the ‘chunk’ being decompressed for each data element in the chunk. As you can see, my dataset chunk size is 80 Mbytes. I have read that the default chunk cache size is 1 MByte.
There seems to be no way to specify the ‘chunk cache’ in the context of a h5dump.
Any suggestions?
Can anyone confirm that h5dump of a chunk does full chunk decompression for each data element in the chunk?
Mike