On my MacBook I create a chunked float array of shape [nz,ny,nx]=[500,1024,1024] (2 GByte) with a chunk size of [20,32,32] = 81920 bytes. The array is as large as the memory to measure true IO times by avoiding that file pages are kept in the kernel's file cache. I'm using HDF5 1.8.3 to make use of the new H5Pset_chunk_cache function.
Creating the array takes about 60 seconds and reading it chunk by chunk takes about 60 seconds as well. In both cases the cache size is setup as 1 chunk (81920 bytes). These times are more or less as expected.
However, when reading the data x-vector by x-vector (in chunk order) it takes 160 seconds, although the cache is setup to hold 32 chunks (2621440 bytes). Its hash table is 3203 entries (next prime larger than 100x cache size as advised in the HDF5 documentation). It appears that most time is spent in user time (over 100 seconds); the actual IO time seems to be fine.
With reading x-vectors in chunk order I mean that the vectors are not read in strict y,z order, but first all y,z indices of a chunk are processed before moving to the next row of chunks. When doing it in strict y,z order, the cache needs to be much larger.
I did another test with a cube shape of [10,512,512] and the chunk shape the same as the cube shape. Also in this case reading by x-vector took much more time than reading by chunk.
So the question is what HDF5 is doing that it takes so much user time to perform this task. Is it memcpy or hashing or something else?
This is a very important issue for us. Our astronomical image cubes will be 3-dim with axes [freq,dec,ra]. Usually the data are retrieved as [dec,ra] planes, but sometimes as a frequency-profile for a specific [dec,ra] point. So efficient access in all directions is important. We thought that chunking would help us here, but that is not all that clear.
Cheers,
Ger van Diepen
···
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.