Chunk caching not working for reads?


I am working with a 10000 row by 200000 column matrix of 4-byte floats. The matrix is written unavoidably in sequential row-major order, but needs to be read in a sequential column-wise order. I chunk the matrix into 1000x1000 chunks, to compromise on write performance (row-wise) and read performance (column-wise).

When writing the file, I set the chunk cache to be big enough to hold an entire row’s worth of chunks (i.e. 200000 / 1000 chunks multiplied by 4e6 bytes). My write times per row are of the order of 5ms, and the algorithm pauses after each 1000 rows. By monitoring I/O at the filesystem level, I see spikes of disk activity during these pauses, with transfer rates approaching maximum. I conclude that the chunk cache is effectively buffering 1000 rows of the matrix, and flushing to disk only when all chunks have been written. So far so good — HDF5 is making my life easy :slight_smile:

However when reading the file, I reserve enough chunk cache to accommodate a column’s worth of chunks (10000 / 1000 chunks multiplied by 4e6 bytes). My column read time is of the order of 10ms, but I don’t see pauses or spikes of disc activity as with the write. Instead, I get a steady trickle of disc activity that does not appear to be correlated with the chunk width as I was expecting. Therefore, it appears that the chunk cache is not being used.

1) Should I expect this behaviour?
2) Have I set up the chunk cache correctly (code below), and do I have to explicitly tell HDF5 to read data chunk-wise from a chunked-layout file?
3) How best to monitor cache flushing/pre-emption activity?



Sample C++ code:

// Cache
H5::FileAccPropList fprops = file.getAccessPlist();
int mdc;
size_t ccelems;
size_t ccnbytes;
double w0;
fprops.getCache(mdc, ccelems, ccnbytes, w0);

size_t chunksPerCol = 10000 / 1000;
ccnbytes = chunksPerCol * chunkDim[0] * chunkDim[1] * sizeof(float);

fprops.setCache(mdc, ccelems, ccnbytes, w0);