I’m hitting a performance issue which seems very similar to this thread from 2012.
I can select 6000 entries along the first dimension, about 750 MB of data, and read it in about 300 ms. But if I try to select blocks of 3 entries at a stride of 100 - so 3% of the data volume - and read it to a contiguous array, it takes much longer, about 2.4 seconds. However, if I read into a matching selection in memory, it’s fast again (~8 ms). Using h5py, these options look like:
# Read all the data - fast
ds.read_direct(arr, source_sel=np.s_[:6000])
# Read a small subset of data with strides - slow
sel = h5py.MultiBlockSlice(start=0, stride=100, count=60, block=3)
ds.read_direct(arr, source_sel=sel, dest_sel=np.s_[:180])
# Read the small subset to a matching selection in memory - fast
ds.read_direct(arr, source_sel=sel, dest_sel=sel)
The h5py read_direct
method is largely a wrapper around H5Dread
, and from looking at the code plus what I can easily profile in Python, I’m pretty confident it’s the call to H5Dread that’s taking the time.
Background info: dataset shape is (76800, 1, 256, 256), data type is uint16, storage is chunked with chunk shape (32, 1, 256, 256). There should be no data type conversion involved (my array in memory has the same datatype as the dataset). I’m using the default file driver on Linux, with h5py 3.4.0 and HDF5 1.12.1.
In the 2012 thread, @epourmal said that this was a known issue, and it had been entered in the bug tracker. Does the same issue still affect the latest version of HDF5? Or has that been fixed and I’m seeing something similar but with a different root cause?
If necessary, I can probably come up with a workaround using the matching selections and then copying data together afterwards, similar to what Chris described in that thread. Even from Python, this could be significantly faster. But it’s fiddly, and I’d much rather discover that it’s not necessary.