H5Dread performance issue with non-contiguous hyperslab selections

Hi,

I think I've come across a performance issue with H5Dread when reading
non-contiguous hyperslab selections. The use case in my software is a bit
complicated, so instead I came up with a small example that shows the same
issue. Please let me know if I'm missing something here, it's possible
that a different approach could be much better.

In my example I write a 2D native int chunked dataset to an HDF5 file
(adapted from the h5_extend example, now writes a 229 MB file). I then
construct a hyperslab selection of the dataset and read it back using a
single call to H5Dread. When I use a stride of 1 (so all elements of the
selection are contiguous) the read is very fast. However, when I set the
stride to 2 the read time slows down significantly, on the order of 15
times slower.

The dataset has a chunk shape of 1000x500, and the 0th dimension is the one
being tested with a stride of 1 and 2. Is this a typical slowdown seen
with a stride of 2? If the chunksize is 1000, then a stride of 1 and 2
would still need to read the same amount of data, so I would expect similar
performance.

I've run the stride of 2 scenario under Valgrind (using the callgrind tool)
for profiling and it shows that 95% of the time is being spent in
H5S_select_iterate (I can share the callgrind output if it helps), which is
making this program CPU bound and nowhere near I/O bound. Any ideas on how
to optimize this function or otherwise increase the performance of this use
case?

Thanks,
Chris LeBlanc

h5dread_hyperslab_benchmark1.c (5.34 KB)

A post was merged into an existing topic: H5Dread performance issue with non-contiguous hyperslab selections