Hi Neil,
A Tuesday 31 March 2009, Neil Fortner escrigué:
This is happening because (in the "time for two trailing indices"
case) the individual chunks are not contiguous in memory, as Ger
pointed out. Also, because the chunks are larger than the chunk cache
size (default=1 MB), the library makes a best effort to avoid having
to allocate enough memory in the chunk. Therefore it reads directly
from the file into the supplied read buffer. Because the selection
in the read buffer (for each chunk) is a series of small
non-contiguous blocks, the library must make a large number of small
reads.
I see. That makes sense.
To improve performance, you can increase the chunk cache size with
H5Pset_cache (or the new H5Pset_chunk_cache function if you're using
the latest snapshot). The test runs in about .7 seconds with this
change on my laptop, down from ~30 seconds. This is still more time
than for the contiguous case, because the library must allocate the
extra space and scatter each element individually from the cache to
the read buffer, but now only calls read once for each chunk.
Yes, it works a lot better now! However, after setting the chunk cache
size to 12 MB (a bit larger than my chunk size, which is 11.8 MB) the
performance is still a long way to be optimal, IMHO. Look at this
numbers:
For a default chunk cache size (1 MB):
time for [0:2,:,:,0] --> 0.15
time for [0,:,:,0:2] --> 7.615
With an increased chunk cache size (12 MB):
time for [0:2,:,:,0] --> 0.165
time for [0,:,:,0:2] --> 1.312
So, despite that the new time is around 6x better, it is still almost 8x
slower than the contiguous case. In order to simulate the time that
could take to scatter each element from the cache to the read buffer,
I've computed the time that takes a similar process with NumPy:
In [28]: a = np.arange(1978*1556*2, dtype="int32")
In [29]: b = np.empty(1978*1556*2, dtype="int32")
In [30]: timeit b[:b.size/2] = a[::2]
10 loops, best of 3: 40.7 ms per loop
In [31]: timeit b[b.size/2:] = a[1::2]
10 loops, best of 3: 40.1 ms per loop
In [32]: a
Out[32]: array([ 0, 1, 2, ..., 6155533, 6155534,
6155535])
In [33]: b
Out[33]: array([ 0, 2, 4, ..., 6155531, 6155533,
6155535])
As can be seen, the scatter process completes in around 40 + 40 = 80 ms
for my chunk size. Hence, I'd expect the second read case to complete
in around 0.17 + 0.08 = 0.25 seconds, while the actual time is still 5x
slower. So, unless I'm missing something, my guess is that the scatter
code in the HDF5 library could be made a lot faster.
Thanks,
···
--
Francesc Alted
"One would expect people to feel threatened by the 'giant
brains or machines that think'. In fact, the fightening
computer becomes less frightening if it is used only to
simulate a familiar noncomputer."
-- Edsger W. Dykstra
"On the cruelty of really teaching computer science"
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.