reading multiple irregularly spaced hyperslabs

steven_e_pav · March 3, 2010, 7:12pm

I am reading hyperslabs from a medium-sized gzip-compressed hdf5 file generated
by the 1.6.5 library; the dataset is around 1000 x 20000 x 20, and I am reading
around 1000 slabs of size 1000 x 1 x 20; the slabs are irregularly spaced. The
reads are somewhat slower than what I expected based on the performance for
a toy version of the problem. In my read algorithm, however, I am essentially
doing the following loop:

for each slab {
H5Sselect_hyperslab(space_id, H5S_SELECT_SET, start, stride, count, block)
H5Dread( etc )
}

should I expect better performance if instead, I construct a union of
hyperslabs, then do the read, like so:

initialize the slab as empty?
for each slab {
H5Sselect_hyperslab(space_id, H5S_SELECT_OR, start, stride, count, block)
}
H5Dread( etc )

which method is preferred, and why?

thanks,

- --sep

[ Steven E. Pav {bikes/bitters/linux} nerd ]
[ a palindrome: stacks ask cats ]

Quincey_Koziol · March 3, 2010, 9:06pm

Hi Steven,

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I am reading hyperslabs from a medium-sized gzip-compressed hdf5 file generated
by the 1.6.5 library; the dataset is around 1000 x 20000 x 20, and I am reading
around 1000 slabs of size 1000 x 1 x 20; the slabs are irregularly spaced. The
reads are somewhat slower than what I expected based on the performance for
a toy version of the problem. In my read algorithm, however, I am essentially
doing the following loop:

for each slab {
H5Sselect_hyperslab(space_id, H5S_SELECT_SET, start, stride, count, block)
H5Dread( etc )
}

should I expect better performance if instead, I construct a union of
hyperslabs, then do the read, like so:

initialize the slab as empty?
for each slab {
H5Sselect_hyperslab(space_id, H5S_SELECT_OR, start, stride, count, block)
}
H5Dread( etc )

which method is preferred, and why?

The latter will probably have somewhat better performance. However, it is much more important to align your hyperslabs with your chunk boundaries. If you can't align the hyperslabs & chunk boundaries, you should increase your chunk cache size to hold several chunks in memory.

Quincey

···

On Mar 3, 2010, at 1:12 PM, steven e. pav wrote:

thanks,

- --sep

[ Steven E. Pav {bikes/bitters/linux} nerd ]
[ a palindrome: stacks ask cats ]

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)

iD8DBQFLjrS1KBuIYpNp6roRAgKsAKChvwAvyY5C30+XeutS96aBJoiXowCglnDr
fRoKVqzQIZGyB8TJHY6uR8E=
=wTDQ
-----END PGP SIGNATURE-----

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

reading multiple irregularly spaced hyperslabs