Slow data access

I've recently started using HDF5 to store ~2.5GB datasets, with the hope to
move to much much larger datasets in the future. However, accessing my data
seems to be slower than I had hoped, and I'm wondering if I'm going about
things the wrong way.

The data is a 3D image set. It is a set of ~3 Billion voxels, with
dimensions of 1760x1024x1878. The data is accessed in all three dimensions
to create 2D images. For example, I could create an image of the "slice" at
325 in the Z dimension to create a 2D image that is 1760x1024, or I could
create an image from 325 in the X direction to create an image that is
1024x1878.

I'm storing the data in the HDF5 file as a 4D array (the 4th dimension being
the RGB info for the voxel). I'm using a chunk size of 32x32x32x3. The data
is gzipped. The data was put into the file using PyTables, and is being read
by C++. The metadata output of HDFView looks like:

8-bit unsigned character, 1878 x 1024 x 1760 x 3
Number of attributes = 3
   CLASS = CARRAY
   VERSION =1.0
   TITLE =

I'm accessing the data using hyperslabs, much like the example code. I can
provide code snippets if necessary.

The problem is the time accessing the data. It is quite a bit slower than I
had hoped.

X slab: 2.17553 seconds
Y slab: 4.19333 seconds
Z slab: 3.09807 seconds

This is on OS X, with a 2.8 GHz i7 processor and 8GB of memory. I saw
similar results when I ran some tests on an EC2 Linux box.

I didn't find anything on the HDF5 website or elsewhere describing best
practices when it comes to laying out the data. Is there a better way? Is
there some trick that I'm missing? The biggest concern is moving to larger
datasets, and whether the data access slows down even more.

Thanks for any help!

Eric Reid

A Wednesday 27 January 2010 17:15:16 Eric Reid escrigué:

I didn't find anything on the HDF5 website or elsewhere describing best
practices when it comes to laying out the data. Is there a better way? Is
there some trick that I'm missing? The biggest concern is moving to larger
datasets, and whether the data access slows down even more.

I've been thinking in your problem, and I think you can try this approach:
setup three different versions of your dataset, each one optimized for walking
data on every dimension. For example, for your 1760x1024x1878 dataset, you
can have a dataset with a chunkshape of (1, 1024, 1878) for getting your
images in the x axis, another with (1760, 1, 1878) for walking images in the y
axis and finally (1760, 1024, 1) for the z axis.

This way, instead of having to read hundreds of MB per image, you only have to
read some MB, giving an speed-up of around 100x. Of course, you will need 3x
more space and some level of indirection in your application to access the
correct array given a required slice, but if performance access is what you
need then this can be reasonable for achieving a 100x of improvement in speed
access.

Cheers,

···

--
Francesc Alted

I just wanted to follow up, in case anyone has the same problem in the
future. Hopefully it helps someone.

I'm now using 16x16x16x3 chunks, as that was slightly faster using my new
approach.

I've switched my code so that instead of trying to read one slice of data, I
read a whole chunk's worth of data. In other words, if I want the image from
slice 10 in the X direction, I read slices 0-15 into memory, then do a quick
memcpy to get the data that I want. Using this approach, my data access
times to read an image from a new chunk are:

X chunk: 0.62 seconds
Y chunk: 1.08 seconds
Z chunk: 0.75 seconds

Additional image reads from the same chunk are on the order of 1-2
milliseconds.

I'm not sure why it is so much faster to read in a whole chunk and pull my
data out by hand, rather than letting the HDF5 routines just grab the slice
I need, but that's what I've found.

Eric

···

On Wed, Jan 27, 2010 at 9:15 AM, Eric Reid <eric.reid@toltech.net> wrote:

I've recently started using HDF5 to store ~2.5GB datasets, with the hope to
move to much much larger datasets in the future. However, accessing my data
seems to be slower than I had hoped, and I'm wondering if I'm going about
things the wrong way.

The data is a 3D image set. It is a set of ~3 Billion voxels, with
dimensions of 1760x1024x1878. The data is accessed in all three dimensions
to create 2D images. For example, I could create an image of the "slice" at
325 in the Z dimension to create a 2D image that is 1760x1024, or I could
create an image from 325 in the X direction to create an image that is
1024x1878.

I'm storing the data in the HDF5 file as a 4D array (the 4th dimension
being the RGB info for the voxel). I'm using a chunk size of 32x32x32x3. The
data is gzipped. The data was put into the file using PyTables, and is being
read by C++. The metadata output of HDFView looks like:

8-bit unsigned character, 1878 x 1024 x 1760 x 3
Number of attributes = 3
   CLASS = CARRAY
   VERSION =1.0
   TITLE =

I'm accessing the data using hyperslabs, much like the example code. I can
provide code snippets if necessary.

The problem is the time accessing the data. It is quite a bit slower than I
had hoped.

X slab: 2.17553 seconds
Y slab: 4.19333 seconds
Z slab: 3.09807 seconds

This is on OS X, with a 2.8 GHz i7 processor and 8GB of memory. I saw
similar results when I ran some tests on an EC2 Linux box.

I didn't find anything on the HDF5 website or elsewhere describing best
practices when it comes to laying out the data. Is there a better way? Is
there some trick that I'm missing? The biggest concern is moving to larger
datasets, and whether the data access slows down even more.

Thanks for any help!

Eric Reid

Hi Eric,

Will the access pattern be such that you read a single plane (in X, Y,
or Z) or will there be an iteration through planes?

You use a chunk size of almost 100K which sounds a bit large to me.
Reading an plane in X requires reading 1024/32 * 1878/32 chunks which is
178 MB. Is 2 sec what you expect for that?
I assume that the data was not already in the kernel's file cache.
You'll see that if you size the chunk cache correctly, the next plane
will be much faster.

Using smaller chunks (say 16*16*16*3) requires reading less data, but
gives a bit more overhead. You could try that.

Is access in X, Y, and Z equally important? If so, equal chunk sizes are
the right thing to do. Otherwise you can bias them.

Cheers,
Ger

Eric Reid 01/27/10 5:16 PM >>>

I've recently started using HDF5 to store ~2.5GB datasets, with the hope
to move to much much larger datasets in the future. However, accessing
my data seems to be slower than I had hoped, and I'm wondering if I'm
going about things the wrong way.

The data is a 3D image set. It is a set of ~3 Billion voxels, with
dimensions of 1760x1024x1878. The data is accessed in all three
dimensions to create 2D images. For example, I could create an image of
the "slice" at 325 in the Z dimension to create a 2D image that is
1760x1024, or I could create an image from 325 in the X direction to
create an image that is 1024x1878.

I'm storing the data in the HDF5 file as a 4D array (the 4th dimension
being the RGB info for the voxel). I'm using a chunk size of 32x32x32x3.
The data is gzipped. The data was put into the file using PyTables, and
is being read by C++. The metadata output of HDFView looks like:

8-bit unsigned character, 1878 x 1024 x 1760 x 3
Number of attributes = 3
�� CLASS = CARRAY
�� VERSION =1.0
�� TITLE =

I'm accessing the data using hyperslabs, much like the example code. I
can provide code snippets if necessary.

The problem is the time accessing the data. It is quite a bit slower
than I had hoped.

X slab: 2.17553 seconds
Y slab: 4.19333 seconds
Z slab: 3.09807 seconds

This is on OS X, with a 2.8 GHz i7 processor and 8GB of memory. I saw
similar results when I ran some tests on an EC2 Linux box.

I didn't find anything on the HDF5 website or elsewhere describing best
practices when it comes to laying out the data. Is there a better way?
Is there some trick that I'm missing? The biggest concern is moving to
larger datasets, and whether the data access slows down even more.

Thanks for any help!

Eric Reid

A Wednesday 27 January 2010 18:18:21 Ger van Diepen escrigué:

Hi Eric,

Will the access pattern be such that you read a single plane (in X, Y,
or Z) or will there be an iteration through planes?

You use a chunk size of almost 100K which sounds a bit large to me.
Reading an plane in X requires reading 1024/32 * 1878/32 chunks which is
178 MB. Is 2 sec what you expect for that?

He said he is using zlib, so my guess is that 178 MB in 2 sec (~90 MB/s) is
what zlib can deliver for his machine. Which compression ratio are you
achieving with your dataset? You can try LZO compressor that comes with
PyTables; it should be faster than zlib. Maybe you may also want to disable
compression (if your disk subsystem can get more than 100 MB/s).

But, as Ger said, trying to read such big datasets fast in *any* dimension is
certainly difficult with any technology on the earth :wink:

···

--
Francesc Alted

Ger,

Thanks for the thought. I already ran tests on chunk sizes, iterating every
8 for 8-96 (i.e. 8x8x8x3, 16x16x16x3, etc). 32x32x32x3 gave me the best
results. Every dimension is equally important.

I'm generally reading in one plane at a time, but there will certainly be
cases where users will look at sequential images one after another.

Eric

···

On Wed, Jan 27, 2010 at 10:18 AM, Ger van Diepen <diepen@astron.nl> wrote:

Hi Eric,

Will the access pattern be such that you read a single plane (in X, Y, or
Z) or will there be an iteration through planes?

You use a chunk size of almost 100K which sounds a bit large to me.
Reading an plane in X requires reading 1024/32 * 1878/32 chunks which is
178 MB. Is 2 sec what you expect for that?
I assume that the data was not already in the kernel's file cache.
You'll see that if you size the chunk cache correctly, the next plane will
be much faster.

Using smaller chunks (say 16*16*16*3) requires reading less data, but gives
a bit more overhead. You could try that.

Is access in X, Y, and Z equally important? If so, equal chunk sizes are
the right thing to do. Otherwise you can bias them.

Cheers,
Ger

>>> Eric Reid 01/27/10 5:16 PM >>>

I've recently started using HDF5 to store ~2.5GB datasets, with the hope to
move to much much larger datasets in the future. However, accessing my data
seems to be slower than I had hoped, and I'm wondering if I'm going about
things the wrong way.

The data is a 3D image set. It is a set of ~3 Billion voxels, with
dimensions of 1760x1024x1878. The data is accessed in all three dimensions
to create 2D images. For example, I could create an image of the "slice" at
325 in the Z dimension to create a 2D image that is 1760x1024, or I could
create an image from 325 in the X direction to create an image that is
1024x1878.

I'm storing the data in the HDF5 file as a 4D array (the 4th dimension
being the RGB info for the voxel). I'm using a chunk size of 32x32x32x3. The
data is gzipped. The data was put into the file using PyTables, and is being
read by C++. The metadata output of HDFView looks like:

8-bit unsigned character, 1878 x 1024 x 1760 x 3
Number of attributes = 3
�� CLASS = CARRAY
�� VERSION =1.0
�� TITLE =

I'm accessing the data using hyperslabs, much like the example code. I can
provide code snippets if necessary.

The problem is the time accessing the data. It is quite a bit slower than I
had hoped.

X slab: 2.17553 seconds
Y slab: 4.19333 seconds
Z slab: 3.09807 seconds

This is on OS X, with a 2.8 GHz i7 processor and 8GB of memory. I saw
similar results when I ran some tests on an EC2 Linux box.

I didn't find anything on the HDF5 website or elsewhere describing best
practices when it comes to laying out the data. Is there a better way? Is
there some trick that I'm missing? The biggest concern is moving to larger
datasets, and whether the data access slows down even more.

Thanks for any help!

Eric Reid