Performance hints for large dataset

Hi all,

I'm working with floating point data building up a very large dataset
typically >100Gb of four dimensions (x, y, z, w).
Dimensions are of the size (x,y,z,w) = (601, 482, 61, 1501) in my example.

The aim is to slice (READING ONLY) this dataset in orthogonal directions:
1) (x, *, *, *)
2) (*, y, *, *)
3) (*, *, z, w)

When using a contiguous layout I naturally get good performance for
directions (1) and (2), however it is very poor for (3).
Using a chunking layout of (8,8,8,8) seem to give the best balance so far
for reasonable access times in all directions. but still not as fast as I
was hoping for. My tests also show that compression improves the read
performance slightly.

I'm looking for advise on possible optimization techniques to use for this
problem other than what has been mentioned.
Otherwise, is my only option to move to some (expensive?) parallel solution?

Thanks!

Regards,
Martin

Hi,

Unfortunately, this is indeed the worst you can have. It's completely
normal that you have the worst performance with slicing in these
dimensions. Even with a parallel filesystem, you would need to read
EVERYTHING from the dataset, and then the library would pick up the
pieces you need.
One solution would be to agglomerate several z,w in dimensions 5 and
6, so that you still get some performance, but it will be worse than 1
or even 2.

Cheers,

Matthieu

···

2014-06-12 20:43 GMT+01:00 Martin Sarajærvi <balony@gmail.com>:

Hi all,

I'm working with floating point data building up a very large dataset
typically >100Gb of four dimensions (x, y, z, w).
Dimensions are of the size (x,y,z,w) = (601, 482, 61, 1501) in my example.

The aim is to slice (READING ONLY) this dataset in orthogonal directions:
1) (x, *, *, *)
2) (*, y, *, *)
3) (*, *, z, w)

When using a contiguous layout I naturally get good performance for
directions (1) and (2), however it is very poor for (3).
Using a chunking layout of (8,8,8,8) seem to give the best balance so far
for reasonable access times in all directions. but still not as fast as I
was hoping for. My tests also show that compression improves the read
performance slightly.

I'm looking for advise on possible optimization techniques to use for this
problem other than what has been mentioned.
Otherwise, is my only option to move to some (expensive?) parallel solution?

Thanks!

Regards,
Martin

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/

Hi Martin,

Have you set the chunk cache sufficiently large? Otherwise it will
reread the same chunks again and again. Allthough the system file cache
might hold all those data, I think it's better to size the cache
correctly because of the lookups HDF5 is doing.
E.g. in the case of (*,y,*,*) you'll need a cache of 601*8*61*1501
floats (1.64 GB). I assume have sufficient memory, otherwise you could
adjust the chunk size, especially in z,w.
Your chunks are not particularly large (16384 bytes) leading to a lot
of iops and a large B-tree to index the chunks. On the other hand, when
enlarging the chunks, you''ll need more memory for the chunk cache.

What is the pattern when accessing the data as *,*,z,w? First w, and
thereafter all z? You'll need a much smaller cache when accessing it
like
    for w in 0:nw/ncw (nw is length of w-axis; ncw is chunk-size in
w)
      for z in 0:nz/ncz
        for w1 in 0:ncw
          for z1 in 0:ncz
In this way you handle a full z,w chunk before moving to the next one,
so your cache size needs to be only 601*482*8*8.

I have a program testing 3D data sets of arbitrary size and chunk size
using a cache size depending on the chunk size and access pattern. If
you like to, I can send it.

Cheers,
Ger

Matthieu Brucher <matthieu.brucher@gmail.com> 6/12/2014 10:56 PM

Hi,

Unfortunately, this is indeed the worst you can have. It's completely
normal that you have the worst performance with slicing in these
dimensions. Even with a parallel filesystem, you would need to read
EVERYTHING from the dataset, and then the library would pick up the
pieces you need.
One solution would be to agglomerate several z,w in dimensions 5 and
6, so that you still get some performance, but it will be worse than 1
or even 2.

Cheers,

Matthieu

Hi all,

I'm working with floating point data building up a very large

dataset

typically >100Gb of four dimensions (x, y, z, w).
Dimensions are of the size (x,y,z,w) = (601, 482, 61, 1501) in my

example.

The aim is to slice (READING ONLY) this dataset in orthogonal

directions:

1) (x, *, *, *)
2) (*, y, *, *)
3) (*, *, z, w)

When using a contiguous layout I naturally get good performance for
directions (1) and (2), however it is very poor for (3).
Using a chunking layout of (8,8,8,8) seem to give the best balance so

far

for reasonable access times in all directions. but still not as fast

as I

was hoping for. My tests also show that compression improves the

read

performance slightly.

I'm looking for advise on possible optimization techniques to use for

this

problem other than what has been mentioned.
Otherwise, is my only option to move to some (expensive?) parallel

solution?

Thanks!

Regards,
Martin

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

···

2014-06-12 20:43 GMT+01:00 Martin Sarajærvi <balony@gmail.com>:

Twitter: https://twitter.com/hdf5

--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi Ger,

Thanks for your reply.

I have played a bit with the cache size, but not tried your particular
suggestion.

Sure I could optimize the chunk sizes for each of the slicing directions,
but this does not really solve my problem as I can only have 1 chunk
setting for my dataset or am I missing something? So if I optimize
for (*,y,*,*) including adjusting the cache setting the (*, *, z, w)
slicing would still be slow.

I would be interested in checking your program for chunk/cache size testing
(running Linux here).

Best regards,
Martub

···

On Mon, Jun 16, 2014 at 8:26 AM, Ger van Diepen <diepen@astron.nl> wrote:

Hi Martin,

Have you set the chunk cache sufficiently large? Otherwise it will
reread the same chunks again and again. Allthough the system file cache
might hold all those data, I think it's better to size the cache correctly
because of the lookups HDF5 is doing.

E.g. in the case of (*,y,*,*) you'll need a cache of 601*8*61*1501 floats
(1.64 GB). I assume have sufficient memory, otherwise you could adjust the
chunk size, especially in z,w.

Your chunks are not particularly large (16384 bytes) leading to a lot of
iops and a large B-tree to index the chunks. On the other hand, when
enlarging the chunks, you''ll need more memory for the chunk cache.

What is the pattern when accessing the data as *,*,z,w? First w, and
thereafter all z? You'll need a much smaller cache when accessing it like

    for w in 0:nw/ncw (nw is length of w-axis; ncw is chunk-size in w)

      for z in 0:nz/ncz

        for w1 in 0:ncw

          for z1 in 0:ncz

In this way you handle a full z,w chunk before moving to the next one, so
your cache size needs to be only 601*482*8*8.

I have a program testing 3D data sets of arbitrary size and chunk size
using a cache size depending on the chunk size and access pattern. If you
like to, I can send it.

Cheers,

Ger

>>> Matthieu Brucher <matthieu.brucher@gmail.com> 6/12/2014 10:56 PM >>>

Hi,

Unfortunately, this is indeed the worst you can have. It's completely
normal that you have the worst performance with slicing in these
dimensions. Even with a parallel filesystem, you would need to read
EVERYTHING from the dataset, and then the library would pick up the
pieces you need.
One solution would be to agglomerate several z,w in dimensions 5 and
6, so that you still get some performance, but it will be worse than 1
or even 2.

Cheers,

Matthieu

2014-06-12 20:43 GMT+01:00 Martin Sarajærvi <balony@gmail.com>:
> Hi all,
>
> I'm working with floating point data building up a very large dataset
> typically >100Gb of four dimensions (x, y, z, w).
> Dimensions are of the size (x,y,z,w) = (601, 482, 61, 1501) in my
example.
>
> The aim is to slice (READING ONLY) this dataset in orthogonal directions:
> 1) (x, *, *, *)
> 2) (*, y, *, *)
> 3) (*, *, z, w)
>
> When using a contiguous layout I naturally get good performance for
> directions (1) and (2), however it is very poor for (3).
> Using a chunking layout of (8,8,8,8) seem to give the best balance so far
> for reasonable access times in all directions. but still not as fast as I
> was hoping for. My tests also show that compression improves the read
> performance slightly.
>
> I'm looking for advise on possible optimization techniques to use for
this
> problem other than what has been mentioned.
> Otherwise, is my only option to move to some (expensive?) parallel
solution?
>
> Thanks!
>
> Regards,
> Martin
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5

--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5