[pHFD5] Reading or writing to an nD dataset as a 1D array

matthieu.brucher · October 25, 2013, 10:22am

Hi,

I'm trying to maximize the way data is read from our filesystem, and
mainly this means reading from several readers chunks of 1MiB. My
problem is that sometime the data is not just one elementxnb cells,
but may be 2, 3, or 10 elements. This means that I may not be able to
properly read/write 1MiB if the data is stored as a (say) 2D array.
Is there a way to read from this array as if it were a 1D array?
Recomposing the data is not a problem for me as I has a mapping from
the data id to its location on each process.

Cheers,

Matthieu Brucher

···

--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/

robl · October 25, 2013, 3:59pm

If you turn on collective I/O it's more than likely the underlying
MPI-IO implementation will sort out the uneven distribution among
processes, coalesce small accesses into fewer bigger accesses, and
maybe even designate "aggregator" nodes to assist with scalability.

You'll get all this without needing to recompose the data.

==rob

···

On Fri, Oct 25, 2013 at 12:22:40PM +0200, Matthieu Brucher wrote:

Hi,

I'm trying to maximize the way data is read from our filesystem, and
mainly this means reading from several readers chunks of 1MiB. My
problem is that sometime the data is not just one elementxnb cells,
but may be 2, 3, or 10 elements. This means that I may not be able to
properly read/write 1MiB if the data is stored as a (say) 2D array.
Is there a way to read from this array as if it were a 1D array?
Recomposing the data is not a problem for me as I has a mapping from
the data id to its location on each process.

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

matthieu.brucher · October 25, 2013, 4:25pm

Hi,

All my tests with collective I/O indicate that I should not use it. As
a reference, my tests on collective I/O supposed that the data was
sorted properly before being written, and writing them from different
nodes or specific node (after local gathering) was 3 or 4 times slower
that writing raw files. With my new strategy, the data is sorted
properly (I don't even know how to tell HDF5 what pieces are where)
and written on the fly at almost maximum bandwidth.
Oh, in my test case, the data was almost properly distributed (1
million data per process)...

Thanks,

Matthieu

···

2013/10/25 Rob Latham <robl@mcs.anl.gov>:

On Fri, Oct 25, 2013 at 12:22:40PM +0200, Matthieu Brucher wrote:

Hi,

I'm trying to maximize the way data is read from our filesystem, and
mainly this means reading from several readers chunks of 1MiB. My
problem is that sometime the data is not just one elementxnb cells,
but may be 2, 3, or 10 elements. This means that I may not be able to
properly read/write 1MiB if the data is stored as a (say) 2D array.
Is there a way to read from this array as if it were a 1D array?
Recomposing the data is not a problem for me as I has a mapping from
the data id to its location on each process.

If you turn on collective I/O it's more than likely the underlying
MPI-IO implementation will sort out the uneven distribution among
processes, coalesce small accesses into fewer bigger accesses, and
maybe even designate "aggregator" nodes to assist with scalability.

You'll get all this without needing to recompose the data.

==rob

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: Matthieu Brucher - Squarepoint Capital | LinkedIn
Music band: http://liliejay.com/

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

[pHFD5] Reading or writing to an nD dataset as a 1D array