Interleaving data?

Brandon_Barker · May 22, 2015, 2:45am

Hi All,

If there are two processes, and the rank 0 process has data that looks like:
0 ... 9
20 ... 29
40 ... 49
...

While rank 1 has:

10 ... 19
30 ... 39
...

In this example, I have MPI_CHUNK_SIZE = 10 and the following relevant
configurations:

  /* dataset and memoryset dimensions (just 1d here) */
  hsize_t dimsm[] = {chunk_count * MPI_CHUNK_SIZE};
  hsize_t dimsf[] = {dimsm[0] * mpi_size};

  /* hyperslab offset and size info */
  hsize_t start[] = {mpi_rank * MPI_CHUNK_SIZE * chunk_count};
  hsize_t count[] = {chunk_count * MPI_CHUNK_SIZE};

This gets all the data, but in such a way that all the data from rank
0 comes before rank 1, rather than having the data written in order:
0 ... 9
10 ... 19
20 ... 29
30 ... 39
40 ... 49
...

Is this possible with pHDF5?

Best,
Brandon

Brandon_Barker · May 22, 2015, 3:55pm

I also want to point out that my data is indeed a 1d array; I simply
printed it along multiple rows to make it more readable (and to
reflect how h5dump would show it).

···

On Thu, May 21, 2015 at 10:45 PM, Brandon Barker <brandon.barker@cornell.edu> wrote:

Hi All,

If there are two processes, and the rank 0 process has data that looks like:
0 ... 9
20 ... 29
40 ... 49
...

While rank 1 has:

10 ... 19
30 ... 39
...

In this example, I have MPI_CHUNK_SIZE = 10 and the following relevant
configurations:

  /* dataset and memoryset dimensions (just 1d here) */
  hsize_t dimsm[] = {chunk_count * MPI_CHUNK_SIZE};
  hsize_t dimsf[] = {dimsm[0] * mpi_size};

  /* hyperslab offset and size info */
  hsize_t start[] = {mpi_rank * MPI_CHUNK_SIZE * chunk_count};
  hsize_t count[] = {chunk_count * MPI_CHUNK_SIZE};

This gets all the data, but in such a way that all the data from rank
0 comes before rank 1, rather than having the data written in order:
0 ... 9
10 ... 19
20 ... 29
30 ... 39
40 ... 49
...

Is this possible with pHDF5?

Best,
Brandon

--
Brandon E. Barker
http://www.cac.cornell.edu/barker/

Mohamad_Chaarawi · May 22, 2015, 4:59pm

Note that chunking does not have to do with the hyperslab selections, but just represents how the data is physically stored on disk and allows extending the dataset if you care to do that.

So you can accomplish what you want bellow with contiguous storage too.

What you are missing is setting the block and stride parameters for hyperslab selections.
So you needs something like the following when selecting your hyperslab in the filespace:

#define BLOCK_SIZE 10
#define NUM_BLOCKS x; // set to how many blocks you have (count of 10 integers per process)

start[0] = mpi_rank * BLOCK_SIZE;
count[0] = NUM_BLOCKS;
block[0] = BLOCK_SIZE;
stride[0] = BLOCK_SIZE * mpi_size;

Thanks,
Mohamad

···

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Brandon Barker
Sent: Thursday, May 21, 2015 9:45 PM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] Interleaving data?

Hi All,

If there are two processes, and the rank 0 process has data that looks like:
0 ... 9
20 ... 29
40 ... 49
...

While rank 1 has:

10 ... 19
30 ... 39
...

In this example, I have MPI_CHUNK_SIZE = 10 and the following relevant
configurations:

  /* dataset and memoryset dimensions (just 1d here) */
  hsize_t dimsm[] = {chunk_count * MPI_CHUNK_SIZE};
  hsize_t dimsf[] = {dimsm[0] * mpi_size};

  /* hyperslab offset and size info */
  hsize_t start[] = {mpi_rank * MPI_CHUNK_SIZE * chunk_count};
  hsize_t count[] = {chunk_count * MPI_CHUNK_SIZE};

This gets all the data, but in such a way that all the data from rank
0 comes before rank 1, rather than having the data written in order:
0 ... 9
10 ... 19
20 ... 29
30 ... 39
40 ... 49
...

Is this possible with pHDF5?

Best,
Brandon

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Brandon_Barker · May 22, 2015, 6:07pm

Hi Mohamad,

Send Hdf-forum mailing list submissions to
        hdf-forum@lists.hdfgroup.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

or, via email, send a message with subject or body 'help' to
        hdf-forum-request@lists.hdfgroup.org

You can reach the person managing the list at
        hdf-forum-owner@lists.hdfgroup.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Hdf-forum digest..."

Today's Topics:

   1. Re: Interleaving data? (Brandon Barker)
   2. Re: Interleaving data? (Mohamad Chaarawi)

----------------------------------------------------------------------

Message: 1
Date: Fri, 22 May 2015 11:55:45 -0400
From: Brandon Barker <brandon.barker@cornell.edu>
To: Brandon Barker <brandon.barker@cornell.edu>
Cc: hdf-forum <hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] Interleaving data?
Message-ID:
        <CAJZ53Cna9GJLgWPMcz9FGXdqc0d-P4WutM+FD7dnmt7Q+zrJag@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

I also want to point out that my data is indeed a 1d array; I simply
printed it along multiple rows to make it more readable (and to
reflect how h5dump would show it).

Hi All,

If there are two processes, and the rank 0 process has data that looks like:
0 ... 9
20 ... 29
40 ... 49
...

While rank 1 has:

10 ... 19
30 ... 39
...

In this example, I have MPI_CHUNK_SIZE = 10 and the following relevant
configurations:

  /* dataset and memoryset dimensions (just 1d here) */
  hsize_t dimsm[] = {chunk_count * MPI_CHUNK_SIZE};
  hsize_t dimsf[] = {dimsm[0] * mpi_size};

  /* hyperslab offset and size info */
  hsize_t start[] = {mpi_rank * MPI_CHUNK_SIZE * chunk_count};
  hsize_t count[] = {chunk_count * MPI_CHUNK_SIZE};

This gets all the data, but in such a way that all the data from rank
0 comes before rank 1, rather than having the data written in order:
0 ... 9
10 ... 19
20 ... 29
30 ... 39
40 ... 49
...

Is this possible with pHDF5?

Best,
Brandon

--
Brandon E. Barker
http://www.cac.cornell.edu/barker/

------------------------------

Message: 2
Date: Fri, 22 May 2015 16:59:42 +0000
From: Mohamad Chaarawi <chaarawi@hdfgroup.org>
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] Interleaving data?
Message-ID:
        <BY2PR0701MB195782023FC5B0D315D62FFFBDC00@BY2PR0701MB1957.namprd07.prod.outlook.com>

Content-Type: text/plain; charset="us-ascii"

Note that chunking does not have to do with the hyperslab selections, but just represents how the data is physically stored on disk and allows extending the dataset if you care to do that.

Thanks for pointing it out - I believe my chunks are unrelated to the
HDF5 chunking (which I didn't know about), so I should probably change
the terminology to "segments" to hopefully avoid confusion.

So you can accomplish what you want bellow with contiguous storage too.

What you are missing is setting the block and stride parameters for hyperslab selections.

I actually played around with stride and block a bit, but never got
what I was looking for. I think what I fundamentally missed was that
stride should be in units of elements, not units of blocks - thanks
for the clarification!

···

On Fri, May 22, 2015 at 1:00 PM, <hdf-forum-request@lists.hdfgroup.org> wrote:

On Thu, May 21, 2015 at 10:45 PM, Brandon Barker > <brandon.barker@cornell.edu> wrote:

So you needs something like the following when selecting your hyperslab in the filespace:

#define BLOCK_SIZE 10
#define NUM_BLOCKS x; // set to how many blocks you have (count of 10 integers per process)

start[0] = mpi_rank * BLOCK_SIZE;
count[0] = NUM_BLOCKS;
block[0] = BLOCK_SIZE;
stride[0] = BLOCK_SIZE * mpi_size;

Thanks,
Mohamad

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Brandon Barker
Sent: Thursday, May 21, 2015 9:45 PM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] Interleaving data?

Hi All,

If there are two processes, and the rank 0 process has data that looks like:
0 ... 9
20 ... 29
40 ... 49
...

While rank 1 has:

10 ... 19
30 ... 39
...

In this example, I have MPI_CHUNK_SIZE = 10 and the following relevant
configurations:

  /* dataset and memoryset dimensions (just 1d here) */
  hsize_t dimsm[] = {chunk_count * MPI_CHUNK_SIZE};
  hsize_t dimsf[] = {dimsm[0] * mpi_size};

  /* hyperslab offset and size info */
  hsize_t start[] = {mpi_rank * MPI_CHUNK_SIZE * chunk_count};
  hsize_t count[] = {chunk_count * MPI_CHUNK_SIZE};

This gets all the data, but in such a way that all the data from rank
0 comes before rank 1, rather than having the data written in order:
0 ... 9
10 ... 19
20 ... 29
30 ... 39
40 ... 49
...

Is this possible with pHDF5?

Best,
Brandon

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

------------------------------

Subject: Digest Footer

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

------------------------------

End of Hdf-forum Digest, Vol 71, Issue 16
*****************************************

--
Brandon E. Barker
http://www.cac.cornell.edu/barker/

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Interleaving data?