Output Array Slices

Aaron · December 12, 2016, 6:18pm

Hello,

I have a very specific usage question.

I would like to output a part of an array. Other libraries may call this a
slice. By way of an example, say I have a 3 dimensional array with a size
of 5 in each direction spread over several separate processes. I would
like to write to file all the data with y coordinate 3.

I currently use parallel file I/O by hyperslab, following the example here:
https://support.hdfgroup.org/HDF5/Tutor/phypecont.html

It seems to me that there might be a very easy way to do this using the
memory and file datasets which doesn't require my code to check each
process for participation in the output, etc, but I have failed to find
examples or documentation that outline the constraints. Any help is
appreciated.

···

--
Aaron Friesz

Aaron · January 5, 2017, 6:06pm

I take it this is not something that hdf5 supports within the API?

···

On Mon, Dec 12, 2016 at 10:18 AM, Aaron Friesz <friesz@usc.edu> wrote:

Hello,

I have a very specific usage question.

I would like to output a part of an array. Other libraries may call this
a slice. By way of an example, say I have a 3 dimensional array with a
size of 5 in each direction spread over several separate processes. I
would like to write to file all the data with y coordinate 3.

I currently use parallel file I/O by hyperslab, following the example
here:
https://support.hdfgroup.org/HDF5/Tutor/phypecont.html

It seems to me that there might be a very easy way to do this using the
memory and file datasets which doesn't require my code to check each
process for participation in the output, etc, but I have failed to find
examples or documentation that outline the constraints. Any help is
appreciated.

--
Aaron Friesz

--
Aaron Friesz
-----------------------------------------------------
University of Southern California
Department of Electrical Engineering
MicroPhotonic Devices Group
213 740 9208
friesz@usc.edu

miller86 · January 5, 2017, 6:20pm

"Hdf-forum on behalf of Aaron Friesz" wrote:

I take it this is not something that hdf5 supports within the API?

···

On Mon, Dec 12, 2016 at 10:18 AM, Aaron Friesz <friesz@usc.edu<mailto:friesz@usc.edu>> wrote:
Hello,

I have a very specific usage question.

I would like to output a part of an array. Other libraries may call this a slice. By way of an example, say I have a 3 dimensional array with a size of 5 in each direction spread over several separate processes. I would like to write to file all the data with y coordinate 3.

I currently use parallel file I/O by hyperslab, following the example here:
https://support.hdfgroup.org/HDF5/Tutor/phypecont.html

It seems to me that there might be a very easy way to do this using the memory and file datasets which doesn't require my code to check each process for participation in the output, etc, but I have failed to find examples or documentation that outline the constraints. Any help is appreciated.

I am pretty sure you should be able to do what you describe (e.g. write to the file all the data with y coordinate 3 from several separate processors).

Once you have *created* the target dataset (and its dataspace) in the file (which *requiires* all processors that opened the file to participate in the creation -- there is no way around that), then each processor that *has*data* to write can define its own memory and file dataspaces to write data and then each can, using independent rather than collective I/O, write the data.

The trick is in defining the size/shape of the memory and file dataspaces. But, I am pretty confident it can be done. Alas, I don't have any examples to send. But, you could play with your code writing bogus data (e.g. 1's from proc 1, 2's from proc 2, etc.) and then examine the results with h5dump or h5ls to confirm you have code that is touching the correct parts of the array.

Mark

--
Mark C. Miller, LLNL

Happiness is a shear act of will, not simply a by product
of happenstance. - MCM

Aaron · January 5, 2017, 6:29pm

Hi Mark, Thanks for your response.

In terms of efficiency, since I will likely do output of this type several
hundred or thousand times in a single run, I will end up defining a new MPI
communicator with just the required processes. I was just hoping that HDF5
could do it for me since writing code to determine which processes should
participate, dataset sizes, etc is going to be tedious. And I'm sure it
would be better code if the hdf5 team had written it.

Regardless, I'll give it a crack and see if I can't come up with a decent
way to get the job done.

···

On Thu, Jan 5, 2017 at 10:20 AM, Miller, Mark C. <miller86@llnl.gov> wrote:

"Hdf-forum on behalf of Aaron Friesz" wrote:

I take it this is not something that hdf5 supports within the API?

On Mon, Dec 12, 2016 at 10:18 AM, Aaron Friesz <friesz@usc.edu> wrote:

Hello,

I have a very specific usage question.

I would like to output a part of an array. Other libraries may call this
a slice. By way of an example, say I have a 3 dimensional array with a
size of 5 in each direction spread over several separate processes. I
would like to write to file all the data with y coordinate 3.

I currently use parallel file I/O by hyperslab, following the example
here:
https://support.hdfgroup.org/HDF5/Tutor/phypecont.html
<https://urldefense.proofpoint.com/v2/url?u=https-3A__support.hdfgroup.org_HDF5_Tutor_phypecont.html&d=DwMFAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=Rx9txIqgEINHtVDIDfXdIw&m=gic9zbpOhrhnX5-j7FCTRslkXkEMypp7EZiFl4xA7JM&s=boH96wqEKVddaycOxJd8GrRzvByBoMiY1QMrXBfRQuA&e=>

It seems to me that there might be a very easy way to do this using the
memory and file datasets which doesn't require my code to check each
process for participation in the output, etc, but I have failed to find
examples or documentation that outline the constraints. Any help is
appreciated.

I am pretty sure you should be able to do what you describe (e.g. write to
the file all the data with y coordinate 3 from several separate processors).

Once you have *created* the target dataset (and its dataspace) in the file
(which *requiires* all processors that opened the file to participate in
the creation -- there is no way around that), then each processor that
*has*data* to write can define its own memory and file dataspaces to write
data and then each can, using independent rather than collective I/O, write
the data.

The trick is in defining the size/shape of the memory and file dataspaces.
But, I am pretty confident it can be done. Alas, I don't have any examples
to send. But, you could play with your code writing bogus data (e.g. 1's
from proc 1, 2's from proc 2, etc.) and then examine the results with
h5dump or h5ls to confirm you have code that is touching the correct parts
of the array.

Mark

--
Mark C. Miller, LLNL

Happiness is a shear act of will, not simply a by product
of happenstance. - MCM

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
Proofpoint Targeted Attack Protection.
hdfgroup.org_mailman_listinfo_hdf-2Dforum-5Flists.hdfgroup.org&d=DwICAg&c=
clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=Rx9txIqgEINHtVDIDfXdIw&m=
gic9zbpOhrhnX5-j7FCTRslkXkEMypp7EZiFl4xA7JM&s=LoA4m__xHwFMJtBm0N-
1it7sH90QZj4z55tGUU7kvo0&e=
Twitter: Proofpoint Targeted Attack Protection
twitter.com_hdf5&d=DwICAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=
Rx9txIqgEINHtVDIDfXdIw&m=gic9zbpOhrhnX5-j7FCTRslkXkEMypp7EZiFl4xA7JM&
s=Gm9pMuv42BIORM_XnQK8fS-HfZdnvb4pNBH4Tum9Ouw&e=

miller86 · January 5, 2017, 6:49pm

"Hdf-forum on behalf of Aaron Friesz" wrote:

Hi Mark, Thanks for your response.

In terms of efficiency, since I will likely do output of this type several hundred or thousand times in a single run, I will end up defining a new MPI communicator with just the required processes.

Well, if you do that, *and* if you want to use only those processors to make HDF5 calls to create datasets *and* write data, then you will also need to *open* the HDF5 file with that new communicator.

I was just hoping that HDF5 could do it for me

Nope. Alas, HDF5 isn't going to help in creating MPI communicators for only those processors with data.

since writing code to determine which processes should participate, dataset sizes, etc is going to be tedious.

int haveData = 0;
MPI_Comm newComm;
/* set haveData to 1 if this proc has data */
MPI_Comm_split(MPI_COMM_WORLD, haveData, haveData, &newComm);

Now, newComm is a communicator which involves *only* the processors with data ordered in rank according to their old ranks.

···

--
Miller, Mark C.

"Those who would [, even temporarily,] sacrafice
essential liberties in the name of security deserve
neither." BF*

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Output Array Slices