Collective H5Dopen

Hi all,

the HDF5 documentation states that H5Dopen may be called independently
if the target object is not modified. What exactly is the definition of
a modification in that case? Does writing to the data set count or is it
only resizing/attribute setting/...

Our application has to write in parallel to on the order order of 1
million independent data sets in a single file. We currently do it by
creating the file up-front and closing it again. Each rank then opens
only the data sets he wants to write to and does its thing. Note that at
this point only the actual data values change - everything else is
already in the file.

At least this is what we would like to do - currently we have to call
H5Dopen for all datasets on each rank to get it to work correctly.
Interestingly enough it also appears to work if each rank opens exactly
the same number of data sets.

Can H5Dopen() work in this way?

If not: can you advise us on the fastest way of writing 1 million
separate data sets in parallel on many cores?

All the best and thanks!

Lion

"Hdf-forum on behalf of Lion Krischer" wrote:

Hi all,

the HDF5 documentation states that H5Dopen may be called independently
if the target object is not modified. What exactly is the definition of
a modification in that case? Does writing to the data set count or is it
only resizing/attribute setting/...

Hmm. Interesting question. I would think *any* operation that changes data (either
metadata or raw data) is a *modification*. But, I think I kinda see where you are
going here...If you've created a (non-extendible) dataset with no checksum or compression
filters etc., all you wanna do is change the raw data but not perterb any of the HDF5 file's
metadata. I don't think the HDF5 library would treat that as a non-modification though. I think a write
operation (even on the raw data) can wind up changing how the library caches dataset and file
metadata in memory thereby creating a situation where two different tasks have a different idea
of the file's cached metadata. When it comes time to close the file, which tasks' view of the metadata
is correct?

Its concievable you could manually do it by obtaining the dataset's offset in the file, calling
any necessary H5Tconvert method on your buffer just prior to writing in and then writing the
buffer yourself to the file via pwrite or something. That would essentially skirt HDF5 though
and probably be too complex to be worth it.

It might be worth looking at, H5DOwrite_chunk() though, https://support.hdfgroup.org/HDF5/doc/HL/RM_HDF5Optimized.html
to ask how that works and if it achieves or gets close to behavior you want.

Mark

···

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Thanks a lot for your input!

We'll try to directly use MPI I/O to write to the file after getting the
offsets from the initially created file. Thanks for the suggestion! As
we use none of the advanced features a straight mmap or its equivalent
with MPI I/O should work.

Our code guarantees that each data set is only written to from a single
rank so that should work and metadata synchronization issues do not
really exist. It would be great if something like this could be
supported directly via the hdf5 library. Maybe a H5DOwrite_directly()
that does some very basic checks and then writes straight to the
uncompressed, unchunked, .. dataset without even touching the metadata
so it should work fine with parallel I/O?

Does H5DOwrite_chunk() work for parallel code? I cannot really tell from
the documentation and it also appears to solve the way more complicated
problem of efficiently writing chunked data.

We'd still be happy to hear additional suggestions :slight_smile:

Cheers!

Lion

···

Hmm. Interesting question. I would think *any* operation that changes
data (either

metadata or raw data) is a *modification*. But, I think I kinda see
where you are

going here...If you've created a (non-extendible) dataset with no
checksum or compression

filters etc., all you wanna do is change the raw data but not perterb
any of the HDF5 file's

metadata. I don't think the HDF5 library would treat that as a
non-modification though. I think a write

operation (even on the raw data) can wind up changing how the library
caches dataset and file

metadata in memory thereby creating a situation where two different
tasks have a different idea

of the file's cached metadata. When it comes time to close the file,
which tasks' view of the metadata

is correct?

Its concievable you could manually do it by obtaining the dataset's
offset in the file, calling

any necessary H5Tconvert method on your buffer just prior to writing
in and then writing the

buffer yourself to the file via pwrite or something. That would
essentially skirt HDF5 though

and probably be too complex to be worth it.

It might be worth looking at, H5DOwrite_chunk() though,
https://support.hdfgroup.org/HDF5/doc/HL/RM_HDF5Optimized.html

to ask how that works and if it achieves or gets close to behavior you
want.

Mark

_______________________________________________

Hdf-forum is for HDF software users discussion.

Hdf-forum@lists.hdfgroup.org <mailto:Hdf-forum@lists.hdfgroup.org>

http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi Lion,

I checked with the developers...
H5DOwrite_chunk() does not work with parallel HDF5.

-Barbara
help@hdfgroup.org

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Lion Krischer
Sent: Tuesday, March 14, 2017 3:43 PM
To: hdf-forum@lists.hdfgroup.org
Subject: Re: [Hdf-forum] Collective H5Dopen

Thanks a lot for your input!

We'll try to directly use MPI I/O to write to the file after getting the offsets from the initially created file. Thanks for the suggestion! As we use none of the advanced features a straight mmap or its equivalent with MPI I/O should work.

Our code guarantees that each data set is only written to from a single rank so that should work and metadata synchronization issues do not really exist. It would be great if something like this could be supported directly via the hdf5 library. Maybe a H5DOwrite_directly() that does some very basic checks and then writes straight to the uncompressed, unchunked, .. dataset without even touching the metadata so it should work fine with parallel I/O?
Does H5DOwrite_chunk() work for parallel code? I cannot really tell from the documentation and it also appears to solve the way more complicated problem of efficiently writing chunked data.

We'd still be happy to hear additional suggestions :slight_smile:

Cheers!

Lion

Hmm. Interesting question. I would think *any* operation that changes data (either
metadata or raw data) is a *modification*. But, I think I kinda see where you are
going here...If you've created a (non-extendible) dataset with no checksum or compression
filters etc., all you wanna do is change the raw data but not perterb any of the HDF5 file's
metadata. I don't think the HDF5 library would treat that as a non-modification though. I think a write
operation (even on the raw data) can wind up changing how the library caches dataset and file
metadata in memory thereby creating a situation where two different tasks have a different idea
of the file's cached metadata. When it comes time to close the file, which tasks' view of the metadata
is correct?

Its concievable you could manually do it by obtaining the dataset's offset in the file, calling
any necessary H5Tconvert method on your buffer just prior to writing in and then writing the
buffer yourself to the file via pwrite or something. That would essentially skirt HDF5 though
and probably be too complex to be worth it.

It might be worth looking at, H5DOwrite_chunk() though, HDF5 Optimized Functions
to ask how that works and if it achieves or gets close to behavior you want.

Mark

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________

Hdf-forum is for HDF software users discussion.

Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>

http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Twitter: https://twitter.com/hdf5

Hi Barbara and Mark,

thanks for getting back to me.

As we tried to implement direct MPI I/O we discovered that HDF5 does not
directly allocate created datasets so we could not get the offsets to do
the direct writing. Doing

auto plist = H5Pcreate(H5P_DATASET_CREATE);
H5Pset_layout(plist, H5D_CONTIGUOUS);
H5Pset_alloc_time(plist, H5D_ALLOC_TIME_EARLY);

before calling H5Dcreate(); and passing plist does allocate the data -
then the offsets works.

Turns out that closing the serially written file, reopening it in
parallel and writing independently with H5DOpen() and H5Dwrite() then
works fine. So I guess the modification of the dataset does then not
count as a modification and it does not have to be collective. The files
close cleanly and the output is correct.

Hope it helps someone who runs into the same issue!

It would also be very helpful to document what exactly counts as a
modification the collective/independent call page.

Cheers!

Lion

···

On 20/03/2017 18:13, Barbara Jones wrote:

Hi Lion,

I checked with the developers…

H5DOwrite_chunk() does not work with parallel HDF5.

-Barbara

help@hdfgroup.org

*From:*Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] *On
Behalf Of *Lion Krischer
*Sent:* Tuesday, March 14, 2017 3:43 PM
*To:* hdf-forum@lists.hdfgroup.org
*Subject:* Re: [Hdf-forum] Collective H5Dopen

Thanks a lot for your input!

We'll try to directly use MPI I/O to write to the file after getting
the offsets from the initially created file. Thanks for the
suggestion! As we use none of the advanced features a straight mmap or
its equivalent with MPI I/O should work.

Our code guarantees that each data set is only written to from a
single rank so that should work and metadata synchronization issues do
not really exist. It would be great if something like this could be
supported directly via the hdf5 library. Maybe a H5DOwrite_directly()
that does some very basic checks and then writes straight to the
uncompressed, unchunked, .. dataset without even touching the metadata
so it should work fine with parallel I/O?

Does H5DOwrite_chunk() work for parallel code? I cannot really tell
from the documentation and it also appears to solve the way more
complicated problem of efficiently writing chunked data.

We'd still be happy to hear additional suggestions :slight_smile:

Cheers!

Lion

Hmm. Interesting question. I would think *any* operation that changes
data (either

metadata or raw data) is a *modification*. But, I think I kinda see
where you are

going here...If you've created a (non-extendible) dataset with no
checksum or compression

filters etc., all you wanna do is change the raw data but not perterb
any of the HDF5 file's

metadata. I don't think the HDF5 library would treat that as a
non-modification though. I think a write

operation (even on the raw data) can wind up changing how the library
caches dataset and file

metadata in memory thereby creating a situation where two different
tasks have a different idea

of the file's cached metadata. When it comes time to close the file,
which tasks' view of the metadata

is correct?

Its concievable you could manually do it by obtaining the dataset's
offset in the file, calling

any necessary H5Tconvert method on your buffer just prior to writing
in and then writing the

buffer yourself to the file via pwrite or something. That would
essentially skirt HDF5 though

and probably be too complex to be worth it.

It might be worth looking at, H5DOwrite_chunk() though,
https://support.hdfgroup.org/HDF5/doc/HL/RM_HDF5Optimized.html

to ask how that works and if it achieves or gets close to behavior you
want.

Mark

_______________________________________________

Hdf-forum is for HDF software users discussion.

Hdf-forum@lists.hdfgroup.org <mailto:Hdf-forum@lists.hdfgroup.org>

http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org <mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi Lion,

My apologies if my guesses about HDF5 behavior were misleading.

I had hoped that if I stated something patently wrong about HDF5, a more cognizant developer might speak up and correct me :wink:

Glad you got things working and thanks for this information.

Mark

"Hdf-forum on behalf of Lion Krischer" wrote:

Hi Barbara and Mark,

thanks for getting back to me.

As we tried to implement direct MPI I/O we discovered that HDF5 does not directly allocate created datasets so we could not get the offsets to do the direct writing. Doing

auto plist = H5Pcreate(H5P_DATASET_CREATE);
H5Pset_layout(plist, H5D_CONTIGUOUS);
H5Pset_alloc_time(plist, H5D_ALLOC_TIME_EARLY);

before calling H5Dcreate(); and passing plist does allocate the data - then the offsets works.

Turns out that closing the serially written file, reopening it in parallel and writing independently with H5DOpen() and H5Dwrite() then works fine. So I guess the modification of the dataset does then not count as a modification and it does not have to be collective. The files close cleanly and the output is correct.

Hope it helps someone who runs into the same issue!

It would also be very helpful to document what exactly counts as a modification the collective/independent call page.

Cheers!

Lion

···

On 20/03/2017 18:13, Barbara Jones wrote:
Hi Lion,

I checked with the developers…
H5DOwrite_chunk() does not work with parallel HDF5.

-Barbara
help@hdfgroup.org<mailto:help@hdfgroup.org>

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Lion Krischer
Sent: Tuesday, March 14, 2017 3:43 PM
To: hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] Collective H5Dopen

Thanks a lot for your input!

We'll try to directly use MPI I/O to write to the file after getting the offsets from the initially created file. Thanks for the suggestion! As we use none of the advanced features a straight mmap or its equivalent with MPI I/O should work.

Our code guarantees that each data set is only written to from a single rank so that should work and metadata synchronization issues do not really exist. It would be great if something like this could be supported directly via the hdf5 library. Maybe a H5DOwrite_directly() that does some very basic checks and then writes straight to the uncompressed, unchunked, .. dataset without even touching the metadata so it should work fine with parallel I/O?
Does H5DOwrite_chunk() work for parallel code? I cannot really tell from the documentation and it also appears to solve the way more complicated problem of efficiently writing chunked data.

We'd still be happy to hear additional suggestions :slight_smile:

Cheers!

Lion

Hmm. Interesting question. I would think *any* operation that changes data (either
metadata or raw data) is a *modification*. But, I think I kinda see where you are
going here...If you've created a (non-extendible) dataset with no checksum or compression
filters etc., all you wanna do is change the raw data but not perterb any of the HDF5 file's
metadata. I don't think the HDF5 library would treat that as a non-modification though. I think a write
operation (even on the raw data) can wind up changing how the library caches dataset and file
metadata in memory thereby creating a situation where two different tasks have a different idea
of the file's cached metadata. When it comes time to close the file, which tasks' view of the metadata
is correct?

Its concievable you could manually do it by obtaining the dataset's offset in the file, calling
any necessary H5Tconvert method on your buffer just prior to writing in and then writing the
buffer yourself to the file via pwrite or something. That would essentially skirt HDF5 though
and probably be too complex to be worth it.

It might be worth looking at, H5DOwrite_chunk() though, HDF5 Optimized Functions
to ask how that works and if it achieves or gets close to behavior you want.

Mark

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________

Hdf-forum is for HDF software users discussion.

Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>

http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Twitter: https://twitter.com/hdf5

_______________________________________________

Hdf-forum is for HDF software users discussion.

Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>

http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Twitter: https://twitter.com/hdf5

Hi Mark,

no need to be sorry - your hint did lead to a working solution after all :slight_smile:

Cheers!

Lion

···

On 20/03/2017 20:44, Miller, Mark C. wrote:

Hi Lion,

My apologies if my guesses about HDF5 behavior were misleading.

I had hoped that if I stated something patently wrong about HDF5, a
more cognizant developer might speak up and correct me :wink:

Glad you got things working and thanks for this information.

Mark

"Hdf-forum on behalf of Lion Krischer" wrote:

Hi Barbara and Mark,

thanks for getting back to me.

As we tried to implement direct MPI I/O we discovered that HDF5 does
not directly allocate created datasets so we could not get the offsets
to do the direct writing. Doing

auto plist = H5Pcreate(H5P_DATASET_CREATE);
H5Pset_layout(plist, H5D_CONTIGUOUS);
H5Pset_alloc_time(plist, H5D_ALLOC_TIME_EARLY);

before calling H5Dcreate(); and passing plist does allocate the data -
then the offsets works.

Turns out that closing the serially written file, reopening it in
parallel and writing independently with H5DOpen() and H5Dwrite() then
works fine. So I guess the modification of the dataset does then not
count as a modification and it does not have to be collective. The
files close cleanly and the output is correct.

Hope it helps someone who runs into the same issue!

It would also be very helpful to document what exactly counts as a
modification the collective/independent call page.

Cheers!

Lion

On 20/03/2017 18:13, Barbara Jones wrote:

    Hi Lion,

    I checked with the developers�

    H5DOwrite_chunk() does not work with parallel HDF5.

    -Barbara

    help@hdfgroup.org <mailto:help@hdfgroup.org>

    *From:*Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] *On
    Behalf Of *Lion Krischer
    *Sent:* Tuesday, March 14, 2017 3:43 PM
    *To:* hdf-forum@lists.hdfgroup.org
    <mailto:hdf-forum@lists.hdfgroup.org>
    *Subject:* Re: [Hdf-forum] Collective H5Dopen

    Thanks a lot for your input!

    We'll try to directly use MPI I/O to write to the file after
    getting the offsets from the initially created file. Thanks for
    the suggestion! As we use none of the advanced features a straight
    mmap or its equivalent with MPI I/O should work.

    Our code guarantees that each data set is only written to from a
    single rank so that should work and metadata synchronization
    issues do not really exist. It would be great if something like
    this could be supported directly via the hdf5 library. Maybe a
    H5DOwrite_directly() that does some very basic checks and then
    writes straight to the uncompressed, unchunked, .. dataset without
    even touching the metadata so it should work fine with parallel I/O?

    Does H5DOwrite_chunk() work for parallel code? I cannot really
    tell from the documentation and it also appears to solve the way
    more complicated problem of efficiently writing chunked data.

    We'd still be happy to hear additional suggestions :slight_smile:

    Cheers!

    Lion

    Hmm. Interesting question. I would think *any* operation that
    changes data (either

    metadata or raw data) is a *modification*. But, I think I kinda
    see where you are

    going here...If you've created a (non-extendible) dataset with no
    checksum or compression

    filters etc., all you wanna do is change the raw data but not
    perterb any of the HDF5 file's

    metadata. I don't think the HDF5 library would treat that as a
    non-modification though. I think a write

    operation (even on the raw data) can wind up changing how the
    library caches dataset and file

    metadata in memory thereby creating a situation where two
    different tasks have a different idea

    of the file's cached metadata. When it comes time to close the
    file, which tasks' view of the metadata

    is correct?

    Its concievable you could manually do it by obtaining the
    dataset's offset in the file, calling

    any necessary H5Tconvert method on your buffer just prior to
    writing in and then writing the

    buffer yourself to the file via pwrite or something. That would
    essentially skirt HDF5 though

    and probably be too complex to be worth it.

    It might be worth looking at, H5DOwrite_chunk() though,
    https://support.hdfgroup.org/HDF5/doc/HL/RM_HDF5Optimized.html

    to ask how that works and if it achieves or gets close to behavior
    you want.

    Mark

    _______________________________________________

    Hdf-forum is for HDF software users discussion.

    Hdf-forum@lists.hdfgroup.org <mailto:Hdf-forum@lists.hdfgroup.org>

    http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

    Twitter: https://twitter.com/hdf5

    _______________________________________________

    Hdf-forum is for HDF software users discussion.

    Hdf-forum@lists.hdfgroup.org <mailto:Hdf-forum@lists.hdfgroup.org>

    http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

    Twitter: https://twitter.com/hdf5

    _______________________________________________

    Hdf-forum is for HDF software users discussion.

    Hdf-forum@lists.hdfgroup.org <mailto:Hdf-forum@lists.hdfgroup.org>

    http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

    Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5