One dataset per process

John_Biddiscombe · May 8, 2009, 8:35am

When attempting to create one dataset per process inside a single HDF file opened in parallel, one must use H5Dcreate(...) which is a collective operation.

Is there a way of creating a different dataset on each process and writing these into a single file. I'd like to write a multiblock structure - one block per process - where in general each process has no knowledge of what the other processes are to write (making the collective create call a problem)

Is there any way of specifing an access mode that enables datasets to be created independently within the same file? (some kind of dummy synchronize in place of the H5Dcreate on the processes which are not writing that particular dataset).

I presume this kind of writing pattern is quite common, but can't find docs/tutorial references which make use of it. If such examples exist, please direct me to them.

Many thanks

JB

···

--
John Biddiscombe, email:biddisco @ cscs.ch

CSCS, Swiss National Supercomputing Centre | Tel: +41 (91) 610.82.07
Via Cantonale, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Quincey_Koziol · May 8, 2009, 12:56pm

Hi John,

When attempting to create one dataset per process inside a single HDF file opened in parallel, one must use H5Dcreate(...) which is a collective operation.

Is there a way of creating a different dataset on each process and writing these into a single file. I'd like to write a multiblock structure - one block per process - where in general each process has no knowledge of what the other processes are to write (making the collective create call a problem)

Is there any way of specifing an access mode that enables datasets to be created independently within the same file? (some kind of dummy synchronize in place of the H5Dcreate on the processes which are not writing that particular dataset).

I presume this kind of writing pattern is quite common, but can't find docs/tutorial references which make use of it. If such examples exist, please direct me to them.

This has been a desired mode of operation for HDF5 metadata operations for a long time, but unfortunately we haven't had any funding toward implementing that mode of operation. We'd very much like to do so and have several internal white papers with sketches about how we'd like to proceed, but it's a large effort and we'll need resources in order to pursue it.

Quincey

···

On May 8, 2009, at 3:35 AM, John Biddiscombe wrote:

Many thanks

JB

--
John Biddiscombe, email:biddisco @ cscs.ch
http://www.cscs.ch/
CSCS, Swiss National Supercomputing Centre | Tel: +41 (91) 610.82.07
Via Cantonale, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

werner · May 8, 2009, 1:43pm

As far as I know (but could be wrong), H5Dcreate() is a collective operation,
but writing hyperslabs is not. So you can create one large global empty
dataset, and each processor writes an hyperslab to it, independently of the
others. This approach however requires additional user-defined metadata
information to identify and find what is where, so depends on the application
if it can be designed that way.

(Quincey, correct me it that is false).

Werner

···

On Fri, 08 May 2009 03:35:56 -0500, John Biddiscombe <biddisco@cscs.ch> wrote:

When attempting to create one dataset per process inside a single HDF file opened in parallel, one must use H5Dcreate(...) which is a collective operation.

Is there a way of creating a different dataset on each process and writing these into a single file. I'd like to write a multiblock structure - one block per process - where in general each process has no knowledge of what the other processes are to write (making the collective create call a problem)

Is there any way of specifing an access mode that enables datasets to be created independently within the same file? (some kind of dummy synchronize in place of the H5Dcreate on the processes which are not writing that particular dataset).

I presume this kind of writing pattern is quite common, but can't find docs/tutorial references which make use of it. If such examples exist, please direct me to them.

Many thanks

JB

--
___________________________________________________________________________
Dr. Werner Benger <werner@cct.lsu.edu> Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
239 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Quincey_Koziol · May 8, 2009, 1:57pm

Hi Werner,

As far as I know (but could be wrong), H5Dcreate() is a collective operation,
but writing hyperslabs is not. So you can create one large global empty
dataset, and each processor writes an hyperslab to it, independently of the
others. This approach however requires additional user-defined metadata
information to identify and find what is where, so depends on the application
if it can be designed that way.

(Quincey, correct me it that is false).

You are correct, and this might work for some applications.

Quincey

···

On May 8, 2009, at 8:43 AM, Werner Benger wrote:

Werner

On Fri, 08 May 2009 03:35:56 -0500, John Biddiscombe > <biddisco@cscs.ch> wrote:

When attempting to create one dataset per process inside a single HDF file opened in parallel, one must use H5Dcreate(...) which is a collective operation.

Is there a way of creating a different dataset on each process and writing these into a single file. I'd like to write a multiblock structure - one block per process - where in general each process has no knowledge of what the other processes are to write (making the collective create call a problem)

Is there any way of specifing an access mode that enables datasets to be created independently within the same file? (some kind of dummy synchronize in place of the H5Dcreate on the processes which are not writing that particular dataset).

I presume this kind of writing pattern is quite common, but can't find docs/tutorial references which make use of it. If such examples exist, please direct me to them.

Many thanks

JB

--
___________________________________________________________________________
Dr. Werner Benger <werner@cct.lsu.edu> Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
239 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

sven · May 8, 2009, 1:11pm

I would highly welcome that implementation because I have the same problem as John for my code. I haven't found any good solution yet with the given ability of the current release of HDF5.

Sven Reiche

···

On May 8, 2009, at 2:56 PM, Quincey Koziol wrote:

Hi John,

On May 8, 2009, at 3:35 AM, John Biddiscombe wrote:

When attempting to create one dataset per process inside a single HDF file opened in parallel, one must use H5Dcreate(...) which is a collective operation.

Is there a way of creating a different dataset on each process and writing these into a single file. I'd like to write a multiblock structure - one block per process - where in general each process has no knowledge of what the other processes are to write (making the collective create call a problem)

Is there any way of specifing an access mode that enables datasets to be created independently within the same file? (some kind of dummy synchronize in place of the H5Dcreate on the processes which are not writing that particular dataset).

I presume this kind of writing pattern is quite common, but can't find docs/tutorial references which make use of it. If such examples exist, please direct me to them.

This has been a desired mode of operation for HDF5 metadata operations for a long time, but unfortunately we haven't had any funding toward implementing that mode of operation. We'd very much like to do so and have several internal white papers with sketches about how we'd like to proceed, but it's a large effort and we'll need resources in order to pursue it.

Quincey

Many thanks

JB

--
John Biddiscombe, email:biddisco @ cscs.ch
http://www.cscs.ch/
CSCS, Swiss National Supercomputing Centre | Tel: +41 (91) 610.82.07
Via Cantonale, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

John_Biddiscombe · May 8, 2009, 1:19pm

Quincey

We'd very much like to do so and have several internal white papers with sketches about how we'd like to proceed, but it's a large effort and we'll need resources in order to pursue it.

I'd be very interested in seeing the white papers if they are available to the general public. We have potentially several man months of effort that could be diverted into making this work.

JB

···

--
John Biddiscombe, email:biddisco @ cscs.ch

CSCS, Swiss National Supercomputing Centre | Tel: +41 (91) 610.82.07
Via Cantonale, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

John_Biddiscombe · May 8, 2009, 1:49pm

Werner

This is what we'll do for now. Each process will participate in the create, but only the process with actual data will write anything. It means sending a bunch of messages between processes so that everyone gets the H5Dcreate params the same - which is a pity (especially since my previous writer had the same problem, but that time it was caused by one process with zero data - as part of a collective write to a single dataset - same deadlock issues)

JB

···

As far as I know (but could be wrong), H5Dcreate() is a collective operation,
but writing hyperslabs is not. So you can create one large global empty
dataset, and each processor writes an hyperslab to it, independently of the
others. This approach however requires additional user-defined metadata
information to identify and find what is where, so depends on the application
if it can be designed that way.

(Quincey, correct me it that is false).

Werner

On Fri, 08 May 2009 03:35:56 -0500, John Biddiscombe > <biddisco@cscs.ch> wrote:

When attempting to create one dataset per process inside a single HDF file opened in parallel, one must use H5Dcreate(...) which is a collective operation.

Is there a way of creating a different dataset on each process and writing these into a single file. I'd like to write a multiblock structure - one block per process - where in general each process has no knowledge of what the other processes are to write (making the collective create call a problem)

Is there any way of specifing an access mode that enables datasets to be created independently within the same file? (some kind of dummy synchronize in place of the H5Dcreate on the processes which are not writing that particular dataset).

I presume this kind of writing pattern is quite common, but can't find docs/tutorial references which make use of it. If such examples exist, please direct me to them.

Many thanks

JB

--
John Biddiscombe, email:biddisco @ cscs.ch

CSCS, Swiss National Supercomputing Centre | Tel: +41 (91) 610.82.07
Via Cantonale, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

sven · May 8, 2009, 1:56pm

It can be done that way but i think it is completely against the HDF5 philosophy.
Let's assume you want to write out e.g. the kinetic energy of particles.
Node 1 has 3 elements and Node has 5. You would end up with 8 elements in the single dataset.
However if you need to distinguish the elements from the different nodes, which is the case for my problem to be solved,
the information gets lost and you are required to supply additional information to describe your single monolithic dataset.
In single processor mode you would create two datasets and add an attribute to each about the additional information (e.g. charge), which is a much more elegant way to do.

Sven

···

On May 8, 2009, at 3:43 PM, Werner Benger wrote:

As far as I know (but could be wrong), H5Dcreate() is a collective operation,
but writing hyperslabs is not. So you can create one large global empty
dataset, and each processor writes an hyperslab to it, independently of the
others. This approach however requires additional user-defined metadata
information to identify and find what is where, so depends on the application
if it can be designed that way.

(Quincey, correct me it that is false).

Werner

On Fri, 08 May 2009 03:35:56 -0500, John Biddiscombe > <biddisco@cscs.ch> wrote:

When attempting to create one dataset per process inside a single HDF file opened in parallel, one must use H5Dcreate(...) which is a collective operation.

Is there a way of creating a different dataset on each process and writing these into a single file. I'd like to write a multiblock structure - one block per process - where in general each process has no knowledge of what the other processes are to write (making the collective create call a problem)

Is there any way of specifing an access mode that enables datasets to be created independently within the same file? (some kind of dummy synchronize in place of the H5Dcreate on the processes which are not writing that particular dataset).

I presume this kind of writing pattern is quite common, but can't find docs/tutorial references which make use of it. If such examples exist, please direct me to them.

Many thanks

JB

--
___________________________________________________________________________
Dr. Werner Benger <werner@cct.lsu.edu> Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
239 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Quincey_Koziol · May 8, 2009, 1:58pm

Hi all,

It can be done that way but i think it is completely against the HDF5 philosophy.

Yes, I agree with this statement also - it does go against the "self-describing" nature of the library/file format.

Quincey

···

On May 8, 2009, at 8:56 AM, Sven Reiche wrote:

Let's assume you want to write out e.g. the kinetic energy of particles.
Node 1 has 3 elements and Node has 5. You would end up with 8 elements in the single dataset.
However if you need to distinguish the elements from the different nodes, which is the case for my problem to be solved,
the information gets lost and you are required to supply additional information to describe your single monolithic dataset.
In single processor mode you would create two datasets and add an attribute to each about the additional information (e.g. charge), which is a much more elegant way to do.

Sven

On May 8, 2009, at 3:43 PM, Werner Benger wrote:

As far as I know (but could be wrong), H5Dcreate() is a collective operation,
but writing hyperslabs is not. So you can create one large global empty
dataset, and each processor writes an hyperslab to it, independently of the
others. This approach however requires additional user-defined metadata
information to identify and find what is where, so depends on the application
if it can be designed that way.

(Quincey, correct me it that is false).

Werner

On Fri, 08 May 2009 03:35:56 -0500, John Biddiscombe <biddisco@cscs.ch >> > wrote:

When attempting to create one dataset per process inside a single HDF file opened in parallel, one must use H5Dcreate(...) which is a collective operation.

Is there a way of creating a different dataset on each process and writing these into a single file. I'd like to write a multiblock structure - one block per process - where in general each process has no knowledge of what the other processes are to write (making the collective create call a problem)

Is there any way of specifing an access mode that enables datasets to be created independently within the same file? (some kind of dummy synchronize in place of the H5Dcreate on the processes which are not writing that particular dataset).

I presume this kind of writing pattern is quite common, but can't find docs/tutorial references which make use of it. If such examples exist, please direct me to them.

Many thanks

JB

--
___________________________________________________________________________
Dr. Werner Benger <werner@cct.lsu.edu> Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
239 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Quincey_Koziol · May 8, 2009, 2:00pm

Hi John,

Quincey

We'd very much like to do so and have several internal white papers with sketches about how we'd like to proceed, but it's a large effort and we'll need resources in order to pursue it.

I'd be very interested in seeing the white papers if they are available to the general public. We have potentially several man months of effort that could be diverted into making this work.

I don't have anything handy, but here's the URL for an older implementation we tried out:

http://www.hdfgroup.org/Parallel_HDF/PHDF5/FPH5/

It had significant flaws and we eventually abandoned the code. However, the basic issues remain the same as described in the documents there, I believe.

Quincey

···

On May 8, 2009, at 8:19 AM, John Biddiscombe wrote:

JB

--
John Biddiscombe, email:biddisco @ cscs.ch
http://www.cscs.ch/
CSCS, Swiss National Supercomputing Centre | Tel: +41 (91) 610.82.07
Via Cantonale, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

John_Biddiscombe · May 8, 2009, 2:04pm

I misread Werner's description - what I'd like to do is create separate datasets (collectively) one per process, but only have the have the one with actual data write anything. I don't want to merge the data into a single dataset as we wish to keep the (spatial) information about blocks implicit. Using independent IO this should be OK (yes?). I hope this isn't going against HDF5's nature!

JB

···

Hi all,

On May 8, 2009, at 8:56 AM, Sven Reiche wrote:

It can be done that way but i think it is completely against the HDF5 philosophy.

    Yes, I agree with this statement also - it does go against the "self-describing" nature of the library/file format.

    Quincey

Let's assume you want to write out e.g. the kinetic energy of particles.
Node 1 has 3 elements and Node has 5. You would end up with 8 elements in the single dataset.
However if you need to distinguish the elements from the different nodes, which is the case for my problem to be solved,
the information gets lost and you are required to supply additional information to describe your single monolithic dataset.
In single processor mode you would create two datasets and add an attribute to each about the additional information (e.g. charge), which is a much more elegant way to do.

Sven

On May 8, 2009, at 3:43 PM, Werner Benger wrote:

As far as I know (but could be wrong), H5Dcreate() is a collective operation,
but writing hyperslabs is not. So you can create one large global empty
dataset, and each processor writes an hyperslab to it, independently of the
others. This approach however requires additional user-defined metadata
information to identify and find what is where, so depends on the application
if it can be designed that way.

(Quincey, correct me it that is false).

    Werner

On Fri, 08 May 2009 03:35:56 -0500, John Biddiscombe >>> <biddisco@cscs.ch> wrote:

When attempting to create one dataset per process inside a single HDF file opened in parallel, one must use H5Dcreate(...) which is a collective operation.

Is there a way of creating a different dataset on each process and writing these into a single file. I'd like to write a multiblock structure - one block per process - where in general each process has no knowledge of what the other processes are to write (making the collective create call a problem)

Is there any way of specifing an access mode that enables datasets to be created independently within the same file? (some kind of dummy synchronize in place of the H5Dcreate on the processes which are not writing that particular dataset).

I presume this kind of writing pattern is quite common, but can't find docs/tutorial references which make use of it. If such examples exist, please direct me to them.

Many thanks

JB

--
___________________________________________________________________________

Dr. Werner Benger <werner@cct.lsu.edu> Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
239 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

--
John Biddiscombe, email:biddisco @ cscs.ch

CSCS, Swiss National Supercomputing Centre | Tel: +41 (91) 610.82.07
Via Cantonale, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Quincey_Koziol · May 8, 2009, 3:26pm

Hi John,

I misread Werner's description - what I'd like to do is create separate datasets (collectively) one per process, but only have the have the one with actual data write anything. I don't want to merge the data into a single dataset as we wish to keep the (spatial) information about blocks implicit. Using independent IO this should be OK (yes?).

Yes, this is fine. Raw data I/O can be either independent or collective, but metadata modifications (like creating/deleting objects, create/delete/modify attributes, etc) must be collective.

Quincey

···

On May 8, 2009, at 9:04 AM, John Biddiscombe wrote:

I hope this isn't going against HDF5's nature!

JB

Hi all,

On May 8, 2009, at 8:56 AM, Sven Reiche wrote:

It can be done that way but i think it is completely against the HDF5 philosophy.

   Yes, I agree with this statement also - it does go against the "self-describing" nature of the library/file format.

   Quincey

Let's assume you want to write out e.g. the kinetic energy of particles.
Node 1 has 3 elements and Node has 5. You would end up with 8 elements in the single dataset.
However if you need to distinguish the elements from the different nodes, which is the case for my problem to be solved,
the information gets lost and you are required to supply additional information to describe your single monolithic dataset.
In single processor mode you would create two datasets and add an attribute to each about the additional information (e.g. charge), which is a much more elegant way to do.

Sven

On May 8, 2009, at 3:43 PM, Werner Benger wrote:

As far as I know (but could be wrong), H5Dcreate() is a collective operation,
but writing hyperslabs is not. So you can create one large global empty
dataset, and each processor writes an hyperslab to it, independently of the
others. This approach however requires additional user-defined metadata
information to identify and find what is where, so depends on the application
if it can be designed that way.

(Quincey, correct me it that is false).

   Werner

On Fri, 08 May 2009 03:35:56 -0500, John Biddiscombe <biddisco@cscs.ch >>>> > wrote:

When attempting to create one dataset per process inside a single HDF file opened in parallel, one must use H5Dcreate(...) which is a collective operation.

Is there a way of creating a different dataset on each process and writing these into a single file. I'd like to write a multiblock structure - one block per process - where in general each process has no knowledge of what the other processes are to write (making the collective create call a problem)

Is there any way of specifing an access mode that enables datasets to be created independently within the same file? (some kind of dummy synchronize in place of the H5Dcreate on the processes which are not writing that particular dataset).

I presume this kind of writing pattern is quite common, but can't find docs/tutorial references which make use of it. If such examples exist, please direct me to them.

Many thanks

JB

--
___________________________________________________________________________
Dr. Werner Benger <werner@cct.lsu.edu> Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
239 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

--
John Biddiscombe, email:biddisco @ cscs.ch
http://www.cscs.ch/
CSCS, Swiss National Supercomputing Centre | Tel: +41 (91) 610.82.07
Via Cantonale, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Scott_Murman · May 12, 2009, 3:20pm

i think there's a distinction here, the access patterns you're seeking are perfectly suitable for HDF5, however they are not supported by any parallel i/o middleware layers that i'm aware of. this is certainly a common use case, any tree structure naturally falls into it, but providing it in parallel requires going down to the raw mpi-io calls and rolling your own. doing this with PHDF5 requires hyperslabs of uniform dimension, chunked i/o for speed, and compression for size. it is inefficient and cumbersome, and i'm sure can be improved upon.

-SM-

···

On May 8, 2009, at 7:04 AM, John Biddiscombe wrote:

I misread Werner's description - what I'd like to do is create separate datasets (collectively) one per process, but only have the have the one with actual data write anything. I don't want to merge the data into a single dataset as we wish to keep the (spatial) information about blocks implicit. Using independent IO this should be OK (yes?). I hope this isn't going against HDF5's nature!

JB

Hi all,

On May 8, 2009, at 8:56 AM, Sven Reiche wrote:

It can be done that way but i think it is completely against the HDF5 philosophy.

   Yes, I agree with this statement also - it does go against the "self-describing" nature of the library/file format.

   Quincey

Let's assume you want to write out e.g. the kinetic energy of particles.
Node 1 has 3 elements and Node has 5. You would end up with 8 elements in the single dataset.
However if you need to distinguish the elements from the different nodes, which is the case for my problem to be solved,
the information gets lost and you are required to supply additional information to describe your single monolithic dataset.
In single processor mode you would create two datasets and add an attribute to each about the additional information (e.g. charge), which is a much more elegant way to do.

Sven

On May 8, 2009, at 3:43 PM, Werner Benger wrote:

As far as I know (but could be wrong), H5Dcreate() is a collective operation,
but writing hyperslabs is not. So you can create one large global empty
dataset, and each processor writes an hyperslab to it, independently of the
others. This approach however requires additional user-defined metadata
information to identify and find what is where, so depends on the application
if it can be designed that way.

(Quincey, correct me it that is false).

   Werner

On Fri, 08 May 2009 03:35:56 -0500, John Biddiscombe <biddisco@cscs.ch >>>> > wrote:

When attempting to create one dataset per process inside a single HDF file opened in parallel, one must use H5Dcreate(...) which is a collective operation.

Is there a way of creating a different dataset on each process and writing these into a single file. I'd like to write a multiblock structure - one block per process - where in general each process has no knowledge of what the other processes are to write (making the collective create call a problem)

Is there any way of specifing an access mode that enables datasets to be created independently within the same file? (some kind of dummy synchronize in place of the H5Dcreate on the processes which are not writing that particular dataset).

I presume this kind of writing pattern is quite common, but can't find docs/tutorial references which make use of it. If such examples exist, please direct me to them.

Many thanks

JB

--
___________________________________________________________________________
Dr. Werner Benger <werner@cct.lsu.edu> Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
239 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

--
John Biddiscombe, email:biddisco @ cscs.ch
http://www.cscs.ch/
CSCS, Swiss National Supercomputing Centre | Tel: +41 (91) 610.82.07
Via Cantonale, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

One dataset per process