Independent datasets for MPI processes

daniel.langr · March 4, 2011, 2:03pm

Hello,

I have several MPI processes each generating unknown number of values and I want to write these values into a HDF5 file. Since I don't know how many values will be generated (by each process), I cannot use one single big dataset, but I have to use separate chunked dataset for each process. That is - every process needs access only to its own dataset and don't care about the others. Unfortunately, I'm forced to call operations such as H5Dcreate collectively. Isn't there a way how to create and write dataset only within one process, if I know no other process will use it?

What's much worse, the H5Dset_extent() / H5Dextend() operations must be called collectively. But every my process generates data independently, so if one process needs to extend its own dataset, other processes don't care and even don't know about it! How to solve this?

Please help,
Daniel

Quincey_Koziol · March 7, 2011, 4:03pm

Hi Daniel,

···

On Mar 4, 2011, at 8:03 AM, Daniel Langr wrote:

Hello,

I have several MPI processes each generating unknown number of values and I want to write these values into a HDF5 file. Since I don't know how many values will be generated (by each process), I cannot use one single big dataset, but I have to use separate chunked dataset for each process. That is - every process needs access only to its own dataset and don't care about the others. Unfortunately, I'm forced to call operations such as H5Dcreate collectively. Isn't there a way how to create and write dataset only within one process, if I know no other process will use it?

What's much worse, the H5Dset_extent() / H5Dextend() operations must be called collectively. But every my process generates data independently, so if one process needs to extend its own dataset, other processes don't care and even don't know about it! How to solve this?

It's a good news/bad news situation:

Unfortunately, those are some of the limitations of the current parallel HDF5 operating model.

However, we are just at the beginning of funding that will change how these sort of metadata operations work, and should have results that allow an application to perform [more] metadata operations independently within the year.

Quincey

John_Biddiscombe · March 22, 2011, 8:34am

Daniel,

Have a look here and see if it's any use to you.
https://hpcforge.org/plugins/mediawiki/wiki/libh5mb/index.php/Main_Page

I'd be happy to add a few tweaks if you think it nearly works but needs a few fixes.

JB

···

-----Original Message-----
From: hdf-forum-bounces@hdfgroup.org [mailto:hdf-forum-bounces@hdfgroup.org] On Behalf Of Daniel Langr
Sent: 04 March 2011 15:03
To: hdf-forum@hdfgroup.org
Subject: [Hdf-forum] Independent datasets for MPI processes

Hello,

I have several MPI processes each generating unknown number of values and I want to write these values into a HDF5 file. Since I don't know how many values will be generated (by each process), I cannot use one single big dataset, but I have to use separate chunked dataset for each process. That is - every process needs access only to its own dataset and don't care about the others. Unfortunately, I'm forced to call operations such as H5Dcreate collectively. Isn't there a way how to create and write dataset only within one process, if I know no other process will use it?

What's much worse, the H5Dset_extent() / H5Dextend() operations must be called collectively. But every my process generates data independently, so if one process needs to extend its own dataset, other processes don't care and even don't know about it! How to solve this?

Please help,
Daniel