I am trying to use hdf5 parallel feature for extreme scale computing.
I would like each processor write out a dseparate dataset.
This question is actually mentioned on the HDF5 website.
Due to the collective call every processor has to call the same data set.
The dataset creation has to be called on all ranks, not the actual writing of the array data.
So all ranks should call H5Dcreate() for all the datasets, but then each rank can write to its corresponding dataset.
Alternatively, you can have 1 rank create the entire file serially, then close the file, then all other ranks open and write the raw data in parallel.
I am trying to use hdf5 parallel feature for extreme scale computing.
I would like each processor write out a dseparate dataset.
This question is actually mentioned on the HDF5 website.
Due to the collective call every processor has to call the same data set.
AFAIK, the issue is that if you create a new dataset in the file, all ranks
have to know about it (I think it seems obvious why). I don't think there
is a file format that can solve this issue. The best idea is still to use
different files if you are not writing in the same dataset!
I am trying to use hdf5 parallel feature for extreme scale computing.
I would like each processor write out a dseparate dataset.
This question is actually mentioned on the HDF5 website.
Due to the collective call every processor has to call the same data set.
If you take those hundreds of thoudsans of cores and issue I/O to the parallel file system, you will probably break your file system.
So you are imagining, say, 1,000 datasets and 100 cores will write to a dataset? Are you imagining one dataset per core? Can the HDF5 visualization and analysis tools deal reasonably well with 100,000 datasets?
A single shared dataset has a lot of workflow advantages. It also maps nicely to collective MPI-IO optimizations.
If you really need one dataset per process, then you probably also need to use the multi-dataset I/O routines (H5Dread_multi() and H5Dwrite_multi() -- are those released yet? )
My recollection is that a developer somewhere in Europe (maybe CERN) developed a convenience API on top of HDF5
that simplified collective dataset a bit by creating an interface where processors work independently to _define_
(names, types, sizes and shapes) the datasets they need to create and then call a collective _sync_ method where
all the collective dataset creation happens down in HDF5. Datasets from different ranks that have same attributes
(e.g. name, types, size and shape) and are marked with a 'tag' wind up being common across the ranks that passed
the same tag. After the collective _sync_ operation, processors can again engage in either independent (or collective)
I/O to the datasets.
I have never used that API and I'll be darned if I can remember the name of it (I spent 20 mins looking on Google) and
I don't even know if it is still being maintained. But, it does provide a much simpler way of interacting with HDF5's
collective dataset creation requirement when that is necessary.
It might be an option if you can find it or if another user here familiar with what I am talking about can send a link
The dataset creation has to be called on all ranks, not the actual writing of the array data.
So all ranks should call H5Dcreate() for all the datasets, but then each rank can write to its corresponding dataset.
Alternatively, you can have 1 rank create the entire file serially, then close the file, then all other ranks open and write the raw data in parallel.
I am trying to use hdf5 parallel feature for extreme scale computing.
I would like each processor write out a dseparate dataset.
This question is actually mentioned on the HDF5 website.
Due to the collective call every processor has to call the same data set.
The collective calls to create datasets are required because the metadata needs to be consistent across all ranks for the datasets to be created correctly. You can write a sample program which violates this rule, and you’ll see how the datasets get clobbered if not coordinated and you don’t get the results you would like.
Same with groups, attributes, and anything that affects the metadata, hence the list of required collective calls.
Mohamad’s solution should work, since collective calls are only required if opening the file in parallel across multiple ranks. Once the metadata is created for all the datasets and written to the file and closed, you can then open the file in parallel and each rank can write into the file without affecting the datasets for the other ranks.
Jarom
···
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Mohamad Chaarawi
Sent: Wednesday, August 31, 2016 6:19 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Parallel I/O with HDF5
The dataset creation has to be called on all ranks, not the actual writing of the array data.
So all ranks should call H5Dcreate() for all the datasets, but then each rank can write to its corresponding dataset.
Alternatively, you can have 1 rank create the entire file serially, then close the file, then all other ranks open and write the raw data in parallel.
I am trying to use hdf5 parallel feature for extreme scale computing.
I would like each processor write out a dseparate dataset.
This question is actually mentioned on the HDF5 website.
Due to the collective call every processor has to call the same data set.
The answer is not satisfying for the extreme scale computing where
hundreds of thousand cores are involved.
If you take those hundreds of thoudsans of cores and issue I/O to the parallel file system, you will probably break your file system.
So you are imagining, say, 1,000 datasets and 100 cores will write to a dataset? Are you imagining one dataset per core? Can the HDF5 visualization and analysis tools deal reasonably well with 100,000 datasets?
A single shared dataset has a lot of workflow advantages. It also maps nicely to collective MPI-IO optimizations.
If you really need one dataset per process, then you probably also need to use the multi-dataset I/O routines (H5Dread_multi() and H5Dwrite_multi() -- are those released yet? )
Not yet... ;-(
···
On Aug 31, 2016, at 11:36 PM, Rob Latham <robl@mcs.anl.gov> wrote:
On 08/30/2016 04:21 PM, jaber javanshir wrote:
==rob