Hi Leigh,
I'm not familiar with the F90 API, but here is an example in C that only writes from rank 0:
-----
if (rank == 0) {
H5Sselect_all(diskspace);
} else {
H5Sselect_none(diskspace);
}
H5Dwrite(dataset, TYPE, memspace, diskspace, dxpl, buffer);
-----
Notice that all tasks call H5Dwrite (as required for a collective write) even though only rank 0 has actually selected a region to write to in the disk space.
If you have a single integer, you probably want to write it as an attribute. Almost all calls except H5Dwrite, including attribute and metadata operations, are assumed to be collective, and expect the same values across all tasks. There is a handy reference here to confirm this for individual calls:
http://www.hdfgroup.org/HDF5/doc/RM/CollectiveCalls.html
So you don't have to manually tell HDF5 to only write an attribute from rank 0, for instance. I believe that all metadata is cached across all ranks, so each rank will need the actual value anyway (otherwise it would have to be broadcast from rank 0 if you only wrote from there).
The metadata is written to disk as it is evicted from the metadata cache. It used to be that this was only done from rank 0, which has an identical copy of the metadata cache as every other task. But we recently collaborated with the HDF Group to modify this to round-robin writes across MPI tasks to improve performance on parallel file systems that expect many-to-one file access patterns (such as Lustre or GPFS). The eventual goal is to have a paging mechanism that will aggregate metadata into large chunks that align to file system boundaries, then write only from rank 0 or a subset of writers (as in collective buffering algorithms found in MPI-IO implementations). Quincey knows more about that and how it will be implemented, but it will probably require MPI communication to maintain cache coherency.
So anyway, the point is that you only have to worry about empty selections for dataset writes.
Hope that helps,
Mark
On Wed, Jan 19, 2011 at 6:36 PM, Leigh Orf <leigh.orf@gmail.com> wrote:
Mark,
Could you give me an example of a call to H5Dwrite (fortran90 api) where an "empty selection" is passed? I don't know which argument you mean.
There are many cases (with metadata for instance) where I need only one member of a group to write the metadata. I am finding that weird things are happening with some of my code as I work with pHDF5 but I think it's because I don't entirely understand what pHDF5 expects.
For instance, if I have a single integer that is common amongst all ranks in a collective group writing to one file, do I just pick the root rank to do the write and have all other ranks pass some dummy variable?
I can understand the paradigm where you are writing data that is different on each rank and you need to specify dims and offsets etc. (the example codes show this) but the "easier" case is throwing me.
Thanks,
Leigh
On Tue, Jan 18, 2011 at 5:15 AM, Mark Howison <mark.howison@gmail.com> wrote:
Hi Leigh,
Yes, it is only a small difference in code between collective and independent mode for the MPI-IO VFD. To enable collective I/O, you pass a dataset transfer property list to H5Dwrite like this:
dxpl_id = H5Pcreate(H5P_DATASET_XFER);
H5Pset_dxpl_mpio(dxpl_id, H5FD_MPIO_COLLECTIVE);
H5Dwrite(dset_id, H5T_NATIVE_FLOAT, memspace, filespace, dxpl_id, somedata0);
One additional constraint with collective I/O, though, is that all MPI tasks must call H5Dwrite. If not, your program will stall in a barrier. In contrast, with independent I/O you can execute writes with no coordination among MPI tasks.
If you do want only a subset of MPI tasks to write in collective mode, you can pass an empty selection to H5Dwrite for the non-writing tasks.
Mark
On Tue, Jan 18, 2011 at 12:45 AM, Leigh Orf <leigh.orf@gmail.com> wrote:
Elena,
That is good news, indeed this was with 1.8.5-patch1.
Is code written with using independent IO structured significantly different than with collective IO? I would like to get moving with pHDF5 and as I am currently not too familiar with it, want to make sure that I am not going to have to do a rewrite after the collective code works. It does seem to all occur behind the scenes with the h5dwrite command, so I presume I am safe.
Thanks,
Leigh
On Mon, Jan 17, 2011 at 4:59 PM, Elena Pourmal <epourmal@hdfgroup.org> wrote:
Leigh,
I am writing to confirm that the bug you reported does exist in 1.8.5-patch1, but is fixed in 1.8.6 (coming soon).
Elena
On Jan 16, 2011, at 3:47 PM, Leigh Orf wrote:
I managed to build pHDF5 on blueprint.ncsa.uiuc.edu (IBM AIX Power 6). I compiled the hyperslab_by_chunk.f90 test program found at http://www.hdfgroup.org/HDF5/Tutor/phypechk.html without error. When I run it, however, I get the following output:
ATTENTION: 0031-408 4 tasks allocated by LoadLeveler, continuing...
ERROR: 0032-110 Attempt to free a predefined datatype (2) in MPI_Type_free, task 0
ERROR: 0032-110 Attempt to free a predefined datatype (2) in MPI_Type_free, task 1
ERROR: 0032-110 Attempt to free a predefined datatype (2) in MPI_Type_free, task 2
ERROR: 0032-110 Attempt to free a predefined datatype (2) in MPI_Type_free, task 3
HDF5: infinite loop closing library
D,S,T,D,S,F,D,G,S,T,F,AC,FD,P,FD,P,FD,P,E,E,SL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL
HDF5: infinite loop closing library
The line which causes the grief is:
CALL h5dwrite_f(dset_id, H5T_NATIVE_INTEGER, data, dimsfi, error, &
file_space_id = filespace, mem_space_id = memspace, xfer_prp = plist_id)
If I replace that call with the one that is commented out in the program, it runs without a problem. That line is:
CALL h5dwrite_f(dset_id, H5T_NATIVE_INTEGER, data, dimsfi,error, &
file_space_id = filespace, mem_space_id = memspace)
Any ideas? I definitely want to take advantage of doing collective I/O if possible.
Leigh
--
Leigh Orf
Associate Professor of Atmospheric Science
Department of Geology and Meteorology
Central Michigan University
Currently on sabbatical at the National Center for Atmospheric Research in Boulder, CO
NCAR office phone: (303) 497-8200
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
--
Leigh Orf
Associate Professor of Atmospheric Science
Department of Geology and Meteorology
Central Michigan University
Currently on sabbatical at the National Center for Atmospheric Research in Boulder, CO
NCAR office phone: (303) 497-8200
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
--
Leigh Orf
Associate Professor of Atmospheric Science
Department of Geology and Meteorology
Central Michigan University
Currently on sabbatical at the National Center for Atmospheric Research in Boulder, CO
NCAR office phone: (303) 497-8200
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org