Hi,
I was wondering if someone could explain what goes on under the hood with
independent vs. collective I/O with parallel hdf5. Specific questions I
have:
With independent I/O, does each I/O rank open, write, close, and hand it off
to the next I/O rank, hence only one rank has access to a file at a given
time (no concurrency)?
With collective I/O, are I/O ranks writing concurrently to one file? If so,
can you control the number of concurrent accesses to a single file?
I have found with collective I/O, only a small subset of writers actually is
writing concurrently (much less than the total number of ranks) for tense of
thousands of cores. What controls this number? Also, how is data collected
to the I/O ranks? MPI_GATHER? It seems you could run the risk of running out
of memory if you are collecting large 3D arrays to only a few ranks on a
distributed memory machine.
I ask these questions because contrary to what I have been told should work,
I cannot get even marginally decent performance out of collective I/O on
lustre for large numbers of cores (30kcores writing to one file), and need
to try new approaches. I am hoping that parallel hdf5 can still be of use to
me rather than having to do my own MPI calls to collect and write, or just
doing tried & true one file per core.
Thanks,
Leigh
···
--
Leigh Orf
Associate Professor of Atmospheric Science
Department of Geology and Meteorology
Central Michigan University
Currently on sabbatical at the National Center for Atmospheric Research
in Boulder, CO
NCAR office phone: (303) 497-8200