The HDF5 library does not support asynchronous I/O at this time. We are looking into including async I/O support in a future release, however.
If you'd like to hurry this work along with financial support, there are people here you should talk to
Dana
···
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Greenberg, Naomi
Sent: Tuesday, May 27, 2014 12:40 PM
To: Hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] Asynchronous I/O
As a total newbie to HDF, I am interested in knowing whether this format supports asynchronous I/O and has calls to synchronize read/write aheads?
Thank you!
The HDF5 library does not support asynchronous I/O at this time. We
are looking into including async I/O support in a future release, however.
I've attached a document that describes our current ideas in this space.
Good read. Just how compute bound is HDF5, anyway? I'm always living in a land of large datasets, where library overhead is dwarfed by the I/O workload overhead.
you did not mention the multi-dataset I/O approach: it's a half-step towards asynchronism -- or maybe a half-step backwards -- in that instead of decoupling the description of the data with the execution of the data, HDF5's multi-dataset routines will describe more data in a single call.
I don't think the global HDF5 lock precludes an async approach. Probably this async facility should exist on top of HDF5, though, and can provide the caching, read-ahead, coalescing, and other benefits while leaving the bulk of the 300k lines of C code untouched. In my head it's MPI_THREAD_FUNELED for HDF5.
The various ways one can manage MPI progress are instructive here.
The HDF5 library does not support asynchronous I/O at this time. We
are looking into including async I/O support in a future release, however.
I've attached a document that describes our current ideas in this space.
Good read. Just how compute bound is HDF5, anyway? I'm always living in a land of large datasets, where library overhead is dwarfed by the I/O workload overhead.
Generally speaking, HDF5 is not compute bound. It's only when an application asks for a compute-oriented task that something could be expensive (datatype conversion, compression, etc).
you did not mention the multi-dataset I/O approach: it's a half-step towards asynchronism -- or maybe a half-step backwards -- in that instead of decoupling the description of the data with the execution of the data, HDF5's multi-dataset routines will describe more data in a single call.
I think multi-dataset reads/writes are neutral on the asynchrony axis - a multi-dataset I/O operation could be made asynchronous in the same way as any other operation that touches the file.
I don't think the global HDF5 lock precludes an async approach. Probably this async facility should exist on top of HDF5, though, and can provide the caching, read-ahead, coalescing, and other benefits while leaving the bulk of the 300k lines of C code untouched. In my head it's MPI_THREAD_FUNNELED for HDF5.
I think there's actually a good case for pushing a portion of the asynchrony inside they HDF5 library, since it allows existing applications (which aren't using async I/O variants of the API routines) to get the benefit of asynchronous metadata operations. (ie. flushing dirty metadata to the file in the background)
The various ways one can manage MPI progress are instructive here.
Indeed.
Thanks for the feedback,
Quincey
···
On May 28, 2014, at 4:22 PM, Rob Latham <robl@mcs.anl.gov> wrote: