Please join us for a webinar on multithreading concurrency presented by Quincey Koziol (@koziol) scheduled for Friday, November 13, 11:00 a.m.
The HDF5 library is now threadsafe, but not concurrent. This talk will present a plan for making the HDF5 library safe for concurrent access by multiple threads. I will outline the current structure of the code, describe the infrastructure changes needed to enable multi-threaded concurrency within the library, and lay out a plan for putting those infrastructure changes to work by making the H5Dread API routine concurrently threadsafe. Community input for making more of the library concurrent is welcome and greatly desired, and I would like to build a task force for contributions to this effort. Please attend to learn more and see how to contribute to this effort.
I have written an experimental reader for a subset of HDF5 in rust (making things safer in terms of aliasing and concurrency) that supports concurrent/multi-threaded reads. The performance is on par or better for sequential reads with the official HDF5 library, and naturally way better for concurrent reads since blocking is unnecessary. It also has a streaming reader which is useful for network applications. Please take a look at: https://github.com/gauteh/hidefix. It is straightforward to create bindings for C/C++/Python, etc.
The approach is inspired by the DMR++ module in Hyrax, and requires that the chunks are accessible concurrently - so they are indexed at first. This is slow at the moment because of the way they are exposed by HDF5 library, but the changes in this PR make this pretty fast (200x speedup / 140 ms for a 1.4 gb file): https://github.com/HDFGroup/hdf5/pull/6 (applies to 1.12, needs updates for latest master). Also see this thread: Iterate over chunk info.
The project was started in order to provide fast concurrent reads for DARS, a DAP server written in rust: https://github.com/gauteh/dars.