The slow part of reading and writing is usually the compression. You can use H5Dread_chunk and H5Dwrite_chunk (since hdf5 1.10) to access the compressed streams without processing and decompress/decompress them using the gzip library separately, which can be done in parallel. There is some little overhead in hdf5, but most part of H5Dread/write_chunk is direct disk access. You can implement a pipeline of reading-decompressing and/or compressing-writing. If the bottleneck is the multi-threaded compression or decompression (CPU-bound), or the single-threaded reading or writing is already saturating the disk capacity (I/O-bound), you would not gain anything more by parallelizing the file accesses. This solution may be simpler and more elegant than dynamically loading multiple copies of the library, and it would provide a boost for single (or few), large files as well.
2 Likes
