There are a few applications now that have implemented thread parallel compression and decompression:
One trick is to use H5Dread_chunk
or H5Dwrite_chunk
. This will allow you to read or write the chunk directly in its compressed form. You can then setup the thread-parallel compression or decompression yourself.
Another approach is using H5Dget_chunk_info
to query the location of a chunk within the file. H5Dchunk_iter
provides a faster way to do this, particularly if you want to get this information for all the chunks, but this is a relatively new API function.
The source for many of the filters is located in the following repository.
For example, the code for the Zstd filter is here:
From the source code there, you can see it simply uses ZSTD_decompress
or ZSTD_compress
.
It would be pretty easy to swap that out for ZSTD_compressCCtx
or ZSTD_decompressDCtx
and provide the parameter ZSTD_c_nbWorkers
to use multiple threads per chunk. However, I suspect that having multiple threads deal with individual chunks may be more efficient. This depends on your chunking scheme.
http://facebook.github.io/zstd/zstd_manual.html#Chapter4