In many modern application the parallelisation is of the form MPI + X (with X == OpenMP, pthreads,…) with typically one MPI rank per compute node.
When using parallel-hdf5 in this case, only one core per node writes whilst the rest idles. This is normally not a problem as the writes are i/o or communication bound. However, if I am trying to use gzip compression then using only one core is very much a waste.
Is there a plan to make use of the thread-parallel implementation of gzip to speed up that exact process? At the moment compression in parallel is prohibitively expensive unless one MPI rank per compute core is used, which goes against modern HPC design and future plans to scale things up.
If there are no such plans, would you see an option to use the custom filters to do so? And possibly abuse the system to make it look like a regular gzip was applied since the result of the compression is identical in serial and parallel?
Thanks! Any ideas and suggestions welcome.