I am using HDF5 to write and read multi-GB files for checkpointing long-running simulations. I notice that the resident set size of my process is quite a bit larger after recovering from a checkpoint file, even though the allocated memory is what I would expect. I suspect that the malloc heap has become very fragmented. I used the igprof malloc profiling tool, and checked first that memory allocated by HDF5 was freed, and this mostly appears to be the case, so there is no obvious memory leak. However, for a checkpoint file of a few GB, it says that H5Z_filter_deflate (the decompression filter) has allocated (and presumably freed) a total of 20 GB of data, in about 11000 allocations. My suspicion is that this is causing significant heap fragmentation, leading to the large observed RSS. It occurs to me that if HDF5 could be told to use a separate malloc heap, or make use of some sort of memory pool, then this could be entirely freed after recovery is finished, avoiding fragmentation. I can find references to HDF5 memory pools, but I can't find any documentation about how these could be used.
Is there some way to reduce memory fragmentation caused by the decompression filter, other than disabling compression entirely?