HDF5 Memory Usage High for Writes


#1

I am developing a multi-threaded C++ application using the C version of the HDF5 library version 1.10.2. I am developing on Red Hat Enterprise Linux 7.7. The application is long-running but uses a lot of memory (~2GB for writing ~100MB HDF5 files). Valgrind does not show any memory leaks.

The main thread spawns up to 16 threads to gather the data from other processes. Then 15 of the threads complete and the remaining thread spawns another thread to write the data to an HDF5 file. The data is 1-D by the way. My application uses up more and more system memory, up to ~2GB after about 2 hours. For another run with more points, the system memory fills up and then the application reads and writes to disk.

The previous version of this application, which was also multi-threaded, did not use HDF5 and did not use as much memory. I tried massif with the option set to show page allocations and the largest page allocation was not for HDF5 but another library used in the application. That other library is used in the previous version of my application but it did use as much memory.

I experimented with various chunk sizes. The original chunk size was 1x10 items (usually 40 B), then I tried increasing the chunk_size to 1x1000 (4kB) and the memory usage stayed the same.


#2

Hi Glenn,

To start, I’d update your HDF5 version to something more recent than 1.10.2. Early versions of the 1.10 series of releases had some performance issues that were fixed in later versions. Also, the larger chunk sizes will probably be better, as they’ll result in a smaller chunk index, more efficient reads, and better performance.

Also, are you using the thread-safe version of the library?

I have seen problems with applications having a constantly growing size in memory, without valgrind complaining. Usually this turns out to be an effect of how the OS assigns and reclaims memory and not something inherent to HDF5. It may be that the kernel is deciding that pages that were allocated to your application are unreclaimable and won’t recycle them, so your application’s memory footprint will grow. It could also be that the kernel is being lazy about reclaiming memory.

You can find more info on how the Linux kernel allocates and recycles pages here:
https://www.kernel.org/doc/html/latest/admin-guide/mm/concepts.html

Does the amount of memory growth vary with the size of the I/O? If so, what’s the life cycle of the buffers you are allocating for I/O? Are you inserting buffers into a data structure that holds references to the buffers until the end of the program?

Also, what is the life cycle of HDF5 objects in your application? How many files and datasets do you have open at any give time? And are you closing them when you are done with them?