Hi all,
I am running an MPI-enabled C++ program on 4 processes, where each process writes to its own h5 file. The files are each about 10GB in size with 10,000 datasets of roughly similar sizes (so each about 1MB). The datasets are written using H5LTmake_dataset_char
and read using H5LTread_dataset_char
(using HDF5 1.8.20) within the same execution. I make about 50,000 calls to both functions in each process, and the code spends a total of about 1 minute in each (which is consistent with HDD read/write speeds).
The problem arises when I run multiple instances of this program at the same time. For example, I ran 3 instances (so 3x4 processes, running on a 24-core machine with 256GB of RAM), with identical operations except that they’re writing/reading the h5 files in different directories. No process is using the same h5 file. In this case, the writing speed is nearly identical — perhaps 2 minutes instead of 1 minute of wallclock time. However, the wallclock time for reading jumps to 2 hours!
The usual advice for slow read speeds (e.g. tuning the chunk size or number of datasets) seems to be inapplicable because the behavior of a single instance of this program is completely sensible. Does anyone have any insight into what could be causing the massive slowdown in this case?
Thanks for your help!