I am using the HighFive library on the NERSC Perlmutter system. I have an application that has to do parallel reads/writes, and I was following the parallel collective I/O example from the HighFive repo: GitHub - highfive-devs/highfive: User-friendly, header-only, C++14 wrapper for HDF5.
Here is my code that does the parallel reading
using namespace HighFive;
try {
FileAccessProps fapl;
fapl.add(MPIOFileAccess { comm.get_row_comm(), MPI_INFO_NULL });
fapl.add(MPIOCollectiveMetadata {});
auto xfer_props = DataTransferProps {};
xfer_props.add(UseCollectiveIO {});
size_t row_start
= Utils::get_start_index(glob_num_rows, comm.get_row_color(), comm.get_proc_rows());
std::vector<double> vec(block_size * num_cols);
for (int r = 0; r < num_rows; r++) {
std::string zero_pad_vec_str = Utils::zero_pad(r + row_start, 6);
std::string vec_filename = dirname + zero_pad_vec_str + ".h5";
File file(vec_filename, File::ReadOnly, fapl);
auto dataset = file.getDataSet("vec");
int reindex, n_blocks, steps;
dataset.getAttribute("reindex").read<int>(reindex);
dataset.getAttribute("n_param").read<int>(n_blocks);
dataset.getAttribute("param_steps").read<int>(steps);
// do some checks on the attributes ...
size_t col_start
= Utils::get_start_index(glob_num_cols, comm.get_col_color(), comm.get_proc_cols());
dataset.select({ col_start * block_size }, { (size_t)num_cols * block_size })
.read(vec, xfer_props);
Utils::check_collective_io(xfer_props);
}
} catch (const std::exception& e) {
if (comm.get_world_rank() == 0)
fprintf(stderr, "Error reading matrix from file: %s\n", e.what());
MPICHECK(MPI_Abort(comm.get_global_comm(), 1));
}
I use a similar code to write data. I profiled this code with Darshan/Drishti, and the report had some issues mentioned in it.
Application issues a high number (9961627) of small read requests (i.e., < 1MB) which represents 100.00% of all read requests
Application issues a high number (85283) of small write requests (i.e., < 1MB) which represents 100.00% of all write requests
Detected write imbalance when accessing 11 individual files
Detected read imbalance when accessing 1223 individual files.
There are also a lot of read/write load imbalances. I’m not quite sure how to go about fixing this. Can you provide any insight?
Here is a link to the profiling reports. The ones named *-2.pdf are for a slightly simpler application, so it may be more useful to start there.
For reference, the num_rows
is ~600, each of which represents a different file. Within each file, each process reads a contiguous chunk of num_cols*block_size
elements. In these examples, that size was ~2.15M double precision floating point numbers.
Thanks