When use MPI version of HDF5 library?

Hi. It seems that on my system a serial version of HDF5 is installed in

/usr/lib/x86_64-linux-gnu/hdf5/serial

and an MPI version of the library is installed in

/usr/lib/x86_64-linux-gnu/hdf5/mpich

Now, my question is when Iā€™m going to compile a program which uses MPI and HDF5 but whose I/O is NOT done in parallel (i.e. one single process does all I/O), which one should I use?

The serial version.

When a single process accesses one single HDF5 file, regardless of the environment the serial version does the job. To scale this approach you can use separate file for each IO process ā€“ then merge the results. This pattern is suitable for embarrassingly parallel problems at the cost of added complexity when merging the results.

When multiple processes accessing the same file (and same dataset) , you need a mechanism to synchronise the HDF5 internal state among processes ā€“ this is what the parallel HDF5 does for you.

2 Likes