When use MPI version of HDF5 library?

arham.amouei · November 29, 2020, 10:27pm

Hi. It seems that on my system a serial version of HDF5 is installed in

/usr/lib/x86_64-linux-gnu/hdf5/serial

and an MPI version of the library is installed in

/usr/lib/x86_64-linux-gnu/hdf5/mpich

Now, my question is when I’m going to compile a program which uses MPI and HDF5 but whose I/O is NOT done in parallel (i.e. one single process does all I/O), which one should I use?

steven · November 30, 2020, 12:54am

The serial version.

When a single process accesses one single HDF5 file, regardless of the environment the serial version does the job. To scale this approach you can use separate file for each IO process – then merge the results. This pattern is suitable for embarrassingly parallel problems at the cost of added complexity when merging the results.

When multiple processes accessing the same file (and same dataset) , you need a mechanism to synchronise the HDF5 internal state among processes – this is what the parallel HDF5 does for you.

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

When use MPI version of HDF5 library?