Exclusive parallel read-only access to a dataset

daniel.langr · March 19, 2011, 6:18pm

Hello,

I have an HDF5 file with one big dataset that needs to be read by many MPI processes exclusively (that is, every process reads its own non-overlapping portion of data). Should I

I. use parallel HDF5 file access (HDF5D_MPI driver) with non-collective dataset reading, or

II. use non-parallel HFD5 file access (default driver; simply open the file multiple times without caring about MPI)?

Is there any significant performance difference between these two options?

Thanks for help,
Daniel

Mark_Moll1 · March 19, 2011, 6:48pm

In my limited experience the answer is “it depends”. If you code is doing mostly computation with sporadic IO, then non-parallel HDF5 might be the way to go. However, if all processes are hammering the same shared file system at the same time, you’d probably want to use parallel HDF5 to coordinate the disk access.

···

On Mar 19, 2011, at 1:18 PM, Daniel Langr wrote:

Hello,

I have an HDF5 file with one big dataset that needs to be read by many MPI processes exclusively (that is, every process reads its own non-overlapping portion of data). Should I

I. use parallel HDF5 file access (HDF5D_MPI driver) with non-collective dataset reading, or

II. use non-parallel HFD5 file access (default driver; simply open the file multiple times without caring about MPI)?

Is there any significant performance difference between these two options?

--
Mark Moll

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Exclusive parallel read-only access to a dataset