[extendable datasets in parallel]

maxim.abalenkov · January 20, 2020, 2:02pm

I have a 2D extendable dataset that I would like to create and expand in parallel. I also have a set of metadata describing the dataset (e.g. dataset dimensions and tilmestep counter). Currently I declare the metadata as attributes of the dataset. I write and modify the metadata by a single root process in sequential mode. My current approach (in pseudocode) is the following:

Initialise Fortran interface
Create property list for parallel file access
Attempt to open file collectively

if (file does not exist) then

  Create new file collectively
  Write initial extendable dataset into file collectively

! file exists
else

  Close file collectively

  if (root) then
    Open file sequentially
    Acquire attribute value(s)
    Close file sequentially
    Broadcast attribute value(s) to other MPI processes
  end if

  Open file collectively
  Append data to main dataset
  Close file collectively

  if (root) then
    Open file sequentially
    Update attribute values
    Close file sequentially
  end if

end if

Close Fortran interface

Opening and closing a file multiple times is suboptimal. Is there a way to read and write attribute values collectively? Maybe I should declare them as separate datasets, then I would be able to access them in parallel? If you could correct my pseudocode and direct me to a practical example it would be most appreciated. Couldn’t find anything on the HDF5 website apart from a (very basic) standard parallel I/O example

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

[extendable datasets in parallel]