Writing a different number of blocks per mpi task to a dataset

mclay · August 18, 2021, 10:46pm

I am using parallel HDF5 version 1.12.1 built on a Linux platform running Ubuntu 20.04.

The too long; didn’t read summary is that I have different number of blocks that I want to write out as a continuous 1-D vector on the solution and I do not see how to write out different number of block per mpi task. My idea of writing out zero length vector doesn’t seem to work.

So the main question is: Is it possible to write different number calls to a dataset? Or is there a better way to do this?

I can provide a simple example code if that helps.

Below is a more detailed discussion:

I am working with something similar to overset grids. In my case I have a solution result that is divided into multiple 5-D arrays on an mpi task. The number of these blocks varies across tasks. There might be one big block on one mpi task and several smaller blocks on a different mpi task.

The blocks can be of different sizes and are NOT continuous in memory. Copy the separate blocks into one continuous memory block is problematic because the memory used by the blocks could be half the total memory available on an mpi task.

What does work is that I am able to write the blocks with a loop over the blocks and treat each block as a one-dimensional vector.

The core loop looks like (yes in Fortran 90):

do iblk = 1, N
starts(1) = blockDataArray(iblk) % qOffset

  call H5Screate_simple_f(nDim, gsz, filespace, ierr)
  CALL H5Dget_space_f(dset_id, filespace, ierr)
  CALL H5Sselect_hyperslab_f (filespace, H5S_SELECT_SET_F, starts, sz, ierr)
  print *,"A: myProc: ", myProc, ", iblk:",iblk,", sz: ",sz(1),", gsz: ",gsz(1),", starts: ",starts(1)

  call H5Dwrite_f(dset_id, H5T_NATIVE_DOUBLE, blockDataArray(iblk) % qVec, gsz, ierr, &
       file_space_id = filespace, mem_space_id = memspace, xfer_prp = plist_id)
  call H5Sclose_f(filespace, ierr)

enddo

If the number of blocks (N) is the same on each task this works fine. I can write to the same dataset N times. But if the number of blocks is different, then this hangs. So say processor 0 has 2 blocks and processor 1 has 3 blocks for a two processor case. Then processor is not participating in the above loop. O.K. that makes sense.

So I added a loop that tries to write out zero sized blocks where N is the number of blocks on this task and maxN is the maximum number of blocks on any task.

do iblk = N+1, maxN
starts(1) = 0
sz(1) = 0
call H5Screate_simple_f(nDim, gsz, filespace, ierr)
CALL H5Dget_space_f(dset_id, filespace, ierr)
CALL H5Sselect_hyperslab_f (filespace, H5S_SELECT_SET_F, starts, sz, ierr)

  print *,"B: myProc: ", myProc, ", iblk:",iblk,", sz: ",sz(1),", gsz: ",gsz(1),", starts: ",starts(1)

  call H5Dwrite_f(dset_id, H5T_NATIVE_DOUBLE, blockDataArray(1) % qVec, gsz, ierr, &
       file_space_id = filespace, mem_space_id = memspace, xfer_prp = plist_id)
  call H5Sclose_f(filespace, ierr); ASSERT(ierr == 0,"H5Sclose_f")

enddo

This fails with the following error message in trying to call H5Dwrite_f()

HDF5-DIAG: Error detected in HDF5 (1.12.1) MPI-process 0:
#000: H5Dio.c line 291 in H5Dwrite(): can’t write data
major: Dataset
minor: Write failed
#001: H5VLcallback.c line 2113 in H5VL_dataset_write(): dataset write failed
major: Virtual Object Layer
minor: Write failed
#002: H5VLcallback.c line 2080 in H5VL__dataset_write(): dataset write failed
major: Virtual Object Layer
minor: Write failed
#003: H5VLnative_dataset.c line 207 in H5VL__native_dataset_write(): can’t write data
major: Dataset
minor: Write failed
#004: H5Dio.c line 661 in H5D__write(): src and dest dataspaces have different number of elements selected
major: Invalid arguments to routine
minor: Bad value

gheber · August 20, 2021, 11:39am

I’m Fortran-illiterate and can’t comment on the snippets.

To write different numbers of blocks per MPI task shouldn’t be a problem. The error message is unambiguous:

#004: H5Dio.c line 661 in H5D__write(): src and dest dataspaces have different number of elements selected

For a sanity check, after making your selections, run H5Sget_select_npoints() on the in-memory and in-file selections and ensure that they match (in total). At the moment, they don’t (that’s what the error message says).

If a rank doesn’t write anything, use H5Sselect_none().

Best, G.

contact · August 20, 2021, 1:58pm

Yes, like @gheber wrote, you can definitely write different number of blocks per MPI task (i.e. rank or process).

To illustrate this, let’s imagine a scenario where a dataset named dset of data type double of one dimension (size 6) is created in parallel (using MPI). In addition, the dataset is written in parallel using four MPI processes where each process is responsible to write the following using a point selection:

MPI process #0 writes values 10 and 20 in positions #0 and #1 (of dataset dset)
MPI process #1 writes values 30, 40 and 50 in positions #2, #3 and #4 (of dataset dset)
MPI process #2 writes value 60 in position #5 (of dataset dset)
MPI process #3 doesn’t write anything

This scenario could be implemented in Fortran using HDFql as follows:

PROGRAM Test

    ! use HDFql module (make sure it can be found by the Fortran compiler)
    USE HDFql

    ! declare variables
    INTEGER :: state
    INTEGER :: rank

    ! create an HDF5 file named 'test.h5' and use (i.e. open) it in parallel
    state = hdfql_execute("CREATE AND USE FILE test.h5 IN PARALLEL")

    ! create a dataset named 'dset' of data type double of one dimension (size 6)
    state = hdfql_execute("CREATE DATASET dset AS DOUBLE(6)");

    ! get MPI rank
    rank = hdfql_mpi_get_rank()

    IF (rank == 0) THEN
        ! MPI process #0 writes values 10 and 20 in positions #0 and #1 (of dataset 'dset')
        state = hdfql_execute("INSERT INTO dset(0; 1) IN PARALLEL VALUES(10, 20)")
    ELSE IF (rank == 1) THEN
        ! MPI process #1 writes values 30, 40 and 50 in positions #2, #3 and #4 (of dataset 'dset')
        state = hdfql_execute("INSERT INTO dset(2; 3; 4) IN PARALLEL VALUES(30, 40, 50)")
    ELSE IF (rank == 2) THEN
        ! MPI process #2 writes value 60 in position #5 (of dataset 'dset')
        state = hdfql_execute("INSERT INTO dset(5) IN PARALLEL VALUES(60)")
    ELSE
        ! MPI process #3 doesn't write anything
        state = hdfql_execute("INSERT INTO dset IN PARALLEL NO VALUES")
    ENDIF

END PROGRAM

Hope it helps!

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Writing a different number of blocks per mpi task to a dataset