Slow or buggy using H5Sselect_elements

Hi Frederic,

Yes writing in parallel to a dataset with point selections independently is going to be slow. This is expected.
However doing it collectively should not be slow and should work.

May I bother you to write a program that reproduces the failures that you see with collective I/O? The code that you pasted will not compile since it’s missing some variables (coords and local_nElements, etc…).

Thanks,
Mohamad

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Frederic Perez
Sent: Thursday, September 03, 2015 8:52 AM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] Slow or buggy using H5Sselect_elements

Hi, HDF users,
I am trying to write a 1D array from several processes in an anarchic way. Each proc has a subset of the array to write, but elements are not contiguous and unsorted. Each proc knows the positions where it should write each element.
With some help from the thread
    http://hdf-forum.184993.n3.nabble.com/HDF5-Parallel-write-selection-using-hyperslabs-slow-write-tp3935966.html
I tried to implement it.

First, the master proc (only that one) creates the file:

        // create file
        hid_t fid = H5Fcreate( name.c_str(), H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
        // prepare file space
        hsize_t dims[2] = {1, global_nElements};
        hsize_t max_dims[2] = {H5S_UNLIMITED, global_nElements}; // not really needed but for future use
        hid_t file_space = H5Screate_simple(2, dims, max_dims);
        // prepare dataset
        hid_t plist = H5Pcreate(H5P_DATASET_CREATE);
        H5Pset_layout(plist, H5D_CHUNKED);
        hsize_t chunk_dims[2] = {1, global_nElements};
        H5Pset_chunk(plist, 2, chunk_dims);
        // create dataset
        hid_t did = H5Dcreate(fid, "Id", H5T_NATIVE_UINT, file_space, H5P_DEFAULT, plist, H5P_DEFAULT);

        H5Dclose(did);
        H5Pclose(plist);
        H5Sclose(file_space);
        H5Fclose( fid );
Then, all procs open the file and write their subset:

    // define MPI file access
    hid_t file_access = H5Pcreate(H5P_FILE_ACCESS);
    H5Pset_fapl_mpio( file_access, MPI_COMM_WORLD, MPI_INFO_NULL );
    // define MPI transfer mode
    hid_t transfer = H5Pcreate(H5P_DATASET_XFER);
    // Open the file
    hid_t fid = H5Fopen( name.c_str(), H5F_ACC_RDWR, file_access);
    // Open the existing dataset
    hid_t did = H5Dopen( fid, dataset.c_str(), H5P_DEFAULT );
    // Get the file space
    hid_t file_space = H5Dget_space(did);
    // Define the memory space for this proc
    hsize_t count[2] = {1, (hsize_t) local_nElements};
    hid_t mem_space = H5Screate_simple(2, count, NULL);
    // Select the elements for this particular proc (the `coords` array has been created before)
    H5Sselect_elements( file_space, H5S_SELECT_SET, local_nElements, coords );
    // Write the previously generated `data` array
    H5Dwrite( did, H5T_NATIVE_UINT, mem_space , file_space , transfer, data );
    // Close stuff
    H5Sclose(file_space);
    H5Dclose(did);
    H5Fclose( fid );
This version works but is VERY SLOW: more than 10 times slower than writing with 1 proc without H5Sselect_elements.
Is this to be expected? Is there a way to make it faster?

Using H5Pget_mpio_actual_io_mode, I realized that it was not using collective transfer, so I tried to force it using the following:

    H5Pset_dxpl_mpio( transfer, H5FD_MPIO_COLLECTIVE);
But unfortunately, I get tons of the following error:

HDF5-DIAG: Error detected in HDF5 (1.8.14) MPI-process 0:
  #000: H5Dio.c line 271 in H5Dwrite(): can't prepare for writing data
    major: Dataset
    minor: Write failed
  #001: H5Dio.c line 352 in H5D__pre_write(): can't write data
    major: Dataset
    minor: Write failed
  #002: H5Dio.c line 788 in H5D__write(): can't write data
    major: Dataset
    minor: Write failed
  #003: H5Dmpio.c line 757 in H5D__chunk_collective_write(): write error
    major: Dataspace
    minor: Write failed
  #004: H5Dmpio.c line 685 in H5D__chunk_collective_io(): couldn't finish linked chunk MPI-IO
    major: Low-level I/O
    minor: Can't get value
  #005: H5Dmpio.c line 881 in H5D__link_chunk_collective_io(): couldn't finish shared collective MPI-IO
    major: Data storage
    minor: Can't get value
  #006: H5Dmpio.c line 1401 in H5D__inter_collective_io(): couldn't finish collective MPI-IO
    major: Low-level I/O
    minor: Can't get value
  #007: H5Dmpio.c line 1445 in H5D__final_collective_io(): optimized write failed
    major: Dataset
    minor: Write failed
  #008: H5Dmpio.c line 297 in H5D__mpio_select_write(): can't finish collective parallel write
    major: Low-level I/O
    minor: Write failed
  #009: H5Fio.c line 171 in H5F_block_write(): write through metadata accumulator failed
    major: Low-level I/O
    minor: Write failed
  #010: H5Faccum.c line 825 in H5F__accum_write(): file write failed
    major: Low-level I/O
    minor: Write failed
  #011: H5FDint.c line 246 in H5FD_write(): driver write request failed
    major: Virtual File Layer
    minor: Write failed
  #012: H5FDmpio.c line 1802 in H5FD_mpio_write(): MPI_File_set_view failed
    major: Internal error (too specific to document in detail)
    minor: Some MPI function failed
  #013: H5FDmpio.c line 1802 in H5FD_mpio_write(): MPI_ERR_ARG: invalid argument of some other kind
    major: Internal error (too specific to document in detail)
    minor: MPI Error String
The same happens with both HDF5 1.8.14 and 1.8.15
Any ideas how to fix this ?
Thank you
Fred