Hi, HDF users,
I am trying to write a 1D array from several processes in an anarchic way.
Each proc has a subset of the array to write, but elements are not
contiguous and unsorted. Each proc knows the positions where it should
write each element.
With some help from the thread
I tried to implement it.
First, the master proc (only that one) creates the file:
// create file
hid_t fid = H5Fcreate( name.c_str(), H5F_ACC_TRUNC, H5P_DEFAULT,
// prepare file space
hsize_t dims[2] = {1, global_nElements};
hsize_t max_dims[2] = {H5S_UNLIMITED, global_nElements}; // not
really needed but for future use
hid_t file_space = H5Screate_simple(2, dims, max_dims);
// prepare dataset
hid_t plist = H5Pcreate(H5P_DATASET_CREATE);
H5Pset_layout(plist, H5D_CHUNKED);
hsize_t chunk_dims[2] = {1, global_nElements};
H5Pset_chunk(plist, 2, chunk_dims);
// create dataset
hid_t did = H5Dcreate(fid, "Id", H5T_NATIVE_UINT, file_space,
H5Fclose( fid );
Then, all procs open the file and write their subset:
// define MPI file access
hid_t file_access = H5Pcreate(H5P_FILE_ACCESS);
H5Pset_fapl_mpio( file_access, MPI_COMM_WORLD, MPI_INFO_NULL );
// define MPI transfer mode
hid_t transfer = H5Pcreate(H5P_DATASET_XFER);
// Open the file
hid_t fid = H5Fopen( name.c_str(), H5F_ACC_RDWR, file_access);
// Open the existing dataset
hid_t did = H5Dopen( fid, dataset.c_str(), H5P_DEFAULT );
// Get the file space
hid_t file_space = H5Dget_space(did);
// Define the memory space for this proc
hsize_t count[2] = {1, (hsize_t) local_nElements};
hid_t mem_space = H5Screate_simple(2, count, NULL);
// Select the elements for this particular proc (the `coords` array has
been created before)
H5Sselect_elements( file_space, H5S_SELECT_SET, local_nElements, coords
// Write the previously generated `data` array
H5Dwrite( did, H5T_NATIVE_UINT, mem_space , file_space , transfer, data
// Close stuff
H5Fclose( fid );
This version works but is VERY SLOW: more than 10 times slower than writing
with 1 proc without H5Sselect_elements.
Is this to be expected? Is there a way to make it faster?
Using H5Pget_mpio_actual_io_mode, I realized that it was not using
collective transfer, so I tried to force it using the following:
H5Pset_dxpl_mpio( transfer, H5FD_MPIO_COLLECTIVE);
But unfortunately, I get tons of the following error:
HDF5-DIAG: Error detected in HDF5 (1.8.14) MPI-process 0:
#000: H5Dio.c line 271 in H5Dwrite(): can't prepare for writing data
major: Dataset
minor: Write failed
#001: H5Dio.c line 352 in H5D__pre_write(): can't write data
major: Dataset
minor: Write failed
#002: H5Dio.c line 788 in H5D__write(): can't write data
major: Dataset
minor: Write failed
#003: H5Dmpio.c line 757 in H5D__chunk_collective_write(): write error
major: Dataspace
minor: Write failed
#004: H5Dmpio.c line 685 in H5D__chunk_collective_io(): couldn't finish
linked chunk MPI-IO
major: Low-level I/O
minor: Can't get value
#005: H5Dmpio.c line 881 in H5D__link_chunk_collective_io(): couldn't
finish shared collective MPI-IO
major: Data storage
minor: Can't get value
#006: H5Dmpio.c line 1401 in H5D__inter_collective_io(): couldn't finish
collective MPI-IO
major: Low-level I/O
minor: Can't get value
#007: H5Dmpio.c line 1445 in H5D__final_collective_io(): optimized write
major: Dataset
minor: Write failed
#008: H5Dmpio.c line 297 in H5D__mpio_select_write(): can't finish
collective parallel write
major: Low-level I/O
minor: Write failed
#009: H5Fio.c line 171 in H5F_block_write(): write through metadata
accumulator failed
major: Low-level I/O
minor: Write failed
#010: H5Faccum.c line 825 in H5F__accum_write(): file write failed
major: Low-level I/O
minor: Write failed
#011: H5FDint.c line 246 in H5FD_write(): driver write request failed
major: Virtual File Layer
minor: Write failed
#012: H5FDmpio.c line 1802 in H5FD_mpio_write(): MPI_File_set_view failed
major: Internal error (too specific to document in detail)
minor: Some MPI function failed
#013: H5FDmpio.c line 1802 in H5FD_mpio_write(): MPI_ERR_ARG: invalid
argument of some other kind
major: Internal error (too specific to document in detail)
minor: MPI Error String
The same happens with both HDF5 1.8.14 and 1.8.15
Any ideas how to fix this ?
Thank you