I use parallel HDF5 and tested a problem from the HDF5 group, except that I changed the size of the dataset, and the datatype to double. The original code example is at:
https://support.hdfgroup.org/ftp/HDF5/examples/parallel/Hyperslab_by_row.c
The code I use is:
#include "hdf5.h"
#include "stdlib.h"
#define H5FILE_NAME "SDS_row.h5"
#define DATASETNAME "IntArray"
#define NX 800 /* dataset dimensions */
#define NY 6554
#define RANK 2
int main (int argc, char **argv)
{
/*
* HDF5 APIs definitions
*/
hid_t file_id, dset_id; /* file and dataset identifiers */
hid_t filespace, memspace; /* file and memory dataspace identifiers */
hsize_t dimsf[2]; /* dataset dimensions */
double *data; /* pointer to data buffer to write */
hsize_t count[2]; /* hyperslab selection parameters */
hsize_t offset[2];
hid_t plist_id; /* property list identifier */
herr_t status;
/*
* MPI variables
*/
int mpi_size, mpi_rank;
MPI_Comm comm = MPI_COMM_WORLD;
MPI_Info info = MPI_INFO_NULL;
/*
* Initialize MPI
*/
MPI_Init(&argc, &argv);
MPI_Comm_size(comm, &mpi_size);
MPI_Comm_rank(comm, &mpi_rank);
/*
* Set up file access property list with parallel I/O access
*/
plist_id = H5Pcreate(H5P_FILE_ACCESS);
H5Pset_fapl_mpio(plist_id, comm, info);
/*
* Create a new file collectively and release property list identifier.
*/
file_id = H5Fcreate(H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, plist_id);
H5Pclose(plist_id);
/*
* Create the dataspace for the dataset.
*/
dimsf[0] = NX;
dimsf[1] = NY;
filespace = H5Screate_simple(RANK, dimsf, NULL);
/*
* Create the dataset with default properties and close filespace.
*/
dset_id = H5Dcreate(file_id, DATASETNAME, H5T_NATIVE_DOUBLE, filespace,
H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
H5Sclose(filespace);
/*
* Each process defines dataset in memory and writes it to the hyperslab
* in the file.
*/
count[0] = dimsf[0]/mpi_size;
count[1] = dimsf[1];
offset[0] = mpi_rank * count[0];
offset[1] = 0;
memspace = H5Screate_simple(RANK, count, NULL);
/*
* Select hyperslab in the file.
*/
filespace = H5Dget_space(dset_id);
H5Sselect_hyperslab(filespace, H5S_SELECT_SET, offset, NULL, count, NULL);
/*
* Initialize data buffer
*/
data = (double *) malloc(sizeof(double)*count[0]*count[1]);
for (hsize_t i=0; i < count[0]*count[1]; i++) {
data[i] = mpi_rank + 10;
}
/*
* Create property list for collective dataset write.
*/
plist_id = H5Pcreate(H5P_DATASET_XFER);
H5Pset_dxpl_mpio(plist_id, H5FD_MPIO_COLLECTIVE);
status = H5Dwrite(dset_id, H5T_NATIVE_DOUBLE, memspace, filespace,
plist_id, data);
free(data);
/*
* Close/release resources.
*/
H5Dclose(dset_id);
H5Sclose(filespace);
H5Sclose(memspace);
H5Pclose(plist_id);
H5Fclose(file_id);
MPI_Finalize();
return 0;
}
If I compile this using parallel HDF5, and run it with
mpirun -np 12 ./test
the program gets stuck. If however, I use NX=500, then it is OK. Moreover, if I use 4 cores, then it is also OK. I have searched online for a whole afternoon and couldn’t find a solution. Could someone please let me know how to fix this, or what the problem of this code is? I use MacOS, and I compile the code using openmpi with gcc9. My HDF5 version, I believe, is 1.10.