Ok I have created a minimal example that I am still seeing errors with. This error only shows up when trying to create
a file with async
. Reading a regular hdf5 file created with synchronous methods using read_async
works without issue.
It seems that something within the event set
cannot be closed but this happens after H5ESwait()
returned without issue already. Relevant line is Finish waiting for async, num in progess: 0, failed: 0, status: 0
which is just a prinf() for op_failed
, num_in_progress
and the status
H5ESwait() returns.
Using the async VOL connector
Succeed with dset write
Succeed waiting for event set operations
Succeed with closing async_fapl
Succeed with closing dset_id
Succeed with closing file_id
HDF5-DIAG: Error detected in HDF5 (1.14.5) MPI-process 0:
#000: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5ES.c line 634 in H5ESclose(): unable to decrement ref count on event set
major: Event Set
minor: Unable to decrement reference count
#001: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5Iint.c line 1087 in H5I_dec_app_ref(): can't decrement ID ref count
major: Object ID
minor: Unable to decrement reference count
#002: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5Iint.c line 1042 in H5I__dec_app_ref(): can't decrement ID ref count
major: Object ID
minor: Unable to decrement reference count
#003: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5ESint.c line 194 in H5ES__close_cb(): unable to close event set
major: Event Set
minor: Close failed
#004: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5ESint.c line 989 in H5ES__close(): can't close event set while unfinished operations are present (i.e. wait on event set first)
major: Event Set
minor: Can't close object
Error with closing es_id
Finish waiting for async, num in progess: 0, failed: 0, status: 0
Further the error stack then loops the following and eventually deadlocks.
HDF5-DIAG: Error detected in HDF5 (1.14.5) MPI-process 0:
#000: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5VL.c line 892 in H5VLfree_lib_state(): can't free library state
major: Virtual Object Layer
minor: Unable to release object
#001: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5VLint.c line 2201 in H5VL_free_lib_state(): can't free API context state
major: Virtual Object Layer
minor: Unable to release object
#002: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5CX.c line 1122 in H5CX_free_state(): can't decrement refcount on DCPL
major: API Context
minor: Unable to decrement reference count
#003: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5Iint.c line 1010 in H5I_dec_ref(): can't decrement ID ref count
major: Object ID
minor: Unable to decrement reference count
#004: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5Iint.c line 948 in H5I__dec_ref(): can't locate ID
major: Object ID
minor: Unable to find ID information (already closed?)
[ ABT ERROR] async_dataset_close_fn H5VLfree_lib_state failed
ran with mpiexec -n 1 ./a.out. I have sadly not figure out the event set error handling yet.
void create_hdf5_async(int argc, char **argv)
{
hid_t async_fapl = 0;
hid_t dxpl_id = 0;
hid_t file_id = 0;
hid_t filespace = 0;
hid_t memspace = 0;
hid_t dset_id = 0;
hid_t es_id = 0;
herr_t status = -1;
hsize_t dims[1];
hsize_t count[1];
hsize_t offset[1];
int size = 20;
/*
* Initialize MPI
*/
int mpi_size, mpi_rank;
MPI_Comm comm = MPI_COMM_WORLD;
MPI_Info info = MPI_INFO_NULL;
int mpi_thread_required = MPI_THREAD_MULTIPLE;
int mpi_thread_provided = 0;
/* Initialize MPI with threading support */
MPI_Init_thread(&argc, &argv, mpi_thread_required, &mpi_thread_provided);
MPI_Comm_size(comm, &mpi_size);
MPI_Comm_rank(comm, &mpi_rank);
es_id = H5EScreate();
if (es_id < 0)
{
fprintf(stderr, "Error with first event set create\n");
}
/*
* Set up file access property list with parallel I/O access
*/
async_fapl = H5Pcreate(H5P_FILE_ACCESS);
status = H5Pset_fapl_mpio(async_fapl, MPI_COMM_WORLD, MPI_INFO_NULL);
check_vol_async_present();
/*
* Create a new file collectively.
*/
file_id = H5Fcreate_async("data/datasets/test_dataset_hdf5-c_async.h5", H5F_ACC_TRUNC, H5P_DEFAULT, async_fapl, es_id);
if (file_id < 0)
{
fprintf(stderr, "Error with file create\n");
}
dims[0] = size;
filespace = H5Screate_simple(1, dims, NULL);
memspace = H5Screate_simple(1, dims, NULL);
/*
* Create the dataset with default properties and close filespace.
*/
dset_id = H5Dcreate_async(file_id, "/X", H5T_IEEE_F64LE, filespace, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT, es_id);
if (dset_id < 0)
{
fprintf(stderr, "Error with dset create\n");
}
/*
* Initialize data buffer
*/
float wbuf[] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0};
if (!wbuf)
{
fprintf(stderr, "Fatal: unable to allocate array\n");
exit(EXIT_FAILURE);
}
dxpl_id = H5Pcreate(H5P_DATASET_XFER);
H5Pset_dxpl_mpio(dxpl_id, H5FD_MPIO_COLLECTIVE);
offset[0] = 0;
count[0] = size;
H5Sselect_hyperslab(filespace, H5S_SELECT_SET, offset, NULL, count, NULL);
status = H5Dwrite_async(dset_id, H5T_NATIVE_FLOAT, H5S_BLOCK, filespace, dxpl_id, wbuf, es_id);
if (status < 0)
{
fprintf(stderr, "Error with dset write\n");
}
else
fprintf(stderr, "Succeed with dset write\n");
/*
* Close/release resources.
*/
size_t num_in_progress;
hbool_t op_failed;
status = H5ESwait(es_id, H5ES_WAIT_FOREVER, &num_in_progress, &op_failed);
if (status < 0)
{
fprintf(stderr, "Error waiting for event set operations\n");
}
else
fprintf(stderr, "Succeed waiting for event set operations\n");
printf("Finish waiting for async, num in progess: %ld, failed: %d, status: %d \n", num_in_progress, op_failed, status);
status = H5Pclose(async_fapl);
if (status < 0)
{
fprintf(stderr, "Error with closing async_fapl\n");
}
else
fprintf(stderr, "Succeed with closing async_fapl\n");
status = H5Pclose(dxpl_id);
status = H5Dclose_async(dset_id, es_id);
if (status < 0)
{
fprintf(stderr, "Error with closing dset_id\n");
}
else
fprintf(stderr, "Succeed with closing dset_id\n");
status = H5Fclose_async(file_id, es_id);
if (status < 0)
{
fprintf(stderr, "Error with closing file_id\n");
}
else
fprintf(stderr, "Succeed with closing file_id\n");
status = H5ESclose(es_id);
if (status < 0)
{
fprintf(stderr, "Error with closing es_id\n");
}
else
fprintf(stderr, "Succeed with closing es_id\n");
}
full stacktrace:
HDF5-DIAG: Error detected in HDF5 (1.14.5) MPI-process 0:
#000: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5VL.c line 892 in H5VLfree_lib_state(): can't free library state
major: Virtual Object Layer
minor: Unable to release object
#001: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5VLint.c line 2201 in H5VL_free_lib_state(): can't free API context state
major: Virtual Object Layer
minor: Unable to release object
#002: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5CX.c line 1122 in H5CX_free_state(): can't decrement refcount on DCPL
major: API Context
minor: Unable to decrement reference count
#003: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5Iint.c line 1010 in H5I_dec_ref(): can't decrement ID ref count
major: Object ID
minor: Unable to decrement reference count
#004: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5Iint.c line 948 in H5I__dec_ref(): can't locate ID
major: Object ID
minor: Unable to find ID information (already closed?)
[ ABT ERROR] async_dataset_close_fn H5VLfree_lib_state failed
HDF5-DIAG: Error detected in HDF5 (1.14.5) MPI-process 0:
#000: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5P.c line 1468 in H5Pclose(): can't close
major: Property lists
minor: Unable to free object
#001: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5Iint.c line 1087 in H5I_dec_app_ref(): can't decrement ID ref count
major: Object ID
minor: Unable to decrement reference count
#002: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5Iint.c line 1042 in H5I__dec_app_ref(): can't decrement ID ref count
major: Object ID
minor: Unable to decrement reference count
#003: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5Iint.c line 948 in H5I__dec_ref(): can't locate ID
major: Object ID
minor: Unable to find ID information (already closed?)
HDF5-DIAG: Error detected in HDF5 (1.14.5) MPI-process 0:
#000: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5VL.c line 892 in H5VLfree_lib_state(): can't free library state
major: Virtual Object Layer
minor: Unable to release object
#001: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5VLint.c line 2201 in H5VL_free_lib_state(): can't free API context state
major: Virtual Object Layer
minor: Unable to release object
#002: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5CX.c line 1122 in H5CX_free_state(): can't decrement refcount on DCPL
major: API Context
minor: Unable to decrement reference count
#003: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5Iint.c line 1010 in H5I_dec_ref(): can't decrement ID ref count
major: Object ID
minor: Unable to decrement reference count
#004: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5Iint.c line 948 in H5I__dec_ref(): can't locate ID
major: Object ID
minor: Unable to find ID information (already closed?)
[ ABT ERROR] async_file_close_fn H5VLfree_lib_state failed
HDF5-DIAG: Error detected in HDF5 (1.14.5) MPI-process 0:
#000: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5P.c line 1468 in H5Pclose(): can't close
major: Property lists
minor: Unable to free object
#001: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5Iint.c line 1087 in H5I_dec_app_ref(): can't decrement ID ref count
major: Object ID
minor: Unable to decrement reference count
#002: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5Iint.c line 1042 in H5I__dec_app_ref(): can't decrement ID ref count
major: Object ID
minor: Unable to decrement reference count
#003: /tmp/dev/spack-stage/spack-stage-hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/spack-src/src/H5Iint.c line 948 in H5I__dec_ref(): can't locate ID
major: Object ID
minor: Unable to find ID information (already closed?)