Hi,
I have had sporadic crashes with parallel HDF5, and when I checked my code with valgrind it seems that the crash is due to a bug in H5Smpio.c. I am using hdf5 version 1.8.8.
In routine H5S_obtain_datatype, starting near line 568 of H5Smpio.c, memory is being realloced if larger buffers are necessary:
/* Check if we need to increase the size of the buffers */
if(outercount >= alloc_count) {
MPI_Aint *tmp_disp; /* Temporary pointer to new displacement buffer */
int *tmp_blocklen; /* Temporary pointer to new block length buffer */
MPI_Datatype *tmp_inner_type; /* Temporary pointer to inner MPI datatype buffer */
/\* Double the allocation count \*/
alloc\_count \*= 2;
/\* Re\-allocate the buffers \*/
if\(NULL == \(tmp\_disp = \(MPI\_Aint \*\)H5MM\_realloc\(disp, alloc\_count \* sizeof\(MPI\_Aint\)\)\)\)
HGOTO\_ERROR\(H5E\_DATASPACE, H5E\_CANTALLOC, FAIL, "can't allocate array of displacements"\)
disp = tmp\_disp;
if\(NULL == \(tmp\_blocklen = \(int \*\)H5MM\_realloc\(blocklen, alloc\_count \* sizeof\(int\)\)\)\)
HGOTO\_ERROR\(H5E\_DATASPACE, H5E\_CANTALLOC, FAIL, "can't allocate array of block lengths"\)
blocklen = tmp\_blocklen;
if\(NULL == \(tmp\_inner\_type = \(MPI\_Datatype \*\)H5MM\_realloc\(inner\_type, alloc\_count \* sizeof\(MPI\_Datatype\)\)\)\)
HGOTO\_ERROR\(H5E\_DATASPACE, H5E\_CANTALLOC, FAIL, "can't allocate array of inner MPI datatypes"\)
} /* end if */
However, unlike with the "disp" and "blocklen" buffers, the inner_type is never pointed to the new tmp_inner_type buffer!! So now inner_type has been freed and doesn't point to anything, and the realloced memory is leaked and will never be freed.
The fix is to just add a line:
inner_type = tmp_inner_type;
after the call to H5MM_realloc as for the "disp" and "blocklen" buffers. I have attached a patch for this. With this fix, parallel hdf5 works very well for me, but without the fix I get many crashes. I hope this can be fixed for the 1.8.9 release,
Martin J. Otte
Atmospheric Modeling and Analysis Division
U.S. Environmental Protection Agency
109 T.W. Alexander Drive, Mail Drop E243-03
Research Triangle Park, NC 27711 USA
Fax: 919-541-1379
Voice: 919-541-0147
hdf5-H5Smpio_realloc.patch (580 Bytes)
···
=