Fix for a memory error in H5Smpio.c

Hi,

I have had sporadic crashes with parallel HDF5, and when I checked my code with valgrind it seems that the crash is due to a bug in H5Smpio.c. I am using hdf5 version 1.8.8.

In routine H5S_obtain_datatype, starting near line 568 of H5Smpio.c, memory is being realloced if larger buffers are necessary:

/* Check if we need to increase the size of the buffers */
if(outercount >= alloc_count) {
MPI_Aint *tmp_disp; /* Temporary pointer to new displacement buffer */
int *tmp_blocklen; /* Temporary pointer to new block length buffer */
MPI_Datatype *tmp_inner_type; /* Temporary pointer to inner MPI datatype buffer */

/\* Double the allocation count \*/
alloc\_count \*= 2;

/\* Re\-allocate the buffers \*/
if\(NULL == \(tmp\_disp = \(MPI\_Aint \*\)H5MM\_realloc\(disp, alloc\_count \* sizeof\(MPI\_Aint\)\)\)\)
    HGOTO\_ERROR\(H5E\_DATASPACE, H5E\_CANTALLOC, FAIL, "can't allocate array of displacements"\)
disp = tmp\_disp;
if\(NULL == \(tmp\_blocklen = \(int \*\)H5MM\_realloc\(blocklen, alloc\_count \* sizeof\(int\)\)\)\)
    HGOTO\_ERROR\(H5E\_DATASPACE, H5E\_CANTALLOC, FAIL, "can't allocate array of block lengths"\)
blocklen = tmp\_blocklen;
if\(NULL == \(tmp\_inner\_type = \(MPI\_Datatype \*\)H5MM\_realloc\(inner\_type, alloc\_count \* sizeof\(MPI\_Datatype\)\)\)\)
    HGOTO\_ERROR\(H5E\_DATASPACE, H5E\_CANTALLOC, FAIL, "can't allocate array of inner MPI datatypes"\)

} /* end if */

However, unlike with the "disp" and "blocklen" buffers, the inner_type is never pointed to the new tmp_inner_type buffer!! So now inner_type has been freed and doesn't point to anything, and the realloced memory is leaked and will never be freed.

The fix is to just add a line:

inner_type = tmp_inner_type;

after the call to H5MM_realloc as for the "disp" and "blocklen" buffers. I have attached a patch for this. With this fix, parallel hdf5 works very well for me, but without the fix I get many crashes. I hope this can be fixed for the 1.8.9 release,

Martin J. Otte
Atmospheric Modeling and Analysis Division
U.S. Environmental Protection Agency
109 T.W. Alexander Drive, Mail Drop E243-03
Research Triangle Park, NC 27711 USA

Fax: 919-541-1379
Voice: 919-541-0147

hdf5-H5Smpio_realloc.patch (580 Bytes)

···

=

Hi Martin,
  Thanks for finding this! I've made the patch to the trunk and will be migrating it back to the 1.8.9 release.

  Quincey

···

On Apr 20, 2012, at 3:10 PM, Martin Otte wrote:

Hi,

I have had sporadic crashes with parallel HDF5, and when I checked my code with valgrind it seems that the crash is due to a bug in H5Smpio.c. I am using hdf5 version 1.8.8.

In routine H5S_obtain_datatype, starting near line 568 of H5Smpio.c, memory is being realloced if larger buffers are necessary:

/* Check if we need to increase the size of the buffers */
if(outercount >= alloc_count) {
    MPI_Aint *tmp_disp; /* Temporary pointer to new displacement buffer */
    int *tmp_blocklen; /* Temporary pointer to new block length buffer */
    MPI_Datatype *tmp_inner_type; /* Temporary pointer to inner MPI datatype buffer */

    /* Double the allocation count */
    alloc_count *= 2;

    /* Re-allocate the buffers */
    if(NULL == (tmp_disp = (MPI_Aint *)H5MM_realloc(disp, alloc_count * sizeof(MPI_Aint))))
        HGOTO_ERROR(H5E_DATASPACE, H5E_CANTALLOC, FAIL, "can't allocate array of displacements")
    disp = tmp_disp;
    if(NULL == (tmp_blocklen = (int *)H5MM_realloc(blocklen, alloc_count * sizeof(int))))
        HGOTO_ERROR(H5E_DATASPACE, H5E_CANTALLOC, FAIL, "can't allocate array of block lengths")
    blocklen = tmp_blocklen;
    if(NULL == (tmp_inner_type = (MPI_Datatype *)H5MM_realloc(inner_type, alloc_count * sizeof(MPI_Datatype))))
        HGOTO_ERROR(H5E_DATASPACE, H5E_CANTALLOC, FAIL, "can't allocate array of inner MPI datatypes")
} /* end if */

However, unlike with the "disp" and "blocklen" buffers, the inner_type is never pointed to the new tmp_inner_type buffer!! So now inner_type has been freed and doesn't point to anything, and the realloced memory is leaked and will never be freed.

The fix is to just add a line:

inner_type = tmp_inner_type;

after the call to H5MM_realloc as for the "disp" and "blocklen" buffers. I have attached a patch for this. With this fix, parallel hdf5 works very well for me, but without the fix I get many crashes. I hope this can be fixed for the 1.8.9 release,

Martin J. Otte
Atmospheric Modeling and Analysis Division
U.S. Environmental Protection Agency
109 T.W. Alexander Drive, Mail Drop E243-03
Research Triangle Park, NC 27711 USA

Fax: 919-541-1379
Voice: 919-541-0147
<hdf5-H5Smpio_realloc.patch>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org