Repeated H5Dwrite calls increase memory usage

Hi all,
I'm writing data-logging software storing in an HDF5 file, to be run for
months at a time. I'm intending to use a chunked dataset, and append a row
every time a measurement is made by using H5Dwrite with a hyperslab
selection on the filespace. However, I've noticed that the memory used by
the application increases with every subsequent H5Dwrite call.

I've simplified it down to a minimal example below. My understanding is
that since the dimensions are fixed at creation time there should be no
increase of memory during the loop. Any suggestions for why it should
continue to accumulate memory? Commenting out the line start[0]=i; results
in no memory increase. This example doesn't accumulate much memory but my
actual application runs out of memory and crashes after about a week.

Am I doing something stupid? Is this expected behaviour? I'm running HDF5
1.8.11 on Win7 32-bit.

Cheers,
Martijn

#include <hdf5.h>

···

const int NITEMS = 10000;
const int NREPS = 100000;
int main()
{
int i;
hid_t fp, sp, dset, memsp, props;
float *arr = ( float* )calloc( NITEMS, sizeof( float ));
hsize_t dims[] = { NREPS, NITEMS };
hsize_t start[] = { 0,0 };
hsize_t count[] = { 1,NITEMS };
/* invent data */
for ( i = 0; i < NITEMS; ++i ) arr[i] = i;
fp = H5Fcreate( "test.h5",H5F_ACC_TRUNC,H5P_DEFAULT,H5P_DEFAULT );
if ( fp <= 0 ) return 1;
/* create chunked dataset */
props = H5Pcreate( H5P_DATASET_CREATE );
H5Pset_chunk( props, 2, count );
sp = H5Screate_simple( 2,dims,NULL );
dset = H5Dcreate2(
fp,"test",H5T_NATIVE_FLOAT,sp,H5P_DEFAULT,props,H5P_DEFAULT );
if ( dset <= 0 ) return 1;
H5Pclose( props );
/* write row by row */
memsp = H5Screate_simple( 2,count,NULL );
for ( i = 0; i < NREPS; ++i )
{
start[0] = i;
H5Sselect_hyperslab( sp,H5S_SELECT_SET,start,NULL,count,NULL );
if ( H5Dwrite( dset,H5T_NATIVE_FLOAT,memsp,sp,H5P_DEFAULT,arr ) < 0 )
break;
}
H5Sclose( memsp );
H5Dclose( dset );
H5Fclose( fp );
}

Hi Martijn,

I'm not so sure that your problem is in HDF5. I ran your program on
both 32-bit Linux and 64-bit Windows 7 w/ VS 2010 (I don't have a
32-bit Win7 VM) and I'm not seeing a memory leak (via Valgrind on
Linux and _CrtDumpMemoryLeaks() on Windows). In all cases, I used
HDF5 1.8.11.

As for memory usage in your example program, I see a quick heap growth
up to around 16 MB and then it levels off and never grows again, no
matter how much data I write. This is the expected behavior of the
HDF5 library which has internal metadata and chunk caches as well as
free lists. These are all pretty conservative and shouldn't cause
your system to run out of memory.

If you have any other information that implicates the HDF5 library,
I'd be happy to take a look at it but I'm pretty sure that the
resource leak you see is not coming from us.

Thanks for the bug report,

Dana

···

On Sat, Jul 20, 2013 at 1:22 AM, Martijn Jasperse <m.jasperse@gmail.com> wrote:

Hi all,
I'm writing data-logging software storing in an HDF5 file, to be run for months at a time. I'm intending to use a chunked dataset, and append a row every time a measurement is made by using H5Dwrite with a hyperslab selection on the filespace. However, I've noticed that the memory used by the application increases with every subsequent H5Dwrite call.

I've simplified it down to a minimal example below. My understanding is that since the dimensions are fixed at creation time there should be no increase of memory during the loop. Any suggestions for why it should continue to accumulate memory? Commenting out the line start[0]=i; results in no memory increase. This example doesn't accumulate much memory but my actual application runs out of memory and crashes after about a week.

Am I doing something stupid? Is this expected behaviour? I'm running HDF5 1.8.11 on Win7 32-bit.

Cheers,
Martijn

#include <hdf5.h>
const int NITEMS = 10000;
const int NREPS = 100000;
int main()
{
int i;
hid_t fp, sp, dset, memsp, props;
float *arr = ( float* )calloc( NITEMS, sizeof( float ));
hsize_t dims[] = { NREPS, NITEMS };
hsize_t start[] = { 0,0 };
hsize_t count[] = { 1,NITEMS };
/* invent data */
for ( i = 0; i < NITEMS; ++i ) arr[i] = i;
fp = H5Fcreate( "test.h5",H5F_ACC_TRUNC,H5P_DEFAULT,H5P_DEFAULT );
if ( fp <= 0 ) return 1;
/* create chunked dataset */
props = H5Pcreate( H5P_DATASET_CREATE );
H5Pset_chunk( props, 2, count );
sp = H5Screate_simple( 2,dims,NULL );
dset = H5Dcreate2( fp,"test",H5T_NATIVE_FLOAT,sp,H5P_DEFAULT,props,H5P_DEFAULT );
if ( dset <= 0 ) return 1;
H5Pclose( props );
/* write row by row */
memsp = H5Screate_simple( 2,count,NULL );
for ( i = 0; i < NREPS; ++i )
{
start[0] = i;
H5Sselect_hyperslab( sp,H5S_SELECT_SET,start,NULL,count,NULL );
if ( H5Dwrite( dset,H5T_NATIVE_FLOAT,memsp,sp,H5P_DEFAULT,arr ) < 0 ) break;
}
H5Sclose( memsp );
H5Dclose( dset );
H5Fclose( fp );
}

Hi Dana,
Thanks for the explanation - I jumped at a false positive! My "minimal
example" was perhaps a bit too minimal; not enough coffee leads to bad
conclusions. I'll continue to probe the rest of my code, no doubt it's a
missing H5*close somewhere.

Thanks again.

Cheers,
Martijn

···

On 23 July 2013 07:56, Dana Robinson <derobins@hdfgroup.org> wrote:

Hi Martijn,

I'm not so sure that your problem is in HDF5. I ran your program on
both 32-bit Linux and 64-bit Windows 7 w/ VS 2010 (I don't have a
32-bit Win7 VM) and I'm not seeing a memory leak (via Valgrind on
Linux and _CrtDumpMemoryLeaks() on Windows). In all cases, I used
HDF5 1.8.11.

As for memory usage in your example program, I see a quick heap growth
up to around 16 MB and then it levels off and never grows again, no
matter how much data I write. This is the expected behavior of the
HDF5 library which has internal metadata and chunk caches as well as
free lists. These are all pretty conservative and shouldn't cause
your system to run out of memory.

If you have any other information that implicates the HDF5 library,
I'd be happy to take a look at it but I'm pretty sure that the
resource leak you see is not coming from us.

Thanks for the bug report,

Dana

On Sat, Jul 20, 2013 at 1:22 AM, Martijn Jasperse <m.jasperse@gmail.com> > wrote:
>
> Hi all,
> I'm writing data-logging software storing in an HDF5 file, to be run for
months at a time. I'm intending to use a chunked dataset, and append a row
every time a measurement is made by using H5Dwrite with a hyperslab
selection on the filespace. However, I've noticed that the memory used by
the application increases with every subsequent H5Dwrite call.
>
> I've simplified it down to a minimal example below. My understanding is
that since the dimensions are fixed at creation time there should be no
increase of memory during the loop. Any suggestions for why it should
continue to accumulate memory? Commenting out the line start[0]=i; results
in no memory increase. This example doesn't accumulate much memory but my
actual application runs out of memory and crashes after about a week.
>
> Am I doing something stupid? Is this expected behaviour? I'm running
HDF5 1.8.11 on Win7 32-bit.
>
> Cheers,
> Martijn
>
>> #include <hdf5.h>
>> const int NITEMS = 10000;
>> const int NREPS = 100000;
>> int main()
>> {
>> int i;
>> hid_t fp, sp, dset, memsp, props;
>> float *arr = ( float* )calloc( NITEMS, sizeof( float ));
>> hsize_t dims[] = { NREPS, NITEMS };
>> hsize_t start[] = { 0,0 };
>> hsize_t count[] = { 1,NITEMS };
>> /* invent data */
>> for ( i = 0; i < NITEMS; ++i ) arr[i] = i;
>> fp = H5Fcreate( "test.h5",H5F_ACC_TRUNC,H5P_DEFAULT,H5P_DEFAULT );
>> if ( fp <= 0 ) return 1;
>> /* create chunked dataset */
>> props = H5Pcreate( H5P_DATASET_CREATE );
>> H5Pset_chunk( props, 2, count );
>> sp = H5Screate_simple( 2,dims,NULL );
>> dset = H5Dcreate2(
fp,"test",H5T_NATIVE_FLOAT,sp,H5P_DEFAULT,props,H5P_DEFAULT );
>> if ( dset <= 0 ) return 1;
>> H5Pclose( props );
>> /* write row by row */
>> memsp = H5Screate_simple( 2,count,NULL );
>> for ( i = 0; i < NREPS; ++i )
>> {
>> start[0] = i;
>> H5Sselect_hyperslab( sp,H5S_SELECT_SET,start,NULL,count,NULL );
>> if ( H5Dwrite( dset,H5T_NATIVE_FLOAT,memsp,sp,H5P_DEFAULT,arr ) < 0 )
break;
>> }
>> H5Sclose( memsp );
>> H5Dclose( dset );
>> H5Fclose( fp );
>> }
>
>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

If you need help tracking down these resources, here is some code. You
probably need to change 'fp' to whatever identifier you are using for HDF5
files.

···

On Tue, Jul 23, 2013 at 12:41:05PM +1000, Martijn Jasperse wrote:

Hi Dana,
Thanks for the explanation - I jumped at a false positive! My "minimal
example" was perhaps a bit too minimal; not enough coffee leads to bad
conclusions. I'll continue to probe the rest of my code, no doubt it's a
missing H5*close somewhere.

-------------------------------------
    ssize_t norphans;
    norphans = H5Fget_obj_count(fp, H5F_OBJ_ALL);
    if (norphans > 1) { /* expect 1 for the file we have not closed */
        int i;
        H5O_info_t info;
        char name[64];
        hid_t * objects = calloc(norphans, sizeof(hid_t));
        H5Fget_obj_ids(fp, H5F_OBJ_ALL, -1, objects);
        for (i=0; i<norphans; i++) {
            H5Oget_info(objects[i], &info);
            H5Iget_name(objects[i], name, 64);
            printf("%d of %zd things still open: %d with name %s of type %d",
                  i, norphans, objects[i], name, info.type);
        }
        free(objects);
    }
-------------------------------------

==rob

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

Hi Rob,
Thanks for the tip! That might save me some trouble later on.
Cheers, Martijn

···

On 23 July 2013 23:14, Rob Latham <robl@mcs.anl.gov> wrote:

On Tue, Jul 23, 2013 at 12:41:05PM +1000, Martijn Jasperse wrote:
> Hi Dana,
> Thanks for the explanation - I jumped at a false positive! My "minimal
> example" was perhaps a bit too minimal; not enough coffee leads to bad
> conclusions. I'll continue to probe the rest of my code, no doubt it's a
> missing H5*close somewhere.

If you need help tracking down these resources, here is some code. You
probably need to change 'fp' to whatever identifier you are using for HDF5
files.

-------------------------------------
    ssize_t norphans;
    norphans = H5Fget_obj_count(fp, H5F_OBJ_ALL);
    if (norphans > 1) { /* expect 1 for the file we have not closed */
        int i;
        H5O_info_t info;
        char name[64];
        hid_t * objects = calloc(norphans, sizeof(hid_t));
        H5Fget_obj_ids(fp, H5F_OBJ_ALL, -1, objects);
        for (i=0; i<norphans; i++) {
            H5Oget_info(objects[i], &info);
            H5Iget_name(objects[i], name, 64);
            printf("%d of %zd things still open: %d with name %s of type
%d",
                  i, norphans, objects[i], name, info.type);
        }
        free(objects);
    }
-------------------------------------

==rob

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Hi Martijn,

Which compiler are you using?

···

On Sat, Jul 20, 2013 at 1:22 AM, Martijn Jasperse <m.jasperse@gmail.com>wrote:

Hi all,
I'm writing data-logging software storing in an HDF5 file, to be run for
months at a time. I'm intending to use a chunked dataset, and append a row
every time a measurement is made by using H5Dwrite with a hyperslab
selection on the filespace. However, I've noticed that the memory used by
the application increases with every subsequent H5Dwrite call.

I've simplified it down to a minimal example below. My understanding is
that since the dimensions are fixed at creation time there should be no
increase of memory during the loop. Any suggestions for why it should
continue to accumulate memory? Commenting out the line start[0]=i;results in no memory increase. This example doesn't accumulate much memory
but my actual application runs out of memory and crashes after about a week.

Am I doing something stupid? Is this expected behaviour? I'm running
HDF5 1.8.11 on Win7 32-bit.

Cheers,
Martijn

  #include <hdf5.h>

const int NITEMS = 10000;
const int NREPS = 100000;
int main()
{
int i;
hid_t fp, sp, dset, memsp, props;
float *arr = ( float* )calloc( NITEMS, sizeof( float ));
hsize_t dims[] = { NREPS, NITEMS };
hsize_t start[] = { 0,0 };
hsize_t count[] = { 1,NITEMS };
/* invent data */
for ( i = 0; i < NITEMS; ++i ) arr[i] = i;
fp = H5Fcreate( "test.h5",H5F_ACC_TRUNC,H5P_DEFAULT,H5P_DEFAULT );
if ( fp <= 0 ) return 1;
/* create chunked dataset */
props = H5Pcreate( H5P_DATASET_CREATE );
H5Pset_chunk( props, 2, count );
sp = H5Screate_simple( 2,dims,NULL );
dset = H5Dcreate2(
fp,"test",H5T_NATIVE_FLOAT,sp,H5P_DEFAULT,props,H5P_DEFAULT );
if ( dset <= 0 ) return 1;
H5Pclose( props );
/* write row by row */
memsp = H5Screate_simple( 2,count,NULL );
for ( i = 0; i < NREPS; ++i )
{
start[0] = i;
H5Sselect_hyperslab( sp,H5S_SELECT_SET,start,NULL,count,NULL );
if ( H5Dwrite( dset,H5T_NATIVE_FLOAT,memsp,sp,H5P_DEFAULT,arr ) < 0 )
break;
}
H5Sclose( memsp );
H5Dclose( dset );
H5Fclose( fp );
}