Growing memory usage in small HDF program

Hello,

I am writing an application which writes large data sets to HDF5 files, in
fixed size blocks, using the HDF C++ API (version 1.8.15, patch 1, built in
msvc 2013 x64)

I my application seems to quickly consume all the available memory on my
system (win32 - around 5.9GB), and then crash whenever the system becomes
stressed (windows kills it as it has no memory)

I have also tested the application on a linux machine, where I saw similar
results.

I was under the impression that by using HDF5, the file would be brought in
and out of memory in such a way that the library would only use a small
working set - is this not true?

I have experimented with HDF features such as flushing to disk, regularly
closing and re opening, garbage collection and tuning chunking and caching
settings and haven't managed to get a stable working set.

I've attached a minimal example, can anyone point out my mistake?

Thanks,
- Jorj

hdf_test.cpp (1.99 KB)

Hmm. Well I have no experience with HDF5's C++ interface.

My first thought when reading your description was. . . I've seen that before. It happens when I forgot to H5Xclose() all the objects I H5Xopened (groups, datasets, types, dataspaces, etc.).

However, with C++, I presume the interface is designed to close objects when they fall out of scope (e.g. deconstructor is called). So, in looking at your code, even though I don't see any explicit calls to close objects previously opened, I assume that *should* be happening when the objects fall out of scope. But, are you *certain* that *is* happening? Just before exiting main, you migth wanna make a call to H5Fget_obj_count() to get some idea how many objects HDF5 library thinks are still open in the file. If you get a large number, then that would suggest the problem is that the C++ interface isn't somehow closing objects as they fall out of scope.

Thats all I can think of. Sorry if no help.

Mark

···

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Jorj Pimm <jorjpimm@gmail.com<mailto:jorjpimm@gmail.com>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Thursday, August 13, 2015 9:21 AM
To: "hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>" <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] Growing memory usage in small HDF program

Hello,

I am writing an application which writes large data sets to HDF5 files, in fixed size blocks, using the HDF C++ API (version 1.8.15, patch 1, built in msvc 2013 x64)

I my application seems to quickly consume all the available memory on my system (win32 - around 5.9GB), and then crash whenever the system becomes stressed (windows kills it as it has no memory)

I have also tested the application on a linux machine, where I saw similar results.

I was under the impression that by using HDF5, the file would be brought in and out of memory in such a way that the library would only use a small working set - is this not true?

I have experimented with HDF features such as flushing to disk, regularly closing and re opening, garbage collection and tuning chunking and caching settings and haven't managed to get a stable working set.

I've attached a minimal example, can anyone point out my mistake?

Thanks,
- Jorj

Hi Jorj

I had a similar problem with exactly your compiler configuration and I switched back to 1.8.14. The problem went away. Perhaps you could try that?

I don't want to say that there is definitely a problem in 1.8.15, but that was just my experience.

Kind regards, Kevin

···

On 13 Aug 2015, at 6:21 PM, Jorj Pimm <jorjpimm@gmail.com> wrote:

Hello,

I am writing an application which writes large data sets to HDF5 files, in fixed size blocks, using the HDF C++ API (version 1.8.15, patch 1, built in msvc 2013 x64)

I my application seems to quickly consume all the available memory on my system (win32 - around 5.9GB), and then crash whenever the system becomes stressed (windows kills it as it has no memory)

I have also tested the application on a linux machine, where I saw similar results.

I was under the impression that by using HDF5, the file would be brought in and out of memory in such a way that the library would only use a small working set - is this not true?

I have experimented with HDF features such as flushing to disk, regularly closing and re opening, garbage collection and tuning chunking and caching settings and haven't managed to get a stable working set.

I've attached a minimal example, can anyone point out my mistake?

Thanks,
- Jorj
<hdf_test.cpp>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

I have found a "orphaned object snippet" elsewhere on the forum and run
this each time round the loop - this didn't print anything, the only object
was the file itself, which seemed to make sense - I'll try adding some
explicit close's too - just in case.

Thanks,
- Jorj

···

On Thu, 13 Aug 2015 at 17:38 Miller, Mark C. <miller86@llnl.gov> wrote:

Hmm. Well I have no experience with HDF5's C++ interface.

My first thought when reading your description was. . . I've seen that
before. It happens when I forgot to H5Xclose() all the objects I H5Xopened
(groups, datasets, types, dataspaces, etc.).

However, with C++, I presume the interface is designed to close objects
when they fall out of scope (e.g. deconstructor is called). So, in looking
at your code, even though I don't see any explicit calls to close objects
previously opened, I assume that *should* be happening when the objects
fall out of scope. But, are you *certain* that *is* happening? Just before
exiting main, you migth wanna make a call to H5Fget_obj_count() to get some
idea how many objects HDF5 library thinks are still open in the file. If
you get a large number, then that would suggest the problem is that the C++
interface isn't somehow closing objects as they fall out of scope.

Thats all I can think of. Sorry if no help.

Mark

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Jorj
Pimm <jorjpimm@gmail.com>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Date: Thursday, August 13, 2015 9:21 AM
To: "hdf-forum@lists.hdfgroup.org" <hdf-forum@lists.hdfgroup.org>
Subject: [Hdf-forum] Growing memory usage in small HDF program

Hello,

I am writing an application which writes large data sets to HDF5 files, in
fixed size blocks, using the HDF C++ API (version 1.8.15, patch 1, built in
msvc 2013 x64)

I my application seems to quickly consume all the available memory on my
system (win32 - around 5.9GB), and then crash whenever the system becomes
stressed (windows kills it as it has no memory)

I have also tested the application on a linux machine, where I saw similar
results.

I was under the impression that by using HDF5, the file would be brought
in and out of memory in such a way that the library would only use a small
working set - is this not true?

I have experimented with HDF features such as flushing to disk, regularly
closing and re opening, garbage collection and tuning chunking and caching
settings and haven't managed to get a stable working set.

I've attached a minimal example, can anyone point out my mistake?

Thanks,
- Jorj

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Bug found (in C++ api as usual)

The C++ API *should* take care of inc/dec ref appropriately although they
do this in each object class (may be higher in some class hierarchies like
datatypes) but something of a leaf otherwise, rather than through
inheritance of IdComponent. That strategy while working has left a few bugs
I've found / encountered both as leaks and dec'reffing references not
incref'd. As of 1.8.15, all that I was aware of though but this concern
should be warranted all the time based on past-burnings (this would be the
third time noticing something a shared_ptr like class/wrapper around HDF
resources (IdComponent...?) would completely eliminate.

dataset.getSpace() leaks a reference:

   //create dataspace object using the existing id then return the object
   DataSpace data_space; <--default constructor makes a valid hdf dataspace
for H5S_SCALAR
   f_DataSpace_setId(&data_space, dataspace_id); <-- evil line, why didn't
we just use the ctor that takes the id parameter?
   return( data_space );

//--------------------------------------------------------------------------
// Function: f_DataSpace_setId - friend
// Purpose: This function is friend to class H5::DataSpace so that it can
// can set DataSpace::id in order to work around a problem
// described in the JIRA issue HDFFV-7947.
// Applications shouldn't need to use it.
// param dspace - IN/OUT: DataSpace object to be changed
// param new_id - IN: New id to set
// Programmer Binh-Minh Ribler - 2015
//--------------------------------------------------------------------------
void f_DataSpace_setId(DataSpace* dspace, hid_t new_id) <--evil function
that shouldn't exist (as a friend no-less!)
{
    dspace->id = new_id; <-- why not dspace->p_setId(new_id);? Just make
it public already as "reset" and get rid of the friend. Follow shared_ptr
semantics.. and bring all this stuff inside IdComponent.
.
}

-Jason

···

On Thu, Aug 13, 2015 at 9:37 AM, Miller, Mark C. <miller86@llnl.gov> wrote:

Hmm. Well I have no experience with HDF5's C++ interface.

My first thought when reading your description was. . . I've seen that
before. It happens when I forgot to H5Xclose() all the objects I H5Xopened
(groups, datasets, types, dataspaces, etc.).

However, with C++, I presume the interface is designed to close objects
when they fall out of scope (e.g. deconstructor is called). So, in looking
at your code, even though I don't see any explicit calls to close objects
previously opened, I assume that *should* be happening when the objects
fall out of scope. But, are you *certain* that *is* happening? Just before
exiting main, you migth wanna make a call to H5Fget_obj_count() to get some
idea how many objects HDF5 library thinks are still open in the file. If
you get a large number, then that would suggest the problem is that the C++
interface isn't somehow closing objects as they fall out of scope.

Thats all I can think of. Sorry if no help.

Mark

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Jorj
Pimm <jorjpimm@gmail.com>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Date: Thursday, August 13, 2015 9:21 AM
To: "hdf-forum@lists.hdfgroup.org" <hdf-forum@lists.hdfgroup.org>
Subject: [Hdf-forum] Growing memory usage in small HDF program

Hello,

I am writing an application which writes large data sets to HDF5 files, in
fixed size blocks, using the HDF C++ API (version 1.8.15, patch 1, built in
msvc 2013 x64)

I my application seems to quickly consume all the available memory on my
system (win32 - around 5.9GB), and then crash whenever the system becomes
stressed (windows kills it as it has no memory)

I have also tested the application on a linux machine, where I saw similar
results.

I was under the impression that by using HDF5, the file would be brought
in and out of memory in such a way that the library would only use a small
working set - is this not true?

I have experimented with HDF features such as flushing to disk, regularly
closing and re opening, garbage collection and tuning chunking and caching
settings and haven't managed to get a stable working set.

I've attached a minimal example, can anyone point out my mistake?

Thanks,
- Jorj

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Thanks Jason,

I've just tried applying that to my local copy of HDF (using the Id
constructor of DataSpace), I can't say for sure - the memory usage of the
program still rose to 5GB+, but it seemed better...

If I get a chance today I'll rewrite using the C api and see what that
changes.

- Jorj

···

On Fri, 14 Aug 2015 at 03:39 Jason Newton <nevion@gmail.com> wrote:

Bug found (in C++ api as usual)

The C++ API *should* take care of inc/dec ref appropriately although they
do this in each object class (may be higher in some class hierarchies like
datatypes) but something of a leaf otherwise, rather than through
inheritance of IdComponent. That strategy while working has left a few bugs
I've found / encountered both as leaks and dec'reffing references not
incref'd. As of 1.8.15, all that I was aware of though but this concern
should be warranted all the time based on past-burnings (this would be the
third time noticing something a shared_ptr like class/wrapper around HDF
resources (IdComponent...?) would completely eliminate.

dataset.getSpace() leaks a reference:

   //create dataspace object using the existing id then return the object
   DataSpace data_space; <--default constructor makes a valid hdf
dataspace for H5S_SCALAR
   f_DataSpace_setId(&data_space, dataspace_id); <-- evil line, why didn't
we just use the ctor that takes the id parameter?
   return( data_space );

//--------------------------------------------------------------------------
// Function: f_DataSpace_setId - friend
// Purpose: This function is friend to class H5::DataSpace so that it
can
// can set DataSpace::id in order to work around a problem
// described in the JIRA issue HDFFV-7947.
// Applications shouldn't need to use it.
// param dspace - IN/OUT: DataSpace object to be changed
// param new_id - IN: New id to set
// Programmer Binh-Minh Ribler - 2015

//--------------------------------------------------------------------------
void f_DataSpace_setId(DataSpace* dspace, hid_t new_id) <--evil function
that shouldn't exist (as a friend no-less!)
{
    dspace->id = new_id; <-- why not dspace->p_setId(new_id);? Just make
it public already as "reset" and get rid of the friend. Follow shared_ptr
semantics.. and bring all this stuff inside IdComponent.
.
}

-Jason

On Thu, Aug 13, 2015 at 9:37 AM, Miller, Mark C. <miller86@llnl.gov> > wrote:

Hmm. Well I have no experience with HDF5's C++ interface.

My first thought when reading your description was. . . I've seen that
before. It happens when I forgot to H5Xclose() all the objects I H5Xopened
(groups, datasets, types, dataspaces, etc.).

However, with C++, I presume the interface is designed to close objects
when they fall out of scope (e.g. deconstructor is called). So, in looking
at your code, even though I don't see any explicit calls to close objects
previously opened, I assume that *should* be happening when the objects
fall out of scope. But, are you *certain* that *is* happening? Just before
exiting main, you migth wanna make a call to H5Fget_obj_count() to get some
idea how many objects HDF5 library thinks are still open in the file. If
you get a large number, then that would suggest the problem is that the C++
interface isn't somehow closing objects as they fall out of scope.

Thats all I can think of. Sorry if no help.

Mark

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Jorj
Pimm <jorjpimm@gmail.com>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Date: Thursday, August 13, 2015 9:21 AM
To: "hdf-forum@lists.hdfgroup.org" <hdf-forum@lists.hdfgroup.org>
Subject: [Hdf-forum] Growing memory usage in small HDF program

Hello,

I am writing an application which writes large data sets to HDF5 files,
in fixed size blocks, using the HDF C++ API (version 1.8.15, patch 1, built
in msvc 2013 x64)

I my application seems to quickly consume all the available memory on my
system (win32 - around 5.9GB), and then crash whenever the system becomes
stressed (windows kills it as it has no memory)

I have also tested the application on a linux machine, where I saw
similar results.

I was under the impression that by using HDF5, the file would be brought
in and out of memory in such a way that the library would only use a small
working set - is this not true?

I have experimented with HDF features such as flushing to disk, regularly
closing and re opening, garbage collection and tuning chunking and caching
settings and haven't managed to get a stable working set.

I've attached a minimal example, can anyone point out my mistake?

Thanks,
- Jorj

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hello Jason,

···

________________________________
From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Jason Newton <nevion@gmail.com>
Sent: Thursday, August 13, 2015 10:39 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Growing memory usage in small HDF program

Bug found (in C++ api as usual)

Thank you for your efforts in tracking down the problem and your suggestions.

The C++ API *should* take care of inc/dec ref appropriately although they do this in each object class (may be higher in some class hierarchies like datatypes) but something of a leaf otherwise, rather than through inheritance of IdComponent. That strategy while working has left a few bugs I've found / encountered both as leaks and dec'reffing references not incref'd. As of 1.8.15, all that I was aware of though but this concern should be warranted all the time based on past-burnings (this would be the third time noticing something a shared_ptr like class/wrapper around HDF resources (IdComponent...?) would completely eliminate.

dataset.getSpace() leaks a reference:

   //create dataspace object using the existing id then return the object
   DataSpace data_space; <--default constructor makes a valid hdf dataspace for H5S_SCALAR
   f_DataSpace_setId(&data_space, dataspace_id); <-- evil line, why didn't we just use the ctor that takes the id parameter?
   return( data_space );

In 1.8.14, this block of code is like this, before it was changed into using the friend function in 1.8.15.

//create dataspace object using the existing id then return the object
DataSpace data_space(dataspace_id);

return(data_space);

As you can see in the comments you included below, the friend function was a work-around of a problem reported by some other users. In that problem, the id was prematurely closed, due to the behind-the-scene copy-constructor/destructor when an object was returned from a function. In order to fix that problem, the copy constructor and the constructor that takes an existing id of those classes that associate with an HDF5 id needs to increment the ref counter.

However, incrementing ref count left some objects opened at the end of the program, perhaps, due to some compiler's optimization when returning an object to the caller. In these situations, a destructor for a temporary object didn't seem to be invoked, so the id ref of the temporary object was never closed. I could never figure out why. Hence, the work-around was to use p_setId instead, which required the use of the friend function. If anyone has a different suggestion, please let us know.

//--------------------------------------------------------------------------
// Function: f_DataSpace_setId - friend
// Purpose: This function is friend to class H5::DataSpace so that it can
// can set DataSpace::id in order to work around a problem
// described in the JIRA issue HDFFV-7947.
// Applications shouldn't need to use it.
// param dspace - IN/OUT: DataSpace object to be changed
// param new_id - IN: New id to set
// Programmer Binh-Minh Ribler - 2015
//--------------------------------------------------------------------------
void f_DataSpace_setId(DataSpace* dspace, hid_t new_id) <--evil function that shouldn't exist (as a friend no-less!)
{
    dspace->id = new_id; <-- why not dspace->p_setId(new_id);? Just make it public already as "reset" and get rid of the friend. Follow shared_ptr semantics.. and bring all this stuff inside IdComponent.
.
}
The difference between the public "setId" and the private p_setId is that "setId" also increments the ref count and is intended for applications to use on the C++ object id. The private p_setId doesn't increment the id ref count and is not intended for application use. The difference is explained in the function's header.
Thank you,
Binh-Minh

-Jason

On Thu, Aug 13, 2015 at 9:37 AM, Miller, Mark C. <miller86@llnl.gov<mailto:miller86@llnl.gov>> wrote:
Hmm. Well I have no experience with HDF5's C++ interface.

My first thought when reading your description was. . . I've seen that before. It happens when I forgot to H5Xclose() all the objects I H5Xopened (groups, datasets, types, dataspaces, etc.).

However, with C++, I presume the interface is designed to close objects when they fall out of scope (e.g. deconstructor is called). So, in looking at your code, even though I don't see any explicit calls to close objects previously opened, I assume that *should* be happening when the objects fall out of scope. But, are you *certain* that *is* happening? Just before exiting main, you migth wanna make a call to H5Fget_obj_count() to get some idea how many objects HDF5 library thinks are still open in the file. If you get a large number, then that would suggest the problem is that the C++ interface isn't somehow closing objects as they fall out of scope.

Thats all I can think of. Sorry if no help.

Mark

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Jorj Pimm <jorjpimm@gmail.com<mailto:jorjpimm@gmail.com>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Thursday, August 13, 2015 9:21 AM
To: "hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>" <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] Growing memory usage in small HDF program

Hello,

I am writing an application which writes large data sets to HDF5 files, in fixed size blocks, using the HDF C++ API (version 1.8.15, patch 1, built in msvc 2013 x64)

I my application seems to quickly consume all the available memory on my system (win32 - around 5.9GB), and then crash whenever the system becomes stressed (windows kills it as it has no memory)

I have also tested the application on a linux machine, where I saw similar results.

I was under the impression that by using HDF5, the file would be brought in and out of memory in such a way that the library would only use a small working set - is this not true?

I have experimented with HDF features such as flushing to disk, regularly closing and re opening, garbage collection and tuning chunking and caching settings and haven't managed to get a stable working set.

I've attached a minimal example, can anyone point out my mistake?

Thanks,
- Jorj

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

I haven't checked what all changes Binh-Minh has put forth but you should
patch the 2 spots because that friend function f_DataSpace_setId still
leaks a reference as is, no matter(?) where it's called.

-Jason

···

On Thu, Aug 13, 2015 at 11:56 PM, Jorj Pimm <jorjpimm@gmail.com> wrote:

Thanks Jason,

I've just tried applying that to my local copy of HDF (using the Id
constructor of DataSpace), I can't say for sure - the memory usage of the
program still rose to 5GB+, but it seemed better...

If I get a chance today I'll rewrite using the C api and see what that
changes.

- Jorj

On Fri, 14 Aug 2015 at 03:39 Jason Newton <nevion@gmail.com> wrote:

Bug found (in C++ api as usual)

The C++ API *should* take care of inc/dec ref appropriately although they
do this in each object class (may be higher in some class hierarchies like
datatypes) but something of a leaf otherwise, rather than through
inheritance of IdComponent. That strategy while working has left a few bugs
I've found / encountered both as leaks and dec'reffing references not
incref'd. As of 1.8.15, all that I was aware of though but this concern
should be warranted all the time based on past-burnings (this would be the
third time noticing something a shared_ptr like class/wrapper around HDF
resources (IdComponent...?) would completely eliminate.

dataset.getSpace() leaks a reference:

   //create dataspace object using the existing id then return the object
   DataSpace data_space; <--default constructor makes a valid hdf
dataspace for H5S_SCALAR
   f_DataSpace_setId(&data_space, dataspace_id); <-- evil line, why
didn't we just use the ctor that takes the id parameter?
   return( data_space );

//--------------------------------------------------------------------------
// Function: f_DataSpace_setId - friend
// Purpose: This function is friend to class H5::DataSpace so that it
can
// can set DataSpace::id in order to work around a problem
// described in the JIRA issue HDFFV-7947.
// Applications shouldn't need to use it.
// param dspace - IN/OUT: DataSpace object to be changed
// param new_id - IN: New id to set
// Programmer Binh-Minh Ribler - 2015

//--------------------------------------------------------------------------
void f_DataSpace_setId(DataSpace* dspace, hid_t new_id) <--evil function
that shouldn't exist (as a friend no-less!)
{
    dspace->id = new_id; <-- why not dspace->p_setId(new_id);? Just make
it public already as "reset" and get rid of the friend. Follow shared_ptr
semantics.. and bring all this stuff inside IdComponent.
.
}

-Jason

On Thu, Aug 13, 2015 at 9:37 AM, Miller, Mark C. <miller86@llnl.gov> >> wrote:

Hmm. Well I have no experience with HDF5's C++ interface.

My first thought when reading your description was. . . I've seen that
before. It happens when I forgot to H5Xclose() all the objects I H5Xopened
(groups, datasets, types, dataspaces, etc.).

However, with C++, I presume the interface is designed to close objects
when they fall out of scope (e.g. deconstructor is called). So, in looking
at your code, even though I don't see any explicit calls to close objects
previously opened, I assume that *should* be happening when the objects
fall out of scope. But, are you *certain* that *is* happening? Just before
exiting main, you migth wanna make a call to H5Fget_obj_count() to get some
idea how many objects HDF5 library thinks are still open in the file. If
you get a large number, then that would suggest the problem is that the C++
interface isn't somehow closing objects as they fall out of scope.

Thats all I can think of. Sorry if no help.

Mark

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of
Jorj Pimm <jorjpimm@gmail.com>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Date: Thursday, August 13, 2015 9:21 AM
To: "hdf-forum@lists.hdfgroup.org" <hdf-forum@lists.hdfgroup.org>
Subject: [Hdf-forum] Growing memory usage in small HDF program

Hello,

I am writing an application which writes large data sets to HDF5 files,
in fixed size blocks, using the HDF C++ API (version 1.8.15, patch 1, built
in msvc 2013 x64)

I my application seems to quickly consume all the available memory on my
system (win32 - around 5.9GB), and then crash whenever the system becomes
stressed (windows kills it as it has no memory)

I have also tested the application on a linux machine, where I saw
similar results.

I was under the impression that by using HDF5, the file would be brought
in and out of memory in such a way that the library would only use a small
working set - is this not true?

I have experimented with HDF features such as flushing to disk,
regularly closing and re opening, garbage collection and tuning chunking
and caching settings and haven't managed to get a stable working set.

I've attached a minimal example, can anyone point out my mistake?

Thanks,
- Jorj

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hmm, I did make a mistake in how p_setID works (since it decref's but does
not incref it's new reference) - and I figured setId wasn't defined
provided the naming.

I've never seen shared_ptr or OpenCL's C++ wrapper (which is almost a
mirror image in terms of the library complexity they map) foul up or leak
references in any cases with the same fundamental operations. It's unclear
to me why this library can't do the same. The amount of code dedicated to
those purposes in those libraries is much less than what's going on here
too...

-Jason

···

On Fri, Aug 14, 2015 at 12:59 AM, Binh-Minh Ribler <bmribler@hdfgroup.org> wrote:

Hello Jason,

------------------------------

*From:* Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of
Jason Newton <nevion@gmail.com>
*Sent:* Thursday, August 13, 2015 10:39 PM
*To:* HDF Users Discussion List
*Subject:* Re: [Hdf-forum] Growing memory usage in small HDF program

Bug found (in C++ api as usual)

Thank you for your efforts in tracking down the problem and your
suggestions.

The C++ API *should* take care of inc/dec ref appropriately although they
do this in each object class (may be higher in some class hierarchies like
datatypes) but something of a leaf otherwise, rather than through
inheritance of IdComponent. That strategy while working has left a few bugs
I've found / encountered both as leaks and dec'reffing references not
incref'd. As of 1.8.15, all that I was aware of though but this concern
should be warranted all the time based on past-burnings (this would be the
third time noticing something a shared_ptr like class/wrapper around HDF
resources (IdComponent...?) would completely eliminate.

dataset.getSpace() leaks a reference:

   //create dataspace object using the existing id then return the object
   DataSpace data_space; <--default constructor makes a valid hdf
dataspace for H5S_SCALAR
   f_DataSpace_setId(&data_space, dataspace_id); <-- evil line, why didn't
we just use the ctor that takes the id parameter?
   return( data_space );

In 1.8.14, this block of code is like this, before it was changed into
using the friend function in 1.8.15.

//create dataspace object using the existing id then return the object
DataSpace data_space(dataspace_id);
return(data_space);

As you can see in the comments you included below, the friend function was
a work-around of a problem reported by some other users. In that problem,
the id was prematurely closed, due to the behind-the-scene
copy-constructor/destructor when an object was returned from a function.
In order to fix that problem, the copy constructor and the constructor that
takes an existing id of those classes that associate with an HDF5 id needs
to increment the ref counter.

However, incrementing ref count left some objects opened at the end of
the program, perhaps, due to some compiler's optimization when returning an
object to the caller. In these situations, a destructor for a temporary
object didn't seem to be invoked, so the id ref of the temporary object was
never closed. I could never figure out why. Hence, the work-around was to
use p_setId instead, which required the use of the friend function. If
anyone has a different suggestion, please let us know.

//--------------------------------------------------------------------------
// Function: f_DataSpace_setId - friend
// Purpose: This function is friend to class H5::DataSpace so that it
can
// can set DataSpace::id in order to work around a problem
// described in the JIRA issue HDFFV-7947.
// Applications shouldn't need to use it.
// param dspace - IN/OUT: DataSpace object to be changed
// param new_id - IN: New id to set
// Programmer Binh-Minh Ribler - 2015

//--------------------------------------------------------------------------
void f_DataSpace_setId(DataSpace* dspace, hid_t new_id) <--evil function
that shouldn't exist (as a friend no-less!)
{
    dspace->id = new_id; <-- why not dspace->p_setId(new_id);? Just make
it public already as "reset" and get rid of the friend. Follow shared_ptr
semantics.. and bring all this stuff inside IdComponent.
.
}

The difference between the public "setId" and the private p_setId is that
"setId" also increments the ref count and is intended for applications to
use on the C++ object id. The private p_setId doesn't increment the id ref
count and is not intended for application use. The difference is
explained in the function's header.
Thank you,
Binh-Minh

-Jason

On Thu, Aug 13, 2015 at 9:37 AM, Miller, Mark C. <miller86@llnl.gov> > wrote:

Hmm. Well I have no experience with HDF5's C++ interface.

My first thought when reading your description was. . . I've seen that
before. It happens when I forgot to H5Xclose() all the objects I H5Xopened
(groups, datasets, types, dataspaces, etc.).

However, with C++, I presume the interface is designed to close objects
when they fall out of scope (e.g. deconstructor is called). So, in looking
at your code, even though I don't see any explicit calls to close objects
previously opened, I assume that *should* be happening when the objects
fall out of scope. But, are you *certain* that *is* happening? Just before
exiting main, you migth wanna make a call to H5Fget_obj_count() to get some
idea how many objects HDF5 library thinks are still open in the file. If
you get a large number, then that would suggest the problem is that the C++
interface isn't somehow closing objects as they fall out of scope.

Thats all I can think of. Sorry if no help.

Mark

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Jorj
Pimm <jorjpimm@gmail.com>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Date: Thursday, August 13, 2015 9:21 AM
To: "hdf-forum@lists.hdfgroup.org" <hdf-forum@lists.hdfgroup.org>
Subject: [Hdf-forum] Growing memory usage in small HDF program

Hello,

I am writing an application which writes large data sets to HDF5 files,
in fixed size blocks, using the HDF C++ API (version 1.8.15, patch 1, built
in msvc 2013 x64)

I my application seems to quickly consume all the available memory on my
system (win32 - around 5.9GB), and then crash whenever the system becomes
stressed (windows kills it as it has no memory)

I have also tested the application on a linux machine, where I saw
similar results.

I was under the impression that by using HDF5, the file would be brought
in and out of memory in such a way that the library would only use a small
working set - is this not true?

I have experimented with HDF features such as flushing to disk, regularly
closing and re opening, garbage collection and tuning chunking and caching
settings and haven't managed to get a stable working set.

I've attached a minimal example, can anyone point out my mistake?

Thanks,
- Jorj

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi Jorj,

If you could rebuild the library, please try these two changed files in the directory hdf5/c++/src and see if that help.

Thanks,

Binh-Minh

H5Location.cpp (39.8 KB)

H5CommonFG.cpp (51.9 KB)

···

________________________________
From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Jorj Pimm <jorjpimm@gmail.com>
Sent: Friday, August 14, 2015 2:56 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Growing memory usage in small HDF program

Thanks Jason,

I've just tried applying that to my local copy of HDF (using the Id constructor of DataSpace), I can't say for sure - the memory usage of the program still rose to 5GB+, but it seemed better...

If I get a chance today I'll rewrite using the C api and see what that changes.

- Jorj

On Fri, 14 Aug 2015 at 03:39 Jason Newton <nevion@gmail.com<mailto:nevion@gmail.com>> wrote:
Bug found (in C++ api as usual)

The C++ API *should* take care of inc/dec ref appropriately although they do this in each object class (may be higher in some class hierarchies like datatypes) but something of a leaf otherwise, rather than through inheritance of IdComponent. That strategy while working has left a few bugs I've found / encountered both as leaks and dec'reffing references not incref'd. As of 1.8.15, all that I was aware of though but this concern should be warranted all the time based on past-burnings (this would be the third time noticing something a shared_ptr like class/wrapper around HDF resources (IdComponent...?) would completely eliminate.

dataset.getSpace() leaks a reference:

   //create dataspace object using the existing id then return the object
   DataSpace data_space; <--default constructor makes a valid hdf dataspace for H5S_SCALAR
   f_DataSpace_setId(&data_space, dataspace_id); <-- evil line, why didn't we just use the ctor that takes the id parameter?
   return( data_space );

//--------------------------------------------------------------------------
// Function: f_DataSpace_setId - friend
// Purpose: This function is friend to class H5::DataSpace so that it can
// can set DataSpace::id in order to work around a problem
// described in the JIRA issue HDFFV-7947.
// Applications shouldn't need to use it.
// param dspace - IN/OUT: DataSpace object to be changed
// param new_id - IN: New id to set
// Programmer Binh-Minh Ribler - 2015
//--------------------------------------------------------------------------
void f_DataSpace_setId(DataSpace* dspace, hid_t new_id) <--evil function that shouldn't exist (as a friend no-less!)
{
    dspace->id = new_id; <-- why not dspace->p_setId(new_id);? Just make it public already as "reset" and get rid of the friend. Follow shared_ptr semantics.. and bring all this stuff inside IdComponent.
.
}

-Jason

On Thu, Aug 13, 2015 at 9:37 AM, Miller, Mark C. <miller86@llnl.gov<mailto:miller86@llnl.gov>> wrote:
Hmm. Well I have no experience with HDF5's C++ interface.

My first thought when reading your description was. . . I've seen that before. It happens when I forgot to H5Xclose() all the objects I H5Xopened (groups, datasets, types, dataspaces, etc.).

However, with C++, I presume the interface is designed to close objects when they fall out of scope (e.g. deconstructor is called). So, in looking at your code, even though I don't see any explicit calls to close objects previously opened, I assume that *should* be happening when the objects fall out of scope. But, are you *certain* that *is* happening? Just before exiting main, you migth wanna make a call to H5Fget_obj_count() to get some idea how many objects HDF5 library thinks are still open in the file. If you get a large number, then that would suggest the problem is that the C++ interface isn't somehow closing objects as they fall out of scope.

Thats all I can think of. Sorry if no help.

Mark

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Jorj Pimm <jorjpimm@gmail.com<mailto:jorjpimm@gmail.com>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Thursday, August 13, 2015 9:21 AM
To: "hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>" <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] Growing memory usage in small HDF program

Hello,

I am writing an application which writes large data sets to HDF5 files, in fixed size blocks, using the HDF C++ API (version 1.8.15, patch 1, built in msvc 2013 x64)

I my application seems to quickly consume all the available memory on my system (win32 - around 5.9GB), and then crash whenever the system becomes stressed (windows kills it as it has no memory)

I have also tested the application on a linux machine, where I saw similar results.

I was under the impression that by using HDF5, the file would be brought in and out of memory in such a way that the library would only use a small working set - is this not true?

I have experimented with HDF features such as flushing to disk, regularly closing and re opening, garbage collection and tuning chunking and caching settings and haven't managed to get a stable working set.

I've attached a minimal example, can anyone point out my mistake?

Thanks,
- Jorj

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

I've applied Binh-Minh's files to my local copy of HDF and this does
significantly reduce memory usage for my example.

It now uses under 100MB of memory, which seems pretty reasonable, I'll
continue testing against these changes.

Thanks all for the help,
- George

···

On Fri, 14 Aug 2015 at 09:10 Jason Newton <nevion@gmail.com> wrote:

Hmm, I did make a mistake in how p_setID works (since it decref's but does
not incref it's new reference) - and I figured setId wasn't defined
provided the naming.

I've never seen shared_ptr or OpenCL's C++ wrapper (which is almost a
mirror image in terms of the library complexity they map) foul up or leak
references in any cases with the same fundamental operations. It's unclear
to me why this library can't do the same. The amount of code dedicated to
those purposes in those libraries is much less than what's going on here
too...

-Jason

On Fri, Aug 14, 2015 at 12:59 AM, Binh-Minh Ribler <bmribler@hdfgroup.org> > wrote:

Hello Jason,

------------------------------

*From:* Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of
Jason Newton <nevion@gmail.com>
*Sent:* Thursday, August 13, 2015 10:39 PM
*To:* HDF Users Discussion List
*Subject:* Re: [Hdf-forum] Growing memory usage in small HDF program

Bug found (in C++ api as usual)

Thank you for your efforts in tracking down the problem and your
suggestions.

The C++ API *should* take care of inc/dec ref appropriately although they
do this in each object class (may be higher in some class hierarchies like
datatypes) but something of a leaf otherwise, rather than through
inheritance of IdComponent. That strategy while working has left a few bugs
I've found / encountered both as leaks and dec'reffing references not
incref'd. As of 1.8.15, all that I was aware of though but this concern
should be warranted all the time based on past-burnings (this would be the
third time noticing something a shared_ptr like class/wrapper around HDF
resources (IdComponent...?) would completely eliminate.

dataset.getSpace() leaks a reference:

   //create dataspace object using the existing id then return the object
   DataSpace data_space; <--default constructor makes a valid hdf
dataspace for H5S_SCALAR
   f_DataSpace_setId(&data_space, dataspace_id); <-- evil line, why
didn't we just use the ctor that takes the id parameter?
   return( data_space );

In 1.8.14, this block of code is like this, before it was changed into
using the friend function in 1.8.15.

//create dataspace object using the existing id then return the object
DataSpace data_space(dataspace_id);
return(data_space);

As you can see in the comments you included below, the friend function
was a work-around of a problem reported by some other users. In that
problem, the id was prematurely closed, due to the behind-the-scene
copy-constructor/destructor when an object was returned from a function.
In order to fix that problem, the copy constructor and the constructor that
takes an existing id of those classes that associate with an HDF5 id needs
to increment the ref counter.

However, incrementing ref count left some objects opened at the end of
the program, perhaps, due to some compiler's optimization when returning an
object to the caller. In these situations, a destructor for a temporary
object didn't seem to be invoked, so the id ref of the temporary object was
never closed. I could never figure out why. Hence, the work-around was to
use p_setId instead, which required the use of the friend function. If
anyone has a different suggestion, please let us know.

//--------------------------------------------------------------------------
// Function: f_DataSpace_setId - friend
// Purpose: This function is friend to class H5::DataSpace so that it
can
// can set DataSpace::id in order to work around a problem
// described in the JIRA issue HDFFV-7947.
// Applications shouldn't need to use it.
// param dspace - IN/OUT: DataSpace object to be changed
// param new_id - IN: New id to set
// Programmer Binh-Minh Ribler - 2015

//--------------------------------------------------------------------------
void f_DataSpace_setId(DataSpace* dspace, hid_t new_id) <--evil function
that shouldn't exist (as a friend no-less!)
{
    dspace->id = new_id; <-- why not dspace->p_setId(new_id);? Just make
it public already as "reset" and get rid of the friend. Follow shared_ptr
semantics.. and bring all this stuff inside IdComponent.
.
}

The difference between the public "setId" and the private p_setId is that
"setId" also increments the ref count and is intended for applications to
use on the C++ object id. The private p_setId doesn't increment the id ref
count and is not intended for application use. The difference is
explained in the function's header.
Thank you,
Binh-Minh

-Jason

On Thu, Aug 13, 2015 at 9:37 AM, Miller, Mark C. <miller86@llnl.gov> >> wrote:

Hmm. Well I have no experience with HDF5's C++ interface.

My first thought when reading your description was. . . I've seen that
before. It happens when I forgot to H5Xclose() all the objects I H5Xopened
(groups, datasets, types, dataspaces, etc.).

However, with C++, I presume the interface is designed to close objects
when they fall out of scope (e.g. deconstructor is called). So, in looking
at your code, even though I don't see any explicit calls to close objects
previously opened, I assume that *should* be happening when the objects
fall out of scope. But, are you *certain* that *is* happening? Just before
exiting main, you migth wanna make a call to H5Fget_obj_count() to get some
idea how many objects HDF5 library thinks are still open in the file. If
you get a large number, then that would suggest the problem is that the C++
interface isn't somehow closing objects as they fall out of scope.

Thats all I can think of. Sorry if no help.

Mark

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of
Jorj Pimm <jorjpimm@gmail.com>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Date: Thursday, August 13, 2015 9:21 AM
To: "hdf-forum@lists.hdfgroup.org" <hdf-forum@lists.hdfgroup.org>
Subject: [Hdf-forum] Growing memory usage in small HDF program

Hello,

I am writing an application which writes large data sets to HDF5 files,
in fixed size blocks, using the HDF C++ API (version 1.8.15, patch 1, built
in msvc 2013 x64)

I my application seems to quickly consume all the available memory on my
system (win32 - around 5.9GB), and then crash whenever the system becomes
stressed (windows kills it as it has no memory)

I have also tested the application on a linux machine, where I saw
similar results.

I was under the impression that by using HDF5, the file would be brought
in and out of memory in such a way that the library would only use a small
working set - is this not true?

I have experimented with HDF features such as flushing to disk,
regularly closing and re opening, garbage collection and tuning chunking
and caching settings and haven't managed to get a stable working set.

I've attached a minimal example, can anyone point out my mistake?

Thanks,
- Jorj

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

That's good. Thank you for applying the files and letting us know, George!

Binh-Minh

···

________________________________
From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Jorj Pimm <jorjpimm@gmail.com>
Sent: Friday, August 14, 2015 4:47 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Growing memory usage in small HDF program

I've applied Binh-Minh's files to my local copy of HDF and this does significantly reduce memory usage for my example.

It now uses under 100MB of memory, which seems pretty reasonable, I'll continue testing against these changes.

Thanks all for the help,
- George

On Fri, 14 Aug 2015 at 09:10 Jason Newton <nevion@gmail.com<mailto:nevion@gmail.com>> wrote:
Hmm, I did make a mistake in how p_setID works (since it decref's but does not incref it's new reference) - and I figured setId wasn't defined provided the naming.

I've never seen shared_ptr or OpenCL's C++ wrapper (which is almost a mirror image in terms of the library complexity they map) foul up or leak references in any cases with the same fundamental operations. It's unclear to me why this library can't do the same. The amount of code dedicated to those purposes in those libraries is much less than what's going on here too...

-Jason

On Fri, Aug 14, 2015 at 12:59 AM, Binh-Minh Ribler <bmribler@hdfgroup.org<mailto:bmribler@hdfgroup.org>> wrote:

Hello Jason,

________________________________
From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Jason Newton <nevion@gmail.com<mailto:nevion@gmail.com>>
Sent: Thursday, August 13, 2015 10:39 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Growing memory usage in small HDF program

Bug found (in C++ api as usual)

Thank you for your efforts in tracking down the problem and your suggestions.

The C++ API *should* take care of inc/dec ref appropriately although they do this in each object class (may be higher in some class hierarchies like datatypes) but something of a leaf otherwise, rather than through inheritance of IdComponent. That strategy while working has left a few bugs I've found / encountered both as leaks and dec'reffing references not incref'd. As of 1.8.15, all that I was aware of though but this concern should be warranted all the time based on past-burnings (this would be the third time noticing something a shared_ptr like class/wrapper around HDF resources (IdComponent...?) would completely eliminate.

dataset.getSpace() leaks a reference:

   //create dataspace object using the existing id then return the object
   DataSpace data_space; <--default constructor makes a valid hdf dataspace for H5S_SCALAR
   f_DataSpace_setId(&data_space, dataspace_id); <-- evil line, why didn't we just use the ctor that takes the id parameter?
   return( data_space );

In 1.8.14, this block of code is like this, before it was changed into using the friend function in 1.8.15.

//create dataspace object using the existing id then return the object
DataSpace data_space(dataspace_id);

return(data_space);

As you can see in the comments you included below, the friend function was a work-around of a problem reported by some other users. In that problem, the id was prematurely closed, due to the behind-the-scene copy-constructor/destructor when an object was returned from a function. In order to fix that problem, the copy constructor and the constructor that takes an existing id of those classes that associate with an HDF5 id needs to increment the ref counter.

However, incrementing ref count left some objects opened at the end of the program, perhaps, due to some compiler's optimization when returning an object to the caller. In these situations, a destructor for a temporary object didn't seem to be invoked, so the id ref of the temporary object was never closed. I could never figure out why. Hence, the work-around was to use p_setId instead, which required the use of the friend function. If anyone has a different suggestion, please let us know.

//--------------------------------------------------------------------------
// Function: f_DataSpace_setId - friend
// Purpose: This function is friend to class H5::DataSpace so that it can
// can set DataSpace::id in order to work around a problem
// described in the JIRA issue HDFFV-7947.
// Applications shouldn't need to use it.
// param dspace - IN/OUT: DataSpace object to be changed
// param new_id - IN: New id to set
// Programmer Binh-Minh Ribler - 2015
//--------------------------------------------------------------------------
void f_DataSpace_setId(DataSpace* dspace, hid_t new_id) <--evil function that shouldn't exist (as a friend no-less!)
{
    dspace->id = new_id; <-- why not dspace->p_setId(new_id);? Just make it public already as "reset" and get rid of the friend. Follow shared_ptr semantics.. and bring all this stuff inside IdComponent.
.
}
The difference between the public "setId" and the private p_setId is that "setId" also increments the ref count and is intended for applications to use on the C++ object id. The private p_setId doesn't increment the id ref count and is not intended for application use. The difference is explained in the function's header.
Thank you,
Binh-Minh

-Jason

On Thu, Aug 13, 2015 at 9:37 AM, Miller, Mark C. <miller86@llnl.gov<mailto:miller86@llnl.gov>> wrote:
Hmm. Well I have no experience with HDF5's C++ interface.

My first thought when reading your description was. . . I've seen that before. It happens when I forgot to H5Xclose() all the objects I H5Xopened (groups, datasets, types, dataspaces, etc.).

However, with C++, I presume the interface is designed to close objects when they fall out of scope (e.g. deconstructor is called). So, in looking at your code, even though I don't see any explicit calls to close objects previously opened, I assume that *should* be happening when the objects fall out of scope. But, are you *certain* that *is* happening? Just before exiting main, you migth wanna make a call to H5Fget_obj_count() to get some idea how many objects HDF5 library thinks are still open in the file. If you get a large number, then that would suggest the problem is that the C++ interface isn't somehow closing objects as they fall out of scope.

Thats all I can think of. Sorry if no help.

Mark

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Jorj Pimm <jorjpimm@gmail.com<mailto:jorjpimm@gmail.com>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Thursday, August 13, 2015 9:21 AM
To: "hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>" <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] Growing memory usage in small HDF program

Hello,

I am writing an application which writes large data sets to HDF5 files, in fixed size blocks, using the HDF C++ API (version 1.8.15, patch 1, built in msvc 2013 x64)

I my application seems to quickly consume all the available memory on my system (win32 - around 5.9GB), and then crash whenever the system becomes stressed (windows kills it as it has no memory)

I have also tested the application on a linux machine, where I saw similar results.

I was under the impression that by using HDF5, the file would be brought in and out of memory in such a way that the library would only use a small working set - is this not true?

I have experimented with HDF features such as flushing to disk, regularly closing and re opening, garbage collection and tuning chunking and caching settings and haven't managed to get a stable working set.

I've attached a minimal example, can anyone point out my mistake?

Thanks,
- Jorj

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hmm. I just wanted to ask THG guys a quick follow-up question here.

I didn't follow this whole thread but was this growth due to the C++ interface failing to close or dec-ref some objects?

If so, why didn't H5Oget_obj_count help to deduce that? My understanding is that Jorj tried that but it yielded no indication of an object handle leak. Is there a bug there?

Mark

···

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Binh-Minh Ribler <bmribler@hdfgroup.org<mailto:bmribler@hdfgroup.org>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Friday, August 14, 2015 10:03 AM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: Re: [Hdf-forum] Growing memory usage in small HDF program

That's good. Thank you for applying the files and letting us know, George!

Binh-Minh

________________________________
From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Jorj Pimm <jorjpimm@gmail.com<mailto:jorjpimm@gmail.com>>
Sent: Friday, August 14, 2015 4:47 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Growing memory usage in small HDF program

I've applied Binh-Minh's files to my local copy of HDF and this does significantly reduce memory usage for my example.

It now uses under 100MB of memory, which seems pretty reasonable, I'll continue testing against these changes.

Thanks all for the help,
- George

On Fri, 14 Aug 2015 at 09:10 Jason Newton <nevion@gmail.com<mailto:nevion@gmail.com>> wrote:
Hmm, I did make a mistake in how p_setID works (since it decref's but does not incref it's new reference) - and I figured setId wasn't defined provided the naming.

I've never seen shared_ptr or OpenCL's C++ wrapper (which is almost a mirror image in terms of the library complexity they map) foul up or leak references in any cases with the same fundamental operations. It's unclear to me why this library can't do the same. The amount of code dedicated to those purposes in those libraries is much less than what's going on here too...

-Jason

On Fri, Aug 14, 2015 at 12:59 AM, Binh-Minh Ribler <bmribler@hdfgroup.org<mailto:bmribler@hdfgroup.org>> wrote:

Hello Jason,

________________________________
From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Jason Newton <nevion@gmail.com<mailto:nevion@gmail.com>>
Sent: Thursday, August 13, 2015 10:39 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Growing memory usage in small HDF program

Bug found (in C++ api as usual)

Thank you for your efforts in tracking down the problem and your suggestions.

The C++ API *should* take care of inc/dec ref appropriately although they do this in each object class (may be higher in some class hierarchies like datatypes) but something of a leaf otherwise, rather than through inheritance of IdComponent. That strategy while working has left a few bugs I've found / encountered both as leaks and dec'reffing references not incref'd. As of 1.8.15, all that I was aware of though but this concern should be warranted all the time based on past-burnings (this would be the third time noticing something a shared_ptr like class/wrapper around HDF resources (IdComponent...?) would completely eliminate.

dataset.getSpace() leaks a reference:

   //create dataspace object using the existing id then return the object
   DataSpace data_space; <--default constructor makes a valid hdf dataspace for H5S_SCALAR
   f_DataSpace_setId(&data_space, dataspace_id); <-- evil line, why didn't we just use the ctor that takes the id parameter?
   return( data_space );

In 1.8.14, this block of code is like this, before it was changed into using the friend function in 1.8.15.

//create dataspace object using the existing id then return the object
DataSpace data_space(dataspace_id);

return(data_space);

As you can see in the comments you included below, the friend function was a work-around of a problem reported by some other users. In that problem, the id was prematurely closed, due to the behind-the-scene copy-constructor/destructor when an object was returned from a function. In order to fix that problem, the copy constructor and the constructor that takes an existing id of those classes that associate with an HDF5 id needs to increment the ref counter.

However, incrementing ref count left some objects opened at the end of the program, perhaps, due to some compiler's optimization when returning an object to the caller. In these situations, a destructor for a temporary object didn't seem to be invoked, so the id ref of the temporary object was never closed. I could never figure out why. Hence, the work-around was to use p_setId instead, which required the use of the friend function. If anyone has a different suggestion, please let us know.

//--------------------------------------------------------------------------
// Function: f_DataSpace_setId - friend
// Purpose: This function is friend to class H5::DataSpace so that it can
// can set DataSpace::id in order to work around a problem
// described in the JIRA issue HDFFV-7947.
// Applications shouldn't need to use it.
// param dspace - IN/OUT: DataSpace object to be changed
// param new_id - IN: New id to set
// Programmer Binh-Minh Ribler - 2015
//--------------------------------------------------------------------------
void f_DataSpace_setId(DataSpace* dspace, hid_t new_id) <--evil function that shouldn't exist (as a friend no-less!)
{
    dspace->id = new_id; <-- why not dspace->p_setId(new_id);? Just make it public already as "reset" and get rid of the friend. Follow shared_ptr semantics.. and bring all this stuff inside IdComponent.
.
}
The difference between the public "setId" and the private p_setId is that "setId" also increments the ref count and is intended for applications to use on the C++ object id. The private p_setId doesn't increment the id ref count and is not intended for application use. The difference is explained in the function's header.
Thank you,
Binh-Minh

-Jason

On Thu, Aug 13, 2015 at 9:37 AM, Miller, Mark C. <miller86@llnl.gov<mailto:miller86@llnl.gov>> wrote:
Hmm. Well I have no experience with HDF5's C++ interface.

My first thought when reading your description was. . . I've seen that before. It happens when I forgot to H5Xclose() all the objects I H5Xopened (groups, datasets, types, dataspaces, etc.).

However, with C++, I presume the interface is designed to close objects when they fall out of scope (e.g. deconstructor is called). So, in looking at your code, even though I don't see any explicit calls to close objects previously opened, I assume that *should* be happening when the objects fall out of scope. But, are you *certain* that *is* happening? Just before exiting main, you migth wanna make a call to H5Fget_obj_count() to get some idea how many objects HDF5 library thinks are still open in the file. If you get a large number, then that would suggest the problem is that the C++ interface isn't somehow closing objects as they fall out of scope.

Thats all I can think of. Sorry if no help.

Mark

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Jorj Pimm <jorjpimm@gmail.com<mailto:jorjpimm@gmail.com>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Thursday, August 13, 2015 9:21 AM
To: "hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>" <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] Growing memory usage in small HDF program

Hello,

I am writing an application which writes large data sets to HDF5 files, in fixed size blocks, using the HDF C++ API (version 1.8.15, patch 1, built in msvc 2013 x64)

I my application seems to quickly consume all the available memory on my system (win32 - around 5.9GB), and then crash whenever the system becomes stressed (windows kills it as it has no memory)

I have also tested the application on a linux machine, where I saw similar results.

I was under the impression that by using HDF5, the file would be brought in and out of memory in such a way that the library would only use a small working set - is this not true?

I have experimented with HDF features such as flushing to disk, regularly closing and re opening, garbage collection and tuning chunking and caching settings and haven't managed to get a stable working set.

I've attached a minimal example, can anyone point out my mistake?

Thanks,
- Jorj

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

I used the snippet in this thread to debug the open objects - but I could
have used it wrong?

http://hdf-forum.184993.n3.nabble.com/Repeated-H5Dwrite-calls-increase-memory-usage-td4026367.html#a4026375

- Jorj

···

On Fri, 14 Aug 2015 at 18:52 Miller, Mark C. <miller86@llnl.gov> wrote:

Hmm. I just wanted to ask THG guys a quick follow-up question here.

I didn't follow this whole thread but was this growth due to the C++
interface failing to close or dec-ref some objects?

If so, why didn't H5Oget_obj_count help to deduce that? My understanding
is that Jorj tried that but it yielded no indication of an object handle
leak. Is there a bug there?

Mark

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of
Binh-Minh Ribler <bmribler@hdfgroup.org>

Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Date: Friday, August 14, 2015 10:03 AM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>

Subject: Re: [Hdf-forum] Growing memory usage in small HDF program

That's good. Thank you for applying the files and letting us know, George!

Binh-Minh

------------------------------
*From:* Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of
Jorj Pimm <jorjpimm@gmail.com>
*Sent:* Friday, August 14, 2015 4:47 AM
*To:* HDF Users Discussion List
*Subject:* Re: [Hdf-forum] Growing memory usage in small HDF program

I've applied Binh-Minh's files to my local copy of HDF and this does
significantly reduce memory usage for my example.

It now uses under 100MB of memory, which seems pretty reasonable, I'll
continue testing against these changes.

Thanks all for the help,
- George

On Fri, 14 Aug 2015 at 09:10 Jason Newton <nevion@gmail.com> wrote:

Hmm, I did make a mistake in how p_setID works (since it decref's but
does not incref it's new reference) - and I figured setId wasn't defined
provided the naming.

I've never seen shared_ptr or OpenCL's C++ wrapper (which is almost a
mirror image in terms of the library complexity they map) foul up or leak
references in any cases with the same fundamental operations. It's unclear
to me why this library can't do the same. The amount of code dedicated to
those purposes in those libraries is much less than what's going on here
too...

-Jason

On Fri, Aug 14, 2015 at 12:59 AM, Binh-Minh Ribler <bmribler@hdfgroup.org >> > wrote:

Hello Jason,

------------------------------

*From:* Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of
Jason Newton <nevion@gmail.com>
*Sent:* Thursday, August 13, 2015 10:39 PM
*To:* HDF Users Discussion List
*Subject:* Re: [Hdf-forum] Growing memory usage in small HDF program

Bug found (in C++ api as usual)

Thank you for your efforts in tracking down the problem and your
suggestions.

The C++ API *should* take care of inc/dec ref appropriately although
they do this in each object class (may be higher in some class hierarchies
like datatypes) but something of a leaf otherwise, rather than through
inheritance of IdComponent. That strategy while working has left a few bugs
I've found / encountered both as leaks and dec'reffing references not
incref'd. As of 1.8.15, all that I was aware of though but this concern
should be warranted all the time based on past-burnings (this would be the
third time noticing something a shared_ptr like class/wrapper around HDF
resources (IdComponent...?) would completely eliminate.

dataset.getSpace() leaks a reference:

   //create dataspace object using the existing id then return the object
   DataSpace data_space; <--default constructor makes a valid hdf
dataspace for H5S_SCALAR
   f_DataSpace_setId(&data_space, dataspace_id); <-- evil line, why
didn't we just use the ctor that takes the id parameter?
   return( data_space );

In 1.8.14, this block of code is like this, before it was changed into
using the friend function in 1.8.15.

//create dataspace object using the existing id then return the object
DataSpace data_space(dataspace_id);
return(data_space);

As you can see in the comments you included below, the friend function
was a work-around of a problem reported by some other users. In that
problem, the id was prematurely closed, due to the behind-the-scene
copy-constructor/destructor when an object was returned from a function.
In order to fix that problem, the copy constructor and the constructor that
takes an existing id of those classes that associate with an HDF5 id needs
to increment the ref counter.

However, incrementing ref count left some objects opened at the end of
the program, perhaps, due to some compiler's optimization when returning an
object to the caller. In these situations, a destructor for a temporary
object didn't seem to be invoked, so the id ref of the temporary object was
never closed. I could never figure out why. Hence, the work-around was to
use p_setId instead, which required the use of the friend function. If
anyone has a different suggestion, please let us know.

//--------------------------------------------------------------------------
// Function: f_DataSpace_setId - friend
// Purpose: This function is friend to class H5::DataSpace so that it
can
// can set DataSpace::id in order to work around a problem
// described in the JIRA issue HDFFV-7947.
// Applications shouldn't need to use it.
// param dspace - IN/OUT: DataSpace object to be changed
// param new_id - IN: New id to set
// Programmer Binh-Minh Ribler - 2015

//--------------------------------------------------------------------------
void f_DataSpace_setId(DataSpace* dspace, hid_t new_id) <--evil function
that shouldn't exist (as a friend no-less!)
{
    dspace->id = new_id; <-- why not dspace->p_setId(new_id);? Just
make it public already as "reset" and get rid of the friend. Follow
shared_ptr semantics.. and bring all this stuff inside IdComponent.
.
}

The difference between the public "setId" and the private p_setId is
that "setId" also increments the ref count and is intended for applications
to use on the C++ object id. The private p_setId doesn't increment the id
ref count and is not intended for application use. The difference is
explained in the function's header.
Thank you,
Binh-Minh

-Jason

On Thu, Aug 13, 2015 at 9:37 AM, Miller, Mark C. <miller86@llnl.gov> >>> wrote:

Hmm. Well I have no experience with HDF5's C++ interface.

My first thought when reading your description was. . . I've seen that
before. It happens when I forgot to H5Xclose() all the objects I H5Xopened
(groups, datasets, types, dataspaces, etc.).

However, with C++, I presume the interface is designed to close objects
when they fall out of scope (e.g. deconstructor is called). So, in looking
at your code, even though I don't see any explicit calls to close objects
previously opened, I assume that *should* be happening when the objects
fall out of scope. But, are you *certain* that *is* happening? Just before
exiting main, you migth wanna make a call to H5Fget_obj_count() to get some
idea how many objects HDF5 library thinks are still open in the file. If
you get a large number, then that would suggest the problem is that the C++
interface isn't somehow closing objects as they fall out of scope.

Thats all I can think of. Sorry if no help.

Mark

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of
Jorj Pimm <jorjpimm@gmail.com>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Date: Thursday, August 13, 2015 9:21 AM
To: "hdf-forum@lists.hdfgroup.org" <hdf-forum@lists.hdfgroup.org>
Subject: [Hdf-forum] Growing memory usage in small HDF program

Hello,

I am writing an application which writes large data sets to HDF5 files,
in fixed size blocks, using the HDF C++ API (version 1.8.15, patch 1, built
in msvc 2013 x64)

I my application seems to quickly consume all the available memory on
my system (win32 - around 5.9GB), and then crash whenever the system
becomes stressed (windows kills it as it has no memory)

I have also tested the application on a linux machine, where I saw
similar results.

I was under the impression that by using HDF5, the file would be
brought in and out of memory in such a way that the library would only use
a small working set - is this not true?

I have experimented with HDF features such as flushing to disk,
regularly closing and re opening, garbage collection and tuning chunking
and caching settings and haven't managed to get a stable working set.

I've attached a minimal example, can anyone point out my mistake?

Thanks,
- Jorj

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi Mark,

H5Fget_obj_count() only works with file objects, which include files, groups, datasets, and named datatypes. HDF5 currently does not offer a function that gives such information for other IDs. We have a feature request in our database and, if we have time, we'll try to add it to a future release.

In the meantime, an application can use

https://www.hdfgroup.org/HDF5/doc/RM/RM_H5I.html#Identify-NMembers

to check for specific identifier's type.

Thanks,

Binh-Minh

···

________________________________
From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Miller, Mark C. <miller86@llnl.gov>
Sent: Friday, August 14, 2015 1:51 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Growing memory usage in small HDF program

Hmm. I just wanted to ask THG guys a quick follow-up question here.

I didn't follow this whole thread but was this growth due to the C++ interface failing to close or dec-ref some objects?

If so, why didn't H5Oget_obj_count help to deduce that? My understanding is that Jorj tried that but it yielded no indication of an object handle leak. Is there a bug there?

Mark

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Binh-Minh Ribler <bmribler@hdfgroup.org<mailto:bmribler@hdfgroup.org>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Friday, August 14, 2015 10:03 AM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: Re: [Hdf-forum] Growing memory usage in small HDF program

That's good. Thank you for applying the files and letting us know, George!

Binh-Minh

________________________________
From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Jorj Pimm <jorjpimm@gmail.com<mailto:jorjpimm@gmail.com>>
Sent: Friday, August 14, 2015 4:47 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Growing memory usage in small HDF program

I've applied Binh-Minh's files to my local copy of HDF and this does significantly reduce memory usage for my example.

It now uses under 100MB of memory, which seems pretty reasonable, I'll continue testing against these changes.

Thanks all for the help,
- George

On Fri, 14 Aug 2015 at 09:10 Jason Newton <nevion@gmail.com<mailto:nevion@gmail.com>> wrote:
Hmm, I did make a mistake in how p_setID works (since it decref's but does not incref it's new reference) - and I figured setId wasn't defined provided the naming.

I've never seen shared_ptr or OpenCL's C++ wrapper (which is almost a mirror image in terms of the library complexity they map) foul up or leak references in any cases with the same fundamental operations. It's unclear to me why this library can't do the same. The amount of code dedicated to those purposes in those libraries is much less than what's going on here too...

-Jason

On Fri, Aug 14, 2015 at 12:59 AM, Binh-Minh Ribler <bmribler@hdfgroup.org<mailto:bmribler@hdfgroup.org>> wrote:

Hello Jason,

________________________________
From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Jason Newton <nevion@gmail.com<mailto:nevion@gmail.com>>
Sent: Thursday, August 13, 2015 10:39 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Growing memory usage in small HDF program

Bug found (in C++ api as usual)

Thank you for your efforts in tracking down the problem and your suggestions.

The C++ API *should* take care of inc/dec ref appropriately although they do this in each object class (may be higher in some class hierarchies like datatypes) but something of a leaf otherwise, rather than through inheritance of IdComponent. That strategy while working has left a few bugs I've found / encountered both as leaks and dec'reffing references not incref'd. As of 1.8.15, all that I was aware of though but this concern should be warranted all the time based on past-burnings (this would be the third time noticing something a shared_ptr like class/wrapper around HDF resources (IdComponent...?) would completely eliminate.

dataset.getSpace() leaks a reference:

   //create dataspace object using the existing id then return the object
   DataSpace data_space; <--default constructor makes a valid hdf dataspace for H5S_SCALAR
   f_DataSpace_setId(&data_space, dataspace_id); <-- evil line, why didn't we just use the ctor that takes the id parameter?
   return( data_space );

In 1.8.14, this block of code is like this, before it was changed into using the friend function in 1.8.15.

//create dataspace object using the existing id then return the object
DataSpace data_space(dataspace_id);

return(data_space);

As you can see in the comments you included below, the friend function was a work-around of a problem reported by some other users. In that problem, the id was prematurely closed, due to the behind-the-scene copy-constructor/destructor when an object was returned from a function. In order to fix that problem, the copy constructor and the constructor that takes an existing id of those classes that associate with an HDF5 id needs to increment the ref counter.

However, incrementing ref count left some objects opened at the end of the program, perhaps, due to some compiler's optimization when returning an object to the caller. In these situations, a destructor for a temporary object didn't seem to be invoked, so the id ref of the temporary object was never closed. I could never figure out why. Hence, the work-around was to use p_setId instead, which required the use of the friend function. If anyone has a different suggestion, please let us know.

//--------------------------------------------------------------------------
// Function: f_DataSpace_setId - friend
// Purpose: This function is friend to class H5::DataSpace so that it can
// can set DataSpace::id in order to work around a problem
// described in the JIRA issue HDFFV-7947.
// Applications shouldn't need to use it.
// param dspace - IN/OUT: DataSpace object to be changed
// param new_id - IN: New id to set
// Programmer Binh-Minh Ribler - 2015
//--------------------------------------------------------------------------
void f_DataSpace_setId(DataSpace* dspace, hid_t new_id) <--evil function that shouldn't exist (as a friend no-less!)
{
    dspace->id = new_id; <-- why not dspace->p_setId(new_id);? Just make it public already as "reset" and get rid of the friend. Follow shared_ptr semantics.. and bring all this stuff inside IdComponent.
.
}
The difference between the public "setId" and the private p_setId is that "setId" also increments the ref count and is intended for applications to use on the C++ object id. The private p_setId doesn't increment the id ref count and is not intended for application use. The difference is explained in the function's header.
Thank you,
Binh-Minh

-Jason

On Thu, Aug 13, 2015 at 9:37 AM, Miller, Mark C. <miller86@llnl.gov<mailto:miller86@llnl.gov>> wrote:
Hmm. Well I have no experience with HDF5's C++ interface.

My first thought when reading your description was. . . I've seen that before. It happens when I forgot to H5Xclose() all the objects I H5Xopened (groups, datasets, types, dataspaces, etc.).

However, with C++, I presume the interface is designed to close objects when they fall out of scope (e.g. deconstructor is called). So, in looking at your code, even though I don't see any explicit calls to close objects previously opened, I assume that *should* be happening when the objects fall out of scope. But, are you *certain* that *is* happening? Just before exiting main, you migth wanna make a call to H5Fget_obj_count() to get some idea how many objects HDF5 library thinks are still open in the file. If you get a large number, then that would suggest the problem is that the C++ interface isn't somehow closing objects as they fall out of scope.

Thats all I can think of. Sorry if no help.

Mark

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Jorj Pimm <jorjpimm@gmail.com<mailto:jorjpimm@gmail.com>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Thursday, August 13, 2015 9:21 AM
To: "hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>" <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] Growing memory usage in small HDF program

Hello,

I am writing an application which writes large data sets to HDF5 files, in fixed size blocks, using the HDF C++ API (version 1.8.15, patch 1, built in msvc 2013 x64)

I my application seems to quickly consume all the available memory on my system (win32 - around 5.9GB), and then crash whenever the system becomes stressed (windows kills it as it has no memory)

I have also tested the application on a linux machine, where I saw similar results.

I was under the impression that by using HDF5, the file would be brought in and out of memory in such a way that the library would only use a small working set - is this not true?

I have experimented with HDF features such as flushing to disk, regularly closing and re opening, garbage collection and tuning chunking and caching settings and haven't managed to get a stable working set.

I've attached a minimal example, can anyone point out my mistake?

Thanks,
- Jorj

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5