Internal memory leftovers of H5Dread()

Hi,

  I'm reading a file of size ~83MB consisting out of 6294 datasets,
converting single precision data in the file to double precision in
memory, so the memory occupancy would be about 145MB of all data loaded
itself. However, once reading a dataset with H5Dread() I leave the
data space identifier open, for further usage in case the same dataset
needs to be read again later. This leads to some memory overhead of
388MB, which is twice as large as the data that is actually used in
memory. When doing an H5Dclose() on the dataset identifier just after
the H5Dread() call, this memory overhead does not appear.

Is there a way to free this evidently HDF5-internal memory that is
used when reading a dataset? I'd like to keep the dataset identifier
available for further usage, but this memory overhead per read is
killing memory performance. Calling

H5garbage_collect(void)

doesn't help. I'm using HDF5 version 1.8.4-snap17 , was there any
change on such memory management behavior in more recent versions?

  Werner

···

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Hi Werner,

It looks like a bug. I've just entered the issue into our database.

When you have a chance, could you please try the latest 1.8.8 to confirm that the issue is still there?

Thank you!
Elena

···

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Feb 15, 2012, at 7:18 AM, Werner Benger wrote:

Hi,

I'm reading a file of size ~83MB consisting out of 6294 datasets,
converting single precision data in the file to double precision in
memory, so the memory occupancy would be about 145MB of all data loaded
itself. However, once reading a dataset with H5Dread() I leave the
data space identifier open, for further usage in case the same dataset
needs to be read again later. This leads to some memory overhead of
388MB, which is twice as large as the data that is actually used in
memory. When doing an H5Dclose() on the dataset identifier just after
the H5Dread() call, this memory overhead does not appear.

Is there a way to free this evidently HDF5-internal memory that is
used when reading a dataset? I'd like to keep the dataset identifier
available for further usage, but this memory overhead per read is
killing memory performance. Calling

H5garbage_collect(void)

doesn't help. I'm using HDF5 version 1.8.4-snap17 , was there any
change on such memory management behavior in more recent versions?

  Werner

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Hi Elena,

  I've tested it with version 1.8.9-snap10, and the behavior is still the same, giving a
difference of about 386MB RAM usage in my application scenario, depending
on whether an H5Dclose() is done after H5Dread() or not, based on that
aforementioned 83MB file.

      Werner

···

On Wed, 15 Feb 2012 17:34:45 +0100, Elena Pourmal <epourmal@hdfgroup.org> wrote:

Hi Werner,

It looks like a bug. I've just entered the issue into our database.
When you have a chance, could you please try the latest 1.8.8 to confirm that the issue is still there?

Thank you!
Elena

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Elena Pourmal The HDF Group http://hdfgroup.org 1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Feb 15, 2012, at 7:18 AM, Werner Benger wrote:

Hi,

I'm reading a file of size ~83MB consisting out of 6294 datasets,
converting single precision data in the file to double precision in
memory, so the memory occupancy would be about 145MB of all data loaded
itself. However, once reading a dataset with H5Dread() I leave the
data space identifier open, for further usage in case the same dataset
needs to be read again later. This leads to some memory overhead of
388MB, which is twice as large as the data that is actually used in
memory. When doing an H5Dclose() on the dataset identifier just after
the H5Dread() call, this memory overhead does not appear.

Is there a way to free this evidently HDF5-internal memory that is
used when reading a dataset? I'd like to keep the dataset identifier
available for further usage, but this memory overhead per read is
killing memory performance. Calling

H5garbage_collect(void)

doesn't help. I'm using HDF5 version 1.8.4-snap17 , was there any
change on such memory management behavior in more recent versions?

  Werner

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Hi Werner,

Thank you for testing. The bug is in the queue.

Elena

···

On Feb 16, 2012, at 5:20 AM, Werner Benger wrote:

Hi Elena,

I've tested it with version 1.8.9-snap10, and the behavior is still the same, giving a
difference of about 386MB RAM usage in my application scenario, depending
on whether an H5Dclose() is done after H5Dread() or not, based on that
aforementioned 83MB file.

     Werner

On Wed, 15 Feb 2012 17:34:45 +0100, Elena Pourmal <epourmal@hdfgroup.org> wrote:

Hi Werner,

It looks like a bug. I've just entered the issue into our database.

When you have a chance, could you please try the latest 1.8.8 to confirm that the issue is still there?

Thank you!
Elena
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Feb 15, 2012, at 7:18 AM, Werner Benger wrote:

Hi,

I'm reading a file of size ~83MB consisting out of 6294 datasets,
converting single precision data in the file to double precision in
memory, so the memory occupancy would be about 145MB of all data loaded
itself. However, once reading a dataset with H5Dread() I leave the
data space identifier open, for further usage in case the same dataset
needs to be read again later. This leads to some memory overhead of
388MB, which is twice as large as the data that is actually used in
memory. When doing an H5Dclose() on the dataset identifier just after
the H5Dread() call, this memory overhead does not appear.

Is there a way to free this evidently HDF5-internal memory that is
used when reading a dataset? I'd like to keep the dataset identifier
available for further usage, but this memory overhead per read is
killing memory performance. Calling

H5garbage_collect(void)

doesn't help. I'm using HDF5 version 1.8.4-snap17 , was there any
change on such memory management behavior in more recent versions?

  Werner

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362