Cache memory usage

I am trying to get a handle on how much memory is being used by HDF5 for
caching, and have a couple questions:

Do the cache limits apply globally (per process), per file, per dataset, or
in some other way? Specifically when trying to compute the total memory
usage I should just add the memory for the raw data chunk cache and the
metadata cache or do I need to multiply one or both by the number of
files/datasets/other?

Is there any good way to measure actual cache memory usage or am I limited
to using top to check process memory usage and computing values based on
cache parameters?

Thank You,
Ethan

Hi Ethan,

I am trying to get a handle on how much memory is being used by HDF5 for caching, and have a couple questions:

Do the cache limits apply globally (per process), per file, per dataset, or in some other way? Specifically when trying to compute the total memory usage I should just add the memory for the raw data chunk cache and the metadata cache or do I need to multiply one or both by the number of files/datasets/other?

  The metadata cache is per file and the raw data chunk cache is per dataset.

Is there any good way to measure actual cache memory usage or am I limited to using top to check process memory usage and computing values based on cache parameters?

  Hmm, you can check the metadata cache, but I don't think there's a query function for the chunk cache currently. Also, you can manually garbage collect the internal HDF5 library memory allocations with H5garbage_collect(), but we don't have a way to query that usage right now either. Probably valgrind or top would still be reasonable now...

  Quincey

···

On Jul 19, 2010, at 8:41 PM, Ethan Dreyfuss wrote:

From hdf-forum-bounces@hdfgroup.org Tue Jul 20 07:48:03 2010
From: Quincey Koziol <koziol@hdfgroup.org>
Date: Tue, 20 Jul 2010 07:50:52 -0500
To: HDF Users Discussion List <hdf-forum@hdfgroup.org>
Subject: Re: [Hdf-forum] Cache memory usage

Hi Ethan,

I am trying to get a handle on how much memory is being used by HDF5 for caching, and have a couple questions:

Do the cache limits apply globally (per process), per file, per dataset, or in some other way? Specifically when trying to compute the total memory usage I should just add the memory for the raw data chunk cache and the metadata cache or do I need to multiply one or both by the number of files/datasets/other?

The metadata cache is per file and the raw data chunk cache is per dataset.

Is there any good way to measure actual cache memory usage or am I limited to using top to check process memory usage and computing values based on cache parameters?

Hmm, you can check the metadata cache, but I don't think there's a query function for the chunk cache currently. Also, you can manually garbage collect the internal HDF5 library memory allocations with H5garbage_collect(), but we don't have a way to query that usage right now either. Probably valgrind or top would still be reasonable now...

Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Ethan,

   I believe I can add a bit to the above:

   As Quincey indicated, HDF5 creates one metadata cache per open file.
You can use H5Fget_mdc_size() to get the current cache size for a given
file, but note that the cache's footprint in memory will typically be two
to three times its current size.

   Note also that unless configured otherwise, the metadata cache will
attempt to resize itself so as to be big enough to contain the current
working set -- and no larger. Thus, depending on your access pattern,
you may see the metadata cache size grow and shrink. That said, you
have to work pretty hard to get it to grow beyond a several MB.

   For further information on the metadata cache, please see the portion
of the special topics section of the Users Guide that addresses the
metadata cache. Reading and understanding this portion of the
documentation is pretty much essential if you want to do take direct
control of the metadata cache without shooting yourself in the foot.

   The chunk cache is not my specialty, so I will not attempt to add to
Quincey's comments.

   I hope this helps.

                                       Best regards,

                                       John Mainzer

···

On Jul 19, 2010, at 8:41 PM, Ethan Dreyfuss wrote:

Thank you Quincey and John for the helpful information. I will keep the
metadata cache options in mind, but right now my biggest concern is the raw
data cache. I currently have one file with many (up to ~1000) datasets,
which I have open simultaneously. Since there is a separate raw data cache
for each I think this explains the significant cache memory usage I am
seeing.

I don't suppose there is a way to share a raw data cache across multiple
datasets is there?

When does memory allocation happen for the raw data cache? Does it get
allocated at dataset open / create time or only as values are written to and
read from the dataset?

Are there any other options I have for affecting the behavior of the raw
data cache? All I can find are H5P_get/set_[chunk_]cache.

Thanks,
Ethan

···

On Tue, Jul 20, 2010 at 1:38 PM, John Mainzer <mainzer@hdfgroup.org> wrote:

>From hdf-forum-bounces@hdfgroup.org Tue Jul 20 07:48:03 2010
>From: Quincey Koziol <koziol@hdfgroup.org>
>Date: Tue, 20 Jul 2010 07:50:52 -0500
>To: HDF Users Discussion List <hdf-forum@hdfgroup.org>
>Subject: Re: [Hdf-forum] Cache memory usage
>
>Hi Ethan,
>
>On Jul 19, 2010, at 8:41 PM, Ethan Dreyfuss wrote:
>
>> I am trying to get a handle on how much memory is being used by HDF5 for
caching, and have a couple questions:
>>
>> Do the cache limits apply globally (per process), per file, per dataset,
or in some other way? Specifically when trying to compute the total memory
usage I should just add the memory for the raw data chunk cache and the
metadata cache or do I need to multiply one or both by the number of
files/datasets/other?
>
> The metadata cache is per file and the raw data chunk cache is per
dataset.
>
>> Is there any good way to measure actual cache memory usage or am I
limited to using top to check process memory usage and computing values
based on cache parameters?
>
> Hmm, you can check the metadata cache, but I don't think there's a
query function for the chunk cache currently. Also, you can manually
garbage collect the internal HDF5 library memory allocations with
H5garbage_collect(), but we don't have a way to query that usage right now
either. Probably valgrind or top would still be reasonable now...
>
> Quincey
>
>
>
>_______________________________________________
>Hdf-forum is for HDF software users discussion.
>Hdf-forum@hdfgroup.org
>http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Ethan,

  I believe I can add a bit to the above:

  As Quincey indicated, HDF5 creates one metadata cache per open file.
You can use H5Fget_mdc_size() to get the current cache size for a given
file, but note that the cache's footprint in memory will typically be two
to three times its current size.

  Note also that unless configured otherwise, the metadata cache will
attempt to resize itself so as to be big enough to contain the current
working set -- and no larger. Thus, depending on your access pattern,
you may see the metadata cache size grow and shrink. That said, you
have to work pretty hard to get it to grow beyond a several MB.

  For further information on the metadata cache, please see the portion
of the special topics section of the Users Guide that addresses the
metadata cache. Reading and understanding this portion of the
documentation is pretty much essential if you want to do take direct
control of the metadata cache without shooting yourself in the foot.

  The chunk cache is not my specialty, so I will not attempt to add to
Quincey's comments.

  I hope this helps.

                                      Best regards,

                                      John Mainzer

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Ethan,

Thank you Quincey and John for the helpful information. I will keep the metadata cache options in mind, but right now my biggest concern is the raw data cache. I currently have one file with many (up to ~1000) datasets, which I have open simultaneously. Since there is a separate raw data cache for each I think this explains the significant cache memory usage I am seeing.

I don't suppose there is a way to share a raw data cache across multiple datasets is there?

  No.

When does memory allocation happen for the raw data cache? Does it get allocated at dataset open / create time or only as values are written to and read from the dataset?

  Memory allocation occurs as data is accessed with H5Dread/H5Dwrite, not when a dataset is opened or created.

Are there any other options I have for affecting the behavior of the raw data cache? All I can find are H5P_get/set_[chunk_]cache.

  No, sorry. :-/

    Quincey

···

On Jul 20, 2010, at 4:46 PM, Ethan Dreyfuss wrote:

Thanks,
Ethan

On Tue, Jul 20, 2010 at 1:38 PM, John Mainzer <mainzer@hdfgroup.org> wrote:
>From hdf-forum-bounces@hdfgroup.org Tue Jul 20 07:48:03 2010
>From: Quincey Koziol <koziol@hdfgroup.org>
>Date: Tue, 20 Jul 2010 07:50:52 -0500
>To: HDF Users Discussion List <hdf-forum@hdfgroup.org>
>Subject: Re: [Hdf-forum] Cache memory usage
>
>Hi Ethan,
>
>On Jul 19, 2010, at 8:41 PM, Ethan Dreyfuss wrote:
>
>> I am trying to get a handle on how much memory is being used by HDF5 for caching, and have a couple questions:
>>
>> Do the cache limits apply globally (per process), per file, per dataset, or in some other way? Specifically when trying to compute the total memory usage I should just add the memory for the raw data chunk cache and the metadata cache or do I need to multiply one or both by the number of files/datasets/other?
>
> The metadata cache is per file and the raw data chunk cache is per dataset.
>
>> Is there any good way to measure actual cache memory usage or am I limited to using top to check process memory usage and computing values based on cache parameters?
>
> Hmm, you can check the metadata cache, but I don't think there's a query function for the chunk cache currently. Also, you can manually garbage collect the internal HDF5 library memory allocations with H5garbage_collect(), but we don't have a way to query that usage right now either. Probably valgrind or top would still be reasonable now...
>
> Quincey
>
>
>
>_______________________________________________
>Hdf-forum is for HDF software users discussion.
>Hdf-forum@hdfgroup.org
>http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Ethan,

  I believe I can add a bit to the above:

  As Quincey indicated, HDF5 creates one metadata cache per open file.
You can use H5Fget_mdc_size() to get the current cache size for a given
file, but note that the cache's footprint in memory will typically be two
to three times its current size.

  Note also that unless configured otherwise, the metadata cache will
attempt to resize itself so as to be big enough to contain the current
working set -- and no larger. Thus, depending on your access pattern,
you may see the metadata cache size grow and shrink. That said, you
have to work pretty hard to get it to grow beyond a several MB.

  For further information on the metadata cache, please see the portion
of the special topics section of the Users Guide that addresses the
metadata cache. Reading and understanding this portion of the
documentation is pretty much essential if you want to do take direct
control of the metadata cache without shooting yourself in the foot.

  The chunk cache is not my specialty, so I will not attempt to add to
Quincey's comments.

  I hope this helps.

                                      Best regards,

                                      John Mainzer

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org