Disabling Metadata Cache (UNCLASSIFIED)

Kenneth_Leiter · May 22, 2012, 7:55pm

UNCLASSIFIED
Hello,

I have an application that reads many datasets located in many hdf5 files. With my current implementation I do not know whether I will read from a file more than once, so I open and close the file each time I read a dataset.

I profiled my code that reads many thousands of datasets in a few hundred files and see that creation and destruction of the metadata cache takes up a significant portion of runtime:

H5AC_create = 10.3%
H5AC_dest = 35%

The H5AC_dest spends the entire time in H5C_flush_invalidate_cache.

To compare, H5Dopen and H5Dread take 9.2% of run time combined.

I know the dataset location that I want to read in the hdf5 file ahead of time, and I don't require (to my knowledge) any metadata in order to read the dataset. I thought that perhaps disabling metadata creation would help, but I don't see a way of doing it (I am using 1.8.5-patch1). I attempted to set min and max metadata cache size to 1024 bytes (the minimum allowed) but saw no improvement in performance. Does anyone know an alternative way of getting around this problem other than avoiding repeated file open and closes.

Thanks,
Ken Leiter
UNCLASSIFIED

Quincey_Koziol · May 24, 2012, 1:30pm

Hi Ken,

···

On May 22, 2012, at 2:55 PM, Leiter, Kenneth Mr CIV USA USAMC wrote:

UNCLASSIFIED
Hello,

I have an application that reads many datasets located in many hdf5 files. With my current implementation I do not know whether I will read from a file more than once, so I open and close the file each time I read a dataset.

I profiled my code that reads many thousands of datasets in a few hundred files and see that creation and destruction of the metadata cache takes up a significant portion of runtime:

H5AC_create = 10.3%
H5AC_dest = 35%

The H5AC_dest spends the entire time in H5C_flush_invalidate_cache.

To compare, H5Dopen and H5Dread take 9.2% of run time combined.

I know the dataset location that I want to read in the hdf5 file ahead of time, and I don't require (to my knowledge) any metadata in order to read the dataset. I thought that perhaps disabling metadata creation would help, but I don't see a way of doing it (I am using 1.8.5-patch1). I attempted to set min and max metadata cache size to 1024 bytes (the minimum allowed) but saw no improvement in performance. Does anyone know an alternative way of getting around this problem other than avoiding repeated file open and closes.

You will need to read metadata in order to read your dataset - the dataset's object header needs to be looked up from the dataset's name, etc. I doubt that reducing the size of the metadata cache will help and all metadata access is performed through it, so it can't really be disabled in a meaningful way without significant changes to the library. Can you push the profiling into the H5AC_create and H5AC_dest (H5C_flush_invalidate_cache) calls a big further and see if there are some algorithmic issues that are slowing things down for you?

Quincey

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Disabling Metadata Cache (UNCLASSIFIED)