I have an application that reads many datasets located in many hdf5 files. With my current implementation I do not know whether I will read from a file more than once, so I open and close the file each time I read a dataset.
I profiled my code that reads many thousands of datasets in a few hundred files and see that creation and destruction of the metadata cache takes up a significant portion of runtime:
H5AC_create = 10.3%
H5AC_dest = 35%
The H5AC_dest spends the entire time in H5C_flush_invalidate_cache.
To compare, H5Dopen and H5Dread take 9.2% of run time combined.
I know the dataset location that I want to read in the hdf5 file ahead of time, and I don't require (to my knowledge) any metadata in order to read the dataset. I thought that perhaps disabling metadata creation would help, but I don't see a way of doing it (I am using 1.8.5-patch1). I attempted to set min and max metadata cache size to 1024 bytes (the minimum allowed) but saw no improvement in performance. Does anyone know an alternative way of getting around this problem other than avoiding repeated file open and closes.