opening/closing groups, performance

I remember reading somewhere it is best to avoid opening and closing files, but is there a similar performance cost to repeatedly opening and closing groups or datasets?

Thanks,
Ken

···

------------------------------------
Kenneth Sullivan, Ph.D.
Research Staff Member
Mayachitra, Inc.
sullivan@mayachitra.com

Hi Kenneth

sparing the details,I will give you my experience: opening and closing a
file engages the OS which is expensive. Opening and closing groups and
datasets are merely operations done at the metadata level which is read and
cached. Therefore I expect the latter to introduce a much lower overhead
than the former.

A somehow related example: moving from a set of plain binary files of
serialized data to a single HDF5 file, I noticed that the per file
performance to completely serialize-deserialize is more or less the same,
with HDF5 being faster under some circumstances. But comparing reading from
multiple binary files or one HDF5 gives me an order of magnitude gain in
performance relative to the order of the number of binary files. So if I
compare reading the same information from one HDF5 file or 10 binary files,
HDF5 is an order of magnitude faster. This seems to increase linearly.

HTH

-- dimitris

···

2009/5/28 Kenneth Sullivan <sullivan@mayachitra.com>

I remember reading somewhere it is best to avoid opening and closing files,
but is there a similar performance cost to repeatedly opening and closing
groups or datasets?

Thanks,
Ken

------------------------------------
Kenneth Sullivan, Ph.D.
Research Staff Member
Mayachitra, Inc.
sullivan@mayachitra.com

Thanks, good to know!
-Ken

···

On Wed, May 27, 2009 at 11:31 PM, Dimitris Servis <servisster@gmail.com>wrote:

Hi Kenneth

sparing the details,I will give you my experience: opening and closing a
file engages the OS which is expensive. Opening and closing groups and
datasets are merely operations done at the metadata level which is read and
cached. Therefore I expect the latter to introduce a much lower overhead
than the former.

A somehow related example: moving from a set of plain binary files of
serialized data to a single HDF5 file, I noticed that the per file
performance to completely serialize-deserialize is more or less the same,
with HDF5 being faster under some circumstances. But comparing reading from
multiple binary files or one HDF5 gives me an order of magnitude gain in
performance relative to the order of the number of binary files. So if I
compare reading the same information from one HDF5 file or 10 binary files,
HDF5 is an order of magnitude faster. This seems to increase linearly.

HTH

-- dimitris

2009/5/28 Kenneth Sullivan <sullivan@mayachitra.com>

I remember reading somewhere it is best to avoid opening and closing

files, but is there a similar performance cost to repeatedly opening and
closing groups or datasets?

Thanks,
Ken

------------------------------------
Kenneth Sullivan, Ph.D.
Research Staff Member
Mayachitra, Inc.
sullivan@mayachitra.com