opening/closing groups, performance

Ken_Sullivan · May 28, 2009, 1:30am

I remember reading somewhere it is best to avoid opening and closing files, but is there a similar performance cost to repeatedly opening and closing groups or datasets?

Thanks,
Ken

···

------------------------------------
Kenneth Sullivan, Ph.D.
Research Staff Member
Mayachitra, Inc.
sullivan@mayachitra.com

Dimitris_Servis · May 28, 2009, 6:31am

Hi Kenneth

sparing the details,I will give you my experience: opening and closing a
file engages the OS which is expensive. Opening and closing groups and
datasets are merely operations done at the metadata level which is read and
cached. Therefore I expect the latter to introduce a much lower overhead
than the former.

A somehow related example: moving from a set of plain binary files of
serialized data to a single HDF5 file, I noticed that the per file
performance to completely serialize-deserialize is more or less the same,
with HDF5 being faster under some circumstances. But comparing reading from
multiple binary files or one HDF5 gives me an order of magnitude gain in
performance relative to the order of the number of binary files. So if I
compare reading the same information from one HDF5 file or 10 binary files,
HDF5 is an order of magnitude faster. This seems to increase linearly.

HTH

-- dimitris

···

2009/5/28 Kenneth Sullivan <sullivan@mayachitra.com>

I remember reading somewhere it is best to avoid opening and closing files,
but is there a similar performance cost to repeatedly opening and closing
groups or datasets?

Thanks,
Ken

------------------------------------
Kenneth Sullivan, Ph.D.
Research Staff Member
Mayachitra, Inc.
sullivan@mayachitra.com

Ken_Sullivan · May 28, 2009, 6:37pm

Thanks, good to know!
-Ken

···

On Wed, May 27, 2009 at 11:31 PM, Dimitris Servis <servisster@gmail.com>wrote:

Hi Kenneth

sparing the details,I will give you my experience: opening and closing a
file engages the OS which is expensive. Opening and closing groups and
datasets are merely operations done at the metadata level which is read and
cached. Therefore I expect the latter to introduce a much lower overhead
than the former.

A somehow related example: moving from a set of plain binary files of
serialized data to a single HDF5 file, I noticed that the per file
performance to completely serialize-deserialize is more or less the same,
with HDF5 being faster under some circumstances. But comparing reading from
multiple binary files or one HDF5 gives me an order of magnitude gain in
performance relative to the order of the number of binary files. So if I
compare reading the same information from one HDF5 file or 10 binary files,
HDF5 is an order of magnitude faster. This seems to increase linearly.

HTH

-- dimitris

2009/5/28 Kenneth Sullivan <sullivan@mayachitra.com>

I remember reading somewhere it is best to avoid opening and closing

files, but is there a similar performance cost to repeatedly opening and
closing groups or datasets?

Thanks,
Ken

------------------------------------
Kenneth Sullivan, Ph.D.
Research Staff Member
Mayachitra, Inc.
sullivan@mayachitra.com

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

opening/closing groups, performance