performance reading attributes attached to large number of group

khm · September 26, 2012, 4:27pm

I have a performance problem while reading the attributes from an hdf file..
it seems it takes almost 1.5 minutes to read attributes(10 each) from about
18000 groups/datasets .. hoping somebody can tell me if this is a reasonable
time indeed for such a structure...
the hdf file has some 300 groups under root
each of these 300 groups(T) have about 60 subgroups(V) and each of these 60
subgroups have 1 or 2 datasets(D)

Root
----- T1
----- V1
----- D

----- V60
---- D

....
....
....
----- T300

at each level I am reading max 10 tiny attributes..

does reading each group mean mostly a new disk seek? any suggestions for
improving performance ?

···

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/performance-reading-attributes-attached-to-large-number-of-group-tp4025432.html
Sent from the hdf-forum mailing list archive at Nabble.com.

Quincey_Koziol · September 27, 2012, 11:37am

Hi Manish,

···

On Sep 26, 2012, at 11:27 AM, khm wrote:

I have a performance problem while reading the attributes from an hdf file..
it seems it takes almost 1.5 minutes to read attributes(10 each) from about
18000 groups/datasets .. hoping somebody can tell me if this is a reasonable
time indeed for such a structure...
the hdf file has some 300 groups under root
each of these 300 groups(T) have about 60 subgroups(V) and each of these 60
subgroups have 1 or 2 datasets(D)

Root
----- T1
      ----- V1
            ----- D

      ----- V60
            ---- D

....
....
....
----- T300

at each level I am reading max 10 tiny attributes..

does reading each group mean mostly a new disk seek? any suggestions for
improving performance ?

Hmm, are you keeping the object open when reading all the attributes on it?

Quincey

khm · September 27, 2012, 1:43pm

yes I open each level and go to full depth and then close all
is this inefficient ..should I use absolute paths?

what I do is :
for each of T1 to T300
    idT1 = open(T1)
    for each of V1 to V60
          idV1 = open (idT1, relative path from T1 to V1)
          readAttribute(idV1, attribute name)
          idD = open(idV1, relative path from V1 to D)
          readAttribute(idD, attribute name)
          close(idD)
          close(idV1)
     end for
     close(idT1)
end for

···

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/performance-reading-attributes-attached-to-large-number-of-group-tp4025432p4025434.html
Sent from the hdf-forum mailing list archive at Nabble.com.

Quincey_Koziol1 · September 27, 2012, 2:21pm

yes I open each level and go to full depth and then close all
is this inefficient ..should I use absolute paths?

No, this is the preferred access method (holding an object open while you are accessing its attributes). You are probably just experiencing a lot of seeks for each object, since you are reading small bits of data from widely scattered locations in the file. It's possible that you may get better performance if you increase the metadata block size with H5Pset_meta_block_size() when creating the file. Try something like 64KiB (instead of the default 4KiB)

Quincey

···

On Sep 27, 2012, at 8:43 AM, khm wrote:

what I do is :
for each of T1 to T300
   idT1 = open(T1)
   for each of V1 to V60
         idV1 = open (idT1, relative path from T1 to V1)
         readAttribute(idV1, attribute name)
         idD = open(idV1, relative path from V1 to D)
         readAttribute(idD, attribute name)
         close(idD)
         close(idV1)
    end for
    close(idT1)
end for

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/performance-reading-attributes-attached-to-large-number-of-group-tp4025432p4025434.html
Sent from the hdf-forum mailing list archive at Nabble.com.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

performance reading attributes attached to large number of group