Scalability/Speed issues using H5DRead

Malcolm_MacLeod · September 5, 2012, 9:25am

Hello,

Our software has for a long time made use of the HDF5 library without any
issues. Recently we have started to run into datasets far larger than wh at
was previously used and some scalability issues appear to be showing.

The HDF5 file in question contains a single group with many datasets - A
specific piece of code opens every dataset one at a time and reads from it via
H5DRead.

Previously it was rare to have more than ~90000 datasets here so this was
never noticed - but after H5DRead has been called about ~60000 times
subsequent calls appear to start to become increasingly slow, by about ~80000
calls it slows to a crawl (instead of processing 1000s a second it is
processing only two or three per second)

I have tried upgrading from 1.8.8 -> 1.8.9 and this seems to have helped
slightly, it now becomes unbearable at around ~100000 instead of ~80000 calls.

Some observations:
1) This does not appear to be due to a seek delay or (larger datasets in the
middle) or anything like that, I have tried e.g. starting at the back of a
group of ~500000 datasets instead of the front and the same thing happens. I
have tried also to start in various spots towards the middle and also the same
behaviour can be observed.
2) If I cancel the loop, allow the software to idle for a while and then give
it another go the same thing happens (it is fast again until a certain
quantity of reads) - so it appears that HDF5 may be doing something in the
background once it is not busy that allows reads to be fast again?

I would greatly appreciate any thoughts on this or ideas as to what might be
going on?

Regards,
Malcolm MacLeod

epourmal · September 7, 2012, 9:57pm

Malcolm,

Please try to use the latest file format when you create a file. It should be more efficient in handling groups with a big number of objects.

See the H5Pset_libver_bounds function (http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetLibverBounds\); use H5F_LIBVER_LATEST for the last two parameters.

You may repack an existing file with h5repack using -L flag.

Elena

···

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Sep 5, 2012, at 4:25 AM, Malcolm MacLeod wrote:

Hello,

Our software has for a long time made use of the HDF5 library without any
issues. Recently we have started to run into datasets far larger than wh at
was previously used and some scalability issues appear to be showing.

The HDF5 file in question contains a single group with many datasets - A
specific piece of code opens every dataset one at a time and reads from it via
H5DRead.

Previously it was rare to have more than ~90000 datasets here so this was
never noticed - but after H5DRead has been called about ~60000 times
subsequent calls appear to start to become increasingly slow, by about ~80000
calls it slows to a crawl (instead of processing 1000s a second it is
processing only two or three per second)

I have tried upgrading from 1.8.8 -> 1.8.9 and this seems to have helped
slightly, it now becomes unbearable at around ~100000 instead of ~80000 calls.

Some observations:
1) This does not appear to be due to a seek delay or (larger datasets in the
middle) or anything like that, I have tried e.g. starting at the back of a
group of ~500000 datasets instead of the front and the same thing happens. I
have tried also to start in various spots towards the middle and also the same
behaviour can be observed.
2) If I cancel the loop, allow the software to idle for a while and then give
it another go the same thing happens (it is fast again until a certain
quantity of reads) - so it appears that HDF5 may be doing something in the
background once it is not busy that allows reads to be fast again?

I would greatly appreciate any thoughts on this or ideas as to what might be
going on?

Regards,
Malcolm MacLeod

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Malcolm_MacLeod · September 8, 2012, 8:31am

Hello Elena,

Sorry I should have mentioned that, I am already setting H5F_LIBVER_LATEST and
have recreated the file (which is what gave the slight speed boost I mentioned
originally when upgrading) but the same issue is unfortunately still present.

- Malcolm

···

Malcolm,

Please try to use the latest file format when you create a file. It should
be more efficient in handling groups with a big number of objects.

See the H5Pset_libver_bounds function
(http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetLibverBounds\);
use H5F_LIBVER_LATEST for the last two parameters.

You may repack an existing file with h5repack using -L flag.

Elena

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Sep 5, 2012, at 4:25 AM, Malcolm MacLeod wrote:
> Hello,
>
> Our software has for a long time made use of the HDF5 library without any
> issues. Recently we have started to run into datasets far larger than wh
> at
> was previously used and some scalability issues appear to be showing.
>
> The HDF5 file in question contains a single group with many datasets - A
> specific piece of code opens every dataset one at a time and reads from it
> via H5DRead.
>
> Previously it was rare to have more than ~90000 datasets here so this was
> never noticed - but after H5DRead has been called about ~60000 times
> subsequent calls appear to start to become increasingly slow, by about
> ~80000 calls it slows to a crawl (instead of processing 1000s a second it
> is processing only two or three per second)
>
> I have tried upgrading from 1.8.8 -> 1.8.9 and this seems to have helped
> slightly, it now becomes unbearable at around ~100000 instead of ~80000
> calls.
>
>
> Some observations:
> 1) This does not appear to be due to a seek delay or (larger datasets in
> the middle) or anything like that, I have tried e.g. starting at the back
> of a group of ~500000 datasets instead of the front and the same thing
> happens. I have tried also to start in various spots towards the middle
> and also the same behaviour can be observed.
> 2) If I cancel the loop, allow the software to idle for a while and then
> give it another go the same thing happens (it is fast again until a
> certain quantity of reads) - so it appears that HDF5 may be doing
> something in the background once it is not busy that allows reads to be
> fast again?
>
>
> I would greatly appreciate any thoughts on this or ideas as to what might
> be going on?
>
> Regards,
> Malcolm MacLeod
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

epourmal · September 9, 2012, 2:58am

Hi Malcom,

Doesn't sound good ;-). Would it be possible to submit a program that demonstrates the issue to help@hdfgroup.org, so we can take a look?

Thank you!

Elena

···

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Sep 8, 2012, at 3:31 AM, Malcolm MacLeod wrote:

Hello Elena,

Sorry I should have mentioned that, I am already setting H5F_LIBVER_LATEST and
have recreated the file (which is what gave the slight speed boost I mentioned
originally when upgrading) but the same issue is unfortunately still present.

- Malcolm

Malcolm,

Please try to use the latest file format when you create a file. It should
be more efficient in handling groups with a big number of objects.

See the H5Pset_libver_bounds function
(http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetLibverBounds\);
use H5F_LIBVER_LATEST for the last two parameters.

You may repack an existing file with h5repack using -L flag.

Elena

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Sep 5, 2012, at 4:25 AM, Malcolm MacLeod wrote:

Hello,

Our software has for a long time made use of the HDF5 library without any
issues. Recently we have started to run into datasets far larger than wh
at
was previously used and some scalability issues appear to be showing.

The HDF5 file in question contains a single group with many datasets - A
specific piece of code opens every dataset one at a time and reads from it
via H5DRead.

Previously it was rare to have more than ~90000 datasets here so this was
never noticed - but after H5DRead has been called about ~60000 times
subsequent calls appear to start to become increasingly slow, by about
~80000 calls it slows to a crawl (instead of processing 1000s a second it
is processing only two or three per second)

I have tried upgrading from 1.8.8 -> 1.8.9 and this seems to have helped
slightly, it now becomes unbearable at around ~100000 instead of ~80000
calls.

Some observations:
1) This does not appear to be due to a seek delay or (larger datasets in
the middle) or anything like that, I have tried e.g. starting at the back
of a group of ~500000 datasets instead of the front and the same thing
happens. I have tried also to start in various spots towards the middle
and also the same behaviour can be observed.
2) If I cancel the loop, allow the software to idle for a while and then
give it another go the same thing happens (it is fast again until a
certain quantity of reads) - so it appears that HDF5 may be doing
something in the background once it is not busy that allows reads to be
fast again?

I would greatly appreciate any thoughts on this or ideas as to what might
be going on?

Regards,
Malcolm MacLeod

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Malcolm_MacLeod · September 19, 2012, 6:23pm

Hello Elena,

Just occurred to me....Did you check that the program closes unused
identifiers? This may cause performance degrade.

Yes, I have checked for this and did not see anything.
Some further information that I have uncovered, on Windows 7 this degredation
does not occur but on Windows XP it does, so seemingly whatever is going on
here is XP specific?

I will see if I can find some time to make a test case.

Thanks,
Malcolm

···

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Sep 8, 2012, at 3:31 AM, Malcolm MacLeod wrote:
> Hello Elena,
>
> Sorry I should have mentioned that, I am already setting H5F_LIBVER_LATEST
> and have recreated the file (which is what gave the slight speed boost I
> mentioned originally when upgrading) but the same issue is unfortunately
> still present.
>
> - Malcolm
>
>> Malcolm,
>>
>> Please try to use the latest file format when you create a file. It
>> should
>> be more efficient in handling groups with a big number of objects.
>>
>> See the H5Pset_libver_bounds function
>> (http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetLibverBounds
>> );
>> use H5F_LIBVER_LATEST for the last two parameters.
>>
>> You may repack an existing file with h5repack using -L flag.
>>
>> Elena
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Elena Pourmal The HDF Group http://hdfgroup.org
>> 1800 So. Oak St., Suite 203, Champaign IL 61820
>> 217.531.6112
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> On Sep 5, 2012, at 4:25 AM, Malcolm MacLeod wrote:
>>> Hello,
>>>
>>> Our software has for a long time made use of the HDF5 library without
>>> any
>>> issues. Recently we have started to run into datasets far larger than wh
>>> at
>>> was previously used and some scalability issues appear to be showing.
>>>
>>> The HDF5 file in question contains a single group with many datasets - A
>>> specific piece of code opens every dataset one at a time and reads from
>>> it
>>> via H5DRead.
>>>
>>> Previously it was rare to have more than ~90000 datasets here so this
>>> was
>>> never noticed - but after H5DRead has been called about ~60000 times
>>> subsequent calls appear to start to become increasingly slow, by about
>>> ~80000 calls it slows to a crawl (instead of processing 1000s a second
>>> it
>>> is processing only two or three per second)
>>>
>>> I have tried upgrading from 1.8.8 -> 1.8.9 and this seems to have helped
>>> slightly, it now becomes unbearable at around ~100000 instead of ~80000
>>> calls.
>>>
>>>
>>> Some observations:
>>> 1) This does not appear to be due to a seek delay or (larger datasets in
>>> the middle) or anything like that, I have tried e.g. starting at the
>>> back
>>> of a group of ~500000 datasets instead of the front and the same thing
>>> happens. I have tried also to start in various spots towards the middle
>>> and also the same behaviour can be observed.
>>> 2) If I cancel the loop, allow the software to idle for a while and then
>>> give it another go the same thing happens (it is fast again until a
>>> certain quantity of reads) - so it appears that HDF5 may be doing
>>> something in the background once it is not busy that allows reads to be
>>> fast again?
>>>
>>>
>>> I would greatly appreciate any thoughts on this or ideas as to what
>>> might
>>> be going on?
>>>
>>> Regards,
>>> Malcolm MacLeod
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> Hdf-forum@hdfgroup.org
>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

epourmal · September 25, 2012, 1:02am

Hi Malcolm,

Hello Elena,

Just occurred to me....Did you check that the program closes unused
identifiers? This may cause performance degrade.

Yes, I have checked for this and did not see anything.
Some further information that I have uncovered, on Windows 7 this degredation
does not occur but on Windows XP it does, so seemingly whatever is going on
here is XP specific?

I will see if I can find some time to make a test case.

This will be great! Please send it to help@hdfgroup.org

Thank you!

Elena

···

On Sep 19, 2012, at 1:23 PM, Malcolm MacLeod wrote:

Thanks,
Malcolm

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Sep 8, 2012, at 3:31 AM, Malcolm MacLeod wrote:

Hello Elena,

Sorry I should have mentioned that, I am already setting H5F_LIBVER_LATEST
and have recreated the file (which is what gave the slight speed boost I
mentioned originally when upgrading) but the same issue is unfortunately
still present.

- Malcolm

Malcolm,

Please try to use the latest file format when you create a file. It
should
be more efficient in handling groups with a big number of objects.

See the H5Pset_libver_bounds function
(http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetLibverBounds
);
use H5F_LIBVER_LATEST for the last two parameters.

You may repack an existing file with h5repack using -L flag.

Elena

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Sep 5, 2012, at 4:25 AM, Malcolm MacLeod wrote:

Hello,

Our software has for a long time made use of the HDF5 library without
any
issues. Recently we have started to run into datasets far larger than wh
at
was previously used and some scalability issues appear to be showing.

The HDF5 file in question contains a single group with many datasets - A
specific piece of code opens every dataset one at a time and reads from
it
via H5DRead.

Previously it was rare to have more than ~90000 datasets here so this
was
never noticed - but after H5DRead has been called about ~60000 times
subsequent calls appear to start to become increasingly slow, by about
~80000 calls it slows to a crawl (instead of processing 1000s a second
it
is processing only two or three per second)

I have tried upgrading from 1.8.8 -> 1.8.9 and this seems to have helped
slightly, it now becomes unbearable at around ~100000 instead of ~80000
calls.

Some observations:
1) This does not appear to be due to a seek delay or (larger datasets in
the middle) or anything like that, I have tried e.g. starting at the
back
of a group of ~500000 datasets instead of the front and the same thing
happens. I have tried also to start in various spots towards the middle
and also the same behaviour can be observed.
2) If I cancel the loop, allow the software to idle for a while and then
give it another go the same thing happens (it is fast again until a
certain quantity of reads) - so it appears that HDF5 may be doing
something in the background once it is not busy that allows reads to be
fast again?

I would greatly appreciate any thoughts on this or ideas as to what
might
be going on?

Regards,
Malcolm MacLeod

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Scalability/Speed issues using H5DRead