slow hyperslab selection, H5Sselect_hyperslab, H5Scombine_hyperslab, and NEW_HYPERSLAB_API

Ken_Sullivan · May 25, 2010, 10:36pm

Hi, I'm running into slow performance when selecting several
(>1000) non-consecutive rows from a 2-dimensional matrix, typically
~500,000 X 100. The bottleneck is the for loop where each row vector index
is OR'ed into the hyperslab, i.e.:

  LOG4CXX_INFO(logger,"TIME begin hyperslab building"); //print out with
time stamp
  //select file buffer hyperslabs
  H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_SET, (const hsize_t*)
fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
  for (hsize_t id = 1; id < numVecsToRead; ++id) {
    LOG4CXX_INFO(logger, id << "/" << numVecsToRead);
    fileOffset[0] = fileLocs1Dim[id];
    H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_OR, (const hsize_t*)
fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
  }
  LOG4CXX_INFO(logger,"TIME end hyperslab building");

One interesting thing is the time between each loop increases between each
iteration, e.g. no time at all between 1-2-3-4-5, but seconds between
1000-1001-1002. So, the time to select the hyperslab is worse than linear,
and can become amazingly time consuming, e.g. >10 minutes (!) for a few
thousand. The read itself is very quick.

My current workaround is to check if the number of vectors to select is
greater than a heuristically determined number where it seems the time to
read the entire file (half a million row vectors) and copy the requested
vectors is less than the time to run the hyperslab selection. Generally the
number works out to ~500 vecs/0.5 seconds.

While poking around the code, I found a similar function,
H5Scombine_hyperslab() that is only compiled if NEW_HYPERSLAB_API is
defined. Using this significantly reduced the time of selection, in
particular the time for each OR-ing seemed constant, so 2000 vectors took
twice as long as 1000, not many times as with H5Sselect_hyperslab().
However, it's still 10s of seconds for few thousand vector selection, and
so it's still much quicker to read all and copy (~1/2 second).
Reading all and copying is not an ideal solution, as it requires malloc/free
~250MB unnecessarily, and if I use H5Scombine_hyperslab() the crossover
number goes up, i.e. more than 500, and it's less likely to be needed. I'm
a bit nervous however about using this undocumented code.

So...am I doing something wrong? Is there a speedy way to select a
hyperslab consisting of 100s or 1000s of non-consecutive vectors?
Is NEW_HYPERSLAB_API safe?

Thanks,
Ken

Quincey_Koziol · May 26, 2010, 3:17pm

Hi Ken,

Hi, I'm running into slow performance when selecting several (>1000) non-consecutive rows from a 2-dimensional matrix, typically ~500,000 X 100. The bottleneck is the for loop where each row vector index is OR'ed into the hyperslab, i.e.:

  LOG4CXX_INFO(logger,"TIME begin hyperslab building"); //print out with time stamp
  //select file buffer hyperslabs
  H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_SET, (const hsize_t*) fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
  for (hsize_t id = 1; id < numVecsToRead; ++id) {
    LOG4CXX_INFO(logger, id << "/" << numVecsToRead);
    fileOffset[0] = fileLocs1Dim[id];
    H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_OR, (const hsize_t*) fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
  }
  LOG4CXX_INFO(logger,"TIME end hyperslab building");

One interesting thing is the time between each loop increases between each iteration, e.g. no time at all between 1-2-3-4-5, but seconds between 1000-1001-1002. So, the time to select the hyperslab is worse than linear, and can become amazingly time consuming, e.g. >10 minutes (!) for a few thousand. The read itself is very quick.

Drat! Sounds like we've got an O(n^2) algorithm (or worse) somewhere in the code that combines two selections. Can you send us a standalone program that demonstrates the problem, so we can file an issue for this, and get it fixed?

My current workaround is to check if the number of vectors to select is greater than a heuristically determined number where it seems the time to read the entire file (half a million row vectors) and copy the requested vectors is less than the time to run the hyperslab selection. Generally the number works out to ~500 vecs/0.5 seconds.

While poking around the code, I found a similar function, H5Scombine_hyperslab() that is only compiled if NEW_HYPERSLAB_API is defined. Using this significantly reduced the time of selection, in particular the time for each OR-ing seemed constant, so 2000 vectors took twice as long as 1000, not many times as with H5Sselect_hyperslab(). However, it's still 10s of seconds for few thousand vector selection, and so it's still much quicker to read all and copy (~1/2 second).
Reading all and copying is not an ideal solution, as it requires malloc/free ~250MB unnecessarily, and if I use H5Scombine_hyperslab() the crossover number goes up, i.e. more than 500, and it's less likely to be needed. I'm a bit nervous however about using this undocumented code.

So...am I doing something wrong? Is there a speedy way to select a hyperslab consisting of 100s or 1000s of non-consecutive vectors? Is NEW_HYPERSLAB_API safe?

Currently, the NEW_HYPERSLAB_API is not tested or supported, so I wouldn't use it.

Quincey

···

On May 25, 2010, at 5:36 PM, Ken Sullivan wrote:

Ken_Sullivan · May 27, 2010, 5:20pm

Hi, sorry to not get back sooner, I've found a couple of other interesting
things. The speed issue doesn't seem to exist running in linux, the same
code runs in a blink. In windows, the really, really slow runs (several
minutes) only seem to happen when running from within visual studios. When
run from command line it's slow, e.g. 15 seconds for 4000 vectors, but not
minutes slow, and the time doesn't seem to grow as I saw before with visual
studios.

#ifdef __cplusplus
extern "C" {
#endif
#include <hdf5.h>
#ifdef __cplusplus
}
#endif
#include <vector>
#include <iostream>
#include <stdlib.h>
#include <math.h>

using namespace std;

int main() {
  unsigned long long totalNumVecs = 500000;
  unsigned long long vecLength = 128;
  hid_t baseType = H5T_NATIVE_FLOAT;

  unsigned long long roughNumVecsToGet = 4000;
  unsigned long long skipRate = (unsigned long
long)ceilf((float)totalNumVecs / (float)roughNumVecsToGet);
  vector<unsigned long long> vecInds;
  for( int rowInd = 0; rowInd < totalNumVecs; rowInd += skipRate) {
    vecInds.push_back(rowInd);
  }

  int rank = 2;
  hsize_t dims[2];
  dims[0] = totalNumVecs;
  dims[1] = vecLength;
  hid_t fileSpaceId = H5Screate_simple(rank, dims, NULL);

hsize_t fileBlockCount[2];
hsize_t fileOffset[2];

  hsize_t selectionDims[2];
  selectionDims[0] = 1;
  fileBlockCount[0] = 1;
  fileOffset[0] = vecInds[0];
  for(int ir = 1; ir < rank; ++ir) {
    selectionDims[ir] = dims[ir];
    fileBlockCount[ir] = 1;
    fileOffset[ir] = 0;
  }

  cout << "begin hyperslab building" << endl;
  H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_SET, (const hsize_t*)
fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
  unsigned long long numVecsToRead = vecInds.size();
  for (hsize_t id=1; id < numVecsToRead; ++id) {
    if ( (id % 50) == 0) {
      cout << id << "/" << numVecsToRead << endl;
    }
    fileOffset[0] = vecInds[id];
    H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_OR, (const hsize_t*)
fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
  }
  cout << "end hyperslab building" << endl;

return 0;
}

Thanks,
Ken

···

On Wed, May 26, 2010 at 8:17 AM, Quincey Koziol <koziol@hdfgroup.org> wrote:

Hi Ken,

On May 25, 2010, at 5:36 PM, Ken Sullivan wrote:

> Hi, I'm running into slow performance when selecting several (>1000)
non-consecutive rows from a 2-dimensional matrix, typically ~500,000 X 100.
The bottleneck is the for loop where each row vector index is OR'ed into
the hyperslab, i.e.:
>
> LOG4CXX_INFO(logger,"TIME begin hyperslab building"); //print out with
time stamp
> //select file buffer hyperslabs
> H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_SET, (const hsize_t*)
fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
> for (hsize_t id = 1; id < numVecsToRead; ++id) {
> LOG4CXX_INFO(logger, id << "/" << numVecsToRead);
> fileOffset[0] = fileLocs1Dim[id];
> H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_OR, (const hsize_t*)
fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
> }
> LOG4CXX_INFO(logger,"TIME end hyperslab building");
>
> One interesting thing is the time between each loop increases between
each iteration, e.g. no time at all between 1-2-3-4-5, but seconds between
1000-1001-1002. So, the time to select the hyperslab is worse than linear,
and can become amazingly time consuming, e.g. >10 minutes (!) for a few
thousand. The read itself is very quick.

        Drat! Sounds like we've got an O(n^2) algorithm (or worse)
somewhere in the code that combines two selections. Can you send us a
standalone program that demonstrates the problem, so we can file an issue
for this, and get it fixed?

> My current workaround is to check if the number of vectors to select is
greater than a heuristically determined number where it seems the time to
read the entire file (half a million row vectors) and copy the requested
vectors is less than the time to run the hyperslab selection. Generally the
number works out to ~500 vecs/0.5 seconds.
>
> While poking around the code, I found a similar function,
H5Scombine_hyperslab() that is only compiled if NEW_HYPERSLAB_API is
defined. Using this significantly reduced the time of selection, in
particular the time for each OR-ing seemed constant, so 2000 vectors took
twice as long as 1000, not many times as with H5Sselect_hyperslab().
However, it's still 10s of seconds for few thousand vector selection, and
so it's still much quicker to read all and copy (~1/2 second).
> Reading all and copying is not an ideal solution, as it requires
malloc/free ~250MB unnecessarily, and if I use H5Scombine_hyperslab() the
crossover number goes up, i.e. more than 500, and it's less likely to be
needed. I'm a bit nervous however about using this undocumented code.
>
> So...am I doing something wrong? Is there a speedy way to select a
hyperslab consisting of 100s or 1000s of non-consecutive vectors? Is
NEW_HYPERSLAB_API safe?

        Currently, the NEW_HYPERSLAB_API is not tested or supported, so I
wouldn't use it.

       Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Quincey_Koziol · May 27, 2010, 6:56pm

Hi Ken,

Hi, sorry to not get back sooner, I've found a couple of other interesting things. The speed issue doesn't seem to exist running in linux, the same code runs in a blink. In windows, the really, really slow runs (several minutes) only seem to happen when running from within visual studios. When run from command line it's slow, e.g. 15 seconds for 4000 vectors, but not minutes slow, and the time doesn't seem to grow as I saw before with visual studios.

Hmm, OK, I'll note that in our bug report. Sounds pretty Windows specific though...

Quincey

···

On May 27, 2010, at 12:20 PM, Ken Sullivan wrote:

#ifdef __cplusplus
extern "C" {
#endif
#include <hdf5.h>
#ifdef __cplusplus
}
#endif
#include <vector>
#include <iostream>
#include <stdlib.h>
#include <math.h>

using namespace std;

int main() {
  unsigned long long totalNumVecs = 500000;
  unsigned long long vecLength = 128;
  hid_t baseType = H5T_NATIVE_FLOAT;

  unsigned long long roughNumVecsToGet = 4000;
  unsigned long long skipRate = (unsigned long long)ceilf((float)totalNumVecs / (float)roughNumVecsToGet);
  vector<unsigned long long> vecInds;
  for( int rowInd = 0; rowInd < totalNumVecs; rowInd += skipRate) {
    vecInds.push_back(rowInd);
  }

  int rank = 2;
  hsize_t dims[2];
  dims[0] = totalNumVecs;
  dims[1] = vecLength;
  hid_t fileSpaceId = H5Screate_simple(rank, dims, NULL);

  hsize_t fileBlockCount[2];
  hsize_t fileOffset[2];

  hsize_t selectionDims[2];
  selectionDims[0] = 1;
  fileBlockCount[0] = 1;
  fileOffset[0] = vecInds[0];
  for(int ir = 1; ir < rank; ++ir) {
    selectionDims[ir] = dims[ir];
    fileBlockCount[ir] = 1;
    fileOffset[ir] = 0;
  }

  cout << "begin hyperslab building" << endl;
  H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_SET, (const hsize_t*) fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
  unsigned long long numVecsToRead = vecInds.size();
  for (hsize_t id=1; id < numVecsToRead; ++id) {
    if ( (id % 50) == 0) {
      cout << id << "/" << numVecsToRead << endl;
    }
    fileOffset[0] = vecInds[id];
    H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_OR, (const hsize_t*) fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
  }
  cout << "end hyperslab building" << endl;

  return 0;
}

Thanks,
Ken

On Wed, May 26, 2010 at 8:17 AM, Quincey Koziol <koziol@hdfgroup.org> wrote:
Hi Ken,

On May 25, 2010, at 5:36 PM, Ken Sullivan wrote:

> Hi, I'm running into slow performance when selecting several (>1000) non-consecutive rows from a 2-dimensional matrix, typically ~500,000 X 100. The bottleneck is the for loop where each row vector index is OR'ed into the hyperslab, i.e.:
>
> LOG4CXX_INFO(logger,"TIME begin hyperslab building"); //print out with time stamp
> //select file buffer hyperslabs
> H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_SET, (const hsize_t*) fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
> for (hsize_t id = 1; id < numVecsToRead; ++id) {
> LOG4CXX_INFO(logger, id << "/" << numVecsToRead);
> fileOffset[0] = fileLocs1Dim[id];
> H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_OR, (const hsize_t*) fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
> }
> LOG4CXX_INFO(logger,"TIME end hyperslab building");
>
> One interesting thing is the time between each loop increases between each iteration, e.g. no time at all between 1-2-3-4-5, but seconds between 1000-1001-1002. So, the time to select the hyperslab is worse than linear, and can become amazingly time consuming, e.g. >10 minutes (!) for a few thousand. The read itself is very quick.

       Drat! Sounds like we've got an O(n^2) algorithm (or worse) somewhere in the code that combines two selections. Can you send us a standalone program that demonstrates the problem, so we can file an issue for this, and get it fixed?

> My current workaround is to check if the number of vectors to select is greater than a heuristically determined number where it seems the time to read the entire file (half a million row vectors) and copy the requested vectors is less than the time to run the hyperslab selection. Generally the number works out to ~500 vecs/0.5 seconds.
>
> While poking around the code, I found a similar function, H5Scombine_hyperslab() that is only compiled if NEW_HYPERSLAB_API is defined. Using this significantly reduced the time of selection, in particular the time for each OR-ing seemed constant, so 2000 vectors took twice as long as 1000, not many times as with H5Sselect_hyperslab(). However, it's still 10s of seconds for few thousand vector selection, and so it's still much quicker to read all and copy (~1/2 second).
> Reading all and copying is not an ideal solution, as it requires malloc/free ~250MB unnecessarily, and if I use H5Scombine_hyperslab() the crossover number goes up, i.e. more than 500, and it's less likely to be needed. I'm a bit nervous however about using this undocumented code.
>
> So...am I doing something wrong? Is there a speedy way to select a hyperslab consisting of 100s or 1000s of non-consecutive vectors? Is NEW_HYPERSLAB_API safe?

       Currently, the NEW_HYPERSLAB_API is not tested or supported, so I wouldn't use it.

       Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Kirk_Harrison · May 27, 2010, 8:17pm

FYI... I had thought that I was experiencing a very similar problem under
Linux.
As my loop progressed, my performance writing via hyperslab grew worse and
worse.
After further troubleshooting and profiling I discovered that I had missed
some
H5Dclose() and H5Aclose() calls. Fixing that made a HUGE difference in my
test case (from 52 secs to approximately 2.5 secs for about 8Mb of data).

Kirk

···

Hi Ken,

On May 27, 2010, at 12:20 PM, Ken Sullivan wrote:

Hi, sorry to not get back sooner, I've found a couple of other
interesting things. The speed issue doesn't seem to exist running in
linux, the same code runs in a blink. In windows, the really, really
slow runs (several minutes) only seem to happen when running from within
visual studios. When run from command line it's slow, e.g. 15 seconds
for 4000 vectors, but not minutes slow, and the time doesn't seem to
grow as I saw before with visual studios.

  Hmm, OK, I'll note that in our bug report. Sounds pretty Windows
specific though...

  Quincey

#ifdef __cplusplus
extern "C" {
#endif
#include <hdf5.h>
#ifdef __cplusplus
}
#endif
#include <vector>
#include <iostream>
#include <stdlib.h>
#include <math.h>

using namespace std;

int main() {
  unsigned long long totalNumVecs = 500000;
  unsigned long long vecLength = 128;
  hid_t baseType = H5T_NATIVE_FLOAT;

  unsigned long long roughNumVecsToGet = 4000;
  unsigned long long skipRate = (unsigned long
long)ceilf((float)totalNumVecs / (float)roughNumVecsToGet);
  vector<unsigned long long> vecInds;
  for( int rowInd = 0; rowInd < totalNumVecs; rowInd += skipRate) {
    vecInds.push_back(rowInd);
  }

  int rank = 2;
  hsize_t dims[2];
  dims[0] = totalNumVecs;
  dims[1] = vecLength;
  hid_t fileSpaceId = H5Screate_simple(rank, dims, NULL);

  hsize_t fileBlockCount[2];
  hsize_t fileOffset[2];

  hsize_t selectionDims[2];
  selectionDims[0] = 1;
  fileBlockCount[0] = 1;
  fileOffset[0] = vecInds[0];
  for(int ir = 1; ir < rank; ++ir) {
    selectionDims[ir] = dims[ir];
    fileBlockCount[ir] = 1;
    fileOffset[ir] = 0;
  }

  cout << "begin hyperslab building" << endl;
  H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_SET, (const hsize_t*)
fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
  unsigned long long numVecsToRead = vecInds.size();
  for (hsize_t id=1; id < numVecsToRead; ++id) {
    if ( (id % 50) == 0) {
      cout << id << "/" << numVecsToRead << endl;
    }
    fileOffset[0] = vecInds[id];
    H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_OR, (const hsize_t*)
fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
  }
  cout << "end hyperslab building" << endl;

  return 0;
}

Thanks,
Ken

On Wed, May 26, 2010 at 8:17 AM, Quincey Koziol <koziol@hdfgroup.org> >> wrote:
Hi Ken,

On May 25, 2010, at 5:36 PM, Ken Sullivan wrote:

> Hi, I'm running into slow performance when selecting several (>1000)
non-consecutive rows from a 2-dimensional matrix, typically ~500,000 X
100. The bottleneck is the for loop where each row vector index is
OR'ed into the hyperslab, i.e.:
>
> LOG4CXX_INFO(logger,"TIME begin hyperslab building"); //print out
with time stamp
> //select file buffer hyperslabs
> H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_SET, (const hsize_t*)
fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
> for (hsize_t id = 1; id < numVecsToRead; ++id) {
> LOG4CXX_INFO(logger, id << "/" << numVecsToRead);
> fileOffset[0] = fileLocs1Dim[id];
> H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_OR, (const hsize_t*)
fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
> }
> LOG4CXX_INFO(logger,"TIME end hyperslab building");
>
> One interesting thing is the time between each loop increases between
each iteration, e.g. no time at all between 1-2-3-4-5, but seconds
between 1000-1001-1002. So, the time to select the hyperslab is worse
than linear, and can become amazingly time consuming, e.g. >10 minutes
(!) for a few thousand. The read itself is very quick.

       Drat! Sounds like we've got an O(n^2) algorithm (or worse)
somewhere in the code that combines two selections. Can you send
us a standalone program that demonstrates the problem, so we can
file an issue for this, and get it fixed?

> My current workaround is to check if the number of vectors to select
is greater than a heuristically determined number where it seems the
time to read the entire file (half a million row vectors) and copy the
requested vectors is less than the time to run the hyperslab
selection. Generally the number works out to ~500 vecs/0.5 seconds.
>
> While poking around the code, I found a similar function,
H5Scombine_hyperslab() that is only compiled if NEW_HYPERSLAB_API is
defined. Using this significantly reduced the time of selection, in
particular the time for each OR-ing seemed constant, so 2000 vectors
took twice as long as 1000, not many times as with
H5Sselect_hyperslab(). However, it's still 10s of seconds for few
thousand vector selection, and so it's still much quicker to read all
and copy (~1/2 second).
> Reading all and copying is not an ideal solution, as it requires
malloc/free ~250MB unnecessarily, and if I use H5Scombine_hyperslab()
the crossover number goes up, i.e. more than 500, and it's less likely
to be needed. I'm a bit nervous however about using this undocumented
code.
>
> So...am I doing something wrong? Is there a speedy way to select a
hyperslab consisting of 100s or 1000s of non-consecutive vectors? Is
NEW_HYPERSLAB_API safe?

       Currently, the NEW_HYPERSLAB_API is not tested or supported, so I
wouldn't use it.

       Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Ken_Sullivan · May 28, 2010, 11:27pm

Yeah we did try and scrutinize our opens/closes, and did actually end up
finding a missing close which was making a small leak,
but unfortunately even the code above runs slow and I think the only open
handle there is fileSpaceId (which I've since made sure to close in the demo
program for good measure).

···

On Thu, May 27, 2010 at 1:17 PM, <kharrison@shensol.com> wrote:

FYI... I had thought that I was experiencing a very similar problem under
Linux.
As my loop progressed, my performance writing via hyperslab grew worse and
worse.
After further troubleshooting and profiling I discovered that I had missed
some
H5Dclose() and H5Aclose() calls. Fixing that made a HUGE difference in my
test case (from 52 secs to approximately 2.5 secs for about 8Mb of data).

Kirk

> Hi Ken,
>
> On May 27, 2010, at 12:20 PM, Ken Sullivan wrote:
>
>> Hi, sorry to not get back sooner, I've found a couple of other
>> interesting things. The speed issue doesn't seem to exist running in
>> linux, the same code runs in a blink. In windows, the really, really
>> slow runs (several minutes) only seem to happen when running from within
>> visual studios. When run from command line it's slow, e.g. 15 seconds
>> for 4000 vectors, but not minutes slow, and the time doesn't seem to
>> grow as I saw before with visual studios.
>
> Hmm, OK, I'll note that in our bug report. Sounds pretty Windows
> specific though...
>
> Quincey
>
>
>> #ifdef __cplusplus
>> extern "C" {
>> #endif
>> #include <hdf5.h>
>> #ifdef __cplusplus
>> }
>> #endif
>> #include <vector>
>> #include <iostream>
>> #include <stdlib.h>
>> #include <math.h>
>>
>> using namespace std;
>>
>> int main() {
>> unsigned long long totalNumVecs = 500000;
>> unsigned long long vecLength = 128;
>> hid_t baseType = H5T_NATIVE_FLOAT;
>>
>> unsigned long long roughNumVecsToGet = 4000;
>> unsigned long long skipRate = (unsigned long
>> long)ceilf((float)totalNumVecs / (float)roughNumVecsToGet);
>> vector<unsigned long long> vecInds;
>> for( int rowInd = 0; rowInd < totalNumVecs; rowInd += skipRate) {
>> vecInds.push_back(rowInd);
>> }
>>
>> int rank = 2;
>> hsize_t dims[2];
>> dims[0] = totalNumVecs;
>> dims[1] = vecLength;
>> hid_t fileSpaceId = H5Screate_simple(rank, dims, NULL);
>>
>> hsize_t fileBlockCount[2];
>> hsize_t fileOffset[2];
>>
>> hsize_t selectionDims[2];
>> selectionDims[0] = 1;
>> fileBlockCount[0] = 1;
>> fileOffset[0] = vecInds[0];
>> for(int ir = 1; ir < rank; ++ir) {
>> selectionDims[ir] = dims[ir];
>> fileBlockCount[ir] = 1;
>> fileOffset[ir] = 0;
>> }
>>
>> cout << "begin hyperslab building" << endl;
>> H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_SET, (const hsize_t*)
>> fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
>> unsigned long long numVecsToRead = vecInds.size();
>> for (hsize_t id=1; id < numVecsToRead; ++id) {
>> if ( (id % 50) == 0) {
>> cout << id << "/" << numVecsToRead << endl;
>> }
>> fileOffset[0] = vecInds[id];
>> H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_OR, (const hsize_t*)
>> fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
>> }
>> cout << "end hyperslab building" << endl;
>>
>>
>> return 0;
>> }
>>
>>
>> Thanks,
>> Ken
>>
>>
>> On Wed, May 26, 2010 at 8:17 AM, Quincey Koziol <koziol@hdfgroup.org> > >> wrote:
>> Hi Ken,
>>
>> On May 25, 2010, at 5:36 PM, Ken Sullivan wrote:
>>
>> > Hi, I'm running into slow performance when selecting several (>1000)
>> non-consecutive rows from a 2-dimensional matrix, typically ~500,000 X
>> 100. The bottleneck is the for loop where each row vector index is
>> OR'ed into the hyperslab, i.e.:
>> >
>> > LOG4CXX_INFO(logger,"TIME begin hyperslab building"); //print out
>> with time stamp
>> > //select file buffer hyperslabs
>> > H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_SET, (const hsize_t*)
>> fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
>> > for (hsize_t id = 1; id < numVecsToRead; ++id) {
>> > LOG4CXX_INFO(logger, id << "/" << numVecsToRead);
>> > fileOffset[0] = fileLocs1Dim[id];
>> > H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_OR, (const hsize_t*)
>> fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
>> > }
>> > LOG4CXX_INFO(logger,"TIME end hyperslab building");
>> >
>> > One interesting thing is the time between each loop increases between
>> each iteration, e.g. no time at all between 1-2-3-4-5, but seconds
>> between 1000-1001-1002. So, the time to select the hyperslab is worse
>> than linear, and can become amazingly time consuming, e.g. >10 minutes
>> (!) for a few thousand. The read itself is very quick.
>>
>> Drat! Sounds like we've got an O(n^2) algorithm (or worse)
>> somewhere in the code that combines two selections. Can you send
>> us a standalone program that demonstrates the problem, so we can
>> file an issue for this, and get it fixed?
>>
>> > My current workaround is to check if the number of vectors to select
>> is greater than a heuristically determined number where it seems the
>> time to read the entire file (half a million row vectors) and copy the
>> requested vectors is less than the time to run the hyperslab
>> selection. Generally the number works out to ~500 vecs/0.5 seconds.
>> >
>> > While poking around the code, I found a similar function,
>> H5Scombine_hyperslab() that is only compiled if NEW_HYPERSLAB_API is
>> defined. Using this significantly reduced the time of selection, in
>> particular the time for each OR-ing seemed constant, so 2000 vectors
>> took twice as long as 1000, not many times as with
>> H5Sselect_hyperslab(). However, it's still 10s of seconds for few
>> thousand vector selection, and so it's still much quicker to read all
>> and copy (~1/2 second).
>> > Reading all and copying is not an ideal solution, as it requires
>> malloc/free ~250MB unnecessarily, and if I use H5Scombine_hyperslab()
>> the crossover number goes up, i.e. more than 500, and it's less likely
>> to be needed. I'm a bit nervous however about using this undocumented
>> code.
>> >
>> > So...am I doing something wrong? Is there a speedy way to select a
>> hyperslab consisting of 100s or 1000s of non-consecutive vectors? Is
>> NEW_HYPERSLAB_API safe?
>>
>> Currently, the NEW_HYPERSLAB_API is not tested or supported, so I
>> wouldn't use it.
>>
>> Quincey
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> Hdf-forum@hdfgroup.org
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> Hdf-forum@hdfgroup.org
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

slow hyperslab selection, H5Sselect_hyperslab, H5Scombine_hyperslab, and NEW_HYPERSLAB_API