H5TBappend_record much slower than std::vector::push_back?

Hello,

I am trying to use a HDF5 Table as an in-memory structure, but it
seems incredibly slow. I am attaching a copy of my test code. It
opens a file in memory and never writes it to disk. Appending 10000
records takes about 2.5 seconds. Doing this with a std::vector is two
orders of magnitude faster.

So my question is, is this expected? Is there a way to make
H5TBappend_records competitive with std::vector::push_back?

Thanks,
Walter Landry

p.s. FYI, I compiled my test code with

  mpicxx -Ofast hdf5_make_table.c -o hdf5_make_table -lhdf5 -lhdf5_hl

and I am using HDF5 1.8.8.

Here is the missing attachment.

Walter Landry

hdf5_make_table.c (3.89 KB)

···

Walter Landry <wlandry@caltech.edu> wrote:

Hello,

I am trying to use a HDF5 Table as an in-memory structure, but it
seems incredibly slow. I am attaching a copy of my test code. It
opens a file in memory and never writes it to disk. Appending 10000
records takes about 2.5 seconds. Doing this with a std::vector is two
orders of magnitude faster.

So my question is, is this expected? Is there a way to make
H5TBappend_records competitive with std::vector::push_back?

Thanks,
Walter Landry

p.s. FYI, I compiled my test code with

  mpicxx -Ofast hdf5_make_table.c -o hdf5_make_table -lhdf5 -lhdf5_hl

and I am using HDF5 1.8.8.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Yes, I've noticed this kind of thing too. Basically HDF can be fast for
these purposes but you need to know 3 things:

-There's a global lock on (all?) HDF routines, might involve additional
syscalls -> context switches.
-There's alot of resolving overhead for handling all the conversions etc
that HDF is responsible for, even if you're using native stuff.
-Additional overhead in TB bookkeeping (forget exact details)

Towards the issue with the first 2 things, do bulk updates to amortize
their cost, if possible/acceptable. Minimize thread contention.

For the 2nd/third issue, take a look at the packet table api for some
reduction on the overhead of extending to a table, boeing made it to
address some data acquisition rates they needed. It is fairly similar but
is in another library in the distribution and not quite a first/2nd class
citizen although it needs to get there IMO along with more guarantees about
atomicity. It builds on the private table api backing H5TB but optimizes
for data acquisition use case (packet io).

···

On Fri, Jun 6, 2014 at 9:47 PM, Walter Landry <wlandry@caltech.edu> wrote:

Here is the missing attachment.

Walter Landry

Walter Landry <wlandry@caltech.edu> wrote:
> Hello,
>
> I am trying to use a HDF5 Table as an in-memory structure, but it
> seems incredibly slow. I am attaching a copy of my test code. It
> opens a file in memory and never writes it to disk. Appending 10000
> records takes about 2.5 seconds. Doing this with a std::vector is two
> orders of magnitude faster.
>
> So my question is, is this expected? Is there a way to make
> H5TBappend_records competitive with std::vector::push_back?
>
> Thanks,
> Walter Landry
>
> p.s. FYI, I compiled my test code with
>
> mpicxx -Ofast hdf5_make_table.c -o hdf5_make_table -lhdf5 -lhdf5_hl
>
> and I am using HDF5 1.8.8.
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

I just tried packet table, and it was also terribly slow. As for
batching writes, my hope was to use H5TB as the in-memory
representation. If I have to create a data structure to store the
table before I give it to H5TB, I will just use that data structure
instead.

Thanks,
Walter Landry

···

Jason Newton <nevion@gmail.com> wrote:

For the 2nd/third issue, take a look at the packet table api for some
reduction on the overhead of extending to a table, boeing made it to
address some data acquisition rates they needed. It is fairly similar but
is in another library in the distribution and not quite a first/2nd class
citizen although it needs to get there IMO along with more guarantees about
atomicity. It builds on the private table api backing H5TB but optimizes
for data acquisition use case (packet io).