Performance between table and dataset

wenlu.yang · January 29, 2021, 8:05am

Hi all,

I have built a framework which relied on high-level library heavily. It is really convient and esay to use so that I could focus myself on basic coding logic without caring too much about low-level details.

But, the performance of my code is a little bad. I would like to find some ways to improve it. I read some source codes and use intel profile tool to collect statistics. I feel that the TABLE is a kind of “heavy” data container when comparing with basic “dataset”. will the performance be better if I replace all of them by dataset (with field names)?

Would you please give me some guides about this topic.

Best regards,
Wenlu

steven · January 29, 2021, 8:58am

In H5CPP the libhdf5hl.so high level interface is replaced with h5::append operator with an internal caching mechanism to mitigate the difference between tiny fragments and HDF5 chunks. The code base is pure C++17 headers, you could re-export it in C – if that is your thing.

On this presentation slide you can find some performance metrics done on a Lenovo X250; the code was designed for high frequency trading systems but can be used for sensor networks, or recording image frames as well.

packet size	transferred data	event/sec	throughput MB/s
12KB	12GB	42’132	510.305
64B	13GB	8’432’170	539.659

wenlu.yang · February 1, 2021, 5:07am

Hi @steven,

Thanks for your tip. From your suggestion, I suppose that you want me to use H5CPP instead of raw libraries and APIs from HDF5 group, right?

However, it is not me who could make choice to use or not use this libraries. I am a teammemeber and must do my job under the same environment just like other programmers.

Anyway, I would also like to thank you for your comment.

Have your nice day.

Best regards,
Wenlu

steven · February 1, 2021, 2:16pm

H5CPP is a thin header only on top of the RAW or HDF5 C API – with the exception of h5::pt_t all h5cpp handles are binary compatible with the CAPI, meaning any CAPI calls can be done without modification to the code…

MIT license is permissive license, meaning I agreed that you MAY use h5cpp for any purpose. It is up to you and your teammates to decide if it is suitable for you or not – there is nothing in it for me either way you choose.
best

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Performance between table and dataset