at first it seemed as bad idea, then again according to this note there is no difference between the two in terms of functionality. Yet I am not quite certain if the previous statement is true across all versions cross product parallel and serial HDF5. On this page attributes must be in a collective call for phdf5. So I wonder what others with more experience have say on this matter.
To break down your question, I see the following:
most recent HDF5
any performance difference between attributes and datasets
any functional difference considering the cross product of: serial, parallel, attribute, dataset
historical version of HDF5
any difference when considering functionality from 1.6 - 1.10; You might be interested in this information when your software may be linked against hdf5 libs provided by somewhat outdated OS distribution
A possible approach is to write a quick test case for both, ballpark/measure the difference if any, and possibly re-post your results here for a review?
Interesting observation… I made some modifications to your example – replaced QT container with std::vector --, then recompiled it on Linux using highfive then H5CPP.
I got similar results…high5.cpp (1.1 KB)
ps.: corrected the arma::vec to arma::fvec so the attribute is 32bit float.
The first section contains the dataset and attribute creation experiment run only 1 time on a Lenovo X250 Linux 18.04;posted time values are in micro seconds.
A single run, with 100 iterations:
Please take it with grain of salt at this microseconds granularity a single run of the experiment is not convincing enough. If you are interested to take this further, you might want to add a batch file – not quite sure what it is called in the windows world – to execute the the tests many times to produce a histogram (or even normalise it to probability distribution)
best: steve Note: I also reduced the attribute size to 8000, for some reasons it failed with higher numbers on my system.
There is definitely a difference in functionality. There is no partial I/O for attribute values, i.e., in the 1M float case, you’ll have to read/write four or eight MB, even if you care only about element 15. With a dataset, you’d just read/write what you need. G.