Property Lists, Dimension Scales, and other potential limitations of the C++ API

jasteph · July 17, 2018, 7:42pm

Hello,

We have begun writing some initial, experimental code to help us learn the HDF5 API. The code that we eventually want to incorporate HDF5 functionality into is in C++, so we are trying to use the C++ API. We quickly discovered some possible limitations or gaps in it relative to the C API. I wanted to make sure that they really are gaps and not just our misunderstanding.

First, it appears that creating and attaching dimension scales is only possible with C functions. There is no parallel capability in the C++ API. Is that right? We were able to use the C functions and object ids to create dimension scales, but I wanted to make sure we weren’t making needless work for ourselves.

Second, I’ve been trying to find a “C++ way” to set the character encoding of object and attribute names to UTF-8. I have not met with any success here, either. The C++ API appears not to provide access to many of the property list manipulation functions such as H5Pset_char_encoding. On top of that, the createGroup method, unlike createDataSet and createAttibute, does not accept property lists as arguments. So, even if I created a property list using the C API, I don’t know how to use it to create a Group. Is this just a limitation of the C++ API?

Is there a document or other description of what’s “missing” from the C++ API?

Thank you!

Adam

Edit: Much of the above was based on my reading of the C++ API doxygen documentation located here. While poking around in the actual headers, I noticed that H5Location::createGroup can accept a LinkCreatPropList object. The LinkCreatPropList class has a member function setCharEncoding to set the encoding of object names.

That version of createGroup does not appear to be documented, and neither does the LinkCreatePropList class. Is that just an oversight/'round to it kind of thing? Or is that documentation considered to be the authoritative source of information about the public C++ API, and we should avoid using undocumented parts?

Thanks again.

steven · July 17, 2018, 9:37pm

Hello Adam,

h5cpp is a new and novel approach to HDF5, while still is under heavy development it is functional. H5cpp targets data scientist and engineers, it maybe seamlessly integrated with existing CAPI calls, comes with an LLVM based source code transformation tool to compile arbitrary POD types into HDF5 compound descriptor without getting your hands dirty.

You find the documentation on the website and can be freely downloaded from my github page.

h5cpp property descriptors come with peer reviewed sensible default settings such as the requested unicode. In addition to that all functions calls are pythonishly simple, for most part compile time evaluated.
While the runtime error system is not in place ( in two weeks will be ready) the read|write|append operations are profiled, they exhibit good properties; while the packet table interface is not only simpler compared to the high level api, but also 300 X faster, making it suitable for most stringent real time applications.
H5CPP supports 7 linalg libraries, std::vector and within reasonable time frame it will support MPI.

If you are interested in the upcoming presentation of H5CPP in Chicago C++ User Group please join us on 31 of July.

Best wishes
steven

jasteph · July 17, 2018, 11:09pm

Hi Steven,

Thanks for the reply. Your project looks very interesting. Unfortunately, we’re a little behind the times. Our code base is about 25 years old, and we only recently made the changes needed to build with C++11. I shudder to think what would happen if we tried to build with C++17 or asked our customers to use compilers new enough to support that standard.

I see you support a pretty big variety of linear algebra libraries, but alas not the one that we use, which is Teuchos. If C++17 was not an obstacle, how time consuming do you think it would be for us to add Teuchos support?

Adam

steven · July 17, 2018, 11:44pm

Hello Adam,

if you check out h5cpp and revert back into the state 2 weeks ago you should be able to find the original c++11 version. That code base has only c++11 requirement, it does come with the create,read,write,append functionality – and the same code profile in terms of performance and memory footprint. The compiler/source code transformation tool – with static compile will still provide seamless serialization the same way the modern H5CPP does.
The new H5CPP API will be backported to C++14, would this be sufficient for your needs? – the backport is rather simple, there are only few c++17 features I relied on – just to make development faster.

Will take a look at Teuchos on the weekend, but to give you a picture: it takes an hour if I don’t know the library; if the author is available and points out how to read/write access the raw memory behind the objects is done it is only few minutes.
H5CPP was designed to interop with all modern STL like and linalg systems. As long as the package author creativity is …hmmm… within bounds.

hope it helps,
steve

bmribler · July 19, 2018, 3:25pm

Hello Adam,

The HDF5 C++ API has had limited support, hence, hasn’t provided wrappers for all of the C functions yet. Using C functions and object ids is the appropriate workaround for any missing function wrappers.

A wrapper for H5Pset_char_encoding is available, i.e., setCharEncoding. Which version of HDF5 are you using? LinkCreatPropList was added to createGroup in 1.8.21 and 1.10.2.

Thank you,

bmribler · July 19, 2018, 3:33pm

Also, there is a table that lists C functions and their C++ wrappers on the main page of the C++ document. It is a work-in-progress document and will be updated at every release.

jasteph · July 19, 2018, 4:01pm

@bmribler, thanks for your reply. I did eventually find the LinkCreatPropList class and setCharEncoding function by reading the C++ API source. I overlooked them initially because they are not mentioned in the doxygen documentation on the HDF5 website, as far as I can tell.

Is the table you are talking about the one that lists “HDF5 C APIs” in one column and “C++ Classes” in another? I have found that helpful, but the kinds of limitations in the C++ API that we are encountering are finer grained than that. I discovered yesterday, for example, that H5Location::createDataSet can only take a DSetCreatPropList and not a LinkCreatPropList (or a dataset access property list). Our work-around was essentially what you suggested. We just wrote a wrapper function that takes C++ objects and somewhat simplifies access to H5Dcreate2.