Half-precision floating-point format H5CPP C++

Recently I was approached to add Half-precision floating-point format to H5CPP based on H5Tcopy( float ) and readjusting layout matching of half float. Now I am wondering if anyone has an observation on the topic, for example using different library etc…

best wishes:
steven
example output:

HDF5 "example.h5" {
GROUP "/" {
   DATASET "type" {
      DATATYPE  16-bit little-endian floating-point
      DATASPACE  SIMPLE { ( 20 ) / ( 20 ) }
      DATA {
      (0): 55.7812, 89.375, 90.125, -29.3906, -90.3125, -48.4688, -1.17871,
      (7): 55.1875, -21.6719, 12.7578, -63.7188, 9.22656, 50.625, 37.0938,
      (14): -81.8125, -32.7812, 89.5, -28.3906, -44.125, -41.125
      }
   }
}
}

H5CPP Git commit 829f06b96cbed40d8b57e3271b8698f72249c867 adds IEEE 754 half-precision binary floating-point format: binary16 based on half 2.1 header only library by Christian Rau

this message considers:

I recently implemented this as well for a project I work on. I took the definitions from h5py, but it looks identical to what @steven has. Is there some consideration to standardizing this definition as a pre-defined type upstream?

Looks like my post crossed with @dguest

You can find the IEEE compatible definitions here:

Thank you for the replies! It looks we all did the same, so the datasets are interchangeable… except two of us didn’t lock the type. Is that a cosmetics, or locking types solve an actual real life problem: “someone had a bad feeling about this… (ben)”?

@piyush.agram @dguest Would it be possible to iron out std::complex as well, and possibly work on time support? (BTW: I am also working on the thorough treatment of sparse linear algebra support --which is a bit delayed…)
It would be great to have a consistent implementation across platforms.

Hi Steven, it’s great to have more half-float support, though it would be great to have it as a piece of C code added to HDF5 on the same level like the existing standard types, even if just as an extension C library, such that it becomes available to all users.

I had been using OpenEXR’s half library https://www.openexr.com/ for a CPU implementation of half precision, do you have an experience in comparison with the half library that you use? It being a standalone header-only library looks like an advantage.

It seems the hurdle against making half-precision support part of the HDF5 library itself is that it’s not a standard data type defined in C99. I’d still place it there though because it’s a standard data type on GPUs though.

Concerning your question on time support, there’s a paper that discusses some aspects of how to specify time in HDF5: http://sciviz.cct.lsu.edu/papers/2007/F5TimeSemantics.pdf . It’s not necessarily “the” solution, but it may serve as inspiration to address some issues about handling time in HDF5. Certainly an actual implementation should be compatible with the C++2a standards by now, and allow to distinguish between UTC and GPS time and such issues. The topic seems to be complex enough to be placed in its own addon-library, and I am not sure if it can even be done reasonably well in C, as that would not benefit from the C++2a support for time.

Concerning support for std::complex and linear algebra: That raises the topic how to implement data types in a consistent scheme that also scales to higher-dimensional tensor types. Particularly, complex numbers are just a special 2D case of what is known in Geometric Algebra as a “rotor”. Their 3D equivalent is known as a quaternion. Quantum mechanics calls them “spinors” in the 4D case. The general treatment for arbitrary dimensions is via Geometric Algebra, so it would be good to have complex numbers implemented as the special 2D case of the nD case, rather than having a special case specific to complex numbers only. Such a scheme should then also be compatible with the specification of tensor and non-tensor data types. A while ago I wrote a paper on this topic that reviews Geometric Algebra and discusses some aspects and approaches also specifically with regards to storing those data types in HDF5: http://sciviz.cct.lsu.edu/papers/2009/GraVisMa09.pdf

It would be great to have a systematic, scalable approach here, but it’s not easy since not even the mathematical community is in agreement on a consistent notation for tensors. However, via HDF5 one could at least provide a reference and working model.

Hello Werner, thank you for the detailed reply, and the links/pointers/material!

The added H5CPP example is here.. To activate OpenEXR half float support, one needs to define -DWITH_OPENEXR_HALF then include and link against the library.

Caveats:
The OpenXDR half float implementation seem to have less traditional implementation:

  • lacking of namespaces: naked class half{}
  • unusual include guards: # _HALF_H
  • is a compiled library as opposed to header-only

This has been addressed by explicit inclusion only by requiring user defining macro -DWITH_OPENEXR_HALF as well as providing optional namespace embedding if user hacked and recompiled the original library.
My email bounced from listed authors, placed comment on published mailing list; if in shortage of time I missed/overlooked anything please feel free to contact me to rectify this.

I read your papers, we should open a different discussion on both time and tensor support with considerations that C++ differs from C language in a sense that has well defined objects/classes. This is to note that H5CPP considers already implemented libraries by adding persistence support; aiming for a unified simple platform independent API based on HDF5 philosophy – with the added expressiveness of modern C++.

A year ago or so Gerd Heber @gheber and I discussed this topic where I recommended Howard Hinnant’s work and began to unfold the impact of it on various statistical platforms.
The remnants of this is at http://clock.vargaconsulting.ca and if someone has done something similar, possibly more elaborate I would be pleased to read on on his/her work.

best wishes:
steven

Hi Steven,
thanks for the elaboration of the differences between OpenEXR half vs. half library. I can actually switch to the half library, it does sound like the better choice and makes code independent of OpenEXR, which indeed is a bit antiquated with respect to the evolving C++ standards. Just it works well, and why change a running system. It’s also interesting to compare HDF5 and OpenEXR; basically HDF5 does all that OpenEXR is still working on to achieve, but they have a few more features, such as multithreaded I/O, that HDF5 does not yet have and that the OpenEXR community insists on.

I might have run over Hinnant’s C++ extensions for date management some time before, it’s indeed some lack in C++ to only consider time but not date. But probably for supporting it, we should wait (as awful as waiting is) for it to be part of the C++ standard, as it’s still evolving and getting there sooner or later. They recently added support for UTC vs GPS time in C++20, which is a non-trivial conversion, and actually important for us to have. For example, when investigating flight trajectories a difference of 18 seconds (UTC vs. GPS) accounts to 500m difference, which leads to significant errors in geospatial data analysis.

One more thought about implementing I/O support for complex numbers: If you read HDF5 files, then it shouldn’t matter how it was written, whether it was written by “some” C++ library, or by FORTRAN, or any other programming language. So I would prefer to have the file-data layout to be as independent of existing C++ libraries as possible, and ideally just being based and inspired by the mathematical properties of a data type. It would be great if complex numbers are written in HDF5 such to fit automatically in a bigger context, even if the C++ standard implementation of complex numbers only supports the 2D case.

Complex numbers also raise the issue of multiple representations of the same quantity: A complex number can be given in Cartesian representation (x,y) as well as in polar representation (r,phi). Those are of course different numerical values and even different units, but mathematically it’s still the same number, only different numerical representations of the same. So that raises the question if maybe via HDF5, or some HDF5 filters, some automatic data conversion between representations can be - or should be - supported? Conversion from (x,y) to (y,x) is already intrinsically possible, provided a complex data type is stored as compound data type.

It also touches the issue of specifying physical quantities. So far it seems CGNS has the most elaborated schemed for physical quantities in HDF5, but it looks quite engineered and not necessarily as elegantly systematic as it could be. At some point, physical quantity support will also come to C++, the time support is only a first step. For HDF5 itself, it was too much of a “hot topic” yet to be supported there as part of the library, but interest on that certainly persists.

And sure, we can open a different discussion on the topic of time, tensor etc. .

Cheers,
Werner