HDF5 RFC: Adding support for 16-bit floating point and Complex number datatypes to HDF5

1 Like

Thanks! I opened the referenced github issue, and I’m happy to see this RFC. I read the PDF and skimmed this discussion, though I haven’t watched the video.

While the RFC implements 16-bit floats and complex numbers, it doesn’t appear to implement them in combination. I’m not sure about C standard compliance, but GCC at least lets me write _Float16 _Complex (example). C++23 has a 16-bit float type, and it seems to play nice with the complex type, so I can write std::complex<std::float16_t>.

I also have use for even weirder types like struct { uint16_t r, i; } to store instrument data that I don’t expect to have a corresponding native HDF5 type. Like with quaternions, I think it makes sense to draw the line at types that have support in language standards.

Overall, I’m quite happy with the RFC since it offers a fast code path for float16-to-float32 conversions, which was the biggest pain point for me. I’m also glad to see HDF5 types for the most common complex types.

P.S. I should also mention that, for the moment, I’ve worked around the float16 bottleneck by storing data in float32 and zeroing out the least significant mantissa bits. Then I write the dataset with a gzip compression filter and get pretty compact storage that is efficient to read.

While the RFC implements 16-bit floats and complex numbers, it doesn’t appear to implement them in combination.

It should be fairly straightforward to support this after support for 16-bit floats is done, but it does add to the testing matrix and makes things a bit messy since the _Float16 type isn’t part of the main C standard, so we have to use the type conditionally in the library. I’m open to the idea though.

Overall, I’m quite happy with the RFC since it offers a fast code path for float16-to-float32 conversions, which was the biggest pain point for me.

With these conversion paths now in place, the conversion time appears to be a bit less than half of what it was, but it’s still around 8x slower than the other parts of your C example because conversions on compound datatypes end up being slow due to repeated ID lookups; conversion between flat 16-bit and 32-bit floating point types is fast though. I’m hoping to be able to optimize these ID lookups as part of implementing support for the complex numbers.