Thanks! I opened the referenced github issue, and Iām happy to see this RFC. I read the PDF and skimmed this discussion, though I havenāt watched the video.
While the RFC implements 16-bit floats and complex numbers, it doesnāt appear to implement them in combination. Iām not sure about C standard compliance, but GCC at least lets me write _Float16 _Complex
(example). C++23 has a 16-bit float type, and it seems to play nice with the complex type, so I can write std::complex<std::float16_t>
.
I also have use for even weirder types like struct { uint16_t r, i; }
to store instrument data that I donāt expect to have a corresponding native HDF5 type. Like with quaternions, I think it makes sense to draw the line at types that have support in language standards.
Overall, Iām quite happy with the RFC since it offers a fast code path for float16-to-float32 conversions, which was the biggest pain point for me. Iām also glad to see HDF5 types for the most common complex types.
P.S. I should also mention that, for the moment, Iāve worked around the float16 bottleneck by storing data in float32 and zeroing out the least significant mantissa bits. Then I write the dataset with a gzip compression filter and get pretty compact storage that is efficient to read.
While the RFC implements 16-bit floats and complex numbers, it doesnāt appear to implement them in combination.
It should be fairly straightforward to support this after support for 16-bit floats is done, but it does add to the testing matrix and makes things a bit messy since the _Float16
type isnāt part of the main C standard, so we have to use the type conditionally in the library. Iām open to the idea though.
Overall, Iām quite happy with the RFC since it offers a fast code path for float16-to-float32 conversions, which was the biggest pain point for me.
With these conversion paths now in place, the conversion time appears to be a bit less than half of what it was, but itās still around 8x slower than the other parts of your C example because conversions on compound datatypes end up being slow due to repeated ID lookups; conversion between flat 16-bit and 32-bit floating point types is fast though. Iām hoping to be able to optimize these ID lookups as part of implementing support for the complex numbers.
Hello, Iām checking back in to ask what the status of this RFC is? Iāve just come up with a use-case for 16-bit floating point so Iām now looking forward to both features.
Oh I missed the announcement. Thanks! So float16 is there but not complex? Is there a schedule for adding complex support?
Hi @tobias,
I actually plan to have a PR out for adding complex number support to the develop branch by the end of this week, or maybe a few days after if I run into issues. Weāre currently discussing what the release of the feature will look like since it will need to go into a major release of HDF5 due to changes in the datatype encoding version number. While not exactly a file format change, the issue is that if the feature goes into a 1.14 release, there would be an awkward situation where users could accidentally create complex number datasets that canāt be read by a previous release of 1.14. The library version bounds āhighā setting also wouldnāt be able to prevent the application from creating an object thatās unreadable with even older versions of the library. We plan to try to make the next major release of HDF5 as easy as possible to upgrade to from the 1.14 releases, with very little in the way of major changes outside of complex numbers.
That sounds great to me. I look forward to 1.15 then!
Amazing news! Could you outline what this support will look like? It sounds like you are implementing it as a new datatype. Will there be built-in facilities for converting to/from existing conventions (like the {.r, .i}
compound type used by h5py
)?
Hi @peter.hill,
Indeed it made sense after a lot of internal discussion to implement support as a new datatype class. While the suggestion above to implement support for attributes on in-memory datatypes and keep representing complex numbers as compound datatypes with attributes makes a lot of sense, I believe that approach would have been a decent bit more work than this approach and it didnāt quite align with the timeline and goals of implementing support for complex numbers. Iām hoping to work on improving the performance of compound datatype conversions in the near future to address some of the concerns around the compound datatype approach. At that point, some specific custom conversion routines should be able to help with conversions between complex number representations until we can maybe look into the compound datatype approach more in the future.
Iām currently working out some last issues surrounding the datatype version encoding change I mentioned previously. Iāve added macros mapping to predefined HDF5 datatypes for both the 3 native C complex number types (float/double/long double _Complex), as well as 6 macros for complex number types of IEEE float formats - F16LE/BE, F32LE/BE and F64LE/BE. Support has been added to h5dump, h5ls and h5diff/ph5diff and they currently print values using āa+biā format, but this can be expanded on later after the main code is merged. Note this is just test data, so the values arenāt very interesting, but hereās an example:
HDF5 "tcomplex.h5" {
DATASET "/DatasetFloatComplex" {
DATATYPE H5T_CPLX_IEEE_F32LE
DATASPACE SIMPLE { ( 10, 10 ) / ( 10, 10 ) }
STORAGE_LAYOUT {
CONTIGUOUS
SIZE 800
OFFSET 2048
}
FILTERS {
NONE
}
FILLVALUE {
FILL_TIME H5D_FILL_TIME_IFSET
VALUE -1+1i
}
ALLOCATION_TIME {
H5D_ALLOC_TIME_LATE
}
DATA {
(0,0): 10+0i, 1+1i, 2+2i, 3+3i, 4+4i, 5+5i, 6+6i, 7+7i, 8+8i, 9+9i,
(1,0): 9+0i, 1.1+1.1i, 2.1+2.1i, 3.1+3.1i, 4.1+4.1i, 5.1+5.1i, 6.1+6.1i,
(1,7): 7.1+7.1i, 8.1+8.1i, 9.1+9.1i,
(2,0): 8+0i, 1.2+1.2i, 2.2+2.2i, 3.2+3.2i, 4.2+4.2i, 5.2+5.2i, 6.2+6.2i,
(2,7): 7.2+7.2i, 8.2+8.2i, 9.2+9.2i,
(3,0): 7+0i, 1.3+1.3i, 2.3+2.3i, 3.3+3.3i, 4.3+4.3i, 5.3+5.3i, 6.3+6.3i,
(3,7): 7.3+7.3i, 8.3+8.3i, 9.3+9.3i,
(4,0): 6+0i, 1.4+1.4i, 2.4+2.4i, 3.4+3.4i, 4.4+4.4i, 5.4+5.4i, 6.4+6.4i,
(4,7): 7.4+7.4i, 8.4+8.4i, 9.4+9.4i,
(5,0): 5+0i, 1.5+1.5i, 2.5+2.5i, 3.5+3.5i, 4.5+4.5i, 5.5+5.5i, 6.5+6.5i,
(5,7): 7.5+7.5i, 8.5+8.5i, 9.5+9.5i,
(6,0): 4+0i, 1.6+1.6i, 2.6+2.6i, 3.6+3.6i, 4.6+4.6i, 5.6+5.6i, 6.6+6.6i,
(6,7): 7.6+7.6i, 8.6+8.6i, 9.6+9.6i,
(7,0): 3+0i, 1.7+1.7i, 2.7+2.7i, 3.7+3.7i, 4.7+4.7i, 5.7+5.7i, 6.7+6.7i,
(7,7): 7.7+7.7i, 8.7+8.7i, 9.7+9.7i,
(8,0): 2+0i, 1.8+1.8i, 2.8+2.8i, 3.8+3.8i, 4.8+4.8i, 5.8+5.8i, 6.8+6.8i,
(8,7): 7.8+7.8i, 8.8+8.8i, 9.8+9.8i,
(9,0): 1+0i, 1.9+1.9i, 2.9+2.9i, 3.9+3.9i, 4.9+4.9i, 5.9+5.9i, 6.9+6.9i,
(9,7): 7.9+7.9i, 8.9+8.9i, 9.9+9.9i
}
ATTRIBUTE "AttributeFloatComplex" {
DATATYPE H5T_CPLX_IEEE_F32LE
DATASPACE SIMPLE { ( 1, 1 ) / ( 1, 1 ) }
DATA {
(0,0): -1+1i
}
}
}
}
Datatype conversions between all the usual C types (int, long, float, long double, etc.) have been added, including _Float16
when support for it is available in the library (though note conversions may be a bit slower since thereās no standard C float16 complex number type currently).
For the conversions between existing conventions, Iāve implemented no-op conversions as long as the data follows these rules (which can also be expanded upon as needed):
-
An array datatype must consist of exactly two elements where each element is of the
same floating-point datatype as the complex number datatypeās base floating-point
datatype. -
A compound datatype must consist of two fields where each field is of the same
floating-point datatype as the complex number datatypeās base floating-point
datatype. The compound datatype must not have any leading or trailing structure
padding or any padding between its two fields. The fields must also have compatible
names, must have compatible offsets within the datatype and must be in the order
of ārealā part ā āimaginaryā part, such that the compound datatype matches the
following representation:H5T_COMPOUND { <float_type> "r(e)(a)(l)"; OFFSET 0 <float_type> "i(m)(a)(g)(i)(n)(a)(r)(y)"; OFFSET SIZEOF("r(e)(a)(l)") }
where ār(e)(a)(l)ā means the field may be named any substring of ārealā, such as
ārā, or āreā and āi(m)(a)(g)(i)(n)(a)(r)(y)ā means the field may be named any
substring of āimaginaryā, such as āimā or āimagā.
Iāve confirmed the conversions work as expected with test data, but Iām also looking for any real h5py-written data files just to make sure Iām not overlooking anything. Let me know if you can point me to some!
While I donāt expect much will change, note that this is all subject to change with review and also if thereās something about support for complex types that doesnāt work well for an application or is awkward to use.
Really incredible stuff, thank you so much!
Hereās a real world file generated using h5py
:
scotty_output.nc (436.7 KB)
The variable /analysis/H_1_Cardano
is complex.
Thanks for the file! I verified that the data can be read directly into a double _Complex
buffer (on my machine) through the no-op conversion path and the output matches the data in the file. Hereās the simple C program I used for an example of what this looks like after the changes are merged:
read_scotty_output.c (790 Bytes)
which gives the output:
DATA: [
(1+0i),
(0.953434+4.33681e-19i),
(0.909362+8.67362e-19i),
(0.867738+0i),
(0.828514-1.73472e-18i),
(0.791641+0i),
(0.757062-6.93889e-18i),
(0.724719+3.46945e-18i),
(0.694546+0i),
(0.666468+3.46945e-18i),
(0.640409-3.46945e-18i),
(0.616281+6.93889e-18i),
(0.593994+0i),
(0.573452-6.93889e-18i),
(0.554554+6.93889e-18i),
(0.537199+6.93889e-18i),
(0.521284+0i),
(0.506711+1.38778e-17i),
(0.493378+1.38778e-17i),
(0.481193+0i),
(0.470062+2.77556e-17i),
(0.459902+0i),
(0.450631+0i),
(0.442174+1.38778e-17i),
(0.434462+1.38778e-17i),
(0.427432+0i),
(0.421025+4.16334e-17i),
(0.415189+1.38778e-17i),
(0.409875-1.38778e-17i),
(0.405041+1.38778e-17i),
(0.400647+0i),
(0.396658+2.77556e-17i),
(0.393042-4.16334e-17i),
(0.38977+0i),
(0.386818-2.77556e-17i),
(0.384161-2.77556e-17i),
(0.381781+0i),
(0.379658-2.77556e-17i),
(0.377775+2.77556e-17i),
(0.37612+5.55112e-17i),
(0.374679+0i),
(0.373441+0i),
(0.372396-2.77556e-17i),
(0.371536+0i),
(0.370853+0i),
(0.37034+0i),
(0.369992+0i),
(0.369806-2.77556e-17i),
(0.369776+0i),
(0.369901+5.55112e-17i),
(0.370023+2.77556e-17i),
(0.370178+0i),
(0.370607+2.77556e-17i),
(0.371187-2.77556e-17i),
(0.371918+0i),
(0.372802-2.77556e-17i),
(0.373841+0i),
(0.375038+0i),
(0.376397-2.77556e-17i),
(0.377922-2.77556e-17i),
(0.379619-5.55112e-17i),
(0.381496+0i),
(0.38356-2.77556e-17i),
(0.38582+0i),
(0.388287+0i),
(0.390973-2.77556e-17i),
(0.393891+0i),
(0.397058+0i),
(0.400492+0i),
(0.404213+0i),
(0.408243+0i),
(0.412609+0i),
(0.417338+0i),
(0.422463+4.16334e-17i),
(0.428021+0i),
(0.434052-4.16334e-17i),
(0.440602+1.38778e-17i),
(0.447722+0i),
(0.45547+1.38778e-17i),
(0.46391+1.38778e-17i),
(0.473114+2.77556e-17i),
(0.483162+1.38778e-17i),
(0.494143+0i),
(0.506153+4.16334e-17i),
(0.519302-1.38778e-17i),
(0.533707+0i),
(0.549494-2.77556e-17i),
(0.566803+1.38778e-17i),
(0.585777+0i),
(0.606571-1.38778e-17i),
(0.629344-2.08167e-17i),
(0.654258-1.38778e-17i),
(0.681478-6.93889e-18i),
(0.711165+0i),
(0.743478+0i),
(0.778569+3.46945e-18i),
(0.81658+1.73472e-18i),
(0.857645-1.73472e-18i),
(0.901887+0i),
(0.949421+0i)
]