No, it’s not too long, and it’s shorter than my comments to come.
Firstly, that single vs multi option is not an overkill, but actually quite essential to support different memory layouts of the same semantics. I have the same capability in my F5 library, where it’s called a “separated compound” versus a “contiguous” data field: https://www.fiberbundle.net/doc/group__F5F.html#gaecac106db677394fe48c187d6007fa9a
One of the use cases for such a separated compound / h5::multi layout is also partial data, like when only the real or only the imaginary part is stored in a file, while the other part is constant or that particular data set. Doing that with a contiguous / h5::single layout is not so easy, and it becomes more essential for higher dimensional datatypes of course.
Btw., talking about higher dimensional data types: The choice of “real” and “imaginary” for the complex numbers is a bad choice because that naming notation does not scale well to higher dimensions. In 3D, there are three imaginary numbers, so an “imaginary” component wouldn’t be unique any more. Within the context of quaternion algebra, they call those three imaginary numbers i,j,k. But that’s also not a scalable notation: in 4D, there will be six imaginary numbers. And actually those imaginary numbers just correspond to the number of possible planes that can be constructed in an n-dimensional space: in 2D, there is only one possible plane, the XY plane. In 3D, there are three possibilities, the XY, YZ and ZX planes. In 4D, using T as a 4th coordinate, there are XY, YZ, ZX, XT, YT and ZT, i.e. six planes, which have the algebraic properties of the imaginary unit within the framework of Geometric Algebra. The 4D case corresponds to bi-quaternions or complex quaternion. The scheme easily extends to arbitrary higher dimensions. So, if you don’t name the components of a complex number {“real”,“imaginary”}, but {“x”,“y”} instead, or {“x1”,“x2”}, or {“e1”,“e2”} or such a similar scheme, then they are automatically part of a bigger scheme: any quaternion or bi-quaternion (or GA)-library could just read complex numbers as a quaternion (or higher dimensional spinor), with the higher dimensional components being just zero for this dataset, but they are interpretable without any more special consideration, just part of the scheme. Other way round, a quaternion algorithm can write its data and a library only knowing complex numbers in 2D can read the 2D subset right away.
A unified object model is certainly a very desirable goal to achieve - so unifying data even on a mathematical level first makes total sense here. That is just the objective of Geometric Algebra, which I like to advocate at this opportunity. For instance, see http://geocalc.clas.asu.edu/html/Overview.html ; especially Hestenes’ Oersted Medal lecture - as referenced from that page - is a good inspiring first read. Utilizing a scaleable notation that places mathematical objects into a broader context isn’t any performance impact, HDF5 doesn’t care how objects are named, but a scalable scheme opens the door wider to more applications on the semantic level.
However, there is an aspect that impacts performance considerations: When you think about reading data with mixed precisions. If you just use complex, then everything is easy, as its just one data type. But the complexity comes in if some data set is stored as complex - which makes a factor 2 after all in storage space. Now if you want an application that is able to read complex datasets, then the binary operation - e.g. just “+” - needs to support complex + complex, complex + complex, complex + complex, complex + complex, leading to quite the type explosion and lengthy compile times with larger binaries if all binary operators support all possibly combinations at runtime, e.g. implemented by iterating over type lists. Even more fun when supporting also long double and all the diverse integer numerical types here… see also section 3.2.3 in https://www.researchgate.net/publication/315881612_Massive_Geometric_Algebra_Visions_for_C_Implementations_of_Geometric_Algebra_to_Scale_into_the_Big_Data_Era on this matter. Even more, HDF5 could store the real part of a complex number in double precision and the imaginary part in single precision. That makes total sense to save storage space when possible even though it’s not supported by C++ complex<> numbers. So in the end, writing data is rather easy, but reading data is the hard part.
So to make life simpler, it’s just easier to read all floating point data into double precision, i.e. reading complex as complex, so the application only needs to deal with one single type, instead of suffering under the type explosion to handle all possible combinations. HDF5 can already do that.
But actually I ran into troubles here: Reading a struct { float x,y; } compound type from file into a struct { double x,y; }; turned out to be extremely slow, about 100x slower than reading floats as float or doubles as doubles. It is fast for non-compound data sets ( separated compound / h5::multi data layout), but converting members of compound data types was so slow, I ended up implemented the conversion myself without letting HDF5 doing it. Have you encountered similar issues? Maybe it’s fixed by now, that was my observation with HDF5 1.8.17. Just saying, there may be performance issues in areas where you don’t expect them…
Werner