A problem when saving NATIVE_LDOUBLE variables


#1

Hi everyone,
I’m new to hdf5. I have tried to used it to save the results of some simulations. However, everything was very good until I wanted to save “long double” variables. When saving “double” there is no problem. But when I use “long double”, the numbers saved in the hdf5 file fluctuate between very large and very small values. Surely, they are not the correct results of the simulation.

My question: how to properly save “long double” variables knowing that the results are wrong when mapped to NATIVE_LDOUBLE?

Examples are welcome since I’m new to hdf5.

Here is the code:

void createHDF5_2DProjectionFile(char* file_name,
CarGrid1D3V& ph_grid,
std::string first_dim,
std::string second_dim,
long double *x1, int size_x1,
long double x2, int size_x2)
{
try
{
/
define the size of the datasets containing the coordinates x1
and x2
*/

        PredType h5Int = PredType::NATIVE_INT;
        PredType h5DoubleL = PredType::NATIVE_LDOUBLE;
        PredType h5Double = PredType::NATIVE_DOUBLE;        

        /* Define the parameters of grid space 
            DS --> Data Space
        */
        hsize_t x1_dims[1], x2_dims[1];
        x1_dims[0] = size_x1;
        x2_dims[0] = size_x2;
        
        
        H5File *file_id = new H5File(file_name, H5F_ACC_TRUNC);
        
        /* Saving string attribute
            Create dataspace with H5S_SCALAR
            Create string datatype of specific length of characters
            Create attribute and write to it
        */

        DataSpace attr_stringDS = DataSpace(H5S_SCALAR);
        StrType strdatatype(PredType::C_S1, 64);

        Attribute original_DistFun = file_id->createAttribute("/OriginalDistFun", 
                                                    strdatatype, attr_stringDS);
        original_DistFun.write(strdatatype, "1D3V");

        Attribute projection = file_id->createAttribute("/Projection", 
                                                    strdatatype, attr_stringDS);
        projection.write(strdatatype, first_dim + " - " + second_dim);


        /* Create the data spaces for grid points along each direction */
        DataSpace* first_dimDS_id = new DataSpace(1, x1_dims, NULL);
        DataSpace* second_dimDS_id = new DataSpace(1, x2_dims, NULL);
        
        /* Create and fille the datasets for grid points along each direction */
        DataSet *data_dim1 = new DataSet(file_id->createDataSet(first_dim, 
                                            h5DoubleL, *first_dimDS_id));

        data_dim1->write(x1, h5DoubleL);

        DataSet *data_dim2 = new DataSet(file_id->createDataSet(second_dim, 
                                            h5DoubleL, *second_dimDS_id));
        data_dim2->write(x2, h5DoubleL);
        
        /* Important attributes added to the file */
        long double x_minmax[2], px_minmax[2], 
                    py_minmax[2], pz_minmax[2], mom_steps[3],
                    ph_vols[3], spatial_steps[1];

        x_minmax[0] = ph_grid.x_min_;
        x_minmax[1] = ph_grid.x_max_;
        px_minmax[0] = ph_grid.px_min_;
        px_minmax[1] = ph_grid.px_max_;
        py_minmax[0] = ph_grid.py_min_;
        py_minmax[1] = ph_grid.py_max_;
        pz_minmax[0] = ph_grid.pz_min_;
        pz_minmax[1] = ph_grid.pz_max_;
        mom_steps[0] = ph_grid.dpx_;
        mom_steps[1] = ph_grid.dpy_;
        mom_steps[2] = ph_grid.dpz_;
        ph_vols[0] = ph_grid.dvs_;
        ph_vols[1] = ph_grid.dvp_;
        ph_vols[2] = ph_grid.dv_; 
        spatial_steps[0] = ph_grid.dx_;
        ph_grid.print_characteristics();
        std::cout << x_minmax[0] << " , " << x_minmax[1] << "\n";
        /* define attributes configuration */
        hsize_t space_1[1];
        space_1[0] = 1;

        hsize_t space_2[1];
        space_2[0] = 2; 
        
        hsize_t space_3[1];
        space_3[0] = 3; 
        
        DataSpace attr_space_1 = DataSpace(1, space_1);
        DataSpace attr_space_2 = DataSpace(1, space_2);
        DataSpace attr_space_3 = DataSpace(1, space_3);

        Attribute x_interval = file_id->createAttribute("[x_min,x_max]",
                                                    h5DoubleL, attr_space_2);
        x_interval.write(h5DoubleL, x_minmax);

        Attribute px_interval = file_id->createAttribute("[px_min,px_max]",
                                                    h5DoubleL, attr_space_2);
        px_interval.write(h5DoubleL, px_minmax);

        Attribute py_interval = file_id->createAttribute("[py_min,py_max]",
                                                    h5DoubleL, attr_space_2);
        py_interval.write(h5DoubleL, py_minmax);

        Attribute pz_interval = file_id->createAttribute("[pz_min,pz_max]",
                                                    h5DoubleL, attr_space_2);
        pz_interval.write(h5DoubleL, pz_minmax);

        Attribute MomVolumes = file_id->createAttribute("[dpx,dpy,dpz]", 
                                                        h5DoubleL, attr_space_3);
        MomVolumes.write(h5DoubleL, mom_steps);

        Attribute PhVolumes = file_id->createAttribute("[dv_s, dv_m, dv_t]",
                                                        h5DoubleL, attr_space_3);
        PhVolumes.write(h5DoubleL, ph_vols);

        Attribute SpatialVolumes = file_id->createAttribute("[dx]", PredType::NATIVE_DOUBLE,
                                                            attr_space_1);
        SpatialVolumes.write(h5DoubleL, spatial_steps);

        /* Free memory */
        delete data_dim1;
        delete data_dim2;
        delete first_dimDS_id;
        delete second_dimDS_id;
        delete file_id;
    }
    catch(DataSetIException error)
    {
        error.printErrorStack();
    }
    catch(DataSpaceIException error)
    {
        error.printErrorStack();
    }
    catch(FileIException error)
    {
        error.printErrorStack();
    }     
}

#2

Can you tell us something about your platform? Processor architecture? OS? Compiler? HDF5 version?

Did you build the HDF5 library from source (and run the tests)?

Thanks, G.


#3

Thank you for your reply.
Processor: intel xeon 2640 V4 10 cores 25MB
OS: Ubuntu 18.04
Compiler: I have tried two compilers g++ 7.5.0 and intel icpc (ICC) 2021.5.0
HDF5 version: 1.10.6

I compiled the code with makefile where I tried to use both g++ and c++ and also with h5c++. For double-precision, everything is ok.

thank you again


#4

Having just recently did some work with long doubles, note that “NATIVE” types are memory datatypes and not file datatypes. There is no definition for a file datatype of long double in hdf5. Which doesn’t mean you can’t save it, just that the conversion might cause some issues.
Check the H5Tget_fields function for the information about floats.


#5

H5CPP supports long double datatypes, and will read back the correct values, however during the development I came to notice that h5dump didn’t follow. Here is the link to my github page some variations to the problem involving long double in rank 1, and long double as a field of a struct.

correct read back from H5CPP

./longdouble
homogenios data: -0.42 -0.42 -0.42 -0.42 -0.42 -0.42 -0.42 -0.42 -0.42 -0.42 -0.42 -0.42 -0.42 -0.42 -0.42 
struct data (only temp) :  0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

h5dump -d /homogenious test.h5

HDF5 "test.h5" {
DATASET "/homogenious" {
   DATATYPE  H5T_NATIVE_LDOUBLE
   DATASPACE  SIMPLE { ( 5, 3 ) / ( 5, 3 ) }
   DATA {
   (0,0): 4.94066e-324, 4.94066e-324, 4.94066e-324,
   (1,0): 4.94066e-324, 4.94066e-324, 4.94066e-324,
   (2,0): 4.94066e-324, 4.94066e-324, 4.94066e-324,
   (3,0): 4.94066e-324, 4.94066e-324, 4.94066e-324,
   (4,0): 4.94066e-324, 4.94066e-324, 4.94066e-324
   }
}

And a brief implementation:

#include <iostream>
#include <vector>
#include "struct.h"
#include <h5cpp/core>
	// generated file must be sandwiched between core and io 
	// to satisfy template dependencies in <h5cpp/io>  
	#include "generated.h"
#include <h5cpp/io>


int main(){
	h5::fd_t fd = h5::create("test.h5", H5F_ACC_TRUNC);
	{ // this is to create the dataset
		h5::create<long double>(fd,"homogenious", h5::current_dims{5,3}, h5::chunk{1,3} | h5::fill_value<long double>(-.42));
		//reading data back, then iterating through the temporary vector of LONG DOUBLES
		for (auto v : h5::read<std::vector<long double>>(fd, "homogenious"))
			std::cerr << v << " ";
		std::cerr << std::endl;
	}

	{ // an example for streams/packet table
		h5::pt_t pt = h5::create<sn::record_t>( fd, 
        	"stream-of-records", h5::max_dims{H5S_UNLIMITED}, h5::chunk{512} | h5::gzip{9} );
		sn::record_t record;	
		for(int i=0; i<10; i++ ) // this is your HPC loop
			record.temp = static_cast<long double>(i/100.0), h5::append(pt, record);
	}
	{ // read entire dataset back from tream
		using T = std::vector<sn::record_t>;
		// for partial read be certain dataset is chunked, see documentation @ sandbox.h5cpp.org
		auto dataset = h5::read<T>(fd,"stream-of-records");

		for( auto rec:dataset ) // this is your HPC loop
			std::cerr << rec.temp <<" ";
		std::cerr << std::endl;
	}
}

and the matching generated.h type decriptor

/* Copyright (c) 2022 vargaconsulting, Toronto,ON Canada */
#ifndef H5CPP_GUARD_NKohX
#define H5CPP_GUARD_NKohX

namespace h5{
    //template specialization of sn::record_t to create HDF5 COMPOUND type
    template<> hid_t inline register_struct<sn::record_t>(){
        hsize_t at_00_[] ={3};            hid_t at_00 = H5Tarray_create(H5T_NATIVE_LDOUBLE,1,at_00_);
        hsize_t at_01_[] ={20};            hid_t at_01 = H5Tarray_create(H5T_NATIVE_LDOUBLE,1,at_01_);
        hsize_t at_02_[] ={9};            hid_t at_02 = H5Tarray_create(H5T_NATIVE_LDOUBLE,1,at_02_);

        hid_t ct_00 = H5Tcreate(H5T_COMPOUND, sizeof (sn::record_t));
        H5Tinsert(ct_00, "temp",	HOFFSET(sn::record_t,temp),H5T_NATIVE_LDOUBLE);
        H5Tinsert(ct_00, "density",	HOFFSET(sn::record_t,density),H5T_NATIVE_LDOUBLE);
        H5Tinsert(ct_00, "B",	HOFFSET(sn::record_t,B),at_00);
        H5Tinsert(ct_00, "V",	HOFFSET(sn::record_t,V),at_00);
        H5Tinsert(ct_00, "dm",	HOFFSET(sn::record_t,dm),at_01);
        H5Tinsert(ct_00, "jkq",	HOFFSET(sn::record_t,jkq),at_02);

        //closing all hid_t allocations to prevent resource leakage
        H5Tclose(at_00); H5Tclose(at_01); H5Tclose(at_02); 

        //if not used with h5cpp framework, but as a standalone code generator then
        //the returned 'hid_t ct_00' must be closed: H5Tclose(ct_00);
        return ct_00;
    };
}
H5CPP_REGISTER_STRUCT(sn::record_t);

#endif

Learn more about H5CPP by reading documentation and presentation slides
best wishes: steve


#6

Since you are at the mercy of the compiler/processor, the safest way, I believe, would be to create a user-defined floating-point datatype as your in-file type plus the necessary conversion functions. See section 6.4.3.1. User-defined Atomic Datatypes here.

G.


#7

Long double with opaque and custom datatype
Because I couldn’t resist to @gheber 's suggestion of using custom datatype, I quickly wrote an example using H5T_OPAQUE class; where we encapsulating the data with the given length. Then I doubled down and provided a custom long double implementation for NATIVE type; which you NEED to adjust for various architectures – as 80 bit float is not a standard.


Instead of copy pasting the full content I attach the files, and as usual you can find the solution on this github page.

Here is the h5dump of the result, notice the incorrect printouts for custom datatype – in my understanding this is not an implementation problem in H5CPP but an undocumented feature of h5dump. (then again: I could be terrible wrong, and if someone unravelled this it might lead to something significant…)

HDF5 "example.h5" {
GROUP "/" {
   DATASET "custom" {
      DATATYPE  H5T_NATIVE_LDOUBLE
      DATASPACE  SIMPLE { ( 20 ) / ( 20 ) }
      DATA {
      (0) : 4.94066e-324, 4.94066e-324, 4.94066e-324, 4.94066e-324,
      (4) : 4.94066e-324, 4.94066e-324, 4.94066e-324, 4.94066e-324,
      (8) : 4.94066e-324, 4.94066e-324, 4.94066e-324, 4.94066e-324,
      (12): 4.94066e-324, 4.94066e-324, 4.94066e-324, 4.94066e-324,
      (16): 4.94066e-324, 4.94066e-324, 4.94066e-324, 4.94066e-324
      }
   }
   DATASET "opaque" {
      DATATYPE  H5T_OPAQUE {
         OPAQUE_TAG "";
      }
      DATASPACE  SIMPLE { ( 20 ) / ( 20 ) }
      DATA {
      (0) : 59:16:f5:f3:bb:e2:28:b8:01:40:00:00:00:00:00:00,
      (1) : 21:93:2c:c5:cc:f5:5b:90:00:40:00:00:00:00:00:00,
      (2) : 73:bb:b4:43:02:7a:0a:83:02:40:00:00:00:00:00:00,
      (3) : 44:d4:b4:6a:e7:6b:63:cf:fd:3f:00:00:00:00:00:00,
         ...
      (17): e0:b0:69:ef:df:4b:c9:c6:01:40:00:00:00:00:00:00,
      (18): b5:40:ab:14:42:57:65:f6:01:40:00:00:00:00:00:00,
      (19): a8:c8:78:67:ec:d4:b3:ca:fc:3f:00:00:00:00:00:00
      }
   }
}
}

The correct dump from H5CPP for both cases:

g++ -I/usr/local/include -I/usr/include  -I./ -o custom.o  -std=c++17 -c custom.cpp
g++ custom.o -lhdf5  -lz -ldl -lm  -o custom	
g++ -I/usr/local/include -I/usr/include  -I./ -o opaque.o  -std=c++17 -c opaque.cpp
g++ opaque.o -lhdf5  -lz -ldl -lm  -o opaque	
./custom
data type: custom::ldouble_t value valid
	ebias: 16383 norm: 2 offset: 0 precision: 80 size: 16
	spos:79 epos:64 esize:15 mpos:0 msize:64

4.31447 8.66964 4.83255 4.69823 4.82813 2.50527 7.00822 9.40886 2.26285 3.16655 2.23003 6.73968 2.16731 9.04254 2.01402 8.32589 9.46123 5.68618 0.147904 9.74185 
4.31447 8.66964 4.83255 4.69823 4.82813 2.50527 7.00822 9.40886 2.26285 3.16655 2.23003 6.73968 2.16731 9.04254 2.01402 8.32589 9.46123 5.68618 0.147904 9.74185 


computing difference ||saved - read|| expecting norm to be zero:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

size of long double: 16 sizeof ldouble_t: 16
./opaque
/usr/local/include/h5cpp/H5Fcreate.hpp line#  55 : couldn't create file...
data type: opaque::ldouble_t value valid
3.30507 7.25485 4.81472 7.96879 2.31025 0.025276 0.444642 7.8098 0.00546243 2.27146 1.85951 5.70785 1.2793 2.46513 2.671 0.188472 2.51656 3.62382 8.91158 5.47687 
3.30507 7.25485 4.81472 7.96879 2.31025 0.025276 0.444642 7.8098 0.00546243 2.27146 1.85951 5.70785 1.2793 2.46513 2.671 0.188472 2.51656 3.62382 8.91158 5.47687 


computing difference ||saved - read|| expecting norm to be zero:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

size of long double: 16 sizeof ldouble_t: 1
<a class="attachment" href="//hdf-discourse-1.s3.amazonaws.com/original/2X/b/bc0ee977d54a5fff3754ab3eec13c728063f924e.hpp">custom.hpp</a> (1.8 KB)
<a class="attachment" href="//hdf-discourse-1.s3.amazonaws.com/original/2X/6/6747c65c8dc9199d66b3ba4fe049ad4bb3d7f716.cpp">custom.cpp</a> (1.8 KB)
<a class="attachment" href="//hdf-discourse-1.s3.amazonaws.com/original/2X/4/431f26ab635f18e2d5f2f3cdcc2aa48a4b04c275">Makefile</a> (718 Bytes)
<a class="attachment" href="//hdf-discourse-1.s3.amazonaws.com/original/2X/8/84fd8b0e92e919596fabc528522ca1ada613c299.hpp">opaque.hpp</a> (1.5 KB)
<a class="attachment" href="//hdf-discourse-1.s3.amazonaws.com/original/2X/f/f6849071a7ff21c3aff4db4cacc1cb8d13fcbebf.cpp">opaque.cpp</a> (1.8 KB)
<a class="attachment" href="//hdf-discourse-1.s3.amazonaws.com/original/2X/6/6d327663a85f8258c250cb03b9fd9dc2e97ffc8e">custom</a> (279.1 KB)

#8

Thanks.
I will look carefully at your solution and give you feedback.


#9

Thanks a lot,
I will read it and come back to this topic with the feedback.


#10

Which version of hdf5/h5dump?


#11

@byrn Hi Allen, I noticed this with HDF5 v1.10.x; so far didn’t have the chance to check the recent versions.
best wishes: steve

h5dump: Version 1.10.4 compiled with gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0 configured with GNU configure (as opposed to cmake) automake (GNU automake) 1.16.1


#12

I did some work in that area of h5dump but I think it was for the last release (1.10.8).


#13

@byrn from SPACK I installed HDF5@1.12.0 then recompiled this example for LONG DOUBLE with custom and opaque type and I got the same result when used h5dump: Version 1.12.0

HDF5 "example.h5" {
GROUP "/" {
   DATASET "custom" {
      DATATYPE  H5T_NATIVE_LDOUBLE
      DATASPACE  SIMPLE { ( 20 ) / ( 20 ) }
      DATA {
      (0) : 4.94066e-324, 4.94066e-324, 4.94066e-324, 4.94066e-324,
      (4) : 4.94066e-324, 4.94066e-324, 4.94066e-324, 4.94066e-324,
      (8) : 4.94066e-324, 4.94066e-324, 4.94066e-324, 4.94066e-324,
      (12): 4.94066e-324, 4.94066e-324, 4.94066e-324, 4.94066e-324,
      (16): 4.94066e-324, 4.94066e-324, 4.94066e-324, 4.94066e-324
      }
   }
   DATASET "opaque" {
      DATATYPE  H5T_OPAQUE {
         OPAQUE_TAG "";
      }
      DATASPACE  SIMPLE { ( 20 ) / ( 20 ) }
      DATA {
      (0) : 04:60:24:10:00:ca:41:b8:01:40:00:00:00:00:00:00,
      (1) : 34:56:ba:d3:14:dd:c0:ac:fd:3f:00:00:00:00:00:00,
      (2) : 55:b0:4e:03:27:01:05:df:01:40:00:00:00:00:00:00,
      ... omitted ...
      (18): 39:18:1b:5c:56:16:f3:91:02:40:00:00:00:00:00:00,
      (19): b8:7e:05:ef:11:c5:e5:b0:00:40:00:00:00:00:00:00
      }
   }
}
}

Note on how 80bit is stored in 16bytes as opposed to 10 bytes:
From AMD64 ABI (attached)
The long double type uses a 15 bit exponent, a 64-bit mantissa with an explicit high order significant bit and an exponent bias of 16383.3 Although a long double requires 16 bytes of storage, only the first 10 bytes are significant. The
remaining six bytes are tail padding, and the contents of these bytes are undefined.

x86_64-abi-0.99.pdf (557.0 KB)

steve


#14

But if the failure is in the write - h5dump will only show what it knows. Again the conversion from memory type to file type could be incorrect. The hdf5 tools/testfiles has two examples of long double (tldouble*.h5/ddl).
They were generated on a known system with long double support. The code to generate them are in the tools/test/h5dump/h5dumpgentest.c file.


#15

@byrn I got the file thanks! Using h5dump@v1.12.0 gcc-10.2.0 from SPACK I got the same behaviour; In my understanding the conversion function is y = f(x) with a well defined inverse x = F(y) which is triggered by the datatype H5T_NATIVE_LDOUBLE running two software on the same system should yield to identical result. Am I wrong?

steven@io:~/src/hdf5/tools/test/h5dump$ /opt/spack/opt/spack/gcc-10.2.0/hdf5/1.12.0-akcsj/bin/h5dump tldouble.h5 
HDF5 "tldouble.h5" {
GROUP "/" {
   DATASET "dset" {
      DATATYPE  H5T_NATIVE_LDOUBLE
      DATASPACE  SIMPLE { ( 3 ) / ( 3 ) }
      DATA {
      (0): 4.94066e-324, 4.94066e-324, 4.94066e-324
      }
   }
}
}

tldouble.h5 (2.0 KB)
h5cppldouble.h5 (4.6 KB)


#16

And the supplied test reference tldouble.ddl shows:
HDF5 “tldouble.h5” {
GROUP “/” {
DATASET “dset” {
DATATYPE 128-bit little-endian floating-point 80-bit precision
DATASPACE SIMPLE { ( 3 ) / ( 3 ) }
DATA {
(0): 1, 2, 3
}
}
}
}


#17

So where is the failure? Read or write? Good question, right?
Note the difference in the type! That was the change I needed to make, maybe 1.12.1 is better?
If my memory is to be trusted, I think long double was added to the code to support a customer request, but the file support was never added beyond the memory type. (At least that is my theory) I am not quite sure how it was added as I only worked on it from the cross platform/cross-compile testing point of view, recently.


#18

@byrn thanks for looking into this! I reattached the previously attached x86_64-abi-0.99.pdf (557.0 KB)
on pg.12 you find long double be defined as 80 extended (IEEE-754) for 64bit AMD64 architecture, on pg13 top:

The long double type uses a 15 bit exponent, a 64-bit mantissa with an ex-
plicit high order significant bit and an exponent bias of 16383. 3 Although a long
double requires 16 bytes of storage, only the first 10 bytes are significant. The
remaining six bytes are tail padding, and the contents of these bytes are undefined.

As you can see from the printout H5CPP long double is in accordance with the AMD64 binary layout. I am wondering what is the interpretation of LONG DOUBLE for the HDFGroup:

  1. a type with length between quad float and double float 80bit storage length, in memory length is arch dependent
  2. quad float even if no machine support for it
  3. LONG DOUBLE doesn’t exist (instead delegate the interpretation to library users through custom type)

To blur the landscape further AARM64 in the following document aapcs32.pdf (409.5 KB) on pg.29 provides a chart for mapping C/C++ elementary types to machine types, specifically long double to double precision IEEE 754 or 8 bytes in length. In this n4296 paper on pg.76 §3.9.1 ¶8 C++ gives a lower bound on what a type long double should be: the type long double provides at least as much precision as double leaving the details filled in to compiler vendors.

what is your opinion in this?

best: steven


#19

So, I think that is why I changed h5dump to display the actual file type in the ldouble.h5 file in terms of bits used and bits for precision. The idea being that the file is generated on a system where long double is 128-bits and that the test only verifies the read conversion of a type specified according to HDF custom float parameters. (Because previous versions of h5dump could mis-interpret native long double types.)


#20

Note also that “half-precision” floats are also not in the HDF file types.