Reading of compound type overwrites other data in destination

Hello everybody,

I haven't read through many threads of the archive, so it may be that this has already been reported. I am sorry if this is the case.

I have encountered a strange behaviour when writing a compound datatype with the C++ API, and then reading data back into objects of the same class.

It seems that the read process overwrites data that was (supposedly) not even written to the file. I.e. if you put only some members of a class into the compound type and write a dataset of this type to file, then read everything back from that file into objects of the same class, it happens that other members of the class (that were not inserted to the compound data type) are overwritten in the destination objects. If, on the other hand, you read the written data into objects of a different class (with different data layout), everything is fine and only the requested members are overwritten.

In my real-life example I allocate an array of objects that are initialised with pointers to other newly allocated objects, which, of course should not be overwritten. After a lot of debugging after the program segfaulted due to the overwritten pointers, I realized that the pointers are not overwritten if the last member of the written compound type is not read. And so, for the moment I just added a one byte member to the output compound type, that is ignored when reading.

I tested the real-life case against v.1.8.11 and v1.8.12 on GNU/Linux and OS X 10.8. Using g++-4.8 and clang++-5.0.

To demonstrate the (let's call it a) bug, I have written a little program that shows that padding data seems to be written to file, and then when reading it gets set again in the destination object. As already mentioned, it does not happen if the destination object has a different data layout from the objects used for writing.

The weird thing is that this program even exhibits the bug when only two members of the compound type are separated by a variable that is not written/read, i.e. said variable will also be overwritten. I verified, though, that in my real-life case the padding data is only written, if the last member of the compound type in the written dataset was read in!

The demo creates an object of class AClass which has its members set to the numbers 1, 2, 3, 4, 5, respectively. It writes three class member variables in a compound to an hdf5 file, which are separated by one variable each (1, 3, 5 written, 2 and 4 omitted).
The data is then first read into an object of class BClass, which has the same member variables in different order (the constructor sets all of them to zero). The omitted variables are not overwritten, this is the behaviour I expect for any case.
Afterwards two of the three variables written are read back into an object of class AClass (1 and 3). The constructor had set all of the members to zero, but variable member 2 is overwritten, though it is not member of the compound type!
Reading in the compound in the same way as it was written to the file yields an object identical to the object from which data was written. Despite the fact that only three member variables were also members of the compound type!

I would reason, that all this is unexpected behaviour. Any comments?

Here's the code of my demo:

================================== 8< ==================================

#include "H5Cpp.h"

#include <iostream>

#define RWFILE "test.h5"

class AClass
{
public:
    AClass() : a(0), probe1(0), c(0), probe2(0), last(0) {};

    long a;
    unsigned char probe1;
    double c;
    unsigned char probe2;
    unsigned long last;
};

class BClass
{
public:
    BClass() : c(0), probe1(0), a(0), probe2(0), last(0) {};

    double c;
    unsigned char probe1;
    long a;
    unsigned char probe2;
    unsigned long last;
};

void writeHDF5( AClass * object, H5::CompType & type, const std::string & filename );
void readHDF5( AClass * object, H5::CompType & type, const std::string & filename );
void readHDF5( BClass * object, H5::CompType & type, const std::string & filename );

int
main()
{
    AClass writeObject;

    BClass readObjectB;
    AClass readObjectA;
    AClass readObjectA2;

    writeObject.a = 1;
    writeObject.probe1 = 2;
    writeObject.c = 3;
    writeObject.probe2 = 4;
    writeObject.last = 5;

    H5::CompType writeType( sizeof(AClass) );

    writeType.insertMember("a", HOFFSET(AClass, a), H5::PredType::NATIVE_LONG );
    writeType.insertMember("c", HOFFSET(AClass, c), H5::PredType::NATIVE_DOUBLE );
    writeType.insertMember("last", HOFFSET(AClass, last), H5::PredType::NATIVE_ULONG );

    writeHDF5( &writeObject, writeType, "test.h5" );

    // read in all data into a different data layout
    H5::CompType readTypeB( sizeof(BClass) );

    readTypeB.insertMember( "a", HOFFSET(BClass, a), H5::PredType::NATIVE_LONG );
    readTypeB.insertMember( "c", HOFFSET(BClass, c), H5::PredType::NATIVE_DOUBLE );
    readTypeB.insertMember( "last", HOFFSET(BClass, last), H5::PredType::NATIVE_ULONG );

    readHDF5( &readObjectB, readTypeB, "test.h5" );

    std::cout << "reading all into object of different class: probe1 = " << (unsigned int) readObjectB.probe1
              << " probe2 = " << (unsigned int) readObjectB.probe2 << std::endl;
    // both probes report zero, as they should

    // read in data into original data layout, without crossing member 'probe2'
    H5::CompType readTypeA( sizeof(BClass) );

    readTypeA.insertMember( "a", HOFFSET(AClass, a), H5::PredType::NATIVE_LONG );
    readTypeA.insertMember( "c", HOFFSET(AClass, c), H5::PredType::NATIVE_DOUBLE );

    readHDF5( &readObjectA, readTypeA, "test.h5" );
    std::cout << "reading 2 of 3 members into object of class AClass: probe1 = " << (unsigned int) readObjectA.probe1
              << " probe2 = " << (unsigned int) readObjectA.probe2 << std::endl;
    // probe1 reports 2, so it has been overwritten, probe2 reports zero, as it should

    // read in all data in the original layout
    readHDF5( &readObjectA2, writeType, "test.h5" );
    std::cout << "reading all members into object of class AClass: probe1 = " << (unsigned int) readObjectA2.probe1
              << " probe2 = " << (unsigned int) readObjectA2.probe2 << std::endl;
    // probe1 reports 2, probe2 reports 4, as in the original writeObject!!

    return 0;
}

void writeHDF5(AClass * object, H5::CompType & type, const std::string & filename )
{
    H5::H5File outputFile( filename, H5F_ACC_TRUNC );

    hsize_t maxdims[] = {1};

    H5::DataSpace mspace(1, maxdims, NULL);
    H5::DataSpace dspace(1, maxdims, NULL);

    H5::DataSet dataset( outputFile.createDataSet( "/dset", type, dspace ) );
    dataset.write( object, type, mspace, dspace );

    outputFile.flush(H5F_SCOPE_GLOBAL);
}

void readHDF5(AClass * object, H5::CompType & type, const std::string & filename )
{
    H5::H5File inputFile( filename, H5F_ACC_RDONLY );

    H5::DataSet dset = inputFile.openDataSet("/dset");

    dset.read( object, type );
}

void readHDF5(BClass * object, H5::CompType & type, const std::string & filename )
{
    H5::H5File inputFile( filename, H5F_ACC_RDONLY );

    H5::DataSet dset = inputFile.openDataSet("/dset");

    dset.read( object, type );
}

================================== >8 ==================================

Sorry for the length of my post. But I thought I should report this.

  cheers,

    Tim