bug with compound types, uint64_t and packet table / appending to tables?

Hi,

I have structs like the following:

struct record{
    uint32_t count;
    uint32_t event;
    uint64_t timestamp_nsec;
    uint64_t time_of_receipt_nsec;
    uint32_t some_other_value;
    uint8_t is_chunked;
};

I use the c++ api with 1.8.15 to make an H5::CompType containing those
fields, using the H5::PredType::Native's. The H5::CompType is constructed
as H5::CompType(sizeof(record)). Now when I write this to an append table
in online non-trivial programs (2 completely different programs so far),
things will be fairly deterministically corrupted. __attribute__((packed))
will make a difference how this corruption shows up, I've noted count and
is_chunked getting the same values before without it. With it, there tends
to only be corruption after the uint64_t's. Some fields are corrupted but
it's not easy to know what's up - other fields have the values of fields
somewhere else in the structure's memory which strongly implies there's
some incorrect offset in the struct used. I use HOFFSET to get the field
offsets in the above for H5::CompType::inesrtMember. I believe the
H5::CompType is correctly describing the structure's but the append process
is not being performed correctly - I use the same types of structs in
programs that write unix file packet logs and parse it with these memtypes
(in python) and it's fine I convert them to hdf5 datasets with h5py..

I have this problem with HDF 1.8.13 and 1.8.15, with gcc 4.8.2 and clang
4.5, on 64bit (the 64bit may be very relevant here, I'm suspecting
somewhere the uint64_t's are treated as uint32_t's or INT's or something).

Any thoughts? I've tried producing it in some boiled down programs and
have yet failed to get a minimum case.

-Jason

As a clarification to hopefully avoid some knee jerk reactions:

If I use uint32_t instead of uint64_t, everything works. I'll have to try
double.

In the aforementioned unix file packet logs, in a seperate h5 file, I
commit an H5::CompType type computed in the same runtime so I can safely
parse the binary file later while keeping most advantages of HDF5. So when
I read the data in with python and convert it to HDF, I am using this named
type as the src (from binary) and dst (to HDF) types. I hope this
increases confidence that I'm not making a trivial mistake on the CompType
construction.

I do intend to look at the offsets of the fields at some point today to
verify in the context where I have the problem, it is correct.

-Jason

···

On Fri, Oct 16, 2015 at 11:47 AM, Jason Newton <nevion@gmail.com> wrote:

Hi,

I have structs like the following:

struct record{
    uint32_t count;
    uint32_t event;
    uint64_t timestamp_nsec;
    uint64_t time_of_receipt_nsec;
    uint32_t some_other_value;
    uint8_t is_chunked;
};

I use the c++ api with 1.8.15 to make an H5::CompType containing those
fields, using the H5::PredType::Native's. The H5::CompType is constructed
as H5::CompType(sizeof(record)). Now when I write this to an append table
in online non-trivial programs (2 completely different programs so far),
things will be fairly deterministically corrupted. __attribute__((packed))
will make a difference how this corruption shows up, I've noted count and
is_chunked getting the same values before without it. With it, there tends
to only be corruption after the uint64_t's. Some fields are corrupted but
it's not easy to know what's up - other fields have the values of fields
somewhere else in the structure's memory which strongly implies there's
some incorrect offset in the struct used. I use HOFFSET to get the field
offsets in the above for H5::CompType::inesrtMember. I believe the
H5::CompType is correctly describing the structure's but the append process
is not being performed correctly - I use the same types of structs in
programs that write unix file packet logs and parse it with these memtypes
(in python) and it's fine I convert them to hdf5 datasets with h5py..

I have this problem with HDF 1.8.13 and 1.8.15, with gcc 4.8.2 and clang
4.5, on 64bit (the 64bit may be very relevant here, I'm suspecting
somewhere the uint64_t's are treated as uint32_t's or INT's or something).

Any thoughts? I've tried producing it in some boiled down programs and
have yet failed to get a minimum case.

-Jason

Doubles don't seem to have changed much but I've collected some
working/broken cases for the same conceptual struct, with different layouts
and dump their HDF definition (in program, one line prior to table
instantiation), show ground truth on select fields prior to append, a
repeat of ground truth just after the append operation to the packet table
(remember, this is using the TB common append function internally so the
bug is most likely there or below), and corresponding HDF recorded value.

This issue seems to be dependent on alignments of types.

I've attached a text file of these runs.

-Jason

hdf_debug.txt (11.1 KB)

···

On Fri, Oct 16, 2015 at 1:59 PM, Jason Newton <nevion@gmail.com> wrote:

As a clarification to hopefully avoid some knee jerk reactions:

If I use uint32_t instead of uint64_t, everything works. I'll have to try
double.

In the aforementioned unix file packet logs, in a seperate h5 file, I
commit an H5::CompType type computed in the same runtime so I can safely
parse the binary file later while keeping most advantages of HDF5. So when
I read the data in with python and convert it to HDF, I am using this named
type as the src (from binary) and dst (to HDF) types. I hope this
increases confidence that I'm not making a trivial mistake on the CompType
construction.

I do intend to look at the offsets of the fields at some point today to
verify in the context where I have the problem, it is correct.

-Jason

On Fri, Oct 16, 2015 at 11:47 AM, Jason Newton <nevion@gmail.com> wrote:

Hi,

I have structs like the following:

struct record{
    uint32_t count;
    uint32_t event;
    uint64_t timestamp_nsec;
    uint64_t time_of_receipt_nsec;
    uint32_t some_other_value;
    uint8_t is_chunked;
};

I use the c++ api with 1.8.15 to make an H5::CompType containing those
fields, using the H5::PredType::Native's. The H5::CompType is constructed
as H5::CompType(sizeof(record)). Now when I write this to an append table
in online non-trivial programs (2 completely different programs so far),
things will be fairly deterministically corrupted. __attribute__((packed))
will make a difference how this corruption shows up, I've noted count and
is_chunked getting the same values before without it. With it, there tends
to only be corruption after the uint64_t's. Some fields are corrupted but
it's not easy to know what's up - other fields have the values of fields
somewhere else in the structure's memory which strongly implies there's
some incorrect offset in the struct used. I use HOFFSET to get the field
offsets in the above for H5::CompType::inesrtMember. I believe the
H5::CompType is correctly describing the structure's but the append process
is not being performed correctly - I use the same types of structs in
programs that write unix file packet logs and parse it with these memtypes
(in python) and it's fine I convert them to hdf5 datasets with h5py..

I have this problem with HDF 1.8.13 and 1.8.15, with gcc 4.8.2 and clang
4.5, on 64bit (the 64bit may be very relevant here, I'm suspecting
somewhere the uint64_t's are treated as uint32_t's or INT's or something).

Any thoughts? I've tried producing it in some boiled down programs and
have yet failed to get a minimum case.

-Jason

Not 100% to the bottom of this, nor [especially] understanding why I only
recently had trouble with this but it is padding/alignment sensitive.
Particularly baffling some packed structs I had still didn't work, despite
careful design.... this must have just been an aliased in with all the
other issues I suddenly encountered.

I believe most applications do not support or try to handle unpacked
structures (hdfview, h5py) and have the expectation that the data in HDF is
packed.

So to deal with this, in python, I had the following snippet, which seems
to convert the data. Basically it boils down to doing the conversion with
H5T convert to the packed compound type, but after the packet table is read
back in. If you want compatibility with other applications, you must
postprocess, like what this python snippet (and the program it came from)
does.

        packed_dtype_id = h5dtype.id.copy()

        packed_dtype_id.pack()
        packed_dtype = packed_dtype_id.dtype
        records = np.array([], dtype=packed_dtype)
        with file(src, 'rb') as f:
            dtype=np.dtype((np.uint8, h5dtype.id.get_size()))
            dbytes = np.fromfile(f, dtype=dtype)
            h5py.h5t.convert(h5dtype.id, packed_dtype_id,
dbytes.shape[0], dbytes)
            records = dbytes.ravel()[:packed_dtype.itemsize *
dbytes.shape[0]].view(packed_dtype).copy()

Note that numpy's itemsize will not actually return the size of the record
in memory, but the sum of the sizes of it's types... which is invalid for
"sizeof" for unpacked structures. H5py also did not respect offsets which
makes the fromfile not work right here. I'm submitting a patch to them that
fixes this.

-Jason

···

On Sat, Oct 17, 2015 at 6:23 PM, Jason Newton <nevion@gmail.com> wrote:

Doubles don't seem to have changed much but I've collected some
working/broken cases for the same conceptual struct, with different layouts
and dump their HDF definition (in program, one line prior to table
instantiation), show ground truth on select fields prior to append, a
repeat of ground truth just after the append operation to the packet table
(remember, this is using the TB common append function internally so the
bug is most likely there or below), and corresponding HDF recorded value.

This issue seems to be dependent on alignments of types.

I've attached a text file of these runs.

-Jason

On Fri, Oct 16, 2015 at 1:59 PM, Jason Newton <nevion@gmail.com> wrote:

As a clarification to hopefully avoid some knee jerk reactions:

If I use uint32_t instead of uint64_t, everything works. I'll have to
try double.

In the aforementioned unix file packet logs, in a seperate h5 file, I
commit an H5::CompType type computed in the same runtime so I can safely
parse the binary file later while keeping most advantages of HDF5. So when
I read the data in with python and convert it to HDF, I am using this named
type as the src (from binary) and dst (to HDF) types. I hope this
increases confidence that I'm not making a trivial mistake on the CompType
construction.

I do intend to look at the offsets of the fields at some point today to
verify in the context where I have the problem, it is correct.

-Jason

On Fri, Oct 16, 2015 at 11:47 AM, Jason Newton <nevion@gmail.com> wrote:

Hi,

I have structs like the following:

struct record{
    uint32_t count;
    uint32_t event;
    uint64_t timestamp_nsec;
    uint64_t time_of_receipt_nsec;
    uint32_t some_other_value;
    uint8_t is_chunked;
};

I use the c++ api with 1.8.15 to make an H5::CompType containing those
fields, using the H5::PredType::Native's. The H5::CompType is constructed
as H5::CompType(sizeof(record)). Now when I write this to an append table
in online non-trivial programs (2 completely different programs so far),
things will be fairly deterministically corrupted. __attribute__((packed))
will make a difference how this corruption shows up, I've noted count and
is_chunked getting the same values before without it. With it, there tends
to only be corruption after the uint64_t's. Some fields are corrupted but
it's not easy to know what's up - other fields have the values of fields
somewhere else in the structure's memory which strongly implies there's
some incorrect offset in the struct used. I use HOFFSET to get the field
offsets in the above for H5::CompType::inesrtMember. I believe the
H5::CompType is correctly describing the structure's but the append process
is not being performed correctly - I use the same types of structs in
programs that write unix file packet logs and parse it with these memtypes
(in python) and it's fine I convert them to hdf5 datasets with h5py..

I have this problem with HDF 1.8.13 and 1.8.15, with gcc 4.8.2 and clang
4.5, on 64bit (the 64bit may be very relevant here, I'm suspecting
somewhere the uint64_t's are treated as uint32_t's or INT's or something).

Any thoughts? I've tried producing it in some boiled down programs and
have yet failed to get a minimum case.

-Jason