C, C#, HDF5DotNet and nested structs

Dear All,

I am writing an packet table from C with a compound dataset in the form of:

// C
struct A
{
  int x;
  int y;
};

struct Container
{
  int offset;
  struct A element;
}

I am later reading the same dataset into .NET using HDF5DotNet. However, I
cannot read the elements if the .NET structs have the same form (with a
nested struct). However, it works correctly if I define a .NET struct as:

// C# - works
public struct Container
{
  public int offset;
  public int x;
  public int y;
}

// C# - does not work
public struct A
{
  public int x;
  public int y;
}

public struct Container
{
  public int offset;
  public A element;
}

Does anybody know any way to use the same form in C# (nested structs) as
with C?

Many thanks.

Martin

Hi Martin,

I am writing an packet table from C with a compound dataset in the form of:

// C
struct A
{
  int x;
  int y;
};

struct Container
{
  int offset;
  struct A element;
}

Am I correct in assuming that you write that sructure out as a
"blob", i.e. not each element of the structure (and then the
elements of the substructure), probably using the sizeof()
the structure? If not please just forget about all of the
rest;-)

If that's the case then you're in trouble for several rea-
sons already when you just use C and don't even throw C#
into the mix. First of all already an 'int' can have dif-
ferent sizes on different machines, so the structure is
unreadable e.g. on a 64-bit machine when written on a
32-bit machine. Same holds for other types. Moreover some
machines store an int with the most-significant byte first,
others with the last-significant byte first etc. (that's
often called the "endian-ness" of a certain architecture).

HDF5 goes a long way to keep you out of trouble when you write
and read back again normal types like int's or double's by
storing the exact format the int's and double's are written
into the file and converts them to the format expected on the
machine the data are read in as necessary.

And then there is "structure padding". The compiler is free
to insert "padding bytes" between the elements of the struc-
ture (and after the last element). Those are needed to ensure
that the members are "properly aligned" in memory - at least
on some architectures e.g. int's can't start at arbitrary
addresses, so padding bytes can be required to get a member
to be aligned to a valid address. Therefor the sizeof() of a
structure is often larger then the sum of sizeof()'s of its
members.

The amount of padding needed can be different on different
architectures, but even on the same machine different com-
pilers are free to insert different amounts. So already a
structure, written out as a "blob" by a program compiled with
compiler A, may not readable with a program compiled using
compiler B (or, theoretically, even a different version of
compiler A).

I am later reading the same dataset into .NET using HDF5DotNet. However, I
cannot read the elements if the .NET structs have the same form (with a
nested struct). However, it works correctly if I define a .NET struct as:

// C# - works
public struct Container
{
  public int offset;
  public int x;
  public int y;
}

By pure chance the layout of the elements in the C structure,
the sizeof()'s the elements and the endian-ness of the data
seem to be identical to what's used in this C# structure.
If you would try to copy the HDF file to another archtecture
(e.g. a 32-bit machine instead of a 64-bit machine or a small-
endian instead of a big-endian machine) and read it there you
will rather likely find that it doesn't work at all anymore.

// C# - does not work
public struct A
{
  public int x;
  public int y;
}

public struct Container
{
  public int offset;
  public A element;
}

Does anybody know any way to use the same form in C# (nested structs) as
with C?

It's impossible to know since the layout of a C structure isn't
well defined. What you have to do is "serialize" the structure,
i.e. split it up into its compnents and write them out each
on its own. When reading it back in you then have to re-assemb-
le the components into a structure on th target machine.

But HDF5 can also help you with this when you use the "compound"
data type (H5T_COMPOUND). You have to tell it about the exact
layout and contents of your structure. That allows HDF5 to
"serialize" the structure for you (and re-assemble it when
reading it in). An introduction can be found e.g. at

  http://www.hdfgroup.org/HDF5/Tutor/compound.html

And since a compound data type can even itself contain a com-
pound data type even writing out a structure that contains a
structure should work.
                           Best regards, Jens

···

On Fri, Apr 08, 2011 at 10:09:27AM +0100, Martin Galpin wrote:
--
  \ Jens Thoms Toerring ________ jt@toerring.de
   \_______________________________ http://toerring.de

Hi Jens,

Thanks for the time.

Am I correct in assuming that you write that sructure out as a
"blob", i.e. not each element of the structure (and then the
elements of the substructure), probably using the sizeof()
the structure? If not please just forget about all of the
rest;-)

I am not using a "blob" and indeed using a H5T_COMPOUND type that is
built using the constituents of the structure.

So, in the case of:

// C
struct A
{
  int x;
  int y;
};

struct Container
{
  int offset;
  struct A element;
}

It would be:

H5insert... H5T_NATIVE_INT32, H5T_NATIVE_INT32, H5T_NATIVE_INT32 (offset, x, y).

With this approach I can H5PTappend a Container* packet directly to
the compound type. Should I be creating nested compound types for the
nested structures?

Thanks again.

Hi Martin.

I am not using a "blob" and indeed using a H5T_COMPOUND type that is
built using the constituents of the structure.

So, in the case of:

// C
struct A
{
  int x;
  int y;
};

struct Container
{
  int offset;
  struct A element;
}

It would be:

H5insert... H5T_NATIVE_INT32, H5T_NATIVE_INT32, H5T_NATIVE_INT32 (offset, x, y).

Not really sure what this means;-) If this is the in-memory layout
then it might just be working because in this case the in-memory
layout of the structure is really that simple, i.e. just three
ints, one after another. I wouldn't like to bet that it is that
simple in all possible cases... BTW, if this is really about the
in-memory layout, why are you using 'H5T_NATIVE_INT32' instead of
'H5T_NATIVE_INT'?

With this approach I can H5PTappend a Container* packet directly to
the compound type. Should I be creating nested compound types for the
nested structures?

If you want to retain the nested-ness of the structure in the
HDF file then I would say yes, definitely. To simplify the dis-
cussion I wrote a shot program that just writes out a single
instance of your nested structure and then reading it back in:

--------8<----------------------------------
#include <stdio.h>
#include <hdf5.h>

typedef struct A
{
    int x;
    int y;
} A_t;

typedef struct Container
{
    int offset;
    A_t element;
} Container_t;

int
main( )
{
    Container_t s = { 16, { 19, -21 } };
    hid_t file,
                space,
                dset,
                A_filetype,
                A_memtype,
                C_filetype,
                C_memtype;
    hsize_t dims[ 1 ] = { 1 };
    int ndims;

  /* Print out the original contents of the structure */

    printf( "Before: { %d, { %d, %d } }\n",
            s.offset, s.element.x, s.element.y );

    /* Create data type with in-memory layout of the A structure */

    A_memtype = H5Tcreate( H5T_COMPOUND, sizeof( A_t ) );
    H5Tinsert( A_memtype, "x", HOFFSET( A_t, x ), H5T_NATIVE_INT );
    H5Tinsert( A_memtype, "y", HOFFSET( A_t, y ), H5T_NATIVE_INT );

    /* Create data type with in-memory layout of the Container structure */

    C_memtype = H5Tcreate( H5T_COMPOUND, sizeof( Container_t ) );
    H5Tinsert( C_memtype, "offset", HOFFSET( Container_t, offset ),
               H5T_NATIVE_INT );
    H5Tinsert( C_memtype, "element", HOFFSET( Container_t, element ),
               A_memtype );

    /* Create data type with in-file layout of the A structure */

    A_filetype = H5Tcreate( H5T_COMPOUND, 2 * H5Tget_size( H5T_NATIVE_INT32 ) );
    H5Tinsert( A_filetype, "x", 0, H5T_NATIVE_INT32 );
    H5Tinsert( A_filetype, "y", H5Tget_size( H5T_NATIVE_INT32 ),
               H5T_NATIVE_INT32 );

    /* Create data type with in-file layout of the Container structure */

    C_filetype = H5Tcreate( H5T_COMPOUND,
                              H5Tget_size( H5T_NATIVE_INT32 )
                            + H5Tget_size( A_filetype ) );
    H5Tinsert( C_filetype, "offset", 0, H5T_NATIVE_INT32 );
    H5Tinsert( C_filetype, "element", H5Tget_size( H5T_NATIVE_INT32 ),
               A_filetype );

    /* Open a file for writing */

    file = H5Fcreate( "ctest.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT );

    /* Create a dataspace and the dataset */

    space = H5Screate_simple( 1, dims, NULL );
    dset = H5Dcreate2( file, "test set", C_filetype, space,
                       H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT );

    /* Write the structure out */

    H5Dwrite( dset, C_memtype, H5S_ALL, H5S_ALL, H5P_DEFAULT, &s );

    /* Get rid of resources and close the file */

    H5Dclose( dset );
    H5Sclose( space );
    H5Tclose( C_filetype );
    H5Tclose( A_filetype );
    H5Fclose( file );

    /* Clear out the structure and print it to show it's all 0 now */

    s.offset = s.element.x = s.element.y = 0;
    printf( "After clear: { %d, { %d, %d } }\n",
            s.offset, s.element.x, s.element.y );

    /* Now the reverse (re-using the in-memory layout of the structure,
       the in-file layout is in the file, so it doesn't need to be re-
       created) */

    file = H5Fopen( "ctest.h5", H5F_ACC_RDONLY, H5P_DEFAULT );
    dset = H5Dopen2( file, "test set", H5P_DEFAULT );
    space = H5Dget_space( dset );
    ndims = H5Sget_simple_extent_dims( space, dims, NULL );
    H5Dread( dset, C_memtype, H5S_ALL, H5S_ALL, H5P_DEFAULT, &s );

    H5Dclose( dset );
    H5Sclose( space );
    H5Tclose( C_memtype );
    H5Tclose( A_memtype );
    H5Fclose( file );

    /* Show that the structure again contains the original data */

    printf( "Finally: { %d, { %d, %d } }\n",
            s.offset, s.element.x, s.element.y );

    return 0;
}
--------8<----------------------------------

Please keep in mind that this was my first attempt at trying to
deal with compounds containing compounds, so there might be some
bugs lurking...
                            Best regards, Jens

···

On Fri, Apr 08, 2011 at 02:59:59PM +0100, Martin Galpin wrote:
--
  \ Jens Thoms Toerring ________ jt@toerring.de
   \_______________________________ http://toerring.de