Structure Packing

george.lewandowski · February 7, 2008, 6:59pm

I am writing an array of structs to an HDF5 dataset. The definision of
the struct is

struct X
{
int64_t index;
int32_t code;
};

I have been a Win32 programmer, and the code base (in use for 5 years)
has run exclusively on MS Windows.

I am now working in Linux, as well, and have a problem with structure
packing.

In MS, the sizeof for the struct is 16, with 4 wasted bytes between the
end of index and beginning of code.
In gcc, the sizeof is 12.

All of the data written to files includes the wasted space. Upon
reading the data in Linux, however, I overrun the end of the buffer,
since I am only allocating 12 bytes per item and reading 16 into it.
This, of course, causes "unexpected results".

In my Linux setup, I could set the packing to 8 with -fpack-struct, but
I am wondering if this is the right approach. I need to be able to
retrieve data already written to files.

Here is the definition of the HDF5 datatype, including a previous
comment about gcc.

        hid_t ti_type = H5Tcreate(H5T_COMPOUND, sizeof( BHTagItem ) );

        // gcc will NOT allow this offsetof without a warning. I don't
        // know why. So I'll try a different tack.
        //size_t index_offset = offsetof( HDF_Boeing::BHTagItem, index_
);
        //size_t code_offset = offsetof( HDF_Boeing::BHTagItem, code_ );
        size_t index_offset = 0;
        size_t code_offset = sizeof(uint64_t);
        H5Tinsert( ti_type, "index", index_offset, H5T_NATIVE_UINT64 );
        H5Tinsert( ti_type, "code" , code_offset, H5T_NATIVE_UINT32 );
        return ti_type;

Any ideas?

George Lewandowski
(314)777-7890
Mail Code S270-2204
Building 270-E Level 2E Room 20E
P-8A

···

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Francesc_Altet · February 8, 2008, 9:57am

Hi George,

A Thursday 07 February 2008, Lewandowski, George escrigué:

I am writing an array of structs to an HDF5 dataset. The definision
of the struct is

struct X
{
int64_t index;
int32_t code;
};

I have been a Win32 programmer, and the code base (in use for 5
years) has run exclusively on MS Windows.

I am now working in Linux, as well, and have a problem with structure
packing.

In MS, the sizeof for the struct is 16, with 4 wasted bytes between
the end of index and beginning of code.
In gcc, the sizeof is 12.

All of the data written to files includes the wasted space. Upon
reading the data in Linux, however, I overrun the end of the buffer,
since I am only allocating 12 bytes per item and reading 16 into it.
This, of course, causes "unexpected results".

Mmmm, not necessarily. HDF5 is pretty smart about converting data in
disk to memory structures. The only thing is that you should make sure
that your memory structures are correctly defined, and perhaps this is
the tricky part in your case.

In my Linux setup, I could set the packing to 8 with -fpack-struct,
but I am wondering if this is the right approach. I need to be able
to retrieve data already written to files.

Here is the definition of the HDF5 datatype, including a previous
comment about gcc.

        hid_t ti_type = H5Tcreate(H5T_COMPOUND, sizeof( BHTagItem )
);

        // gcc will NOT allow this offsetof without a warning. I
don't // know why. So I'll try a different tack.
        //size_t index_offset = offsetof( HDF_Boeing::BHTagItem,
index_ );
        //size_t code_offset = offsetof( HDF_Boeing::BHTagItem, code_
); size_t index_offset = 0;
        size_t code_offset = sizeof(uint64_t);
        H5Tinsert( ti_type, "index", index_offset, H5T_NATIVE_UINT64
); H5Tinsert( ti_type, "code" , code_offset, H5T_NATIVE_UINT32 );
return ti_type;

I think this should work whenever than BHTagItem is the 12-bytes
structure where "index" and "code" are kept. Is sizeof(BHTagItem)
returning 12 in your Linux box?

···

--

0,0< Francesc Altet http://www.carabos.com/

V V Cárabos Coop. V. Enjoy Data
"-"

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Pedro_Vicente2 · February 15, 2008, 7:14pm

hello, George

I am writing an array of structs to an HDF5 dataset. The definision of
the struct is

struct X
{
   int64_t index;
   int32_t code;
};

I have been a Win32 programmer, and the code base (in use for 5 years)
has run exclusively on MS Windows.

I am now working in Linux, as well, and have a problem with structure
packing.

In MS, the sizeof for the struct is 16, with 4 wasted bytes between the
end of index and beginning of code.
In gcc, the sizeof is 12.

All of the data written to files includes the wasted space. Upon
reading the data in Linux, however, I overrun the end of the buffer,
since I am only allocating 12 bytes per item and reading 16 into it.
This, of course, causes "unexpected results".

In my Linux setup, I could set the packing to 8 with -fpack-struct, but
I am wondering if this is the right approach. I need to be able to
retrieve data already written to files.

Here is the definition of the HDF5 datatype, including a previous
comment about gcc.

       hid_t ti_type = H5Tcreate(H5T_COMPOUND, sizeof( BHTagItem ) );

       // gcc will NOT allow this offsetof without a warning. I don't
       // know why. So I'll try a different tack.
       //size_t index_offset = offsetof( HDF_Boeing::BHTagItem, index_
);
       //size_t code_offset = offsetof( HDF_Boeing::BHTagItem, code_ );
       size_t index_offset = 0;
       size_t code_offset = sizeof(uint64_t);
       H5Tinsert( ti_type, "index", index_offset, H5T_NATIVE_UINT64 );
       H5Tinsert( ti_type, "code" , code_offset, H5T_NATIVE_UINT32 );
       return ti_type;

Any ideas?

George Lewandowski
(314)777-7890
Mail Code S270-2204
Building 270-E Level 2E Room 20E
P-8A

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

I ran into a similar issue when I implemented the HDF5 Table API

The Table API also uses the H5T_COMPOUND HDF5 type

This page has a section called "Alignment of Structure Fields", that shortly explains the issue

http://hdfgroup.org/HDF5/Tutor/h5table.html

If you want to take a look at how the Table API deals with this check the code of

/hl/examples/ex_table_01.c

And further the code of the functions H5TBmake_table and H5TBread_table in

/hl/src/H5TA.c

This is the 1.6.7 HDF5 version

http://www.hdfgroup.org/HDF5/doc/HL/RM_H5TB.html

All the examples on the Table API use this structure

typedef struct Particle
{
  char name[16];
  int lati;
  int longi;
  float pressure;
  double temperature;
} Particle;

How you should define your HDF5 code so that your structure is portable is like this

get the size of the structure in memory

size_t dst_size = sizeof( Particle );

the offsets

size_t dst_offset[NFIELDS] = { HOFFSET( Particle, name ),
                                HOFFSET( Particle, lati ),
                                HOFFSET( Particle, longi ),
                                HOFFSET( Particle, pressure ),
                                HOFFSET( Particle, temperature )};

Difine an array of HDF5 types , in this case

hid_t field_type[NFIELDS];
string_type = H5Tcopy( H5T_C_S1 );
  H5Tset_size( string_type, 16 );
  field_type[0] = string_type;
  field_type[1] = H5T_NATIVE_INT;
  field_type[2] = H5T_NATIVE_INT;
  field_type[3] = H5T_NATIVE_FLOAT;
  field_type[4] = H5T_NATIVE_DOUBLE;

This is what you need to define your structure and save it to disk. This is made inside the H5TBmake_table function

You define an HDF5 type with the above memory size

/* Create the memory data type. */
if ((mem_type_id = H5Tcreate (H5T_COMPOUND, dst_size )) < 0 )
return -1;

call H5Tinsert in this type with the above arrays (You need an array for names too)

/* Insert fields. */
for ( i = 0; i < nfields; i++)
{
if ( H5Tinsert(mem_type_id, field_names[i], field_offset[i], field_types[i] ) < 0 )
return -1;
}

Then call H5Dcreate and H5Dwrite using this type

I also use mostly Windows for development. In Visual Studio 6, the size of that structure is 40 bytes in memory (instead of the expected 36 bytes if you sum the type sizes). 4 bytes of padding are inserted to make "pressure" (being a 4-byte value) start on a 4-byte boundary in memory.

This layout is what H5TBmake_table then saves. Optionally one could pack this structure using the HD5 function H5Tpack(), but this is not done by the Table API

When reading, like Francesc mentioned before, you should recreate the in-memory datatypes

This is made inside H5TBread_table and more specifically inside H5TB_create_type, which creates a "native" dataype in the reading machine

For this native type you must get the sizes of your members in the reading machine memory. I use an array

size_t dst_sizes[NFIELDS] = { sizeof( dst_buf[0].name),
                               sizeof( dst_buf[0].lati),
                               sizeof( dst_buf[0].longi),
                               sizeof( dst_buf[0].pressure),
                               sizeof( dst_buf[0].temperature)};

that is passed to H5TBread_table

inside H5TBread_table there is a call to a private function called H5TB_create_type, where a call is made to HDF5 function H5Tget_native_type to check if the size needs to be adjusted for the reading machine

you need to do something like this

/* get each field ID and adjust its size, if necessary */
for ( i=0; i<nfields; i++)
{
  if ((mtype_id=H5Tget_member_type(ftype_id,i))<0)
   goto out;
  if ((nmtype_id=H5Tget_native_type(mtype_id,H5T_DIR_DEFAULT))<0)
   goto out;
  size_native=H5Tget_size(nmtype_id);
  if (dst_sizes[i]!=size_native)
  {
   if (H5Tset_size(nmtype_id,dst_sizes[i])<0)
    goto out;
  }
  if (H5Tinsert(mem_type_id,fnames[i],dst_offset[i],nmtype_id) < 0 )
   goto out;
  if (H5Tclose(mtype_id)<0)
   goto out;
  if (H5Tclose(nmtype_id)<0)
   goto out;
}

So, basically, use the offsets to write, use the sizeof and H5Tget_native_type to read

so, in your code, don't use this

size_t code_offset = sizeof(uint64_t);

but rather

size_t code_offset = HOFFSET (X, uint64_t);

hope that helps, let us know if you have more specific code questions !

···

At 12:59 PM 2/7/2008, Lewandowski, George wrote:

-----------------------------------------------------------
Pedro

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Francesc_Altet · February 8, 2008, 5:56pm

A Friday 08 February 2008, escriguéreu:

On the Windows box, a file created shows this H5dump result on the
datatype:

   DATATYPE H5T_COMPOUND {
      H5T_STD_U64LE "index";
      H5T_STD_U32LE "code";
   }

And H5Tget_size gives a size of 16.

In Windows the sizeof is 16 and on Linux the sizeof is 12.

If I had used H5Tpack on the compound type, I am sure that 12
bytes/struct would be stored, but on Windows my raw buffer would have
16/struct and on Linux it would have 12, and how would HDF know the
difference?

I guess I will have to check the size in HDF with H5Tget_size, and
the size in memory with sizeof() and do conversions as necessary.
But I would like to know, for future reference how HDF handles these
things.

In principle, you shouldn't need to do conversions manually. I think
that the only problem is to define a correct data type in the reading
machine. You have told us how you create this type like this:

        hid_t ti_type = H5Tcreate(H5T_COMPOUND, sizeof( BHTagItem ) );
        size_t index_offset = 0;
        size_t code_offset = sizeof(uint64_t);
        H5Tinsert( ti_type, "index", index_offset, H5T_NATIVE_UINT64 );
        H5Tinsert( ti_type, "code" , code_offset, H5T_NATIVE_UINT32 );
        return ti_type;

but you have not specified if you use the same way in the writing
machine and the reading machine. My guess is that, in the reading
machine, you are probably using H5Tget_size in H5Tcreate, instead of
sizeof(BHTagItem), and this is why you are creating a datatype of 16
bytes in-memory, instead of a correct one of 12 bytes. If this is your
case, my advice is to use sizeof(BHTagItem) also in the reading
machine. This should be enough for allowing HDF5 to figure out where
he has to place the read data.

If this does not work for you, then you are facing some other problem I
can't figure out right now.

At any rate, I regularly map compound types on-disk with padding and
read them in structures in-memory with no padding using the technique
above (i.e. always specifying the size of your in-memory datatype) with
no problem.

HTH,

···

--

0,0< Francesc Altet http://www.carabos.com/

V V Cárabos Coop. V. Enjoy Data
"-"

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Francesc_Altet · February 9, 2008, 9:42am

A Friday 08 February 2008, Lewandowski, George escrigué:

Ah, this is very interesting. I checked, and I am using sizeof() to
create the datatype. However, I am committing the datatype to the
file, and using that committed type to read in the data. I was under
the impression that this was the safer way to do it. Perhaps I have
to rethink this.

Yes, absolutely. It is completely unadvisable to re-use datatypes
on-disk for your in-memory ones. If you want maximum portability, you
should recreate the in-memory datatypes adapted for the reading
platform.

So, just use:

        hid_t ti_type = H5Tcreate(H5T_COMPOUND, sizeof( BHTagItem ) );
        size_t index_offset = 0;
        size_t code_offset = sizeof(uint64_t);
        H5Tinsert( ti_type, "index", index_offset, H5T_NATIVE_UINT64 );
        H5Tinsert( ti_type, "code" , code_offset, H5T_NATIVE_UINT32 );

for ti_type in your Linux box, and you should be able to read your data
without problems.

Regards,

···

--

0,0< Francesc Altet http://www.carabos.com/

V V Cárabos Coop. V. Enjoy Data
"-"

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.