incorrect endianness when writing big-endian data on little-endian systems

I believe that we've encountered a bug in HDF5.

Our application receives data from a socket and writes it to a file using packet tables. The incoming data is in network byte order (big-endian) and all of the data types we specify for the packet tables are also the big-endian data types. We do not do any byte swapping before writing the buffer data, to reduce overhead.

When we were using HDF 1.8.14, this produced correct files when running the application on a little-endian system. We've updated to 1.8.16 and now the files are incorrect. Specifying big-endian data types causes the data to get byte-swapped (even though it's already big-endian) and specifying little-endian data types does not do any byte-swapping. I have also reproduced this problem using 1.8.17 and 1.10.0 (patch 1). This happens in both Windows and Linux.

I can't find any information in the release notes about this change. We can revert to using 1.8.14 for now, but we've moved to Visual Studio 2015 for building in Windows and that means we have to patch the HDF source before we can build it.

Is there any way to indicate that the buffer being passed to AppendPackets (we're using the C++ API; the corresponding C function is H5PTappend) is already big-endian? We cannot allow the overhead of two byte-swap operations when the incoming data is already in the correct byte order.

Barbara Jones
Software Engineer
[VTI_Inst_logo for email] [ametek_email]
5425 Warner Rd. | Suite 13 | Valley View, OH 44125 | http://www.vtiinstruments.com<http://www.vtiinstruments.com/&gt;

···

P. +1.216.447.8950 x2011 | F: +1.216.447.8951 | barbara.jones@ametek.com<mailto:barbara.jones@ametek.com>

Hi Barbara,

Are you using a packed struct? We are aware of an issue where using a packed struct alters the layout of the data. We are working on a fix for that issue. We think it should work if it is *not* packed.

If this is not the issue, can you send us a sample application that we can use to reproduce the issue? You can send it

···

to: help@hdfgroup.org

Thanks!
-Barbara
help@hdfgroup.org

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Barbara Jones
Sent: Monday, September 19, 2016 9:07 AM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] incorrect endianness when writing big-endian data on little-endian systems

I believe that we've encountered a bug in HDF5.

Our application receives data from a socket and writes it to a file using packet tables. The incoming data is in network byte order (big-endian) and all of the data types we specify for the packet tables are also the big-endian data types. We do not do any byte swapping before writing the buffer data, to reduce overhead.

When we were using HDF 1.8.14, this produced correct files when running the application on a little-endian system. We've updated to 1.8.16 and now the files are incorrect. Specifying big-endian data types causes the data to get byte-swapped (even though it's already big-endian) and specifying little-endian data types does not do any byte-swapping. I have also reproduced this problem using 1.8.17 and 1.10.0 (patch 1). This happens in both Windows and Linux.

I can't find any information in the release notes about this change. We can revert to using 1.8.14 for now, but we've moved to Visual Studio 2015 for building in Windows and that means we have to patch the HDF source before we can build it.

Is there any way to indicate that the buffer being passed to AppendPackets (we're using the C++ API; the corresponding C function is H5PTappend) is already big-endian? We cannot allow the overhead of two byte-swap operations when the incoming data is already in the correct byte order.

Barbara Jones
Software Engineer
[VTI_Inst_logo for email] [ametek_email]
5425 Warner Rd. | Suite 13 | Valley View, OH 44125 | http://www.vtiinstruments.com<http://www.vtiinstruments.com/&gt;
P. +1.216.447.8950 x2011 | F: +1.216.447.8951 | barbara.jones@ametek.com<mailto:barbara.jones@ametek.com>

Hi Barbara,

I'm not terribly familiar with the HDF terminology, since a coworker wrote our original implementation.

It will take a day or two for us to put together a sample application that reproduces the problem, since we'll need to include a dump of the raw data from our instruments. Until then, here's the code that we're using to define the packet table:
#define INSERT_PACKET(NAME, DATA_TYPE)\
  err = H5Tinsert(complex, NAME, offset, DATA_TYPE);\
  if (err) {\
    return err;\
  }\
  offset += H5Tget_size(DATA_TYPE);

inline size_t FramePacketHeader(hid_t complex)
{
    size_t offset = 0;
    herr_t err;

    INSERT_PACKET(HDFSTREAMING_PKT_INFO_STR, H5T_STD_U16BE);
    INSERT_PACKET(HDFSTREAMING_PKT_SIZE_STR, H5T_STD_U16BE);
    INSERT_PACKET(HDFSTREAMING_PKT_STREAM_ID_STR, H5T_STD_U32BE);
    INSERT_PACKET(HDFSTREAMING_PKT_OUI_STR, H5T_STD_U32BE);

    INSERT_PACKET(HDFSTREAMING_PKT_INFO_CLASS_CODE_STR, H5T_STD_U16BE);
    INSERT_PACKET(HDFSTREAMING_PKT_CLASS_CODE_STR, H5T_STD_U16BE);
    INSERT_PACKET(HDFSTREAMING_PKT_TIME_SEC_STR, H5T_STD_U32BE);
    INSERT_PACKET(HDFSTREAMING_PKT_TIME_FRAC_UPPER_STR, H5T_STD_U32BE);
    INSERT_PACKET(HDFSTREAMING_PKT_TIME_FRAC_LOWER_STR, H5T_STD_U32BE);
    return offset;
}

inline hid_t cdt_packet_class_meas_int32(uint32_t packetsize)
{
  herr_t err;
  size_t totsize = sizeof(uint32_t)*packetsize;
  hid_t complex = H5Tcreate(H5T_COMPOUND, totsize);

  size_t offset = FramePacketHeader(complex);
  hsize_t EU_size[] = { packetsize - PACKET_HEADER_SIZE };
  hid_t arr = H5Tarray_create(H5T_STD_U32BE, 1, EU_size);
  err = H5Tinsert(complex, HDFSTREAMING_SAMPLES_STR, offset, arr);
  if (err) {
    return err;
  }

  err = H5Tinsert(complex, HDFSTREAMING_TRAILER_STR, (packetsize - 1) * sizeof(uint32_t), H5T_STD_U32BE);
  if (err) {
    return err;
  }
  return complex;
}

And here's the code we're using to create the packet table and append data to it:
if (xff->channels[ch].ptable_ids[ptype] < 0) {
    xff->channels[ch].ptable_ids[ptype] = H5PTcreate_fl(xff->channels[ch].group_id, packet_class_names[ptype], type, 10, -1);
}
H5PTappend(xff->channels[ch].ptable_ids[ptype], 1, &buffer);

Regards,
Barbie
(Using my nickname here to avoid confusing due to the naming collision)

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Barbara Jones
Sent: Monday, September 19, 2016 1:19 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] incorrect endianness when writing big-endian data on little-endian systems

Hi Barbara,

Are you using a packed struct? We are aware of an issue where using a packed struct alters the layout of the data. We are working on a fix for that issue. We think it should work if it is *not* packed.

If this is not the issue, can you send us a sample application that we can use to reproduce the issue? You can send it
to: help@hdfgroup.org<mailto:help@hdfgroup.org>

Thanks!
-Barbara
help@hdfgroup.org<mailto:help@hdfgroup.org>

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Barbara Jones
Sent: Monday, September 19, 2016 9:07 AM
To: hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>
Subject: [Hdf-forum] incorrect endianness when writing big-endian data on little-endian systems

I believe that we've encountered a bug in HDF5.

Our application receives data from a socket and writes it to a file using packet tables. The incoming data is in network byte order (big-endian) and all of the data types we specify for the packet tables are also the big-endian data types. We do not do any byte swapping before writing the buffer data, to reduce overhead.

When we were using HDF 1.8.14, this produced correct files when running the application on a little-endian system. We've updated to 1.8.16 and now the files are incorrect. Specifying big-endian data types causes the data to get byte-swapped (even though it's already big-endian) and specifying little-endian data types does not do any byte-swapping. I have also reproduced this problem using 1.8.17 and 1.10.0 (patch 1). This happens in both Windows and Linux.

I can't find any information in the release notes about this change. We can revert to using 1.8.14 for now, but we've moved to Visual Studio 2015 for building in Windows and that means we have to patch the HDF source before we can build it.

Is there any way to indicate that the buffer being passed to AppendPackets (we're using the C++ API; the corresponding C function is H5PTappend) is already big-endian? We cannot allow the overhead of two byte-swap operations when the incoming data is already in the correct byte order.

Barbara Jones
Software Engineer
[VTI_Inst_logo for email] [ametek_email]
5425 Warner Rd. | Suite 13 | Valley View, OH 44125 | http://www.vtiinstruments.com<http://www.vtiinstruments.com/&gt;
P. +1.216.447.8950 x2011 | F: +1.216.447.8951 | barbara.jones@ametek.com<mailto:barbara.jones@ametek.com>