fill values, compound types and padding

I've made the unfortunate mistake of failing to take compiler padding into account in the structures I've been using for HDF5 compound types. This has had the even more unfortunate result of making fill value elements not actually compare with the fill value (due to the pad bytes having different values, though this seems to occur only when a dataset has fill values explicitly written to disk as part of an output buffer).

My investigation has given indications that the pad bytes are being stored in the HDF5 file, which surprises me a bit. Is that the case? Should I pay rather more attention to optimizing my structures to mitigate padding?

As a follow-up to my original message (and I'm really hoping I can get an answer at this point, as this is proving to be quite a pain), I've found that if I write a file on a little-endian host and read it on a big-endian host, the fill value will have random values in the pad bytes. Here's an example:

fill value buffer just prior to call to H5Pset_fill_value when creating dataset on little-endian host:
   0000: 9f 86 01 00 00 00 00 00 00 00 00 00 00 00 f8 ff ................
   0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
   0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
   0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
   0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
   0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f8 ff ................
   0060: 00 00 00 00 00 00 f8 ff 00 00 00 00 00 00 f8 ff ................
   0070: 00 00 00 00 00 00 f8 ff 00 00 00 00 00 00 f8 ff ................
   0080: 00 00 00 00 00 00 f8 ff 00 00 00 00 00 00 f8 ff ................
   0090: 00 00 00 00 00 00 f8 ff 00 00 c0 ff 00 00 00 00 ................

fill value as read using H5Pget_fill_value on big-endian host (note that prior to the function call, the fill value buffer was all 0x00):
   0000: 00 01 86 9f 00 00 00 00 ff f8 00 00 00 00 00 00 ................
   0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
   0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
   0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
   0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0 ................
   0050: 00 00 00 00 00 00 00 50 ff f8 00 00 00 00 00 00 .......P........
   0060: ff f8 00 00 00 00 00 00 ff f8 00 00 00 00 00 00 ................
   0070: ff f8 00 00 00 00 00 00 ff f8 00 00 00 00 00 00 ................
   0080: ff f8 00 00 00 00 00 00 ff f8 00 00 00 00 00 00 ................
   0090: ff f8 00 00 00 00 00 00 ff c0 00 00 00 17 90 38 ...............8

pad bytes are @ 4f, 54-57, and 9c-9f. Given that when creating the dataset, and just prior to retrieving the fill value, those bytes were all zeroes (1st hex dump), it looks like somewhere deep in the bowels of the HDF5 library, these pad bytes are being filled with garbage somehow.

How can I get the fill value and element values out of a dataset without these pad bytes being filled with random data?

···

On 05/30/2012 05:27 PM, John K wrote:

I've made the unfortunate mistake of failing to take compiler padding into account in the structures I've been using for HDF5 compound types. This has had the even more unfortunate result of making fill value elements not actually compare with the fill value (due to the pad bytes having different values, though this seems to occur only when a dataset has fill values explicitly written to disk as part of an output buffer).

My investigation has given indications that the pad bytes are being stored in the HDF5 file, which surprises me a bit. Is that the case? Should I pay rather more attention to optimizing my structures to mitigate padding?

In the event anyone else runs into the issue I've been posting about, here's a patch. Basically what it does is clear (memset all bytes to zero) an buffer that's being allocated for type conversion when getting the fill value of a dataset. Without that, any compiler-inserted pad bytes are going to be whatever random bits happen to be in memory when bkg is allocated.

diff -rupN orig/src/H5Pdcpl.c hdf5-1.8.9/src/H5Pdcpl.c
--- orig/src/H5Pdcpl.c 2012-05-09 10:05:58.000000000 -0500
+++ hdf5-1.8.9/src/H5Pdcpl.c 2012-06-01 18:03:14.156605097 -0500
@@ -1557,8 +1557,11 @@ H5P_get_fill_value(H5P_genplist_t *plist
       */
      if(H5T_get_size(type) >= H5T_get_size(fill.type)) {
          buf = value;
- if(H5T_path_bkg(tpath) && NULL == (bkg = H5MM_malloc(H5T_get_size(type)
)))
- HGOTO_ERROR(H5E_PLIST, H5E_CANTALLOC, FAIL, "memory allocation fail
ed for type conversion")
+ if(H5T_path_bkg(tpath)) {
+ if(NULL == (bkg = H5MM_malloc(H5T_get_size(type))))
+ HGOTO_ERROR(H5E_PLIST, H5E_CANTALLOC, FAIL, "memory allocation
failed for type conversion")
+ HDmemset(bkg, 0, H5T_get_size(type));
+ }
      } /* end if */
      else {
          if(NULL == (buf = H5MM_malloc(H5T_get_size(fill.type))))