Issues with H5T_NATIVE_LDOUBLE

Andrea_Bedini · September 3, 2013, 5:30am

Hi,

I am experiencing the following issue with hdf5 and gcc 4.8.0

Consider this very simple test

#include <hdf5.h>

int main() {
  switch (H5Tget_order(H5T_NATIVE_LDOUBLE)) {
  case H5T_ORDER_LE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE\n");
    break;
  case H5T_ORDER_BE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_BE\n");
    break;
  case H5T_ORDER_VAX:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX\n");
    break;
  case H5T_ORDER_MIXED:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_MIXED\n");
    break;
  case H5T_ORDER_NONE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_NONE\n");
    break;
  default:
    printf("here are dragons\n");
  }
  return 0;
}

on the same x86_64 GNU/Linux machine I get

$ hdf5-1.8.11-gcc-4.7.0/my_test # compiled with gcc 4.7.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE

$ hdf5-1.8.11-gcc-4.8.0/my_test # compiled with gcc 4.8.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX

So H5T_NATIVE_LDOUBLE is mis-detected. I tried to dig deeper and basically
the fault must be in src/H5detect.c which is used to generate the
definitions in src/H5Tinit.c
I could not figure out what H5detect.c does wrong (it is not very readable,
given its extensive use of macros) but the compiler does emit a lot of
warnings (see https://gist.github.com/andreabedini/6419975).

I think this must be related to the failure of dt_arith long double test
observed recently.

Any suggestion on how to fix this ?

Best wishes,
Andrea

···

--
Andrea Bedini <andrea.bedini@gmail.com>

Andrea_Bedini · September 3, 2013, 8:32am

Hi there,

I think I have found the problem. The issue is in H5detect.c.
Macros DETECT_F and DETECT_I do not initialize properly the perm field in
the detected_t struct. As a result the routine fix_order is passed some
uninitialized memory which makes it fail. I have a small patch against
H5detect.c which fixes the problem by simply initializing the perm field
with zeros. Valgrind's tool memcheck would have exposed the problem.

Best wishes,
Andrea

hdf5_uninitialized.patch (1.28 KB)

···

On 3 September 2013 15:30, Andrea Bedini <andrea.bedini@gmail.com> wrote:

Hi,

I am experiencing the following issue with hdf5 and gcc 4.8.0

Consider this very simple test

#include <hdf5.h>

int main() {
  switch (H5Tget_order(H5T_NATIVE_LDOUBLE)) {
  case H5T_ORDER_LE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE\n");
    break;
  case H5T_ORDER_BE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_BE\n");
    break;
  case H5T_ORDER_VAX:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX\n");
    break;
  case H5T_ORDER_MIXED:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_MIXED\n");
    break;
  case H5T_ORDER_NONE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_NONE\n");
    break;
  default:
    printf("here are dragons\n");
  }
  return 0;
}

on the same x86_64 GNU/Linux machine I get

$ hdf5-1.8.11-gcc-4.7.0/my_test # compiled with gcc 4.7.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE

$ hdf5-1.8.11-gcc-4.8.0/my_test # compiled with gcc 4.8.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX

So H5T_NATIVE_LDOUBLE is mis-detected. I tried to dig deeper and basically
the fault must be in src/H5detect.c which is used to generate the
definitions in src/H5Tinit.c
I could not figure out what H5detect.c does wrong (it is not very
readable, given its extensive use of macros) but the compiler does emit a
lot of warnings (see https://gist.github.com/andreabedini/6419975\).

I think this must be related to the failure of dt_arith long double test
observed recently.

Any suggestion on how to fix this ?

Best wishes,
Andrea

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>

Raymond_Lu · September 3, 2013, 7:43pm

Andrea,

We've verified that your solution is correct. We're putting your fix into the library. Thanks for helping us.

Ray

···

On Sep 3, 2013, at 3:32 AM, Andrea Bedini wrote:

Hi there,

I think I have found the problem. The issue is in H5detect.c. Macros DETECT_F and DETECT_I do not initialize properly the perm field in the detected_t struct. As a result the routine fix_order is passed some uninitialized memory which makes it fail. I have a small patch against H5detect.c which fixes the problem by simply initializing the perm field with zeros. Valgrind's tool memcheck would have exposed the problem.

Best wishes,
Andrea

On 3 September 2013 15:30, Andrea Bedini <andrea.bedini@gmail.com> wrote:
Hi,

I am experiencing the following issue with hdf5 and gcc 4.8.0

Consider this very simple test

#include <hdf5.h>

int main() {
  switch (H5Tget_order(H5T_NATIVE_LDOUBLE)) {
  case H5T_ORDER_LE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE\n");
    break;
  case H5T_ORDER_BE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_BE\n");
    break;
  case H5T_ORDER_VAX:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX\n");
    break;
  case H5T_ORDER_MIXED:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_MIXED\n");
    break;
  case H5T_ORDER_NONE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_NONE\n");
    break;
  default:
    printf("here are dragons\n");
  }
  return 0;
}

on the same x86_64 GNU/Linux machine I get

$ hdf5-1.8.11-gcc-4.7.0/my_test # compiled with gcc 4.7.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE

$ hdf5-1.8.11-gcc-4.8.0/my_test # compiled with gcc 4.8.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX

So H5T_NATIVE_LDOUBLE is mis-detected. I tried to dig deeper and basically the fault must be in src/H5detect.c which is used to generate the definitions in src/H5Tinit.c
I could not figure out what H5detect.c does wrong (it is not very readable, given its extensive use of macros) but the compiler does emit a lot of warnings (see https://gist.github.com/andreabedini/6419975\).

I think this must be related to the failure of dt_arith long double test observed recently.

Any suggestion on how to fix this ?

Best wishes,
Andrea

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>
<hdf5_uninitialized.patch>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Andrea_Bedini · September 3, 2013, 10:00pm

Hi Ray,

thanks for giving it a look. Antonio made me notice that something else
might be at work since the macro DETECT_F already zeroes the structure
right before anything else:

memset(&INFO, 0, sizeof(INFO)); #L299

so I don't understand how the perm fields need to be zeroed again around
line #L308. This still considering the "Byte Order" loop as a black box.

As a side question: isn't there a more portable way of doing this? I am
pretty sure H5detect.c might invoke a bunch of undefined behaviours given
the amount of warning the compiler generates and of bit trickery.

Best wishes,
Andrea

···

On 4 September 2013 05:43, Raymond Lu <songyulu@hdfgroup.org> wrote:

Andrea,

We've verified that your solution is correct. We're putting your fix into
the library. Thanks for helping us.

Ray

On Sep 3, 2013, at 3:32 AM, Andrea Bedini wrote:

Hi there,

I think I have found the problem. The issue is in H5detect.c.
Macros DETECT_F and DETECT_I do not initialize properly the perm field in
the detected_t struct. As a result the routine fix_order is passed some
uninitialized memory which makes it fail. I have a small patch against
H5detect.c which fixes the problem by simply initializing the perm field
with zeros. Valgrind's tool memcheck would have exposed the problem.

Best wishes,
Andrea

On 3 September 2013 15:30, Andrea Bedini <andrea.bedini@gmail.com> wrote:

Hi,

I am experiencing the following issue with hdf5 and gcc 4.8.0

Consider this very simple test

#include <hdf5.h>

int main() {
  switch (H5Tget_order(H5T_NATIVE_LDOUBLE)) {
  case H5T_ORDER_LE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE\n");
    break;
  case H5T_ORDER_BE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_BE\n");
    break;
  case H5T_ORDER_VAX:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX\n");
    break;
  case H5T_ORDER_MIXED:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_MIXED\n");
    break;
  case H5T_ORDER_NONE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_NONE\n");
    break;
  default:
    printf("here are dragons\n");
  }
  return 0;
}

on the same x86_64 GNU/Linux machine I get

$ hdf5-1.8.11-gcc-4.7.0/my_test # compiled with gcc 4.7.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE

$ hdf5-1.8.11-gcc-4.8.0/my_test # compiled with gcc 4.8.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX

So H5T_NATIVE_LDOUBLE is mis-detected. I tried to dig deeper and
basically the fault must be in src/H5detect.c which is used to generate the
definitions in src/H5Tinit.c
I could not figure out what H5detect.c does wrong (it is not very
readable, given its extensive use of macros) but the compiler does emit a
lot of warnings (see https://gist.github.com/andreabedini/6419975\).

I think this must be related to the failure of dt_arith long double test
observed recently.

Any suggestion on how to fix this ?

Best wishes,
Andrea

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>
<hdf5_uninitialized.patch>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>

Andrea_Bedini · September 4, 2013, 12:50am

Hi,

I found something else (I know, I should stop :)). I am not entirely sure
but it seems that when H5detect fails it writes "unable to calculate
alignment for long double" on stderr so this message should be observable
on build logs (although buried by other warnings). The packages on debian
sid and testing for both i386 and x86-64 seem to be affected:

https://buildd.debian.org/status/fetch.php?pkg=hdf5&arch=amd64&ver=1.8.11-3%2Bb1&stamp=1377024563
https://buildd.debian.org/status/fetch.php?pkg=hdf5&arch=i386&ver=1.8.11-3%2Bb1&stamp=1377025110

But here's the exciting part: look what I found

http://www.unidata.ucar.edu/mailing_lists/archives/gembud/2010/msg00052.html

It's a build log from 2010 for HDF5 v1.6.5 and gcc-4.4.3 that says "unable
to calculate alignment for long double".

If my understanding is correct, nor 1.8.11 or gcc 4.8.0 would be the
problem and it would be that piece of code just doesn't work properly.

Best wishes,
Andrea

···

On 4 September 2013 08:00, Andrea Bedini <andrea.bedini@gmail.com> wrote:

Hi Ray,

thanks for giving it a look. Antonio made me notice that something else
might be at work since the macro DETECT_F already zeroes the structure
right before anything else:

memset(&INFO, 0, sizeof(INFO)); #L299

so I don't understand how the perm fields need to be zeroed again around
line #L308. This still considering the "Byte Order" loop as a black box.

As a side question: isn't there a more portable way of doing this? I am
pretty sure H5detect.c might invoke a bunch of undefined behaviours given
the amount of warning the compiler generates and of bit trickery.

Best wishes,
Andrea

On 4 September 2013 05:43, Raymond Lu <songyulu@hdfgroup.org> wrote:

Andrea,

We've verified that your solution is correct. We're putting your fix
into the library. Thanks for helping us.

Ray

On Sep 3, 2013, at 3:32 AM, Andrea Bedini wrote:

Hi there,

I think I have found the problem. The issue is in H5detect.c.
Macros DETECT_F and DETECT_I do not initialize properly the perm field in
the detected_t struct. As a result the routine fix_order is passed some
uninitialized memory which makes it fail. I have a small patch against
H5detect.c which fixes the problem by simply initializing the perm field
with zeros. Valgrind's tool memcheck would have exposed the problem.

Best wishes,
Andrea

On 3 September 2013 15:30, Andrea Bedini <andrea.bedini@gmail.com> wrote:

Hi,

I am experiencing the following issue with hdf5 and gcc 4.8.0

Consider this very simple test

#include <hdf5.h>

int main() {
  switch (H5Tget_order(H5T_NATIVE_LDOUBLE)) {
  case H5T_ORDER_LE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE\n");
    break;
  case H5T_ORDER_BE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_BE\n");
    break;
  case H5T_ORDER_VAX:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX\n");
    break;
  case H5T_ORDER_MIXED:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_MIXED\n");
    break;
  case H5T_ORDER_NONE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_NONE\n");
    break;
  default:
    printf("here are dragons\n");
  }
  return 0;
}

on the same x86_64 GNU/Linux machine I get

$ hdf5-1.8.11-gcc-4.7.0/my_test # compiled with gcc 4.7.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE

$ hdf5-1.8.11-gcc-4.8.0/my_test # compiled with gcc 4.8.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX

So H5T_NATIVE_LDOUBLE is mis-detected. I tried to dig deeper and
basically the fault must be in src/H5detect.c which is used to generate the
definitions in src/H5Tinit.c
I could not figure out what H5detect.c does wrong (it is not very
readable, given its extensive use of macros) but the compiler does emit a
lot of warnings (see https://gist.github.com/andreabedini/6419975\).

I think this must be related to the failure of dt_arith long double test
observed recently.

Any suggestion on how to fix this ?

Best wishes,
Andrea

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>
<hdf5_uninitialized.patch>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>

gnwiii · September 4, 2013, 12:58pm

Another historical reference to the obscurity of this code is: <
118777 – sci-libs/hdf5-1.6.2 (and 1.6.4) cannot build because of Bus error in H5detect.

I've been building HDF5 libraries for use with NASA SeaDAS, and recently
have started using HDF5 with R and GDAL. The SeaDAS builds are static, and
I don't find the "unable to calculate alignment for long double" message in
my SeaDAS build logs on linux and OS X. For R and GDAL, however, I need
dynamic libraries and those build logs do have the "unable to calculate
alignment for long double" message on both linux and OS X.

···

On Tue, Sep 3, 2013 at 9:50 PM, Andrea Bedini <andrea.bedini@gmail.com>wrote:

Hi,

I found something else (I know, I should stop :)). I am not entirely sure
but it seems that when H5detect fails it writes "unable to calculate
alignment for long double" on stderr so this message should be observable
on build logs (although buried by other warnings). The packages on debian
sid and testing for both i386 and x86-64 seem to be affected:

Build log for hdf5 (1.8.11-3+b1) on amd64

Build log for hdf5 (1.8.11-3+b1) on i386

But here's the exciting part: look what I found

[gembud] Problems compiling Gempak 5.11.4 on Fedora 12

It's a build log from 2010 for HDF5 v1.6.5 and gcc-4.4.3 that says "unable
to calculate alignment for long double".

If my understanding is correct, nor 1.8.11 or gcc 4.8.0 would be the
problem and it would be that piece of code just doesn't work properly.

Best wishes,
Andrea

On 4 September 2013 08:00, Andrea Bedini <andrea.bedini@gmail.com> wrote:

Hi Ray,

thanks for giving it a look. Antonio made me notice that something else
might be at work since the macro DETECT_F already zeroes the structure
right before anything else:

memset(&INFO, 0, sizeof(INFO)); #L299

so I don't understand how the perm fields need to be zeroed again around
line #L308. This still considering the "Byte Order" loop as a black box.

As a side question: isn't there a more portable way of doing this? I am
pretty sure H5detect.c might invoke a bunch of undefined behaviours given
the amount of warning the compiler generates and of bit trickery.

Best wishes,
Andrea

On 4 September 2013 05:43, Raymond Lu <songyulu@hdfgroup.org> wrote:

Andrea,

We've verified that your solution is correct. We're putting your fix
into the library. Thanks for helping us.

Ray

On Sep 3, 2013, at 3:32 AM, Andrea Bedini wrote:

Hi there,

I think I have found the problem. The issue is in H5detect.c.
Macros DETECT_F and DETECT_I do not initialize properly the perm field in
the detected_t struct. As a result the routine fix_order is passed some
uninitialized memory which makes it fail. I have a small patch against
H5detect.c which fixes the problem by simply initializing the perm field
with zeros. Valgrind's tool memcheck would have exposed the problem.

Best wishes,
Andrea

On 3 September 2013 15:30, Andrea Bedini <andrea.bedini@gmail.com>wrote:

Hi,

I am experiencing the following issue with hdf5 and gcc 4.8.0

Consider this very simple test

#include <hdf5.h>

int main() {
  switch (H5Tget_order(H5T_NATIVE_LDOUBLE)) {
  case H5T_ORDER_LE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE\n");
    break;
  case H5T_ORDER_BE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_BE\n");
    break;
  case H5T_ORDER_VAX:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX\n");
    break;
  case H5T_ORDER_MIXED:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_MIXED\n");
    break;
  case H5T_ORDER_NONE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_NONE\n");
    break;
  default:
    printf("here are dragons\n");
  }
  return 0;
}

on the same x86_64 GNU/Linux machine I get

$ hdf5-1.8.11-gcc-4.7.0/my_test # compiled with gcc 4.7.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE

$ hdf5-1.8.11-gcc-4.8.0/my_test # compiled with gcc 4.8.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX

So H5T_NATIVE_LDOUBLE is mis-detected. I tried to dig deeper and
basically the fault must be in src/H5detect.c which is used to generate the
definitions in src/H5Tinit.c
I could not figure out what H5detect.c does wrong (it is not very
readable, given its extensive use of macros) but the compiler does emit a
lot of warnings (see https://gist.github.com/andreabedini/6419975\).

I think this must be related to the failure of dt_arith long double
test observed recently.

Any suggestion on how to fix this ?

Best wishes,
Andrea

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>
<hdf5_uninitialized.patch>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
George N. White III <aa056@chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia

Andrea_Bedini · September 5, 2013, 1:40am

Thanks George.

For anyone interested in debugging this problem, debian has an extensive
collection of build logs over many architectures
Build logs for hdf5 (going back 12 years!)

As far as I know, the corruption is limited to the H5T_NATIVE_LDOUBLE type.
You can check your particular build with the following test

#include <hdf5.h>
int main() {
return !(H5Tget_order(H5T_NATIVE_LDOUBLE) == H5Tget_order(
H5T_NATIVE_DOUBLE));
}

It exits with code 1 if the long double has different byte ordering than
double (which is technically possible, but highly suspicious).

Otherwise the patch I sent earlier in this thread seems to do the trick,
although what exactly is going wrong is still beyond my understanding.

Third option: you can define an equivalent of H5T_NATIVE_LDOUBLE yourself.
The following creates a data type representing a long double as implemented
by gcc on x86 architectures (see
long double - Wikipedia for details)

hid_t ldouble_datatype = H5Tcopy(H5T_NATIVE_DOUBLE);
H5Tset_size(ldouble_datatype, sizeof(long double));
H5Tset_precision(ldouble_datatype, 80);
H5Tset_fields (ldouble_datatype, 79, 64, 15, 0, 64);
H5Tset_pad(ldouble_datatype, H5T_PAD_ZERO, H5T_PAD_ZERO);
H5Tset_inpad(ldouble_datatype, H5T_PAD_ZERO);
H5Tset_ebias(ldouble_datatype, 16383);
H5Tset_norm(ldouble_datatype, H5T_NORM_NONE);

Best wishes,
Andrea

···

On 4 September 2013 22:58, George N. White III <gnwiii@gmail.com> wrote:

Another historical reference to the obscurity of this code is: <
118777 – sci-libs/hdf5-1.6.2 (and 1.6.4) cannot build because of Bus error in H5detect.

I've been building HDF5 libraries for use with NASA SeaDAS, and recently
have started using HDF5 with R and GDAL. The SeaDAS builds are static, and
I don't find the "unable to calculate alignment for long double" message in
my SeaDAS build logs on linux and OS X. For R and GDAL, however, I need
dynamic libraries and those build logs do have the "unable to calculate
alignment for long double" message on both linux and OS X.

On Tue, Sep 3, 2013 at 9:50 PM, Andrea Bedini <andrea.bedini@gmail.com>wrote:

Hi,

I found something else (I know, I should stop :)). I am not entirely sure
but it seems that when H5detect fails it writes "unable to calculate
alignment for long double" on stderr so this message should be observable
on build logs (although buried by other warnings). The packages on debian
sid and testing for both i386 and x86-64 seem to be affected:

Build log for hdf5 (1.8.11-3+b1) on amd64

Build log for hdf5 (1.8.11-3+b1) on i386

But here's the exciting part: look what I found

[gembud] Problems compiling Gempak 5.11.4 on Fedora 12

It's a build log from 2010 for HDF5 v1.6.5 and gcc-4.4.3 that says "unable
to calculate alignment for long double".

If my understanding is correct, nor 1.8.11 or gcc 4.8.0 would be the
problem and it would be that piece of code just doesn't work properly.

Best wishes,
Andrea

On 4 September 2013 08:00, Andrea Bedini <andrea.bedini@gmail.com> wrote:

Hi Ray,

thanks for giving it a look. Antonio made me notice that something else
might be at work since the macro DETECT_F already zeroes the structure
right before anything else:

memset(&INFO, 0, sizeof(INFO)); #L299

so I don't understand how the perm fields need to be zeroed again around
line #L308. This still considering the "Byte Order" loop as a black box.

As a side question: isn't there a more portable way of doing this? I am
pretty sure H5detect.c might invoke a bunch of undefined behaviours given
the amount of warning the compiler generates and of bit trickery.

Best wishes,
Andrea

On 4 September 2013 05:43, Raymond Lu <songyulu@hdfgroup.org> wrote:

Andrea,

We've verified that your solution is correct. We're putting your fix
into the library. Thanks for helping us.

Ray

On Sep 3, 2013, at 3:32 AM, Andrea Bedini wrote:

Hi there,

I think I have found the problem. The issue is in H5detect.c.
Macros DETECT_F and DETECT_I do not initialize properly the perm field in
the detected_t struct. As a result the routine fix_order is passed some
uninitialized memory which makes it fail. I have a small patch against
H5detect.c which fixes the problem by simply initializing the perm field
with zeros. Valgrind's tool memcheck would have exposed the problem.

Best wishes,
Andrea

On 3 September 2013 15:30, Andrea Bedini <andrea.bedini@gmail.com>wrote:

Hi,

I am experiencing the following issue with hdf5 and gcc 4.8.0

Consider this very simple test

#include <hdf5.h>

int main() {
  switch (H5Tget_order(H5T_NATIVE_LDOUBLE)) {
  case H5T_ORDER_LE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE\n");
    break;
  case H5T_ORDER_BE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_BE\n");
    break;
  case H5T_ORDER_VAX:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX\n");
    break;
  case H5T_ORDER_MIXED:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_MIXED\n");
    break;
  case H5T_ORDER_NONE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_NONE\n");
    break;
  default:
    printf("here are dragons\n");
  }
  return 0;
}

on the same x86_64 GNU/Linux machine I get

$ hdf5-1.8.11-gcc-4.7.0/my_test # compiled with gcc 4.7.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE

$ hdf5-1.8.11-gcc-4.8.0/my_test # compiled with gcc 4.8.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX

So H5T_NATIVE_LDOUBLE is mis-detected. I tried to dig deeper and
basically the fault must be in src/H5detect.c which is used to generate the
definitions in src/H5Tinit.c
I could not figure out what H5detect.c does wrong (it is not very
readable, given its extensive use of macros) but the compiler does emit a
lot of warnings (see https://gist.github.com/andreabedini/6419975\).

I think this must be related to the failure of dt_arith long double
test observed recently.

Any suggestion on how to fix this ?

Best wishes,
Andrea

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>
<hdf5_uninitialized.patch>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
George N. White III <aa056@chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>

Raymond_Lu · September 5, 2013, 8:40pm

I isolated part of the DETECT_F into a C program as attached (detect.c). It only contains the algorithm for detecting the byte order of long double. When I compile it with gcc -g, -O0, or no flag, it reports little-endian. When I compile it with -O1, -O2, or -O3, it reports VAX order. I don't know where goes wrong yet. But I suspect GCC's optimization has bugs. Maybe you can help me.

I haven't tried the algorithms for other parts in DETECT_F yet. The alignment problem you talked about is one of the other algorithms.

Ray

detect.c (3.73 KB)

···

On Sep 4, 2013, at 8:40 PM, Andrea Bedini wrote:

Thanks George.

For anyone interested in debugging this problem, debian has an extensive collection of build logs over many architectures Build logs for hdf5 (going back 12 years!)

As far as I know, the corruption is limited to the H5T_NATIVE_LDOUBLE type. You can check your particular build with the following test

#include <hdf5.h>
int main() {
  return !(H5Tget_order(H5T_NATIVE_LDOUBLE) == H5Tget_order(H5T_NATIVE_DOUBLE));
}

It exits with code 1 if the long double has different byte ordering than double (which is technically possible, but highly suspicious).

Otherwise the patch I sent earlier in this thread seems to do the trick, although what exactly is going wrong is still beyond my understanding.

Third option: you can define an equivalent of H5T_NATIVE_LDOUBLE yourself. The following creates a data type representing a long double as implemented by gcc on x86 architectures (see long double - Wikipedia for details)

hid_t ldouble_datatype = H5Tcopy(H5T_NATIVE_DOUBLE);
H5Tset_size(ldouble_datatype, sizeof(long double));
H5Tset_precision(ldouble_datatype, 80);
H5Tset_fields (ldouble_datatype, 79, 64, 15, 0, 64);
H5Tset_pad(ldouble_datatype, H5T_PAD_ZERO, H5T_PAD_ZERO);
H5Tset_inpad(ldouble_datatype, H5T_PAD_ZERO);
H5Tset_ebias(ldouble_datatype, 16383);
H5Tset_norm(ldouble_datatype, H5T_NORM_NONE);

Best wishes,
Andrea

On 4 September 2013 22:58, George N. White III <gnwiii@gmail.com> wrote:
Another historical reference to the obscurity of this code is: <118777 – sci-libs/hdf5-1.6.2 (and 1.6.4) cannot build because of Bus error in H5detect.

I've been building HDF5 libraries for use with NASA SeaDAS, and recently have started using HDF5 with R and GDAL. The SeaDAS builds are static, and I don't find the "unable to calculate alignment for long double" message in my SeaDAS build logs on linux and OS X. For R and GDAL, however, I need dynamic libraries and those build logs do have the "unable to calculate alignment for long double" message on both linux and OS X.

On Tue, Sep 3, 2013 at 9:50 PM, Andrea Bedini <andrea.bedini@gmail.com> wrote:
Hi,

I found something else (I know, I should stop :)). I am not entirely sure but it seems that when H5detect fails it writes "unable to calculate alignment for long double" on stderr so this message should be observable on build logs (although buried by other warnings). The packages on debian sid and testing for both i386 and x86-64 seem to be affected:

Build log for hdf5 (1.8.11-3+b1) on amd64
Build log for hdf5 (1.8.11-3+b1) on i386

But here's the exciting part: look what I found

[gembud] Problems compiling Gempak 5.11.4 on Fedora 12

It's a build log from 2010 for HDF5 v1.6.5 and gcc-4.4.3 that says "unable to calculate alignment for long double".

If my understanding is correct, nor 1.8.11 or gcc 4.8.0 would be the problem and it would be that piece of code just doesn't work properly.

Best wishes,
Andrea

On 4 September 2013 08:00, Andrea Bedini <andrea.bedini@gmail.com> wrote:
Hi Ray,

thanks for giving it a look. Antonio made me notice that something else might be at work since the macro DETECT_F already zeroes the structure right before anything else:

memset(&INFO, 0, sizeof(INFO)); #L299

so I don't understand how the perm fields need to be zeroed again around line #L308. This still considering the "Byte Order" loop as a black box.

As a side question: isn't there a more portable way of doing this? I am pretty sure H5detect.c might invoke a bunch of undefined behaviours given the amount of warning the compiler generates and of bit trickery.

Best wishes,
Andrea

On 4 September 2013 05:43, Raymond Lu <songyulu@hdfgroup.org> wrote:
Andrea,

We've verified that your solution is correct. We're putting your fix into the library. Thanks for helping us.

Ray

On Sep 3, 2013, at 3:32 AM, Andrea Bedini wrote:

Hi there,

I think I have found the problem. The issue is in H5detect.c. Macros DETECT_F and DETECT_I do not initialize properly the perm field in the detected_t struct. As a result the routine fix_order is passed some uninitialized memory which makes it fail. I have a small patch against H5detect.c which fixes the problem by simply initializing the perm field with zeros. Valgrind's tool memcheck would have exposed the problem.

Best wishes,
Andrea

On 3 September 2013 15:30, Andrea Bedini <andrea.bedini@gmail.com> wrote:
Hi,

I am experiencing the following issue with hdf5 and gcc 4.8.0

Consider this very simple test

#include <hdf5.h>

int main() {
  switch (H5Tget_order(H5T_NATIVE_LDOUBLE)) {
  case H5T_ORDER_LE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE\n");
    break;
  case H5T_ORDER_BE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_BE\n");
    break;
  case H5T_ORDER_VAX:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX\n");
    break;
  case H5T_ORDER_MIXED:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_MIXED\n");
    break;
  case H5T_ORDER_NONE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_NONE\n");
    break;
  default:
    printf("here are dragons\n");
  }
  return 0;
}

on the same x86_64 GNU/Linux machine I get

$ hdf5-1.8.11-gcc-4.7.0/my_test # compiled with gcc 4.7.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE

$ hdf5-1.8.11-gcc-4.8.0/my_test # compiled with gcc 4.8.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX

So H5T_NATIVE_LDOUBLE is mis-detected. I tried to dig deeper and basically the fault must be in src/H5detect.c which is used to generate the definitions in src/H5Tinit.c
I could not figure out what H5detect.c does wrong (it is not very readable, given its extensive use of macros) but the compiler does emit a lot of warnings (see https://gist.github.com/andreabedini/6419975\).

I think this must be related to the failure of dt_arith long double test observed recently.

Any suggestion on how to fix this ?

Best wishes,
Andrea

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>
<hdf5_uninitialized.patch>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
George N. White III <aa056@chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Andrea_Bedini · September 6, 2013, 6:49am

Hi Rey,

thanks for that, it really helped. I checked thoroughly and the memset of
the temporary variables disappears randomly.
It doesn't depends only on optimization though, on my machine putting a
printf("%Lf\n", value2); just before the loop changes the result.
I'm not sure who gets the blame here, poking into the padding bits of a
long double might just be unspecified or undefined behaviour.

Andrea

···

On 6 September 2013 06:40, Raymond Lu <songyulu@hdfgroup.org> wrote:

I isolated part of the DETECT_F into a C program as attached (detect.c).
It only contains the algorithm for detecting the byte order of long
double. When I compile it with gcc -g, -O0, or no flag, it reports
little-endian. When I compile it with -O1, -O2, or -O3, it reports VAX
order. I don't know where goes wrong yet. But I suspect GCC's
optimization has bugs. Maybe you can help me.

I haven't tried the algorithms for other parts in DETECT_F yet. The
alignment problem you talked about is one of the other algorithms.

Ray

On Sep 4, 2013, at 8:40 PM, Andrea Bedini wrote:

Thanks George.

For anyone interested in debugging this problem, debian has an extensive
collection of build logs over many architectures
Build logs for hdf5 (going back 12 years!)

As far as I know, the corruption is limited to the H5T_NATIVE_LDOUBLE
type. You can check your particular build with the following test

#include <hdf5.h>
int main() {
  return !(H5Tget_order(H5T_NATIVE_LDOUBLE) == H5Tget_order(
H5T_NATIVE_DOUBLE));
}

It exits with code 1 if the long double has different byte ordering than
double (which is technically possible, but highly suspicious).

Otherwise the patch I sent earlier in this thread seems to do the trick,
although what exactly is going wrong is still beyond my understanding.

Third option: you can define an equivalent of H5T_NATIVE_LDOUBLE yourself.
The following creates a data type representing a long double as implemented
by gcc on x86 architectures (see
long double - Wikipedia for details)

hid_t ldouble_datatype = H5Tcopy(H5T_NATIVE_DOUBLE);
H5Tset_size(ldouble_datatype, sizeof(long double));
H5Tset_precision(ldouble_datatype, 80);
H5Tset_fields (ldouble_datatype, 79, 64, 15, 0, 64);
H5Tset_pad(ldouble_datatype, H5T_PAD_ZERO, H5T_PAD_ZERO);
H5Tset_inpad(ldouble_datatype, H5T_PAD_ZERO);
H5Tset_ebias(ldouble_datatype, 16383);
H5Tset_norm(ldouble_datatype, H5T_NORM_NONE);

Best wishes,
Andrea

On 4 September 2013 22:58, George N. White III <gnwiii@gmail.com> wrote:

Another historical reference to the obscurity of this code is: <
118777 – sci-libs/hdf5-1.6.2 (and 1.6.4) cannot build because of Bus error in H5detect.

I've been building HDF5 libraries for use with NASA SeaDAS, and recently
have started using HDF5 with R and GDAL. The SeaDAS builds are static, and
I don't find the "unable to calculate alignment for long double" message in
my SeaDAS build logs on linux and OS X. For R and GDAL, however, I need
dynamic libraries and those build logs do have the "unable to calculate
alignment for long double" message on both linux and OS X.

On Tue, Sep 3, 2013 at 9:50 PM, Andrea Bedini <andrea.bedini@gmail.com>wrote:

Hi,

I found something else (I know, I should stop :)). I am not entirely
sure but it seems that when H5detect fails it writes "unable to calculate
alignment for long double" on stderr so this message should be observable
on build logs (although buried by other warnings). The packages on debian
sid and testing for both i386 and x86-64 seem to be affected:

Build log for hdf5 (1.8.11-3+b1) on amd64

Build log for hdf5 (1.8.11-3+b1) on i386

But here's the exciting part: look what I found

[gembud] Problems compiling Gempak 5.11.4 on Fedora 12

It's a build log from 2010 for HDF5 v1.6.5 and gcc-4.4.3 that says "unable
to calculate alignment for long double".

If my understanding is correct, nor 1.8.11 or gcc 4.8.0 would be the
problem and it would be that piece of code just doesn't work properly.

Best wishes,
Andrea

On 4 September 2013 08:00, Andrea Bedini <andrea.bedini@gmail.com>wrote:

Hi Ray,

thanks for giving it a look. Antonio made me notice that something else
might be at work since the macro DETECT_F already zeroes the structure
right before anything else:

memset(&INFO, 0, sizeof(INFO)); #L299

so I don't understand how the perm fields need to be zeroed again
around line #L308. This still considering the "Byte Order" loop as a black
box.

As a side question: isn't there a more portable way of doing this? I am
pretty sure H5detect.c might invoke a bunch of undefined behaviours given
the amount of warning the compiler generates and of bit trickery.

Best wishes,
Andrea

On 4 September 2013 05:43, Raymond Lu <songyulu@hdfgroup.org> wrote:

Andrea,

We've verified that your solution is correct. We're putting your fix
into the library. Thanks for helping us.

Ray

On Sep 3, 2013, at 3:32 AM, Andrea Bedini wrote:

Hi there,

I think I have found the problem. The issue is in H5detect.c.
Macros DETECT_F and DETECT_I do not initialize properly the perm field in
the detected_t struct. As a result the routine fix_order is passed some
uninitialized memory which makes it fail. I have a small patch against
H5detect.c which fixes the problem by simply initializing the perm field
with zeros. Valgrind's tool memcheck would have exposed the problem.

Best wishes,
Andrea

On 3 September 2013 15:30, Andrea Bedini <andrea.bedini@gmail.com>wrote:

Hi,

I am experiencing the following issue with hdf5 and gcc 4.8.0

Consider this very simple test

#include <hdf5.h>

int main() {
  switch (H5Tget_order(H5T_NATIVE_LDOUBLE)) {
  case H5T_ORDER_LE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE\n");
    break;
  case H5T_ORDER_BE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_BE\n");
    break;
  case H5T_ORDER_VAX:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX\n");
    break;
  case H5T_ORDER_MIXED:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_MIXED\n");
    break;
  case H5T_ORDER_NONE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_NONE\n");
    break;
  default:
    printf("here are dragons\n");
  }
  return 0;
}

on the same x86_64 GNU/Linux machine I get

$ hdf5-1.8.11-gcc-4.7.0/my_test # compiled with gcc 4.7.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE

$ hdf5-1.8.11-gcc-4.8.0/my_test # compiled with gcc 4.8.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX

So H5T_NATIVE_LDOUBLE is mis-detected. I tried to dig deeper and
basically the fault must be in src/H5detect.c which is used to generate the
definitions in src/H5Tinit.c
I could not figure out what H5detect.c does wrong (it is not very
readable, given its extensive use of macros) but the compiler does emit a
lot of warnings (see https://gist.github.com/andreabedini/6419975\).

I think this must be related to the failure of dt_arith long double
test observed recently.

Any suggestion on how to fix this ?

Best wishes,
Andrea

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>
<hdf5_uninitialized.patch>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
George N. White III <aa056@chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>

Andrea_Bedini · September 6, 2013, 7:20am

Some links on memset (in the opposite order I have found them)

···

On 6 September 2013 16:49, Andrea Bedini <andrea.bedini@gmail.com> wrote:

Hi Rey,

thanks for that, it really helped. I checked thoroughly and the memset of
the temporary variables disappears randomly.
It doesn't depends only on optimization though, on my machine putting a
printf("%Lf\n", value2); just before the loop changes the result.
I'm not sure who gets the blame here, poking into the padding bits of a
long double might just be unspecified or undefined behaviour.

Andrea

On 6 September 2013 06:40, Raymond Lu <songyulu@hdfgroup.org> wrote:

I isolated part of the DETECT_F into a C program as attached (detect.c).
It only contains the algorithm for detecting the byte order of long
double. When I compile it with gcc -g, -O0, or no flag, it reports
little-endian. When I compile it with -O1, -O2, or -O3, it reports VAX
order. I don't know where goes wrong yet. But I suspect GCC's
optimization has bugs. Maybe you can help me.

I haven't tried the algorithms for other parts in DETECT_F yet. The
alignment problem you talked about is one of the other algorithms.

Ray

On Sep 4, 2013, at 8:40 PM, Andrea Bedini wrote:

Thanks George.

For anyone interested in debugging this problem, debian has an extensive
collection of build logs over many architectures
Build logs for hdf5 (going back 12 years!)

As far as I know, the corruption is limited to the H5T_NATIVE_LDOUBLE
type. You can check your particular build with the following test

#include <hdf5.h>
int main() {
  return !(H5Tget_order(H5T_NATIVE_LDOUBLE) == H5Tget_order(
H5T_NATIVE_DOUBLE));
}

It exits with code 1 if the long double has different byte ordering than
double (which is technically possible, but highly suspicious).

Otherwise the patch I sent earlier in this thread seems to do the trick,
although what exactly is going wrong is still beyond my understanding.

Third option: you can define an equivalent of H5T_NATIVE_LDOUBLE
yourself. The following creates a data type representing a long double as
implemented by gcc on x86 architectures (see
long double - Wikipedia for details)

hid_t ldouble_datatype = H5Tcopy(H5T_NATIVE_DOUBLE);
H5Tset_size(ldouble_datatype, sizeof(long double));
H5Tset_precision(ldouble_datatype, 80);
H5Tset_fields (ldouble_datatype, 79, 64, 15, 0, 64);
H5Tset_pad(ldouble_datatype, H5T_PAD_ZERO, H5T_PAD_ZERO);
H5Tset_inpad(ldouble_datatype, H5T_PAD_ZERO);
H5Tset_ebias(ldouble_datatype, 16383);
H5Tset_norm(ldouble_datatype, H5T_NORM_NONE);

Best wishes,
Andrea

On 4 September 2013 22:58, George N. White III <gnwiii@gmail.com> wrote:

Another historical reference to the obscurity of this code is: <
118777 – sci-libs/hdf5-1.6.2 (and 1.6.4) cannot build because of Bus error in H5detect.

I've been building HDF5 libraries for use with NASA SeaDAS, and recently
have started using HDF5 with R and GDAL. The SeaDAS builds are static, and
I don't find the "unable to calculate alignment for long double" message in
my SeaDAS build logs on linux and OS X. For R and GDAL, however, I need
dynamic libraries and those build logs do have the "unable to calculate
alignment for long double" message on both linux and OS X.

On Tue, Sep 3, 2013 at 9:50 PM, Andrea Bedini <andrea.bedini@gmail.com>wrote:

Hi,

I found something else (I know, I should stop :)). I am not entirely
sure but it seems that when H5detect fails it writes "unable to calculate
alignment for long double" on stderr so this message should be observable
on build logs (although buried by other warnings). The packages on debian
sid and testing for both i386 and x86-64 seem to be affected:

Build log for hdf5 (1.8.11-3+b1) on amd64

Build log for hdf5 (1.8.11-3+b1) on i386

But here's the exciting part: look what I found

[gembud] Problems compiling Gempak 5.11.4 on Fedora 12

It's a build log from 2010 for HDF5 v1.6.5 and gcc-4.4.3 that says "unable
to calculate alignment for long double".

If my understanding is correct, nor 1.8.11 or gcc 4.8.0 would be the
problem and it would be that piece of code just doesn't work properly.

Best wishes,
Andrea

On 4 September 2013 08:00, Andrea Bedini <andrea.bedini@gmail.com>wrote:

Hi Ray,

thanks for giving it a look. Antonio made me notice that something
else might be at work since the macro DETECT_F already zeroes the structure
right before anything else:

memset(&INFO, 0, sizeof(INFO)); #L299

so I don't understand how the perm fields need to be zeroed again
around line #L308. This still considering the "Byte Order" loop as a black
box.

As a side question: isn't there a more portable way of doing this? I
am pretty sure H5detect.c might invoke a bunch of undefined behaviours
given the amount of warning the compiler generates and of bit trickery.

Best wishes,
Andrea

On 4 September 2013 05:43, Raymond Lu <songyulu@hdfgroup.org> wrote:

Andrea,

We've verified that your solution is correct. We're putting your fix
into the library. Thanks for helping us.

Ray

On Sep 3, 2013, at 3:32 AM, Andrea Bedini wrote:

Hi there,

I think I have found the problem. The issue is in H5detect.c.
Macros DETECT_F and DETECT_I do not initialize properly the perm field in
the detected_t struct. As a result the routine fix_order is passed some
uninitialized memory which makes it fail. I have a small patch against
H5detect.c which fixes the problem by simply initializing the perm field
with zeros. Valgrind's tool memcheck would have exposed the problem.

Best wishes,
Andrea

On 3 September 2013 15:30, Andrea Bedini <andrea.bedini@gmail.com>wrote:

Hi,

I am experiencing the following issue with hdf5 and gcc 4.8.0

Consider this very simple test

#include <hdf5.h>

int main() {
  switch (H5Tget_order(H5T_NATIVE_LDOUBLE)) {
  case H5T_ORDER_LE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE\n");
    break;
  case H5T_ORDER_BE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_BE\n");
    break;
  case H5T_ORDER_VAX:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX\n");
    break;
  case H5T_ORDER_MIXED:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_MIXED\n");
    break;
  case H5T_ORDER_NONE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_NONE\n");
    break;
  default:
    printf("here are dragons\n");
  }
  return 0;
}

on the same x86_64 GNU/Linux machine I get

$ hdf5-1.8.11-gcc-4.7.0/my_test # compiled with gcc 4.7.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE

$ hdf5-1.8.11-gcc-4.8.0/my_test # compiled with gcc 4.8.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX

So H5T_NATIVE_LDOUBLE is mis-detected. I tried to dig deeper and
basically the fault must be in src/H5detect.c which is used to generate the
definitions in src/H5Tinit.c
I could not figure out what H5detect.c does wrong (it is not very
readable, given its extensive use of macros) but the compiler does emit a
lot of warnings (see https://gist.github.com/andreabedini/6419975\).

I think this must be related to the failure of dt_arith long double
test observed recently.

Any suggestion on how to fix this ?

Best wishes,
Andrea

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>
<hdf5_uninitialized.patch>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
George N. White III <aa056@chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>

Raymond_Lu · September 6, 2013, 10:24pm

Andrea,

My coworker Neil helped me in this afternoon to find out that when GCC 4.8 compiler assigns constant values to variables (value1 and value2) like this,

   for(i = 0, value1 = 0.0, value2 = 1.0; i < (int)sizeof(long double); i++) {
      value3 = value1;
      value1 += value2;
      value2 /= 256.0;
    :
    :

it introduces some garbage to the two padding bytes of value1 and value2. Then the garbage confuses our algorithm, especially the value of "last_mbyte" gets wrong. To fix it in a simple way, use an intermediate variable like this:

long double tmp_value, divisor;

   tmp_value = 0.0;
   value1 = tmp_value;
   tmp_value = 1.0;
   value2 = tmp_value;
   tmp_value = 256.0;
   divisor = tmp_value;

   for(i = 0; i < (int)sizeof(long double); i++) {
      value3 = value1;
      value1 += value2;
      value2 /= divisor;
    :
    :

How do you think about it?

Ray

···

On Sep 6, 2013, at 1:49 AM, Andrea Bedini wrote:

Hi Rey,

thanks for that, it really helped. I checked thoroughly and the memset of the temporary variables disappears randomly.
It doesn't depends only on optimization though, on my machine putting a printf("%Lf\n", value2); just before the loop changes the result.
I'm not sure who gets the blame here, poking into the padding bits of a long double might just be unspecified or undefined behaviour.

Andrea

On 6 September 2013 06:40, Raymond Lu <songyulu@hdfgroup.org> wrote:
I isolated part of the DETECT_F into a C program as attached (detect.c). It only contains the algorithm for detecting the byte order of long double. When I compile it with gcc -g, -O0, or no flag, it reports little-endian. When I compile it with -O1, -O2, or -O3, it reports VAX order. I don't know where goes wrong yet. But I suspect GCC's optimization has bugs. Maybe you can help me.

I haven't tried the algorithms for other parts in DETECT_F yet. The alignment problem you talked about is one of the other algorithms.

Ray

On Sep 4, 2013, at 8:40 PM, Andrea Bedini wrote:

Thanks George.

For anyone interested in debugging this problem, debian has an extensive collection of build logs over many architectures Build logs for hdf5 (going back 12 years!)

As far as I know, the corruption is limited to the H5T_NATIVE_LDOUBLE type. You can check your particular build with the following test

#include <hdf5.h>
int main() {
  return !(H5Tget_order(H5T_NATIVE_LDOUBLE) == H5Tget_order(H5T_NATIVE_DOUBLE));
}

It exits with code 1 if the long double has different byte ordering than double (which is technically possible, but highly suspicious).

Otherwise the patch I sent earlier in this thread seems to do the trick, although what exactly is going wrong is still beyond my understanding.

Third option: you can define an equivalent of H5T_NATIVE_LDOUBLE yourself. The following creates a data type representing a long double as implemented by gcc on x86 architectures (see long double - Wikipedia for details)

hid_t ldouble_datatype = H5Tcopy(H5T_NATIVE_DOUBLE);
H5Tset_size(ldouble_datatype, sizeof(long double));
H5Tset_precision(ldouble_datatype, 80);
H5Tset_fields (ldouble_datatype, 79, 64, 15, 0, 64);
H5Tset_pad(ldouble_datatype, H5T_PAD_ZERO, H5T_PAD_ZERO);
H5Tset_inpad(ldouble_datatype, H5T_PAD_ZERO);
H5Tset_ebias(ldouble_datatype, 16383);
H5Tset_norm(ldouble_datatype, H5T_NORM_NONE);

Best wishes,
Andrea

On 4 September 2013 22:58, George N. White III <gnwiii@gmail.com> wrote:
Another historical reference to the obscurity of this code is: <118777 – sci-libs/hdf5-1.6.2 (and 1.6.4) cannot build because of Bus error in H5detect.

I've been building HDF5 libraries for use with NASA SeaDAS, and recently have started using HDF5 with R and GDAL. The SeaDAS builds are static, and I don't find the "unable to calculate alignment for long double" message in my SeaDAS build logs on linux and OS X. For R and GDAL, however, I need dynamic libraries and those build logs do have the "unable to calculate alignment for long double" message on both linux and OS X.

On Tue, Sep 3, 2013 at 9:50 PM, Andrea Bedini <andrea.bedini@gmail.com> wrote:
Hi,

I found something else (I know, I should stop :)). I am not entirely sure but it seems that when H5detect fails it writes "unable to calculate alignment for long double" on stderr so this message should be observable on build logs (although buried by other warnings). The packages on debian sid and testing for both i386 and x86-64 seem to be affected:

Build log for hdf5 (1.8.11-3+b1) on amd64
Build log for hdf5 (1.8.11-3+b1) on i386

But here's the exciting part: look what I found

[gembud] Problems compiling Gempak 5.11.4 on Fedora 12

It's a build log from 2010 for HDF5 v1.6.5 and gcc-4.4.3 that says "unable to calculate alignment for long double".

If my understanding is correct, nor 1.8.11 or gcc 4.8.0 would be the problem and it would be that piece of code just doesn't work properly.

Best wishes,
Andrea

On 4 September 2013 08:00, Andrea Bedini <andrea.bedini@gmail.com> wrote:
Hi Ray,

thanks for giving it a look. Antonio made me notice that something else might be at work since the macro DETECT_F already zeroes the structure right before anything else:

memset(&INFO, 0, sizeof(INFO)); #L299

so I don't understand how the perm fields need to be zeroed again around line #L308. This still considering the "Byte Order" loop as a black box.

As a side question: isn't there a more portable way of doing this? I am pretty sure H5detect.c might invoke a bunch of undefined behaviours given the amount of warning the compiler generates and of bit trickery.

Best wishes,
Andrea

On 4 September 2013 05:43, Raymond Lu <songyulu@hdfgroup.org> wrote:
Andrea,

We've verified that your solution is correct. We're putting your fix into the library. Thanks for helping us.

Ray

On Sep 3, 2013, at 3:32 AM, Andrea Bedini wrote:

Hi there,

I think I have found the problem. The issue is in H5detect.c. Macros DETECT_F and DETECT_I do not initialize properly the perm field in the detected_t struct. As a result the routine fix_order is passed some uninitialized memory which makes it fail. I have a small patch against H5detect.c which fixes the problem by simply initializing the perm field with zeros. Valgrind's tool memcheck would have exposed the problem.

Best wishes,
Andrea

On 3 September 2013 15:30, Andrea Bedini <andrea.bedini@gmail.com> wrote:
Hi,

I am experiencing the following issue with hdf5 and gcc 4.8.0

Consider this very simple test

#include <hdf5.h>

int main() {
  switch (H5Tget_order(H5T_NATIVE_LDOUBLE)) {
  case H5T_ORDER_LE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE\n");
    break;
  case H5T_ORDER_BE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_BE\n");
    break;
  case H5T_ORDER_VAX:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX\n");
    break;
  case H5T_ORDER_MIXED:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_MIXED\n");
    break;
  case H5T_ORDER_NONE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_NONE\n");
    break;
  default:
    printf("here are dragons\n");
  }
  return 0;
}

on the same x86_64 GNU/Linux machine I get

$ hdf5-1.8.11-gcc-4.7.0/my_test # compiled with gcc 4.7.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE

$ hdf5-1.8.11-gcc-4.8.0/my_test # compiled with gcc 4.8.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX

So H5T_NATIVE_LDOUBLE is mis-detected. I tried to dig deeper and basically the fault must be in src/H5detect.c which is used to generate the definitions in src/H5Tinit.c
I could not figure out what H5detect.c does wrong (it is not very readable, given its extensive use of macros) but the compiler does emit a lot of warnings (see https://gist.github.com/andreabedini/6419975\).

I think this must be related to the failure of dt_arith long double test observed recently.

Any suggestion on how to fix this ?

Best wishes,
Andrea

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>
<hdf5_uninitialized.patch>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
George N. White III <aa056@chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Andrea_Bedini · September 7, 2013, 12:04am

Hi Ray,

yes, this is compatible with what I observed in my tests. Honestly I am not
sure the problem has a simple solution at all. If the standard doesn't
guarantee us the pad bit have a consistent value, there's no way we can
expect this to work in a portable way: at some point in the future a
compiler will be smart enough to revert all the obstacles we throw at its
way.

If I may, I would suggest we replace the entire mechanism with something
simpler. There are not that many floating point formats floating around
(pardon the pun), even considering the many architectures. Can't we just
hardcode them? The configuration system has surely enough information to
determine the native floating point format without bit fiddling.

What are the supported architectures?

Andrea

PS: it seems the disappearing memset problem can be solved by asking gcc to
not replace memset with its builtin version, i.e. passing the option
-fno-builtin-memset, but this of course won't work with other compilers.

···

On 7 September 2013 08:24, Raymond Lu <songyulu@hdfgroup.org> wrote:

Andrea,

My coworker Neil helped me in this afternoon to find out that when GCC 4.8
compiler assigns constant values to variables (value1 and value2) like this,

   for(i = 0, value1 = 0.0, value2 = 1.0; i < (int)sizeof(long double);
i++) {
      value3 = value1;
      value1 += value2;
      value2 /= 256.0;
:
:

it introduces some garbage to the two padding bytes of value1 and value2.
Then the garbage confuses our algorithm, especially the value of
"last_mbyte" gets wrong. To fix it in a simple way, use an intermediate
variable like this:

   long double tmp_value, divisor;

   tmp_value = 0.0;
   value1 = tmp_value;
   tmp_value = 1.0;
   value2 = tmp_value;
   tmp_value = 256.0;
   divisor = tmp_value;

   for(i = 0; i < (int)sizeof(long double); i++) {
      value3 = value1;
      value1 += value2;
      value2 /= divisor;
:
:

How do you think about it?

Ray

On Sep 6, 2013, at 1:49 AM, Andrea Bedini wrote:

Hi Rey,

thanks for that, it really helped. I checked thoroughly and the memset of
the temporary variables disappears randomly.
It doesn't depends only on optimization though, on my machine putting a
printf("%Lf\n", value2); just before the loop changes the result.
I'm not sure who gets the blame here, poking into the padding bits of a
long double might just be unspecified or undefined behaviour.

Andrea

On 6 September 2013 06:40, Raymond Lu <songyulu@hdfgroup.org> wrote:

I isolated part of the DETECT_F into a C program as attached (detect.c).
It only contains the algorithm for detecting the byte order of long
double. When I compile it with gcc -g, -O0, or no flag, it reports
little-endian. When I compile it with -O1, -O2, or -O3, it reports VAX
order. I don't know where goes wrong yet. But I suspect GCC's
optimization has bugs. Maybe you can help me.

I haven't tried the algorithms for other parts in DETECT_F yet. The
alignment problem you talked about is one of the other algorithms.

Ray

On Sep 4, 2013, at 8:40 PM, Andrea Bedini wrote:

Thanks George.

For anyone interested in debugging this problem, debian has an extensive
collection of build logs over many architectures
Build logs for hdf5 (going back 12 years!)

As far as I know, the corruption is limited to the H5T_NATIVE_LDOUBLE
type. You can check your particular build with the following test

#include <hdf5.h>
int main() {
  return !(H5Tget_order(H5T_NATIVE_LDOUBLE) == H5Tget_order(
H5T_NATIVE_DOUBLE));
}

It exits with code 1 if the long double has different byte ordering than
double (which is technically possible, but highly suspicious).

Otherwise the patch I sent earlier in this thread seems to do the trick,
although what exactly is going wrong is still beyond my understanding.

Third option: you can define an equivalent of H5T_NATIVE_LDOUBLE
yourself. The following creates a data type representing a long double as
implemented by gcc on x86 architectures (see
long double - Wikipedia for details)

hid_t ldouble_datatype = H5Tcopy(H5T_NATIVE_DOUBLE);
H5Tset_size(ldouble_datatype, sizeof(long double));
H5Tset_precision(ldouble_datatype, 80);
H5Tset_fields (ldouble_datatype, 79, 64, 15, 0, 64);
H5Tset_pad(ldouble_datatype, H5T_PAD_ZERO, H5T_PAD_ZERO);
H5Tset_inpad(ldouble_datatype, H5T_PAD_ZERO);
H5Tset_ebias(ldouble_datatype, 16383);
H5Tset_norm(ldouble_datatype, H5T_NORM_NONE);

Best wishes,
Andrea

On 4 September 2013 22:58, George N. White III <gnwiii@gmail.com> wrote:

Another historical reference to the obscurity of this code is: <
118777 – sci-libs/hdf5-1.6.2 (and 1.6.4) cannot build because of Bus error in H5detect.

I've been building HDF5 libraries for use with NASA SeaDAS, and recently
have started using HDF5 with R and GDAL. The SeaDAS builds are static, and
I don't find the "unable to calculate alignment for long double" message in
my SeaDAS build logs on linux and OS X. For R and GDAL, however, I need
dynamic libraries and those build logs do have the "unable to calculate
alignment for long double" message on both linux and OS X.

On Tue, Sep 3, 2013 at 9:50 PM, Andrea Bedini <andrea.bedini@gmail.com>wrote:

Hi,

I found something else (I know, I should stop :)). I am not entirely
sure but it seems that when H5detect fails it writes "unable to calculate
alignment for long double" on stderr so this message should be observable
on build logs (although buried by other warnings). The packages on debian
sid and testing for both i386 and x86-64 seem to be affected:

Build log for hdf5 (1.8.11-3+b1) on amd64

Build log for hdf5 (1.8.11-3+b1) on i386

But here's the exciting part: look what I found

[gembud] Problems compiling Gempak 5.11.4 on Fedora 12

It's a build log from 2010 for HDF5 v1.6.5 and gcc-4.4.3 that says "unable
to calculate alignment for long double".

If my understanding is correct, nor 1.8.11 or gcc 4.8.0 would be the
problem and it would be that piece of code just doesn't work properly.

Best wishes,
Andrea

On 4 September 2013 08:00, Andrea Bedini <andrea.bedini@gmail.com>wrote:

Hi Ray,

thanks for giving it a look. Antonio made me notice that something
else might be at work since the macro DETECT_F already zeroes the structure
right before anything else:

memset(&INFO, 0, sizeof(INFO)); #L299

so I don't understand how the perm fields need to be zeroed again
around line #L308. This still considering the "Byte Order" loop as a black
box.

As a side question: isn't there a more portable way of doing this? I
am pretty sure H5detect.c might invoke a bunch of undefined behaviours
given the amount of warning the compiler generates and of bit trickery.

Best wishes,
Andrea

On 4 September 2013 05:43, Raymond Lu <songyulu@hdfgroup.org> wrote:

Andrea,

We've verified that your solution is correct. We're putting your fix
into the library. Thanks for helping us.

Ray

On Sep 3, 2013, at 3:32 AM, Andrea Bedini wrote:

Hi there,

I think I have found the problem. The issue is in H5detect.c.
Macros DETECT_F and DETECT_I do not initialize properly the perm field in
the detected_t struct. As a result the routine fix_order is passed some
uninitialized memory which makes it fail. I have a small patch against
H5detect.c which fixes the problem by simply initializing the perm field
with zeros. Valgrind's tool memcheck would have exposed the problem.

Best wishes,
Andrea

On 3 September 2013 15:30, Andrea Bedini <andrea.bedini@gmail.com>wrote:

Hi,

I am experiencing the following issue with hdf5 and gcc 4.8.0

Consider this very simple test

#include <hdf5.h>

int main() {
  switch (H5Tget_order(H5T_NATIVE_LDOUBLE)) {
  case H5T_ORDER_LE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE\n");
    break;
  case H5T_ORDER_BE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_BE\n");
    break;
  case H5T_ORDER_VAX:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX\n");
    break;
  case H5T_ORDER_MIXED:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_MIXED\n");
    break;
  case H5T_ORDER_NONE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_NONE\n");
    break;
  default:
    printf("here are dragons\n");
  }
  return 0;
}

on the same x86_64 GNU/Linux machine I get

$ hdf5-1.8.11-gcc-4.7.0/my_test # compiled with gcc 4.7.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE

$ hdf5-1.8.11-gcc-4.8.0/my_test # compiled with gcc 4.8.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX

So H5T_NATIVE_LDOUBLE is mis-detected. I tried to dig deeper and
basically the fault must be in src/H5detect.c which is used to generate the
definitions in src/H5Tinit.c
I could not figure out what H5detect.c does wrong (it is not very
readable, given its extensive use of macros) but the compiler does emit a
lot of warnings (see https://gist.github.com/andreabedini/6419975\).

I think this must be related to the failure of dt_arith long double
test observed recently.

Any suggestion on how to fix this ?

Best wishes,
Andrea

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>
<hdf5_uninitialized.patch>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
George N. White III <aa056@chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>

Andrea_Bedini · September 8, 2013, 11:58pm

FYI I asked this question on stackoverflow.

···

On 7 September 2013 10:04, Andrea Bedini <andrea.bedini@gmail.com> wrote:

Hi Ray,

yes, this is compatible with what I observed in my tests. Honestly I am
not sure the problem has a simple solution at all. If the standard doesn't
guarantee us the pad bit have a consistent value, there's no way we can
expect this to work in a portable way: at some point in the future a
compiler will be smart enough to revert all the obstacles we throw at its
way.

If I may, I would suggest we replace the entire mechanism with something
simpler. There are not that many floating point formats floating around
(pardon the pun), even considering the many architectures. Can't we just
hardcode them? The configuration system has surely enough information to
determine the native floating point format without bit fiddling.

What are the supported architectures?

Andrea

PS: it seems the disappearing memset problem can be solved by asking gcc
to not replace memset with its builtin version, i.e. passing the option
-fno-builtin-memset, but this of course won't work with other compilers.

On 7 September 2013 08:24, Raymond Lu <songyulu@hdfgroup.org> wrote:

Andrea,

My coworker Neil helped me in this afternoon to find out that when GCC
4.8 compiler assigns constant values to variables (value1 and value2) like
this,

   for(i = 0, value1 = 0.0, value2 = 1.0; i < (int)sizeof(long double);
i++) {
      value3 = value1;
      value1 += value2;
      value2 /= 256.0;
:
:

it introduces some garbage to the two padding bytes of value1 and value2.
Then the garbage confuses our algorithm, especially the value of
"last_mbyte" gets wrong. To fix it in a simple way, use an intermediate
variable like this:

   long double tmp_value, divisor;

   tmp_value = 0.0;
   value1 = tmp_value;
   tmp_value = 1.0;
   value2 = tmp_value;
   tmp_value = 256.0;
   divisor = tmp_value;

   for(i = 0; i < (int)sizeof(long double); i++) {
      value3 = value1;
      value1 += value2;
      value2 /= divisor;
:
:

How do you think about it?

Ray

On Sep 6, 2013, at 1:49 AM, Andrea Bedini wrote:

Hi Rey,

thanks for that, it really helped. I checked thoroughly and the memset of
the temporary variables disappears randomly.
It doesn't depends only on optimization though, on my machine putting a
printf("%Lf\n", value2); just before the loop changes the result.
I'm not sure who gets the blame here, poking into the padding bits of a
long double might just be unspecified or undefined behaviour.

Andrea

On 6 September 2013 06:40, Raymond Lu <songyulu@hdfgroup.org> wrote:

I isolated part of the DETECT_F into a C program as attached (detect.c).
It only contains the algorithm for detecting the byte order of long
double. When I compile it with gcc -g, -O0, or no flag, it reports
little-endian. When I compile it with -O1, -O2, or -O3, it reports VAX
order. I don't know where goes wrong yet. But I suspect GCC's
optimization has bugs. Maybe you can help me.

I haven't tried the algorithms for other parts in DETECT_F yet. The
alignment problem you talked about is one of the other algorithms.

Ray

On Sep 4, 2013, at 8:40 PM, Andrea Bedini wrote:

Thanks George.

For anyone interested in debugging this problem, debian has an extensive
collection of build logs over many architectures
Build logs for hdf5 (going back 12
years!)

As far as I know, the corruption is limited to the H5T_NATIVE_LDOUBLE
type. You can check your particular build with the following test

#include <hdf5.h>
int main() {
  return !(H5Tget_order(H5T_NATIVE_LDOUBLE) == H5Tget_order(
H5T_NATIVE_DOUBLE));
}

It exits with code 1 if the long double has different byte ordering than
double (which is technically possible, but highly suspicious).

Otherwise the patch I sent earlier in this thread seems to do the trick,
although what exactly is going wrong is still beyond my understanding.

Third option: you can define an equivalent of H5T_NATIVE_LDOUBLE
yourself. The following creates a data type representing a long double as
implemented by gcc on x86 architectures (see
long double - Wikipedia for details)

hid_t ldouble_datatype = H5Tcopy(H5T_NATIVE_DOUBLE);
H5Tset_size(ldouble_datatype, sizeof(long double));
H5Tset_precision(ldouble_datatype, 80);
H5Tset_fields (ldouble_datatype, 79, 64, 15, 0, 64);
H5Tset_pad(ldouble_datatype, H5T_PAD_ZERO, H5T_PAD_ZERO);
H5Tset_inpad(ldouble_datatype, H5T_PAD_ZERO);
H5Tset_ebias(ldouble_datatype, 16383);
H5Tset_norm(ldouble_datatype, H5T_NORM_NONE);

Best wishes,
Andrea

On 4 September 2013 22:58, George N. White III <gnwiii@gmail.com> wrote:

Another historical reference to the obscurity of this code is: <
118777 – sci-libs/hdf5-1.6.2 (and 1.6.4) cannot build because of Bus error in H5detect.

I've been building HDF5 libraries for use with NASA SeaDAS, and
recently have started using HDF5 with R and GDAL. The SeaDAS builds are
static, and I don't find the "unable to calculate alignment for long
double" message in my SeaDAS build logs on linux and OS X. For R and GDAL,
however, I need dynamic libraries and those build logs do have the "unable
to calculate alignment for long double" message on both linux and OS X.

On Tue, Sep 3, 2013 at 9:50 PM, Andrea Bedini <andrea.bedini@gmail.com>wrote:

Hi,

I found something else (I know, I should stop :)). I am not entirely
sure but it seems that when H5detect fails it writes "unable to calculate
alignment for long double" on stderr so this message should be observable
on build logs (although buried by other warnings). The packages on debian
sid and testing for both i386 and x86-64 seem to be affected:

Build log for hdf5 (1.8.11-3+b1) on amd64

Build log for hdf5 (1.8.11-3+b1) on i386

But here's the exciting part: look what I found

[gembud] Problems compiling Gempak 5.11.4 on Fedora 12

It's a build log from 2010 for HDF5 v1.6.5 and gcc-4.4.3 that says "unable
to calculate alignment for long double".

If my understanding is correct, nor 1.8.11 or gcc 4.8.0 would be the
problem and it would be that piece of code just doesn't work properly.

Best wishes,
Andrea

On 4 September 2013 08:00, Andrea Bedini <andrea.bedini@gmail.com>wrote:

Hi Ray,

thanks for giving it a look. Antonio made me notice that something
else might be at work since the macro DETECT_F already zeroes the structure
right before anything else:

memset(&INFO, 0, sizeof(INFO)); #L299

so I don't understand how the perm fields need to be zeroed again
around line #L308. This still considering the "Byte Order" loop as a black
box.

As a side question: isn't there a more portable way of doing this? I
am pretty sure H5detect.c might invoke a bunch of undefined behaviours
given the amount of warning the compiler generates and of bit trickery.

Best wishes,
Andrea

On 4 September 2013 05:43, Raymond Lu <songyulu@hdfgroup.org> wrote:

Andrea,

We've verified that your solution is correct. We're putting your
fix into the library. Thanks for helping us.

Ray

On Sep 3, 2013, at 3:32 AM, Andrea Bedini wrote:

Hi there,

I think I have found the problem. The issue is in H5detect.c.
Macros DETECT_F and DETECT_I do not initialize properly the perm field in
the detected_t struct. As a result the routine fix_order is passed some
uninitialized memory which makes it fail. I have a small patch against
H5detect.c which fixes the problem by simply initializing the perm field
with zeros. Valgrind's tool memcheck would have exposed the problem.

Best wishes,
Andrea

On 3 September 2013 15:30, Andrea Bedini <andrea.bedini@gmail.com>wrote:

Hi,

I am experiencing the following issue with hdf5 and gcc 4.8.0

Consider this very simple test

#include <hdf5.h>

int main() {
  switch (H5Tget_order(H5T_NATIVE_LDOUBLE)) {
  case H5T_ORDER_LE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE\n");
    break;
  case H5T_ORDER_BE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_BE\n");
    break;
  case H5T_ORDER_VAX:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX\n");
    break;
  case H5T_ORDER_MIXED:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_MIXED\n");
    break;
  case H5T_ORDER_NONE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_NONE\n");
    break;
  default:
    printf("here are dragons\n");
  }
  return 0;
}

on the same x86_64 GNU/Linux machine I get

$ hdf5-1.8.11-gcc-4.7.0/my_test # compiled with gcc 4.7.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE

$ hdf5-1.8.11-gcc-4.8.0/my_test # compiled with gcc 4.8.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX

So H5T_NATIVE_LDOUBLE is mis-detected. I tried to dig deeper and
basically the fault must be in src/H5detect.c which is used to generate the
definitions in src/H5Tinit.c
I could not figure out what H5detect.c does wrong (it is not very
readable, given its extensive use of macros) but the compiler does emit a
lot of warnings (see https://gist.github.com/andreabedini/6419975\).

I think this must be related to the failure of dt_arith long double
test observed recently.

Any suggestion on how to fix this ?

Best wishes,
Andrea

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>
<hdf5_uninitialized.patch>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
George N. White III <aa056@chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>

gnwiii · September 9, 2013, 11:18am

The answers to <
c - accessing long double bit representation - Stack Overflow;
indicate that it is not safe to rely on the contents of padding bits.

The underlying problem here is to have a robust and standards compliant way
to determine the alignment of various data types. This is a sufficiently
important use case that there really should be a way to determine this
without relying on the bits used for padding. Maybe there is a gap in the
current standards that could be fixed in the future, but this problem
affects a large number of existing workflows so needs to be addressed in a
way that is compatible with legacy compilers/OS's. There is some merit to
the idea of providing the information via a table, but such tables have a
way of getting out of sync with reality. Would it be feasible to enumerate
the most common alignment's and then apply a test that doesn't rely on the
contents of the padding bits to select the appropriate entry from the table
of possibilities?

···

On Sun, Sep 8, 2013 at 8:58 PM, Andrea Bedini <andrea.bedini@gmail.com>wrote:

FYI I asked this question on stackoverflow.

c - accessing long double bit representation - Stack Overflow

On 7 September 2013 10:04, Andrea Bedini <andrea.bedini@gmail.com> wrote:

Hi Ray,

yes, this is compatible with what I observed in my tests. Honestly I am
not sure the problem has a simple solution at all. If the standard doesn't
guarantee us the pad bit have a consistent value, there's no way we can
expect this to work in a portable way: at some point in the future a
compiler will be smart enough to revert all the obstacles we throw at its
way.

If I may, I would suggest we replace the entire mechanism with something
simpler. There are not that many floating point formats floating around
(pardon the pun), even considering the many architectures. Can't we just
hardcode them? The configuration system has surely enough information to
determine the native floating point format without bit fiddling.

What are the supported architectures?

Andrea

PS: it seems the disappearing memset problem can be solved by asking gcc
to not replace memset with its builtin version, i.e. passing the option
-fno-builtin-memset, but this of course won't work with other compilers.

On 7 September 2013 08:24, Raymond Lu <songyulu@hdfgroup.org> wrote:

Andrea,

My coworker Neil helped me in this afternoon to find out that when GCC
4.8 compiler assigns constant values to variables (value1 and value2) like
this,

   for(i = 0, value1 = 0.0, value2 = 1.0; i < (int)sizeof(long double);
i++) {
      value3 = value1;
      value1 += value2;
      value2 /= 256.0;
:
:

it introduces some garbage to the two padding bytes of value1 and
value2. Then the garbage confuses our algorithm, especially the value of
"last_mbyte" gets wrong. To fix it in a simple way, use an intermediate
variable like this:

   long double tmp_value, divisor;

   tmp_value = 0.0;
   value1 = tmp_value;
   tmp_value = 1.0;
   value2 = tmp_value;
   tmp_value = 256.0;
   divisor = tmp_value;

   for(i = 0; i < (int)sizeof(long double); i++) {
      value3 = value1;
      value1 += value2;
      value2 /= divisor;
:
:

How do you think about it?

Ray

On Sep 6, 2013, at 1:49 AM, Andrea Bedini wrote:

Hi Rey,

thanks for that, it really helped. I checked thoroughly and the memset
of the temporary variables disappears randomly.
It doesn't depends only on optimization though, on my machine putting a
printf("%Lf\n", value2); just before the loop changes the result.
I'm not sure who gets the blame here, poking into the padding bits of a
long double might just be unspecified or undefined behaviour.

Andrea

On 6 September 2013 06:40, Raymond Lu <songyulu@hdfgroup.org> wrote:

I isolated part of the DETECT_F into a C program as attached
(detect.c). It only contains the algorithm for detecting the byte order of
long double. When I compile it with gcc -g, -O0, or no flag, it reports
little-endian. When I compile it with -O1, -O2, or -O3, it reports VAX
order. I don't know where goes wrong yet. But I suspect GCC's
optimization has bugs. Maybe you can help me.

I haven't tried the algorithms for other parts in DETECT_F yet. The
alignment problem you talked about is one of the other algorithms.

Ray

On Sep 4, 2013, at 8:40 PM, Andrea Bedini wrote:

Thanks George.

For anyone interested in debugging this problem, debian has an
extensive collection of build logs over many architectures
Build logs for hdf5 (going back 12
years!)

As far as I know, the corruption is limited to the H5T_NATIVE_LDOUBLE
type. You can check your particular build with the following test

#include <hdf5.h>
int main() {
  return !(H5Tget_order(H5T_NATIVE_LDOUBLE) == H5Tget_order(
H5T_NATIVE_DOUBLE));
}

It exits with code 1 if the long double has different byte ordering
than double (which is technically possible, but highly suspicious).

Otherwise the patch I sent earlier in this thread seems to do the
trick, although what exactly is going wrong is still beyond my
understanding.

Third option: you can define an equivalent of H5T_NATIVE_LDOUBLE
yourself. The following creates a data type representing a long double as
implemented by gcc on x86 architectures (see
long double - Wikipedia for details)

hid_t ldouble_datatype = H5Tcopy(H5T_NATIVE_DOUBLE);
H5Tset_size(ldouble_datatype, sizeof(long double));
H5Tset_precision(ldouble_datatype, 80);
H5Tset_fields (ldouble_datatype, 79, 64, 15, 0, 64);
H5Tset_pad(ldouble_datatype, H5T_PAD_ZERO, H5T_PAD_ZERO);
H5Tset_inpad(ldouble_datatype, H5T_PAD_ZERO);
H5Tset_ebias(ldouble_datatype, 16383);
H5Tset_norm(ldouble_datatype, H5T_NORM_NONE);

Best wishes,
Andrea

On 4 September 2013 22:58, George N. White III <gnwiii@gmail.com>wrote:

Another historical reference to the obscurity of this code is: <
118777 – sci-libs/hdf5-1.6.2 (and 1.6.4) cannot build because of Bus error in H5detect.

I've been building HDF5 libraries for use with NASA SeaDAS, and
recently have started using HDF5 with R and GDAL. The SeaDAS builds are
static, and I don't find the "unable to calculate alignment for long
double" message in my SeaDAS build logs on linux and OS X. For R and GDAL,
however, I need dynamic libraries and those build logs do have the "unable
to calculate alignment for long double" message on both linux and OS X.

On Tue, Sep 3, 2013 at 9:50 PM, Andrea Bedini <andrea.bedini@gmail.com >>>>> > wrote:

Hi,

I found something else (I know, I should stop :)). I am not entirely
sure but it seems that when H5detect fails it writes "unable to calculate
alignment for long double" on stderr so this message should be observable
on build logs (although buried by other warnings). The packages on debian
sid and testing for both i386 and x86-64 seem to be affected:

Build log for hdf5 (1.8.11-3+b1) on amd64

Build log for hdf5 (1.8.11-3+b1) on i386

But here's the exciting part: look what I found

[gembud] Problems compiling Gempak 5.11.4 on Fedora 12

It's a build log from 2010 for HDF5 v1.6.5 and gcc-4.4.3 that says "unable
to calculate alignment for long double".

If my understanding is correct, nor 1.8.11 or gcc 4.8.0 would be the
problem and it would be that piece of code just doesn't work properly.

Best wishes,
Andrea

On 4 September 2013 08:00, Andrea Bedini <andrea.bedini@gmail.com>wrote:

Hi Ray,

thanks for giving it a look. Antonio made me notice that something
else might be at work since the macro DETECT_F already zeroes the structure
right before anything else:

memset(&INFO, 0, sizeof(INFO)); #L299

so I don't understand how the perm fields need to be zeroed again
around line #L308. This still considering the "Byte Order" loop as a black
box.

As a side question: isn't there a more portable way of doing this? I
am pretty sure H5detect.c might invoke a bunch of undefined behaviours
given the amount of warning the compiler generates and of bit trickery.

Best wishes,
Andrea

On 4 September 2013 05:43, Raymond Lu <songyulu@hdfgroup.org> wrote:

Andrea,

We've verified that your solution is correct. We're putting your
fix into the library. Thanks for helping us.

Ray

On Sep 3, 2013, at 3:32 AM, Andrea Bedini wrote:

Hi there,

I think I have found the problem. The issue is in H5detect.c.
Macros DETECT_F and DETECT_I do not initialize properly the perm field in
the detected_t struct. As a result the routine fix_order is passed some
uninitialized memory which makes it fail. I have a small patch against
H5detect.c which fixes the problem by simply initializing the perm field
with zeros. Valgrind's tool memcheck would have exposed the problem.

Best wishes,
Andrea

On 3 September 2013 15:30, Andrea Bedini <andrea.bedini@gmail.com>wrote:

Hi,

I am experiencing the following issue with hdf5 and gcc 4.8.0

Consider this very simple test

#include <hdf5.h>

int main() {
  switch (H5Tget_order(H5T_NATIVE_LDOUBLE)) {
  case H5T_ORDER_LE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE\n");
    break;
  case H5T_ORDER_BE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_BE\n");
    break;
  case H5T_ORDER_VAX:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX\n");
    break;
  case H5T_ORDER_MIXED:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_MIXED\n");
    break;
  case H5T_ORDER_NONE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_NONE\n");
    break;
  default:
    printf("here are dragons\n");
  }
  return 0;
}

on the same x86_64 GNU/Linux machine I get

$ hdf5-1.8.11-gcc-4.7.0/my_test # compiled with gcc 4.7.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE

$ hdf5-1.8.11-gcc-4.8.0/my_test # compiled with gcc 4.8.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX

So H5T_NATIVE_LDOUBLE is mis-detected. I tried to dig deeper and
basically the fault must be in src/H5detect.c which is used to generate the
definitions in src/H5Tinit.c
I could not figure out what H5detect.c does wrong (it is not very
readable, given its extensive use of macros) but the compiler does emit a
lot of warnings (see warnings I get compiling hdf5-1.8.11/src/H5detect.c with gcc-4.8.0 · GitHub
).

I think this must be related to the failure of dt_arith long
double test observed recently.

Any suggestion on how to fix this ?

Best wishes,
Andrea

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>
<hdf5_uninitialized.patch>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
George N. White III <aa056@chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>

--
Andrea Bedini <andrea.bedini@gmail.com>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
George N. White III <aa056@chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia

nfortne2 · September 27, 2013, 3:48pm

This issue has been fixed in 1.8.12. I fixed it by, before doing any other analysis, flipping each bit in the variable and checking to see if the value changed (with ==). Bits that do not change the value are then ignored in subsequent steps. Here is the relevant code for those interested:

    volatile TYPE _v1, _v2;
    unsigned char _buf1[sizeof(TYPE)];
    unsigned char _pad_mask[sizeof(TYPE)];
    unsigned char _byte_mask;
    int _i;

<...>

    _v1 = 4.0;
    HDmemcpy(_buf1, (const void *)&_v1, sizeof(TYPE));
    for(_i = 0; _i < (int)sizeof(TYPE); _i++)
        for(_byte_mask = (unsigned char)1; _byte_mask; _byte_mask <<= 1) {
            _buf1[_i] ^= _byte_mask;
            HDmemcpy((void *)&_v2, (const void *)_buf1, sizeof(TYPE));
            if(_v1 != _v2)
                _pad_mask[_i] |= _byte_mask;
            _buf1[_i] ^= _byte_mask;
        } /* end for */

<...>

_pad_mask is then used to determine which bits to ignore when comparing variables in the subsequent analysis.

-Neil

···

On 09/09/2013 06:19 AM, George N. White III wrote:
The answers to <http://stackoverflow.com/questions/18668871/accessing-long-double-bit-representation> indicate that it is not safe to rely on the contents of padding bits.

The underlying problem here is to have a robust and standards compliant way to determine the alignment of various data types. This is a sufficiently important use case that there really should be a way to determine this without relying on the bits used for padding. Maybe there is a gap in the current standards that could be fixed in the future, but this problem affects a large number of existing workflows so needs to be addressed in a way that is compatible with legacy compilers/OS's. There is some merit to the idea of providing the information via a table, but such tables have a way of getting out of sync with reality. Would it be feasible to enumerate the most common alignment's and then apply a test that doesn't rely on the contents of the padding bits to select the appropriate entry from the table of possibilities?

On Sun, Sep 8, 2013 at 8:58 PM, Andrea Bedini <andrea.bedini@gmail.com<mailto:andrea.bedini@gmail.com>> wrote:
FYI I asked this question on stackoverflow.

On 7 September 2013 10:04, Andrea Bedini <andrea.bedini@gmail.com<mailto:andrea.bedini@gmail.com>> wrote:
Hi Ray,

yes, this is compatible with what I observed in my tests. Honestly I am not sure the problem has a simple solution at all. If the standard doesn't guarantee us the pad bit have a consistent value, there's no way we can expect this to work in a portable way: at some point in the future a compiler will be smart enough to revert all the obstacles we throw at its way.

If I may, I would suggest we replace the entire mechanism with something simpler. There are not that many floating point formats floating around (pardon the pun), even considering the many architectures. Can't we just hardcode them? The configuration system has surely enough information to determine the native floating point format without bit fiddling.

What are the supported architectures?

Andrea

PS: it seems the disappearing memset problem can be solved by asking gcc to not replace memset with its builtin version, i.e. passing the option -fno-builtin-memset, but this of course won't work with other compilers.

On 7 September 2013 08:24, Raymond Lu <songyulu@hdfgroup.org<mailto:songyulu@hdfgroup.org>> wrote:
Andrea,

My coworker Neil helped me in this afternoon to find out that when GCC 4.8 compiler assigns constant values to variables (value1 and value2) like this,

   for(i = 0, value1 = 0.0, value2 = 1.0; i < (int)sizeof(long double); i++) {
      value3 = value1;
      value1 += value2;
      value2 /= 256.0;
:
:

it introduces some garbage to the two padding bytes of value1 and value2. Then the garbage confuses our algorithm, especially the value of "last_mbyte" gets wrong. To fix it in a simple way, use an intermediate variable like this:

long double tmp_value, divisor;

   tmp_value = 0.0;
   value1 = tmp_value;
   tmp_value = 1.0;
   value2 = tmp_value;
   tmp_value = 256.0;
   divisor = tmp_value;

   for(i = 0; i < (int)sizeof(long double); i++) {
      value3 = value1;
      value1 += value2;
      value2 /= divisor;
:
:

How do you think about it?

Ray

On Sep 6, 2013, at 1:49 AM, Andrea Bedini wrote:

Hi Rey,

thanks for that, it really helped. I checked thoroughly and the memset of the temporary variables disappears randomly.
It doesn't depends only on optimization though, on my machine putting a printf("%Lf\n", value2); just before the loop changes the result.
I'm not sure who gets the blame here, poking into the padding bits of a long double might just be unspecified or undefined behaviour.

Andrea

On 6 September 2013 06:40, Raymond Lu <songyulu@hdfgroup.org<mailto:songyulu@hdfgroup.org>> wrote:
I isolated part of the DETECT_F into a C program as attached (detect.c). It only contains the algorithm for detecting the byte order of long double. When I compile it with gcc -g, -O0, or no flag, it reports little-endian. When I compile it with -O1, -O2, or -O3, it reports VAX order. I don't know where goes wrong yet. But I suspect GCC's optimization has bugs. Maybe you can help me.

I haven't tried the algorithms for other parts in DETECT_F yet. The alignment problem you talked about is one of the other algorithms.

Ray

On Sep 4, 2013, at 8:40 PM, Andrea Bedini wrote:

Thanks George.

For anyone interested in debugging this problem, debian has an extensive collection of build logs over many architectures https://buildd.debian.org/status/logs.php?pkg=hdf5 (going back 12 years!)

As far as I know, the corruption is limited to the H5T_NATIVE_LDOUBLE type. You can check your particular build with the following test

#include <hdf5.h>
int main() {
return !(H5Tget_order(H5T_NATIVE_LDOUBLE) == H5Tget_order(H5T_NATIVE_DOUBLE));
}

It exits with code 1 if the long double has different byte ordering than double (which is technically possible, but highly suspicious).

Otherwise the patch I sent earlier in this thread seems to do the trick, although what exactly is going wrong is still beyond my understanding.

Third option: you can define an equivalent of H5T_NATIVE_LDOUBLE yourself. The following creates a data type representing a long double as implemented by gcc on x86 architectures (see http://en.wikipedia.org/wiki/Long_double#Implementations for details)

hid_t ldouble_datatype = H5Tcopy(H5T_NATIVE_DOUBLE);
H5Tset_size(ldouble_datatype, sizeof(long double));
H5Tset_precision(ldouble_datatype, 80);
H5Tset_fields (ldouble_datatype, 79, 64, 15, 0, 64);
H5Tset_pad(ldouble_datatype, H5T_PAD_ZERO, H5T_PAD_ZERO);
H5Tset_inpad(ldouble_datatype, H5T_PAD_ZERO);
H5Tset_ebias(ldouble_datatype, 16383);
H5Tset_norm(ldouble_datatype, H5T_NORM_NONE);

Best wishes,
Andrea

On 4 September 2013 22:58, George N. White III <gnwiii@gmail.com<mailto:gnwiii@gmail.com>> wrote:
Another historical reference to the obscurity of this code is: <https://bugs.gentoo.org/show_bug.cgi?id=118777>.

I've been building HDF5 libraries for use with NASA SeaDAS, and recently have started using HDF5 with R and GDAL. The SeaDAS builds are static, and I don't find the "unable to calculate alignment for long double" message in my SeaDAS build logs on linux and OS X. For R and GDAL, however, I need dynamic libraries and those build logs do have the "unable to calculate alignment for long double" message on both linux and OS X.

On Tue, Sep 3, 2013 at 9:50 PM, Andrea Bedini <andrea.bedini@gmail.com<mailto:andrea.bedini@gmail.com>> wrote:
Hi,

I found something else (I know, I should stop :)). I am not entirely sure but it seems that when H5detect fails it writes "unable to calculate alignment for long double" on stderr so this message should be observable on build logs (although buried by other warnings). The packages on debian sid and testing for both i386 and x86-64 seem to be affected:

https://buildd.debian.org/status/fetch.php?pkg=hdf5&arch=amd64&ver=1.8.11-3%2Bb1&stamp=1377024563
https://buildd.debian.org/status/fetch.php?pkg=hdf5&arch=i386&ver=1.8.11-3%2Bb1&stamp=1377025110

But here's the exciting part: look what I found

http://www.unidata.ucar.edu/mailing_lists/archives/gembud/2010/msg00052.html

It's a build log from 2010 for HDF5 v1.6.5 and gcc-4.4.3 that says "unable to calculate alignment for long double".

If my understanding is correct, nor 1.8.11 or gcc 4.8.0 would be the problem and it would be that piece of code just doesn't work properly.

Best wishes,
Andrea

On 4 September 2013 08:00, Andrea Bedini <andrea.bedini@gmail.com<mailto:andrea.bedini@gmail.com>> wrote:
Hi Ray,

thanks for giving it a look. Antonio made me notice that something else might be at work since the macro DETECT_F already zeroes the structure right before anything else:

memset(&INFO, 0, sizeof(INFO)); #L299

so I don't understand how the perm fields need to be zeroed again around line #L308. This still considering the "Byte Order" loop as a black box.

As a side question: isn't there a more portable way of doing this? I am pretty sure H5detect.c might invoke a bunch of undefined behaviours given the amount of warning the compiler generates and of bit trickery.

Best wishes,
Andrea

On 4 September 2013 05:43, Raymond Lu <songyulu@hdfgroup.org<mailto:songyulu@hdfgroup.org>> wrote:
Andrea,

We've verified that your solution is correct. We're putting your fix into the library. Thanks for helping us.

Ray

On Sep 3, 2013, at 3:32 AM, Andrea Bedini wrote:

Hi there,

I think I have found the problem. The issue is in H5detect.c. Macros DETECT_F and DETECT_I do not initialize properly the perm field in the detected_t struct. As a result the routine fix_order is passed some uninitialized memory which makes it fail. I have a small patch against H5detect.c which fixes the problem by simply initializing the perm field with zeros. Valgrind's tool memcheck would have exposed the problem.

Best wishes,
Andrea

On 3 September 2013 15:30, Andrea Bedini <andrea.bedini@gmail.com<mailto:andrea.bedini@gmail.com>> wrote:
Hi,

I am experiencing the following issue with hdf5 and gcc 4.8.0

Consider this very simple test

#include <hdf5.h>

int main() {
  switch (H5Tget_order(H5T_NATIVE_LDOUBLE)) {
  case H5T_ORDER_LE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE\n");
    break;
  case H5T_ORDER_BE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_BE\n");
    break;
  case H5T_ORDER_VAX:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX\n");
    break;
  case H5T_ORDER_MIXED:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_MIXED\n");
    break;
  case H5T_ORDER_NONE:
    printf("H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_NONE\n");
    break;
  default:
    printf("here are dragons\n");
  }
  return 0;
}

on the same x86_64 GNU/Linux machine I get

$ hdf5-1.8.11-gcc-4.7.0/my_test # compiled with gcc 4.7.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_LE

$ hdf5-1.8.11-gcc-4.8.0/my_test # compiled with gcc 4.8.0
H5Tget_order(H5T_NATIVE_LDOUBLE) = H5T_ORDER_VAX

So H5T_NATIVE_LDOUBLE is mis-detected. I tried to dig deeper and basically the fault must be in src/H5detect.c which is used to generate the definitions in src/H5Tinit.c
I could not figure out what H5detect.c does wrong (it is not very readable, given its extensive use of macros) but the compiler does emit a lot of warnings (see https://gist.github.com/andreabedini/6419975).

I think this must be related to the failure of dt_arith long double test observed recently.

Any suggestion on how to fix this ?

Best wishes,
Andrea

--
Andrea Bedini <andrea.bedini@gmail.com<mailto:andrea.bedini@gmail.com>>

--
Andrea Bedini <andrea.bedini@gmail.com<mailto:andrea.bedini@gmail.com>>
<hdf5_uninitialized.patch>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com<mailto:andrea.bedini@gmail.com>>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
George N. White III <aa056@chebucto.ns.ca<mailto:aa056@chebucto.ns.ca>>
Head of St. Margarets Bay, Nova Scotia

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com<mailto:andrea.bedini@gmail.com>>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com<mailto:andrea.bedini@gmail.com>>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com<mailto:andrea.bedini@gmail.com>>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
George N. White III <aa056@chebucto.ns.ca<mailto:aa056@chebucto.ns.ca>>
Head of St. Margarets Bay, Nova Scotia

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Issues with H5T_NATIVE_LDOUBLE