Issue building HDF5 1.10.6 with NAG Fortran 7.0


#1

So, this is a fun one. A colleague and I believe we have found a bug in the configure step of hdf5 when building with NAG Fortran 7.0, but it only manifests during the build phase (configure “succeeds”). The configure line I’m using is:

./configure \
 --prefix=/Users/mathomp4/installed/MPI/nag-7.0_7009/openmpi-4.0.2/Baselibs/6.0.4/Darwin \
 --includedir=/Users/mathomp4/installed/MPI/nag-7.0_7009/openmpi-4.0.2/Baselibs/6.0.4/Darwin/include/hdf5 \
 --with-szlib=/Users/mathomp4/installed/MPI/nag-7.0_7009/openmpi-4.0.2/Baselibs/6.0.4/Darwin/include/szlib,/Users/mathomp4/installed/MPI/nag-7.0_7009/openmpi-4.0.2/Baselibs/6.0.4/Darwin/lib --with-zlib=/Users/mathomp4/installed/MPI/nag-7.0_7009/openmpi-4.0.2/Baselibs/6.0.4/Darwin/include/zlib,/Users/mathomp4/installed/MPI/nag-7.0_7009/openmpi-4.0.2/Baselibs/6.0.4/Darwin/lib \
--disable-shared --disable-cxx --enable-hl --enable-fortran --disable-sharedlib-rpath \
 --enable-parallel --disable-fortran2003 \
 CFLAGS= FCFLAGS=-fpp -mismatch_all CC=mpicc FC=mpifort CXX=mpic++ F77=mpifort

(Note: This configure line has been passed down for a while in time. It’s full of cruft, but it still works so I don’t try and touch it. The --disable-fortran2003 is only ever done if NAG Fortran is used. For now (at least) we don’t use the HDF5 F2003 interfaces in our code, so disabling it isn’t a big deal.)

And I’m building on macOS Mojave with clang and nagfor:

❯ gcc --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 11.0.0 (clang-1100.0.33.17)
Target: x86_64-apple-darwin18.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
❯ nagfor -V
NAG Fortran Compiler Release 7.0(Yurakucho) Build 7009
Product NPMI670NA for Apple Intel Mac OSX 64-bit
Copyright 1990-2020 The Numerical Algorithms Group Ltd., Oxford, U.K.

The eventual error is:

  CC       H5f90kit.lo
In file included from H5f90kit.c:27:
In file included from ./H5f90.h:20:
In file included from ./H5f90i.h:22:
./H5f90i_gen.h:38:9: error: unknown type name 'c_float_1'; did you mean 'float_t'?
typedef c_float_1 real_f;
        ^~~~~~~~~
        float_t
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/math.h:44:19: note: 'float_t' declared here
    typedef float float_t;
                  ^
In file included from H5f90kit.c:27:
In file included from ./H5f90.h:20:
In file included from ./H5f90i.h:22:
./H5f90i_gen.h:39:9: error: unknown type name 'c_float_2'; did you mean 'float_t'?
typedef c_float_2 double_f;
        ^~~~~~~~~
        float_t
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/math.h:44:19: note: 'float_t' declared here
    typedef float float_t;
                  ^
2 errors generated.
make[3]: *** [Makefile:1021: H5f90kit.lo] Error 1
make[3]: Leaving directory '/Users/mathomp4/Baselibs/ESMA-Baselibs-6.0.4/src/hdf5/fortran/src'
make[2]: *** [Makefile:1271: install] Error 2
make[2]: Leaving directory '/Users/mathomp4/Baselibs/ESMA-Baselibs-6.0.4/src/hdf5/fortran/src'
make[1]: *** [Makefile:820: install-recursive] Error 1
make[1]: Leaving directory '/Users/mathomp4/Baselibs/ESMA-Baselibs-6.0.4/src/hdf5/fortran'
make: *** [Makefile:660: install-recursive] Error 1

Now, my colleague knows Fortran a lot better than I and was able to track this issue down by looking at the config.log output and saw:

configure:8138: checking for Number of Fortran INTEGER KINDs
configure:8140: result: 4
configure:8142: checking for Fortran INTEGER KINDs
configure:8144: result: {1,2,3,4}
configure:8146: checking for Fortran REAL KINDs
configure:8148: result: {16}
configure:8150: checking for Fortran REALs maximum decimal precision
configure:8152: result: 31

and:

| #define H5CONFIG_F_NUM_RKIND INTEGER, PARAMETER :: num_rkinds = 1
| #define H5CONFIG_F_RKIND INTEGER, DIMENSION(1:num_rkinds) :: rkind = (/16/)
| #define H5CONFIG_F_RKIND_SIZEOF INTEGER, DIMENSION(1:num_rkinds) :: rkind_sizeof = (/2/)

Essentially, it thinks NAG Fortran only has one real kind, 16. So, what is causing this? NAG Fortran 7.0 has added support for 16-bit reals (proposed in Fortran 202x for inclusion as REAL16) (from man nagfor):

DATA TYPES
       The  table below lists the intrinsic data types provided by the NAG Fortran Compiler together with their kind numbers.  There are
       three possible schemes for the intrinsic kind type parameters: the default mode of operation (which may be  specified  explicitly
       by  the  -kind=sequential  option),  the  "byte" numbering scheme (specified by the -kind=byte option) and the "unique" numbering
       scheme (specified by the -kind=unique).

       Type   KIND Numbers            Name       Description
       Name  (sequential,byte,unique)
       -------------------------------------------------------------------------
       REAL        1      4     301   REAL32(*)  Single precision floating-point
       REAL        2      8     302   REAL64(*)  Double precision floating-point
       REAL        3     16     303   REAL128(*) Quad precision floating-point
       REAL       16      2     304   REAL16(*)  Half precision floating-point

The configure step seems to be doing something in the HDF5 Fortran detection which assumes the KIND numbers will be well-ordered. But, in NAG Fortran they are 16,1,2,3 in “precision-order”. So (KIND=16) is a LOT less precise than (KIND=3). We don’t want to try passing in -kind=byte (which might work) because then everything has to be compiled with -kind=byte after that. We are hoping to avoid that and are changing our internal Fortran code to use ISO_FORTRAN_ENV and REAL32, say, instead of REAL*4 or the like so we can become NAG compatible.

My colleague was able to do some hard-coded hacks to configure itself:

diff --git a/configure b/configure
index 778a005631..0830f31b21 100755
--- a/configure
+++ b/configure
@@ -7925,7 +7925,8 @@ if ac_fn_fc_try_run "$LINENO"; then :


         pac_validIntKinds="`sed -n '1p' pac_fconftest.out`"
-	pac_validRealKinds="`sed -n '2p' pac_fconftest.out`"
+	#pac_validRealKinds="`sed -n '2p' pac_fconftest.out`"
+	pac_validRealKinds="1,2,3"
         PAC_FC_MAX_REAL_PRECISION="`sed -n '3p' pac_fconftest.out`"

 cat >>confdefs.h <<_ACEOF
@@ -8097,7 +8098,8 @@ if ac_fn_fc_try_run "$LINENO"; then :


         pac_validIntKinds="`sed -n '1p' pac_fconftest.out`"
-	pac_validRealKinds="`sed -n '2p' pac_fconftest.out`"
+	#pac_validRealKinds="`sed -n '2p' pac_fconftest.out`"
+	pac_validRealKinds="1,2,3"
         PAC_FC_MAX_REAL_PRECISION="`sed -n '3p' pac_fconftest.out`"

 cat >>confdefs.h <<_ACEOF

which seemed to do something:

| #define H5CONFIG_F_NUM_RKIND INTEGER, PARAMETER :: num_rkinds = 3
| #define H5CONFIG_F_RKIND INTEGER, DIMENSION(1:num_rkinds) :: rkind = (/1,2,3/)
| #define H5CONFIG_F_RKIND_SIZEOF INTEGER, DIMENSION(1:num_rkinds) :: rkind_sizeof = (/4,8,16/)

and the build succeeded (make check failed but it looked to be some MPI issue).

But we aren’t autotools masters by any means, so we don’t quite know how to fix things underneath “correctly”.


#2

Hi Matthew,

I entered HDF5 bug HDFFV-11033 for this issue.
Thank you for reporting it!

-Barbara


#3

Thanks for your excellent detailed report. A correction, in HDF5 1.10.6 --enable/disable-fortran2003 is not an option in HDF5 1.10.6, so you are building the fortran2003 interfaces.