Can we retire the Autotools?

Maintaining two build systems is a huge pain. Either we duplicate effort, which is wasteful, or we have to come up with build-system-independent work-arounds like our common feature test code and our compiler flags.

Since the Autotools don’t really work on Windows, that leaves us with CMake as the only viable single build system. I know that this would be a big change for people, so I wouldn’t implement it until HDF5 2.0 (it would go into develop after we create the hdf5_1_14 branch) and it would never go into any of the 1.x maintenance branches. As a part of this change, we’d create an Autotools --> CMake matrix so you could easily map Autotools build options onto CMake build options.

If this is something that would impact you in a major, negative way, we’d like to hear from you, so drop a comment in this post. We’d also like to hear from you if there is anything you think is done poorly in CMake but well in the Autotools or if you have any suggestions for improving the CMake experience overall.

5 Likes

Another reason to ditch the Autotools is that managing the generated files (configure, Makefile.in, etc.) is a pain. We don’t check the generated files into maintenance branches like hdf5_1_10 or develop, so we have to generate those and check them in when we create a release branch. It’d be nice to ditch that manual step.

Dana Robinson @derobins wrote a blog post with his thoughts on retiring autotools. You can read the whole post here:

Please chime in and let us know what you think.

I would recommend to try moving towards a static configuration that does not rely on a build system at all, i.e., to have the ability to just compile a set of C files without any need of external software needed for configuration (such as bash or cmake ). That makes it much easier to integrate HDF5 into other software packages with their own build systems.

This would pretty much require some compiler-specific “#ifdef”-battle in a pre-defined, “standard” H5configure.h header file that comes right with the HDF5 distribution instead of one generated dynamically by the build system, may it be autoconf or CMake. With #ifdef’s a lot of cases can already be identified by the compiler, without relying on external software creating a header file.

Particularly, modern C preprocessors offer the __has_include directive:

https://gcc.gnu.org/onlinedocs/cpp/_005f_005fhas_005finclude.html#g_t_005f_005fhas_005finclude

So probably we can get away with all kind of external checking for the existence of header files or particular OS features. The vision is to be able to just use a set if .c and .h files, drag them into some application-specific built system and compile it.

This would also have the advantage that exactly the same source code directory can be used for different platforms, like compiling a windows and a linux version from the same source directory (provided of course that the built system places binaries in platform-specific directories instead of the source code directory, but that should be done anyway).

Maybe there would be a “minimal” configure for use-defined settings, like the options on the current configure scripts, such as the mentioned H5feature.h . But those options can reside in a minimal user-config.h header file, and possibly just a handful of such defines are actually needed. The benefit is that this one single, platform-independent user-config.h header file would then be valid for all platform-specific compilations from the same source tree, and re-compilation is automatically triggered when this one header file is modified.

Currently such a use case is not possible as the dynamically created, platform-specific configure file is placed in the HDF5 source code directory.

Beside the configure header files, also H5libsettings and H5detect are two components that hinder static configurations. Particularly H5detect is created by a binary that is run during the build process. Here the question is whether we can get away with this mechanism, either by also using #ifdef compiler-specific definitions, or by providing platform-specific pre-configurations for known platforms.

For unknown or too complex platforms, using the current configure / cmake mechanims would still be available. But it would be good to at least trying to go into the direction of a static configuration as much as possible such to cover the most frequently used platforms.

1 Like

I for one would NOT be a fan of this kind of static configuration. My projects are large enough as it is, adding in the compile time for HDF5 on top of our compile time isn’t wanted.

The argument that you should be able to compile HDF5 without “external” tools is slightly invalid in my opinion. If you are in fact compiling source codes you will, by definition, have external tools on your system already. CMake is the least common denominator in this scenario (full disclosure, I helped bring CMake support to HDF5 way back in the 1.6/1.8 days). Bash is already on nearly all *NIX systems, Windows being the obvious exception.

And going with this static configuration but then falling back to CMake/AutoTools for “unknown or complex” systems puts the HDF maintainers right back where they are today: Keeping to configuration systems going. The idea is to lessen the maintenance burden on the HDF5 maintainers so they can concentrate on bug fixes and feature implementations.

3 Likes

First, thanks @derobins for writing the blog post and taking the time to ask the community for opinions.

I maintain a distribution of the HDF5 library in an R package (https://github.com/grimbough/Rhdf5lib) and currently that relies on the AutoTools configure script etc to manage the compilation in a way that’s compatible with R. This is mostly because R itself uses AutoTools and you can almost guarantee that anyone who is trying to compile an R package from source will already have the appropriate tools on their system and be able to run a configure script etc.

CMake on the other hand is not particularly popular within R packages. Only 10 of the 24,000 packages on CRAN declare that CMake is a system requirement for installing the package. I suspect that’s because many package authors don’t declare it despite requiring it, but it indicates the relative lack of adoption. The whole reason I distribute this package is to make installing HDF5 easy for R users, and I’d be reluctant to ask them to install something else outside of R.

So, on the face of it it’d really prefer the AutoTools option to remain! That said, the “Writing R Extensions” guide for package developers does mention CMake (https://cran.r-project.org/doc/manuals/R-exts.html#Using-cmake), so maybe this wouldn’t be as disruptive as I fear. Rhdf5lib is also stuck on version 1.10.6 of the library due to some R toolchain incompatibilities on Windows, and unless I sort those this is also a moot point.

1 Like

@derobins do you know how bleeding edge HDF5’s CMake requirement is? Reason I ask is that for those package maintainers out there with a large legacy investment in AutoTools, there are sort of two issues. One is just using CMake instead of AutoTools to install HDF5. The other is depending on having an available CMake install. The latter issue is mitigated somewhat by ensuring HDF5 is not on CMake’s bleeding edge…having CMake logic that works as needed on older versions of CMake which are likely to already be installed and available on most systems.

Do we have a good sense of what older version(s) of CMake that might be and whether HDF5’s CMake logic is compatible with those oldere versions?

1 Like

@miller86, great point!

For example, only utunbu_latest worked for my aarch64 testing due to CMake 3.18 requirement from develop branch.

cmake_minimum_required (VERSION 3.18)

3.18 is pretty recent…too recent I think for it to be ubiquitously available I think on various linux distros.

I wonder how problematic it is or would be for THG to adjust their CMake logic to target an older version? This might be too problematic, however. Older versions of CMake also suffered from more buggy behavior that THG may have to code work-arounds for too.

Not sure what the right answer is.

FWIW, IMHO it makes no sense for THG to maintain both AutoTools and CMake.

I am not a CMake expert by any stretch. But, I am wondering if CMake offers a solution for maintainers to effectively embed it in their releases so that the command cmake -DSOME_VAR:BOOL=TRUE ../src/. has the effect of first attempting to compile and use a version of CMake in the source dir ../src/. before using whatever its current local version is. That would at least address the dependency problem.

1 Like

Previously, and in the existing 1.8,1.10, 1.12 branches, the CMake minimum is 3.12 with appropriate workarounds and restrictions. Those will not move. Develop was moved to CMake 3.18 to eliminate those issues and get all the supported platforms at the same level plus add new compilers (i.e. VS2022 support). A cursory inspection of various distros/machines indicated that CMake 3.18 was available.

2 Likes

I support the idea of retiring Autotools. Having two build systems is not only extra work for the HDF5 team but also requires everyone with a project relying on HDF5 to be aware of the differences.

My recent pull request (https://github.com/HDFGroup/hdf5/pull/2219) was an attempt to make the Autotools installation behave like a CMake installation.

I feel like there’s a constant tension between “stable CMake” and “useful CMake features”. Compared to the Autotools, CMake evolves pretty quickly and that will be a challenge, moving forward.

As for Visual Studio support, does that require updating the minimum version? If so, can we use a conditional to set that statement? Requiring very recent CMake might be less of an issue on Windows compared to Linux.

Yes, we use version checking to use workarounds or check minimum for new compiler/platforms. At some point changing the minimum is the better solution. Once every 5 years isn’t too bad of a cadence.

Got it. Yeah, my only concern is that important, older machines can often be in use long after most people have moved on. I feel like this is especially true in the sciences. I recall being at a nuclear physics experiment where the magnet control program ran on a 486 running Windows 3.1 because the developer wrote low-level DOS code for that platform and then the source code was lost.

1 Like

If you must switch to CMake, please don’t require a version more recent than that in Debian stable (or even oldstable), else you’ll require people to compile it which, believe me, is not a pleasant experience.

We don’t check the generated files into maintenance branches

Why not? You can mark them as generated in .gitattributes so they don’t clog up PRs, you can put them in LFS if they’re largish (and configure is what, a few hundred KB?)

+1 for retiring autotools in favour of CMake.

+1 for @w.benger’s suggestions too. It’s much better to not do configuration-time introspection whenever possible (though of course sometimes there may be a need to fallback to that as a last resort). It’s just much friendlier for cross-complation in general and Universal Binaries on macOS specifically. I’m talking about things like introspecting the size of int, whether the system is big/little endian, whether some header or function exists. It’s much better do use __has_include like he said, or things like #ifdef __APPLE__

@mike.jackson if he’s saying what I’m saying, it’s not that we don’t still need CMake, it’s just that we shouldn’t use it for some of what it’s being used for currently.

1 Like

At NOAA we use the autotools build, but if HDF5 switches to CMake exclusively, we can deal with it. (How does the HDF5 spack build work?)

I understand the tension of maintaining two build systems - it’s what’s done on netcdf-c, netcdf-fortran, and PIO, all of which I contribute to. If only one build system were used, that would be a huge win.

I don’t understand how a package like HDF5 or netCDF could build without configuration-time introspection. Nor do I think that platform-dependent pre-processor symbols should be (re-)introduced into our codebases. We got away from that once, and should not go back.

Furthermore, shared library builds are very common and need to be supported, as well as static builds. Autotools is good at automatically supporting both, CMake can support both but needs a little more effort. Both are essential.

I should have started by thanking you, Dana Robinson @derobins, for this idea and your efforts! :wink:

HDF5 is vital software for so many systems! Keep up the good work!

2 Likes

The spack hdf5 package was switched to use CMake at least a year ago.

1 Like