I’m trying to build HDF5 with the --enable-parallel option. I’ve installed zlib (ZDIR=/home/chris/zlib). Using the MPICH libraries (built from source), I get as far as:
CC=/home/chris/mpich/install/bin/mpicc FC=/home/chris/mpich/install/bin/mpifort ./configure --with-zlib=${ZDIR} --prefix=${H5DIR} --enable-hl --enable-fortran --enable-parallel
make check
…and then just hangs. If I kill the job and rerun make check it thinks it has already run the tests:
No need to test t_cache again.
make[4]: Leaving directory ‘/home/nemo/hdf5/hdf5-1.10.5/testpar’
make[4]: Entering directory ‘/home/nemo/hdf5/hdf5-1.10.5/testpar’
Testing t_cache_image
but, this needs to be in a script so isn’t an acceptable solution!
In case anyone is wondering, I’m using MPICH because Open MPI doesn’t even get that far:
Hi, can’t speak for MPICH, but I have compiled several versions of Parallel HDF5 with OpenMPI 4.0.1. In fact I’ve compiled one commit for each day from 2003 - 2019 and linked against IOR performance measurement tool with good result.
Speaking from personal experience: Ubuntu 18.04 LTS is my preferred OS to build cluster on AWS EC2. And it should work fine. The only glitch I usually have is to compile systems from a shared / attached drive: for some reasons it messes up the time stamps. I usually resolve this by doing all system compiles on instances with ephemeral/local drives. However this problem is oblt OpenMPI specific, PHDF5 compiles fine on parallel FS.
I did manage to compile PHDF5 from 2003 - 2019 against OpenMPI 4.0.1 on and Ubuntu 18.04 LTS based custom cluster running on AWS EC2.
This message basically states that you do not have enough hardware resources to run the 6 processes you requested (OMPI assumes you are running for performance and refuses by default to oversubscribe your hardware resources). You can find more information in our FAQ.
$ /opt/openmpi-4.0.1/bin/mpicc --version
gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I missed running make clean before this (so it was still trying to build against a version of mpich I’d also installed). However, once I realised this, I still have issues:
h5repack tests failed with 1 errors.
Makefile:1451: recipe for target 'h5repack.sh.chkexe_' failed
H5repack error - make check and HDF5 make test in error suggest that these tests are unreliable and to use the -i flag. This does work, but it would be good to have some confirmation from an HDF5 developer if there are plans to fix the tests. I’m really not comfortable with hiding errors rather than fixing them, particularly for the community I’m supporting.
Let’s break the problems into smaller size, factoring out make check of HDF5:
OpenMPI 4.0.2 compiled ?
Is there a suitable Parallel File system such as: Lustre, BGFS, OrangeFS running?
did pHDF5 compile?
what job scheduler is in place: SLUM, GridEngine, …
are there enough resources available to run the job.
The make check may be controlled various way, and indeed can be finicky. To set the correct number of processes is suggested, by default I think it is set to 4 (not certain)
If your company is interested in quality IAAS on demand clusters matching with setup used on supercomputers my consulting company provides such services directly or through THDFGroup.