Test MPI_TEST_H5DIFF-h5diff_601 fails

Hello,

I’m trying to build and test hdf5 on an NVIDIA Grace Hopper node (aarch64) and am having trouble passing test h5diff_601. Can anyone advise, please?

Thanks,

Mark

OS: Rocky 9
Compiler: nvhpc 24.9
MPI: openmpi 4.1.6
HDF5 configure line: cmake -DHDF5_ENABLE_PARALLEL=ON -DCMAKE_INSTALL_PREFIX=$prefix -DBUILD_SHARED_LIBS=ON -DZLIB_USE_EXTERNAL=ON -DHDF5_BUILD_FORTRAN=ON …/hdf5

Test output:

Start testing: Nov 14 16:47 GMT

224/2824 Testing: H5DIFF-h5diff_601
224/2824 Test: H5DIFF-h5diff_601
Command: “/usr/bin/cmake” “-D” “TEST_EMULATOR=” “-D” “TEST_PROGRAM=/nobackup/users/thingamy/h/build/bin/h5diff” “-D” “TEST_ARGS:STRING=h5diff_basic1.h
5;h5diff_basic1.h5;nono_obj” “-D” “TEST_FOLDER=/nobackup/users/thingamy/h/build/tools/test/h5diff/testfiles” “-D” “TEST_OUTPUT=h5diff_601.out” “-D” “T
EST_EXPECT=2” “-D” “TEST_REFERENCE=h5diff_601.txt” “-D” “TEST_ERRREF=Object could not be found” “-D” “TEST_APPEND=EXIT CODE:” “-P” “/nobackup/users/th
ingamy/h/hdf5/config/cmake/grepTest.cmake”
Directory: /nobackup/users/thingamy/h/build/tools/test/h5diff/testfiles
“H5DIFF-h5diff_601” start time: Nov 14 16:47 GMT
Output:

– Optional TEST_FILTER to be defined
– COMMAND: /nobackup/users/thingamy/h/build/bin/h5diff h5diff_basic1.h5;h5diff_basic1.h5;nono_obj
– COMMAND Result: 2
– COMMAND Error:
– COMPARE Result: 0
– Passed: The output of /nobackup/users/thingamy/h/build/bin/h5diff matched

Test time = 0.06 sec

Test Passed.
“H5DIFF-h5diff_601” end time: Nov 14 16:47 GMT
“H5DIFF-h5diff_601” time elapsed: 00:00:00

225/2824 Testing: MPI_TEST_H5DIFF-h5diff_601
225/2824 Test: MPI_TEST_H5DIFF-h5diff_601
Command: “/usr/bin/cmake” “-D” “TEST_PROGRAM=/opt/software/builder/developers/libraries/openmpi/4.1.6/1/nvhpc-24.9/bin/mpiexec” “-D” “TEST_ARGS:STRING
=-n;1;;/nobackup/users/thingamy/h/build/bin/ph5diff;;h5diff_basic1.h5;h5diff_basic1.h5;nono_obj” “-D” “TEST_FOLDER=/nobackup/users/thingamy/h/build/to
ols/test/h5diff/PAR/testfiles” “-D” “TEST_OUTPUT=h5diff_601.out” “-D” “TEST_EXPECT=0” “-D” “TEST_REFERENCE=h5diff_601.txt” “-D” “TEST_ERRREF=Object co
uld not be found” “-D” “TEST_APPEND=EXIT CODE:” “-D” “TEST_REF_APPEND=EXIT CODE: [0-9]” “-D” “TEST_REF_FILTER=EXIT CODE: 0” “-D” “TEST_SORT_COMPARE=TR
UE” “-P” “/nobackup/users/thingamy/h/hdf5/config/cmake/grepTest.cmake”
Directory: /nobackup/users/thingamy/h/build/tools/test/h5diff/PAR/testfiles
“MPI_TEST_H5DIFF-h5diff_601” start time: Nov 14 16:47 GMT
Output:

– Optional TEST_FILTER to be defined
– COMMAND: /opt/software/builder/developers/libraries/openmpi/4.1.6/1/nvhpc-24.9/bin/mpiexec -n;1;;/nobackup/users/thingamy/h/build/bin/ph5diff;;h5d
iff_basic1.h5;h5diff_basic1.h5;nono_obj
– COMMAND Result: 0
– COMMAND Error:
Only 1 task available…doing serial diff

CMake Error at /nobackup/users/thingamy/h/hdf5/config/cmake/grepTest.cmake:119 (message):
Failed: The error output of
/opt/software/builder/developers/libraries/openmpi/4.1.6/1/nvhpc-24.9/bin/mpiexec
did not contain ‘Object could not be found’. Error output was: 'Only 1
task available…doing serial diff

Test time = 0.46 sec ---------------------------------------------------------- Test Failed. "MPI_TEST_H5DIFF-h5diff_601" end time: Nov 14 16:47 GMT "MPI_TEST_H5DIFF-h5diff_601" time elapsed: 00:00:00 ----------------------------------------------------------

End testing: Nov 14 16:47 GMT

The 601 test expects an error message that the object could not be found and it is not finding that because:

CMake Error at /nobackup/users/thingamy/h/hdf5/config/cmake/grepTest.cmake:119 (message):
Failed: The error output of
/opt/software/builder/developers/libraries/openmpi/4.1.6/1/nvhpc-24.9/bin/mpiexec
did not contain ‘Object could not be found’. Error output was: 'Only 1
task available…doing serial diff

You can check the files produced.

Hello and many thanks for the reply, I really appreciate it.

Looking at the files, the expected output was sent to standard out; however, in addition, the string “Only 1 task available…doing serial diff” was sent to standard error. In fact, all of the MPI tests run with one rank, which seems wrong.

Is the problem:

  1. The test(s) should be run with more than one MPI rank (in which case, why doesn’t the test do this, and how do I change it)?

  2. The output of ph5diff has changed since the test was written (in which case could the test be modified so that it expects this, please)?

I’m using the ctest suite (make test after cmake).

We use automated scripts to generate a tailored hdf5 for every compiler and MPI for each of our HPC machines, so figuring this out would help us remove a lot of eyeballing.

Thanks!

Mark

Replying to myself…

Found the MPIEXEC_MAX_NUMPROCS cmake option, changing it to 4 because that’s what I remember the HDF5 1.8 build used to do.

Perhaps the default for MPIEXEC_MAX_NUMPROCS should be changed?

Best,

Mark