Parallel test t_pshutdown failure HDF5 1.10.5 + Open MPI 3.1.4

Hi,

I am building an MPI-enabled HDF5 1.10.5 on Linux RHEL 6.5 with the following:

  • Intel Composer XE 2019u1
  • Open MPI 3.1.4
  • Mellanox MXM 3.5.3093-1.34100

All this is being done on an NFSv3 mount.

Make check all runs fine, except for t_pshutdown, which hangs. I left it running overnight, with 6 MPI ranks, and the processes were still running when I checked 8 hours later.

Manually running t_pshutdown with a single rank was fine:

$ mpirun -n 1 ./t_pshutdown
Testing proper shutdown of HDF5 library                                PASSED

Trying to run with >1 ranks hangs:

$ mpirun -n 2 ./t_pshutdown
Testing proper shutdown of HDF5 library                               

Any hints appreciated.

Hi,

I am working on a parallel testsuite as an independent researcher and have been noticing minor anomalies similar to yours. (sometimes it all of them passes, sometimes they fail) Here is my CMAKE configuration:
-DCMAKE_INSTALL_PREFIX=/home/steven/.local/tmp-002 -DHDF5_ENABLE_PARALLEL:BOOL=ON -DHDF5_ENABLE_THREADSAFE=OFF -DHDF5_BUILD_CPP_LIB:BOOL=OFF -DMPIEXEC_EXECUTABLE:STRING='srun' -DMPIEXEC_NUMPROC_FLAG:STRING=-n -DMPIEXEC_MAX_NUMPROCS:STRING= /home/steven/scratch/tmp-002/hdf5-src
MPI:(Open MPI) 4.1.0a1 and srun -n 4 or some small number like 4-10

Because of the nature assignment I am to test against ALL version of pHDF lib – therefore we could share results/experience. In order to do that I need the git commit number of the HDF5 lib you are working with, so we could match our results.

Unfortunately as of now I can’t change the OMPI to your version nor the compiler.

let me know
steve