MPI portion of tests fails


#1

I had a complaint from a user that his MPI code would hang. We have a new cluster, so we are still in the test phase. I downloaded the latest version of HDF5 and built it, but the tests fail. I am using Intel’s Oneapi compilers.

===Parallel tests in testpar begin Mon Aug 8 14:35:35 PDT 2022===
**** Hint ****
Parallel test files reside in the current directory by default.
Set HDF5_PARAPREFIX to use another directory. e.g.,
HDF5_PARAPREFIX=/PFS/user/me
export HDF5_PARAPREFIX
make check
**** end of Hint ****
make[4]: Entering directory ‘/users/ramos/hdf5-1.12.2/testpar’

Testing: t_mpi

Test log for t_mpi

*** Hint ***
You can use environment variable HDF5_PARAPREFIX to run parallel test files in a
different directory or to add file type prefix. e.g.,
HDF5_PARAPREFIX=pfs:/PFS/user/me
export HDF5_PARAPREFIX
*** End of Hint ***

MPI functionality tests


Proc 0: *** MPIO 1 write Many read test…

Proc 0: hostname=node0001
Proc 2: hostname=node0001
Proc 4: hostname=node0001
Proc 1: hostname=node0001
Proc 3: hostname=node0001
Proc 5: hostname=node0001

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 3151539 RUNNING AT node0001.cm.cluster
= KILLED BY SIGNAL: 14 (Alarm clock)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 3151540 RUNNING AT node0001.cm.cluster
= KILLED BY SIGNAL: 14 (Alarm clock)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 2 PID 3151541 RUNNING AT node0001.cm.cluster
= KILLED BY SIGNAL: 14 (Alarm clock)


#2

Do you know exactly which version of the Intel compiler and Intel MPI library have been used?

In the past I also had some trouble to get Intel MPI working. In my case it turns out to be a bug in the ROMIO implementation of their MPI library.