Parallel HDF5 1.10.4 make check fails on t_bigio


#1

Hi,

I’ve installed HDF 5 1.10.4 on Debian Jessie with zlib and --enable-parallel --enable-fortran.

Tests run fine up until:

Testing t_bigio

t_bigio Test Log

Testing Dataset1 write by ROW

Testing Dataset2 write by COL

Testing Dataset3 write select ALL proc 0, NONE others

Testing Dataset4 write point selection

mpiexec noticed that process rank 2 with PID 13639 on node 39514e5ff14e exited $n signal 11 (Segmentation fault).

Makefile:1436: recipe for target ‘t_bigio.chkexe_’ failed
make[4]: *** [t_bigio.chkexe_] Error 1
make[4]: Leaving directory ‘/tmp/hdf5-fortran/testpar’
Makefile:1545: recipe for target ‘build-check-p’ failed
make[3]: *** [build-check-p] Error 1
make[3]: Leaving directory ‘/tmp/hdf5-fortran/testpar’
Makefile:1416: recipe for target ‘test’ failed
make[2]: *** [test] Error 2

I’m running openmpi 1.6.5-9.1 with gcc 4.9.2-2.

(As a new user I can’t yet upload my config.log)

Any help much appreciated.

Simon


#2

Hello!

There is an known issue in the OpenMPI v1.x MPI datatype code that causes the HDF5 t_bigio test to fail. My understanding is that this has been fixed in current releases of OpenMPI 2.1.x, 3.0.x, 3.1.x, and 4.0.x.

-Barbara


#3

Hi Barbara,

Many thanks for the reply.

I will upgrade MPI and retry.

Best wishes,

Simon


#4

Having experimented with MPICH 3.1-5, I can confirm that the t_bigio check continues to fail at the same point:
Testing Dataset4 write point selection
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 4
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 5
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2

================================================================================

= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 24144 RUNNING AT 2a5df2ded2e5
= EXIT CODE: 1
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

===
Makefile:1436: recipe for target ‘t_bigio.chkexe_’ failed

Any more ideas would be welcomed.

Thanks,

Simon


#5

I can confirm that the issue is resolved with mpich-3.2.1.