Parallel HDF5 1.8.21 and OpenMPI 4.0.1

Hi,

I’ve built parallel HDF5 1.8.21 against OpenMPI 4.0.1 on CentOS 7 and a Lustre 2.12 filesystem using the OS-provided GCC 4.8.5 and am trying to run the testsuite. I’m failing the testphdf5 test: could anyone help, please?

I’ve successfully used the same method to pass tests when building HDF5 1.8.21 against different MPIs - MVAPICH2 2.3.1 and IntelMPI 2019.4.243.

I’ve set the following MCA param to try and force ROMIO:

export OMPI_MCA_io=romio321

I’ve built openmpi 4.0.1 with configure options:

./configure --prefix=$prefix
–with-sge
–with-io-romio-flags=–with-file-system=lustre+ufs
–enable-mpi-cxx
–with-cma
–enable-mpi1-compatibility
–with-ucx=$prefix --without-verbs
–enable-mca-no-build=btl-uct

For OpenMPI 4.0.1, I’m getting this failure:

testphdf5 Test Log

===================================
PHDF5 TESTS START

MPI-process 5. hostname=login2.arc4.leeds.ac.uk

For help use: /nobackup/issmcd/login2.arc4.leeds.ac.uk.u4q9A9ALkN/hdf5-1.8.21/testpar/.libs/testphdf5 -help
Linked with hdf5 version 1.8 release 21
MPI-process 3. hostname=login2.arc4.leeds.ac.uk

For help use: /nobackup/issmcd/login2.arc4.leeds.ac.uk.u4q9A9ALkN/hdf5-1.8.21/testpar/.libs/testphdf5 -help
Linked with hdf5 version 1.8 release 21
MPI-process 1. hostname=login2.arc4.leeds.ac.uk

For help use: /nobackup/issmcd/login2.arc4.leeds.ac.uk.u4q9A9ALkN/hdf5-1.8.21/testpar/.libs/testphdf5 -help
Linked with hdf5 version 1.8 release 21
MPI-process 2. hostname=login2.arc4.leeds.ac.uk

For help use: /nobackup/issmcd/login2.arc4.leeds.ac.uk.u4q9A9ALkN/hdf5-1.8.21/testpar/.libs/testphdf5 -help
Linked with hdf5 version 1.8 release 21
MPI-process 4. hostname=login2.arc4.leeds.ac.uk
MPI-process 0. hostname=login2.arc4.leeds.ac.uk

For help use: /nobackup/issmcd/login2.arc4.leeds.ac.uk.u4q9A9ALkN/hdf5-1.8.21/testpar/.libs/testphdf5 -help
Linked with hdf5 version 1.8 release 21

For help use: /nobackup/issmcd/login2.arc4.leeds.ac.uk.u4q9A9ALkN/hdf5-1.8.21/testpar/.libs/testphdf5 -help
Linked with hdf5 version 1.8 release 21
Test filenames are:
ParaTest.h5
Testing – fapl_mpio duplicate (mpiodup)
Test filenames are:
ParaTest.h5
Testing – fapl_mpio duplicate (mpiodup)
Test filenames are:
ParaTest.h5
Testing – fapl_mpio duplicate (mpiodup)
*** Hint ***
You can use environment variable HDF5_PARAPREFIX to run parallel test files in a
different directory or to add file type prefix. E.g.,
HDF5_PARAPREFIX=pfs:/PFS/user/me
export HDF5_PARAPREFIX
*** End of Hint ***
Test filenames are:
ParaTest.h5
Testing – fapl_mpio duplicate (mpiodup)
Test filenames are:
ParaTest.h5
Testing – fapl_mpio duplicate (mpiodup)
Test filenames are:
ParaTest.h5
Testing – fapl_mpio duplicate (mpiodup)
Testing – dataset using split communicators (split)
Testing – dataset using split communicators (split)
Testing – dataset using split communicators (split)
Testing – dataset using split communicators (split)
Testing – dataset using split communicators (split)
Testing – dataset using split communicators (split)
Testing – dataset independent write (idsetw)
Testing – dataset independent write (idsetw)
Testing – dataset independent write (idsetw)
Testing – dataset independent write (idsetw)
Testing – dataset independent write (idsetw)
Testing – dataset independent write (idsetw)
Testing – dataset independent read (idsetr)
Testing – dataset independent read (idsetr)
Testing – dataset independent read (idsetr)
Testing – dataset independent read (idsetr)
Testing – dataset independent read (idsetr)
Testing – dataset independent read (idsetr)
Testing – dataset collective write (cdsetw)
Testing – dataset collective write (cdsetw)
Testing – dataset collective write (cdsetw)
Testing – dataset collective write (cdsetw)
Testing – dataset collective write (cdsetw)
Testing – dataset collective write (cdsetw)
Testing – dataset collective read (cdsetr)
Testing – dataset collective read (cdsetr)
Testing – dataset collective read (cdsetr)
Testing – dataset collective read (cdsetr)
Testing – dataset collective read (cdsetr)
Testing – dataset collective read (cdsetr)
Testing – extendible dataset independent write (eidsetw)
Testing – extendible dataset independent write (eidsetw)
Testing – extendible dataset independent write (eidsetw)
Testing – extendible dataset independent write (eidsetw)
Testing – extendible dataset independent write (eidsetw)
Testing – extendible dataset independent write (eidsetw)
Testing – extendible dataset independent read (eidsetr)
Testing – extendible dataset independent read (eidsetr)
Testing – extendible dataset independent read (eidsetr)
Testing – extendible dataset independent read (eidsetr)
Testing – extendible dataset independent read (eidsetr)
Testing – extendible dataset independent read (eidsetr)
Testing – extendible dataset collective write (ecdsetw)
Testing – extendible dataset collective write (ecdsetw)
Testing – extendible dataset collective write (ecdsetw)
Testing – extendible dataset collective write (ecdsetw)
Testing – extendible dataset collective write (ecdsetw)
Testing – extendible dataset collective write (ecdsetw)
Testing – extendible dataset collective read (ecdsetr)
Testing – extendible dataset collective read (ecdsetr)
Testing – extendible dataset collective read (ecdsetr)
Testing – extendible dataset collective read (ecdsetr)
Testing – extendible dataset collective read (ecdsetr)
Testing – extendible dataset collective read (ecdsetr)
Testing – extendible dataset independent write #2 (eidsetw2)
Testing – extendible dataset independent write #2 (eidsetw2)
Testing – extendible dataset independent write #2 (eidsetw2)
Testing – extendible dataset independent write #2 (eidsetw2)
Testing – extendible dataset independent write #2 (eidsetw2)
Testing – extendible dataset independent write #2 (eidsetw2)
Testing – chunked dataset with none-selection (selnone)
Testing – chunked dataset with none-selection (selnone)
Testing – chunked dataset with none-selection (selnone)
Testing – chunked dataset with none-selection (selnone)
Testing – chunked dataset with none-selection (selnone)
Testing – chunked dataset with none-selection (selnone)
Testing – parallel extend Chunked allocation on serial file (calloc)
Testing – parallel extend Chunked allocation on serial file (calloc)
Testing – parallel extend Chunked allocation on serial file (calloc)
Testing – parallel extend Chunked allocation on serial file (calloc)
Testing – parallel extend Chunked allocation on serial file (calloc)
Testing – parallel extend Chunked allocation on serial file (calloc)
Testing – parallel read of dataset written serially with filters (fltread)
Testing – parallel read of dataset written serially with filters (fltread)
Testing – parallel read of dataset written serially with filters (fltread)
Testing – parallel read of dataset written serially with filters (fltread)
Testing – parallel read of dataset written serially with filters (fltread)
Testing – parallel read of dataset written serially with filters (fltread)
Testing – compressed dataset collective read (cmpdsetr)
Testing – compressed dataset collective read (cmpdsetr)
Testing – compressed dataset collective read (cmpdsetr)
Testing – compressed dataset collective read (cmpdsetr)
Testing – compressed dataset collective read (cmpdsetr)
Testing – compressed dataset collective read (cmpdsetr)
Testing – zero dim dset (zerodsetr)
Testing – zero dim dset (zerodsetr)
Testing – zero dim dset (zerodsetr)
Testing – zero dim dset (zerodsetr)
Testing – zero dim dset (zerodsetr)
Testing – zero dim dset (zerodsetr)
Testing – multiple datasets write (ndsetw)
Testing – multiple datasets write (ndsetw)
Testing – multiple datasets write (ndsetw)
Testing – multiple datasets write (ndsetw)
Testing – multiple datasets write (ndsetw)
Testing – multiple datasets write (ndsetw)
Testing – multiple groups write (ngrpw)
Testing – multiple groups write (ngrpw)
Testing – multiple groups write (ngrpw)
Testing – multiple groups write (ngrpw)
Testing – multiple groups write (ngrpw)
Testing – multiple groups write (ngrpw)
Testing – multiple groups read (ngrpr)
Testing – multiple groups read (ngrpr)
Testing – multiple groups read (ngrpr)
Testing – multiple groups read (ngrpr)
Testing – multiple groups read (ngrpr)
Testing – multiple groups read (ngrpr)
Testing – compact dataset test (compact)
Testing – compact dataset test (compact)
Testing – compact dataset test (compact)
Testing – compact dataset test (compact)
Testing – compact dataset test (compact)
Testing – compact dataset test (compact)
Testing – collective group and dataset write (cngrpw)
Testing – collective group and dataset write (cngrpw)
Testing – collective group and dataset write (cngrpw)
Testing – collective group and dataset write (cngrpw)
Testing – collective group and dataset write (cngrpw)
Testing – collective group and dataset write (cngrpw)
Testing – independent group and dataset read (ingrpr)
Testing – independent group and dataset read (ingrpr)
Testing – independent group and dataset read (ingrpr)
Testing – independent group and dataset read (ingrpr)
Testing – independent group and dataset read (ingrpr)
Testing – independent group and dataset read (ingrpr)
Testing – big dataset test (bigdset)
Testing – big dataset test (bigdset)
Testing – big dataset test (bigdset)
Testing – big dataset test (bigdset)
Testing – big dataset test (bigdset)
Testing – big dataset test (bigdset)
Testing – dataset fill value (fill)
Testing – dataset fill value (fill)
Testing – dataset fill value (fill)
Testing – dataset fill value (fill)
Testing – dataset fill value (fill)
Testing – dataset fill value (fill)
Testing – simple collective chunk io (cchunk1)
Testing – simple collective chunk io (cchunk1)
Testing – simple collective chunk io (cchunk1)
Testing – simple collective chunk io (cchunk1)
Testing – simple collective chunk io (cchunk1)
Testing – simple collective chunk io (cchunk1)
Testing – noncontiguous collective chunk io (cchunk2)
Testing – noncontiguous collective chunk io (cchunk2)
Testing – noncontiguous collective chunk io (cchunk2)
Testing – noncontiguous collective chunk io (cchunk2)
Testing – noncontiguous collective chunk io (cchunk2)
Testing – noncontiguous collective chunk io (cchunk2)
Testing – multi-chunk collective chunk io (cchunk3)
Testing – multi-chunk collective chunk io (cchunk3)
Testing – multi-chunk collective chunk io (cchunk3)
Testing – multi-chunk collective chunk io (cchunk3)
Testing – multi-chunk collective chunk io (cchunk3)
Testing – multi-chunk collective chunk io (cchunk3)

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun noticed that process rank 0 with PID 0 on node login2 exited on signal 14 (Alarm clock).

4198.74user 2989.71system 20:01.90elapsed 598%CPU (0avgtext+0avgdata 119908maxresident)k
2744inputs+67648outputs (9major+97788minor)pagefaults 0swaps

Hello,

The Alarm Clock message at the end indicates that a test ran longer than 20 minutes allowed by default. The time allowed can be increased by setting the environment variable HDF5_ALARM_SECONDS to a larger value in seconds – 2400 to double the time, 3600 to triple, etc.

Larry

Hi Larry,

Thanks for the suggestion - I tried upping HDF5_ALARM_SECONDS to 3600 seconds with no change in result. As far as the log reports, it gets to exactly the same point.

If I rerun the tests and strace each of the 6 testphdf5 processes, all I’m seeing is the following line printed again and again, with nothing else:

read(35, “”, 0) = 0

Given that this is an otherwise idle Lustre filesystem, the MVAPICH2 and IntelMPI builds passed tests within the default 20 minutes, and the OpenMPI build hasn’t within 60 minutes, I guess it’s somehow got stuck in a loop that isn’t terminating for some reason.

I don’t know if anyone has any clues on how to debug this, please?

Thanks,

Mark

Hi Mark,

I can’t tell for certain, can you check what version of pmix is that ompi shipped with? Did you want to try to run it with external pmix 3.1.3 then compile ompi with that? Notice that the exteral matched libevent

./autogen.pl && ./configure --prefix=/usr/local --with-slurm --with-pmix=/usr/local --enable-mpi1-compatibility --with-libevent=/usr/local --with-hwloc=/usr/local

This works for slurm + ompi 4.0.1 + PVFS2 + pHDF5 back to 1997 see this presentation slide for performance regression test of parallel HDF5 from 1997 - 2019.

steve

Hi Steve,

Thanks for this: I’ve tried rebuilding everything on top of pmix 3.1.3 and libevent 2.1.11-stable. In addition, I’ve got rid of various non-essential configuration in order to simplify things. Unfortunately, I get the same result.

I do get an improvement if I move the hdf5 test directory off the Lustre filesystem and onto a local one - then it passes the test.

To be clear, I’m running the tests on a single machine outside of any batch queue system.

Built libevent with: ./configure --prefix=$prefix

Built pmix with: ./configure --prefix=$prefix --with-libevent=$prefix

Built openmpi with: ./configure --prefix=$prefix --with-libevent=$prefix --with-pmix=$prefix --enable-mpi1-compatibility

Built hdf5 with: ./configure --prefix="$prefix" --enable-parallel --with-zlib

Mark,
can you send me the git version of HDF5 and I can run a check on that version against the stack we use here: h5cluster with ompi 4.0.1 custom compiled with most recent external PMIx 3 series ( 4 series doesn’t work with slum yet ).

This will not necessary solve your exact problem, but can give you upper or lower bounds on what tests fail/pass.
steve

Hi Steve,

I’m just using the HDF5 1.8.21 release. Are you running on top of Lustre like us, or OrangeFS (suggested by the presentation you pointed me at). I suspect that this might be the significant difference between our findings.

All the best,

Mark

Yes. As of now I can only test against OrangeFS which means the ompio pulls in the OrangeFS specific mca, as opposed to romio. In few weeks I will be able to run it against lustre but for now I am limited to pvfs.

steve