Hello,
I am trying to build parallel hdf5 1.8.12 on RHEL6.4 using the Intel compilers (C icc and Fortran ifort) and the IntelMPI 4.1.0 (which are included in the Intel Cluster Studio XE 2013). Everything configured and compiled okay, although I am getting failures when running "make check", specifically in "testpar/testphdf5". The error message is the following:
...
Testing -- test cause for broken collective io (nocolcause)
Testing -- test cause for broken collective io (nocolcause)
Testing -- test cause for broken collective io (nocolcause)
Testing -- test cause for broken collective io (nocolcause)
Testing -- test cause for broken collective io (nocolcause)
Testing -- test cause for broken collective io (nocolcause)
Fatal error in PMPI_Barrier: Invalid communicator, error stack:
PMPI_Barrier(949): MPI_Barrier(comm=0x0) failed
PMPI_Barrier(903): Invalid communicator
Fatal error in PMPI_Barrier: Invalid communicator, error stack:
PMPI_Barrier(949): MPI_Barrier(comm=0x0) failed
PMPI_Barrier(903): Invalid communicator
Fatal error in PMPI_Barrier: Invalid communicator, error stack:
PMPI_Barrier(949): MPI_Barrier(comm=0x0) failed
PMPI_Barrier(903): Invalid communicator
Fatal error in PMPI_Barrier: Invalid communicator, error stack:
PMPI_Barrier(949): MPI_Barrier(comm=0x0) failed
PMPI_Barrier(903): Invalid communicator
Fatal error in PMPI_Barrier: Invalid communicator, error stack:
PMPI_Barrier(949): MPI_Barrier(comm=0x0) failed
PMPI_Barrier(903): Invalid communicator
Fatal error in PMPI_Barrier: Invalid communicator, error stack:
PMPI_Barrier(949): MPI_Barrier(comm=0x0) failed
PMPI_Barrier(903): Invalid communicator
...
I have managed to narrow it down - this error is generated when the MPIPOSIX driver is used (I think it's in the call to H5Pset_fapl_mpiposix() itself). The "nocolcause" test is in the function "no_collective_cause_tests" in file "testpar/t_dest.c". When I comment out the two lines in this function that set the "TEST_SET_MPIPOSIX" flag, thus skipping the MPIPOSIX test, everything else checks successfully.
I am very new to HDF5, and to parallel io at all, so any help with this issue will be much appreciated.
Thanks,
Boyan
PS: While poking about it, I found an inconsequential bug in testpar/t_dest.c on line 3681 (in the same function): the line is
MPI_Comm_size(MPI_COMM_WORLD, &mpi_rank);
while it should be
MPI_Comm_rank(MPI_COMM_WORLD, &mpi_rank);
···
====================================================================================
La version française suit le texte anglais.
------------------------------------------------------------------------------------
This email may contain privileged and/or confidential information, and the Bank of
Canada does not waive any related rights. Any distribution, use, or copying of this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately from
your system and notify the sender promptly by email that you have done so.
------------------------------------------------------------------------------------
Le présent courriel peut contenir de l'information privilégiée ou confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre
ordinateur toute copie du courriel reçu.