I am having failures running the hdf5-1.10.0-patch1 parallel tests testphdf5. The t_mpi test passes with no issues.
Many of the failures occur in the call stack with PMPI_File_set_view being called by H5FDWrite. I am using gcc-4.7.2 and openmpi-1.6.4 on a RHEL6 system. I am also getting failures on OSX El Capitan with gcc-4.9.4 and openmpi.
On RHEL6, the eidsetw2 is one of the tests failing. The backtrace is:
I’m not really asking for anyone to debug this for me, just wondering if anyone else is having issues running the parallel tests with hdf5-1.10.0-patch1.
Thanks,
..Greg
···
--
"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”
I think the issue is related to using an older openmpi (or maybe just using openmpi). In hdf5-1.8.16, H5Dchunk.c, there is a comment about working around a bug for MPI_Type_create_hindexed_block(). The comment says that “should not have a special case for blocks == 0, but ompi (as of 1.8.1) has a bug in file_set_view when a zero size datatype is create with hindexed or hvector.”
This fix is not in hdf5-1.10.0-patch1. My cases are failing (with openmpi-1.6.4 and openmpi-1.8.1) on processors where blocks == 0 and they are failing with MPI_File_set_view in the backtrace. If I pull the workaround from 1.8.16 in H5Dchunk.c into 1.8.10-patch1, then the code makes it past this point (but then fails an assert at a later point in the test).
..Greg
···
--
"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”
I am having failures running the hdf5-1.10.0-patch1 parallel tests testphdf5. The t_mpi test passes with no issues.
Many of the failures occur in the call stack with PMPI_File_set_view being called by H5FDWrite. I am using gcc-4.7.2 and openmpi-1.6.4 on a RHEL6 system. I am also getting failures on OSX El Capitan with gcc-4.9.4 and openmpi.
On RHEL6, the eidsetw2 is one of the tests failing. The backtrace is:
I’m not really asking for anyone to debug this for me, just wondering if anyone else is having issues running the parallel tests with hdf5-1.10.0-patch1.
Thanks,
..Greg
--
"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”
Good hunch about ompi. OpenMPI fixed this bug a couple years back.
==rob
···
On 10/25/2016 06:41 PM, Sjaardema, Gregory D wrote:
I think the issue is related to using an older openmpi (or maybe just
using openmpi). In hdf5-1.8.16, H5Dchunk.c, there is a comment about
working around a bug for MPI_Type_create_hindexed_block(). The comment
says that �should not have a special case for blocks == 0, but ompi (as
of 1.8.1) has a bug in file_set_view when a zero size datatype is create
with hindexed or hvector.�
This fix is not in hdf5-1.10.0-patch1. My cases are failing (with
openmpi-1.6.4 and openmpi-1.8.1) on processors where blocks == 0 and
they are failing with MPI_File_set_view in the backtrace. If I pull the
workaround from 1.8.16 in H5Dchunk.c into 1.8.10-patch1, then the code
makes it past this point (but then fails an assert at a later point in
the test).
On 10/25/2016 06:41 PM, Sjaardema, Gregory D wrote:
> I think the issue is related to using an older openmpi (or maybe just
> using openmpi). In hdf5-1.8.16, H5Dchunk.c, there is a comment about
> working around a bug for MPI_Type_create_hindexed_block(). The comment
> says that “should not have a special case for blocks == 0, but ompi (as
> of 1.8.1) has a bug in file_set_view when a zero size datatype is create
> with hindexed or hvector.”
>
>
>
> This fix is not in hdf5-1.10.0-patch1. My cases are failing (with
> openmpi-1.6.4 and openmpi-1.8.1) on processors where blocks == 0 and
> they are failing with MPI_File_set_view in the backtrace. If I pull the
> workaround from 1.8.16 in H5Dchunk.c into 1.8.10-patch1, then the code
> makes it past this point (but then fails an assert at a later point in
> the test).