HDF5 and OpenMPI


#1

Dear HDF5 Community Members,

In the pursuit of furthering our mission of ensuring the accessibility of HDF-stored data as it relates to open-source technologies, The HDF Group has recently been testing HDF5 with various versions of OpenMPI. As a result of the efforts of both The HDF Group and our dedicated users, we have discovered the following issues to be aware of:

With OpenMPI 2.x through 3.1.3, as well as 4.0.0, potential data corruption and crashes have been observed due to bugs within OMPIO. These problems have generally been resolved by switching to the ROMIO I/O backend for the time being by supplying the command-line option “–mca io ^ompio” to mpirun. For more information, refer to https://github.com/openPMD/openPMD-api/issues/446 (thanks to Axel Huebl for the initial report on the HDF Group Forum).

With OpenMPI 1.10 and the 3.0 series using the ROMIO I/O backend, crashes related to datatype flattening have been observed in the “t_filters_parallel” test on various Linux machines. Switching to the OMPIO I/O backend by adding “–mca io ompio” to mpirun has been sufficient to resolve these crashes. However, for OpenMPI 1.10 test failures still occur in “t_filters_parallel” due to a bug in MPI_Mprobe.

With OpenMPI 1.10 and 2.0.0 through 2.1.4, test failures have been observed in ‘testphdf5’ and ‘t_bigio’ due to a MPIO file driver write failure. As of OpenMPI 2.1.5 and 3.1.0, these tests appear to pass without problems.

With OpenMPI 3.0.0 through 3.0.2, test failures have been observed in ‘testphdf5’ and ‘t_shapesame’ due to a MPIO file driver write failure. As of OpenMPI 3.0.3, these tests appear to pass without problems.

Where possible, we recommend that users update to the latest stable version of OpenMPI within a given series and ideally to the latest series available. Specifically (and as one might expect), the previous points show that we have found the best compatibility with HDF5 using OpenMPI 2.1.5 (2.1.6 has not yet been tested), 3.0.3, 3.1.3 and 4.0.0. While the data corruption issues discovered with OpenMPI 3.1.3 and 4.0.0 are serious enough to potentially warrant holding off on such an upgrade, the OpenMPI team has been made aware of the issues and they can be worked around in the meantime by switching to the ROMIO I/O backend.

Please try to test HDF5 1.10.5 release candidate that was made available earlier today and report any problems found.

Thank you!

Elena on behalf of The HDF Group HDF5 developers


#2

Thanks for your comprehensive testing! And nice to see the issue with OpenMPI are considered and should be fixed in the future. :slight_smile:


#3

Thank you for the extensive testing and the heads-up to the community.

Special thanks to René Widera for co-investigating these issues with me and a big thank you to the OpenMPI community for the prompt support fixing the reported issues (ref: ompi/ompi #6285).