PHDF5, testpar/t_mpi fails

I am attempting to install HDF5 (latest stable release, v1.14.4-3) with enable-parallel, but the testpar/t_mpi test in ‘make check’ fails.
This is with Intel OneAPI, v2021.9.0 on a AWS instance. My configure line is:
–with-zlib=$ZLIB --enable-fortran --enable-hl --enable-parallel --disable-shared
where ZLIB is where I installed the zlib compression library.

I currently don’t have the option of updating the Intel OneAPI version.

I would love if someone could help me. Thanks!

Hi, @gregory.thompson !

What’s your OS - windows / mac / linux?

It’s Linux2 on AWS.

Would building a more recent version of MPI make any difference? I have read something related to ROMIO in an older version of Intel’s MPI is to blame. But if I install a newer MPI in a local dir (I’m unable to override system level installs), does it matter?

Probably.

This is one of our daily test results that is the closest to your setting:

Test: MPI_TEST_t_mpi (Passed)
Build: 1.14.6-openmpi/4.1.2–intel-classic/2021.10.0-magic-Linux-4.18.0-553.27.1.1toss.t4.x86_64-x86_64 (corona211) on 2025-02-05 15:14:12

The Build line means:

HDF5: 1.14.6 (coming soon)
OpenMPI: 4.1.2
oneAPI: 2021.10.0

Thanks for that info. Please excuse my possible blunders, but I am a scientist trying to build an application that can greatly benefit from HDF5’s parallel I/O. My knowledge of “system administration” or software stack installation is seriously low.

For clarification, if I build MPICH v4.3.0 from scratch using the intel icc/ifort compilers, then put the resulting include and lib folders in CPPFLAGS and LDFLAGS when I try building HDF5, does that achieve an effective replacement to the pre-installed OneAPI MPI versions?

Hi, @gregory.thompson !

Gemini (Google AI) says, yes:

Yes, you can replace Intel’s default MPI with MPICH built from scratch using the oneAPI compiler. Here’s a breakdown of how to do it and key considerations:

Why Replace Intel MPI?

  • Flexibility and Customization: Building MPICH from scratch allows you to fine-tune it for your specific hardware and workload. You can enable or disable features, optimize for particular interconnects, and potentially achieve better performance.
  • Specific Features: MPICH might offer features or versions that are not yet available or prioritized in Intel’s MPI implementation.
  • Open Source: MPICH is open-source, which can be a preference for some users.

Steps to Replace Intel MPI with MPICH

  1. Install oneAPI Base Toolkit: Ensure you have the Intel oneAPI Base Toolkit installed, as it provides the necessary compilers (icx, icpx) and libraries.
  2. Download MPICH Source Code: Download the latest MPICH source code from the official website or a trusted mirror.
  3. Configure MPICH: This is a crucial step. You’ll need to configure MPICH to use the oneAPI compilers. Here’s a general example:Bash./configure --prefix=/path/to/your/mpich/installation \ CC=icx CXX=icpx FC=ifort \ --with-device=ch4:ofi
  • Replace /path/to/your/mpich/installation with your desired installation directory.
  • CC=icx, CXX=icpx, and FC=ifort tell MPICH to use the oneAPI C, C++, and Fortran compilers, respectively.
  • --with-device=ch4:ofi enables support for the OpenFabrics Interface (OFI), which is commonly used for high-performance interconnects. Adjust this based on your network.
  1. Build and Install MPICH:Bashmake make install
  2. Set Environment Variables: After installation, set the necessary environment variables to point to your MPICH installation:Bashexport PATH=/path/to/your/mpich/installation/bin:$PATH export LD_LIBRARY_PATH=/path/to/your/mpich/installation/lib:$LD_LIBRARY_PATH
  3. Test Your Installation: Compile and run a simple MPI program to verify that MPICH is working correctly.

Important Considerations

  • Compatibility: Ensure that the MPICH version you build is compatible with the other libraries and tools you are using.
  • Performance: While MPICH can be highly optimized, achieving the best performance might require careful configuration and tuning.
  • Support: If you encounter issues, you’ll need to rely on the MPICH community for support, as Intel might not provide direct support for MPICH.
  • Intel MPI Features: Be aware that some features specific to Intel MPI might not be available in MPICH.

Additional Tips

  • Compiler Flags: You might need to add specific compiler flags during the configuration step to optimize for your target architecture.
  • Documentation: Refer to the official MPICH documentation for detailed instructions and advanced configuration options.
  • Community Support: The MPICH community is a valuable resource for troubleshooting and getting help with your installation.

By following these steps and considering the important points, you can successfully replace Intel’s default MPI with MPICH built from scratch using the oneAPI compiler.

What is the actual error that the test returns? This most likely indicates an MPI set-up/configuration issue since this test is, for the most part, checking MPI functionality.

I successfully installed newer MPICH and then I build HDF5.

Now when I switch to the testpar directory and run ‘make check’ the t_mpi test succeeds! But the next one is “t_bigio” and fails with this (abbreviated) error message:

Single Rank Independent I/O

HDF5-DIAG: Error detected in HDF5 (1.14.4-3) MPI-process 0:
  #000: H5D.c line 1371 in H5Dwrite(): can't synchronously write data
    major: Dataset
    minor: Write failed

Hi, @gregory.thompson !

Congratulations!

As the test name “big” suggests, it creates a big file.
Does your AWS instance have an enough disk space (e.g., EBS/EFS)?

In addition, it is a well known issue: t_bigio test failure with 1.14.0 and mpich on Fedora rawhide · Issue #2510 · HDFGroup/hdf5.

Yes, the AWS instance has enough available space. From other searching, I just tried using this:

export HDF5_DO_MPI_FILE_SYNC=0

and the test now PASSES! So maybe the rest of the testpar checks will work as well.

1 Like

One more update however. Test t_cache_image hangs. I had to Control-C it. Here is the snippet when I aborted…

Test log for t_cache_image 
============================
===================================
Parallel metadata cache image tests
        mpi_size     = 6
===================================
Constructing test files: 
   writing t_cache_image_00 ... done.
   writing t_cache_image_01 ... done.
Test file construction complete.
Testing parallel CI load test -- proc0 md write -- R/O                 PASSED
Testing parallel CI load test -- dist md write -- R/O                  PASSED
Testing parallel CI load test -- proc0 md write -- R/W                [mpiexec@ip-10-33-14-103] Sending Ctrl-C to processes as requested

It’s broken:

hdf5/testpar/CMakeTests.cmake at develop · HDFGroup/hdf5

I don’t understand what you are telling me. It’s broken? Then why is it part of the test suite? I see the same line in my CMakeTests.cmake file. Should it be commented out? How do I take action to resolve so that make check inside testpar subdirectory works?

You’re using Autotools, which will be deprecated soon. Please test with CMake.

See also Remove Autotools support by byrnHDF · Pull Request #5241 · HDFGroup/hdf5.

Hi @gregory.thompson,

I don’t understand what you are telling me. It’s broken? Then why is it part of the test suite? I see the same line in my CMakeTests.cmake file. Should it be commented out? How do I take action to resolve so that make check inside testpar subdirectory works?

The test isn’t broken, but for different combinations of platforms and MPI implementations/versions, the test may hang. In general, we have no issues with recent MPI versions on the platforms we support, but there are various issues we know of on platforms we don’t support or with specific versions of MPICH or OpenMPI that we either document or workaround.

OK, thanks for the replies. I think - and hope - that I’m now able to build the remaining software stack (netCDF) to accomplish the end goal. I appreciate the help that got it to work!