Writing large data using parallel HDF5

soorena.parthian · September 4, 2021, 9:34pm

Hi,

I have problem writing large data using parallel HDF5.
I watched “Parallel HDF5 quick start and tuning knobs” by Quincey Koziol on youtube and followed his examples, which are nicely available on https://github.com/HDFGroup/Tutorial/tree/main/Parallel-hands-on-tutorial
The tutorial is very instructive and I managed to run them all successfully on my machine. However, I encounter a problem when increasing the dataset size. For example, in “h5par_ex1d.c” file available in the tutorial (the link is above), I increased DIM0 and CHUNK_DIM0 from 230,000 to 300,000,000. Here is the error I get:

$ mpirun -np 4 ./h5par_ex1d
[msi:60674] *** An error occurred in MPI_Type_create_hindexed
[msi:60674] *** reported by process [1942749185,3]
[msi:60674] *** on communicator MPI_COMM_WORLD
[msi:60674] *** MPI_ERR_ARG: invalid argument of some other kind
[msi:60674] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[msi:60674] *** and potentially your MPI job)
[msi:60667] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 2198
[msi:60667] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[msi:60667] Set MCA parameter “orte_base_help_aggregate” to 0 to see all help / error messages

However, the code runs fine if I use data size 200,000,000.

My workstation has 64 GB memory, and nearly 50 GB is free. My disk also has 133 GB free space.
I use OpenMPI 4.1.1 and HDF5 1.10.7 on Ubuntu 20.04.3, kernel 5.11.0-27-generic, OS Type 64, CPU Intel® Core™ i7-7700K CPU @ 4.20GHz, Architecture x86_64.

I appreciate if someone can help me solving this problem.

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Writing large data using parallel HDF5