Parallel file writing crashes when using 4 procs

Hello,
test code is here:
henry / mpi_test_perf · GitLab.
Tested with 2^32 array 1d is ok, but with 2^33, it works for 1 and 2 procs, but it fails with 4 procs. Using gdb as described here (FAQ: Debugging applications in parallel), i got the following messages:

writeH5compressed:46 dims: 8589934592 
[New Thread 0x7ffff5591740 (LWP 1715479)]

Thread 1 "ph5_dataset" received signal SIGSEGV, Segmentation fau
__memmove_avx_unaligned_erms ()
    at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:
262     ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.Sr directory.
(gdb) where
#0  __memmove_avx_unaligned_erms ()
    at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:
#1  0x00007ffff77b01c5 in opal_convertor_unpack ()
   from /lib/x86_64-linux-gnu/libopen-pal.so.40
#2  0x00007ffff5c365df in mca_pml_ob1_recv_request_progress_frag
   from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_o
#3  0x00007ffff5c08a87 in ?? ()
   from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_s
#4  0x00007ffff5c389a8 in mca_pml_ob1_send_request_schedule_once
   from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_o
#5  0x00007ffff5c31429 in mca_pml_ob1_recv_frag_callback_ack ()
   from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_o
#6  0x00007ffff5c08a87 in ?? ()
   from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_s
#7  0x00007ffff5c32ca8 in mca_pml_ob1_recv_request_ack_send_btl 
   from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_o
#8  0x00007ffff5c3348c in ?? ()
   from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_o
#9  0x00007ffff5c35b50 in mca_pml_ob1_recv_request_progress_rndv
   from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_o
#10 0x00007ffff5c2f84e in ?? ()
   from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_o
#11 0x00007ffff5c2faa0 in ?? ()
--Type <RET> for more, q to quit, c to continue without paging--
   from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_o
#12 0x00007ffff5c08a87 in ?? ()
   from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_s
#13 0x00007ffff5c3b3ae in mca_pml_ob1_send_request_start_rndv ()
   from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_o
#14 0x00007ffff5c2b9e1 in mca_pml_ob1_isend ()
   from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_o
#15 0x00007ffff5bd119e in ?? ()
   from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll
#16 0x00007ffff5bd377b in mca_fcoll_vulcan_file_write_all ()
   from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll
#17 0x00007ffff5bbddc0 in mca_common_ompio_file_write_at_all ()
   from /lib/x86_64-linux-gnu/libmca_common_ompio.so.41
#18 0x00007ffff5c1924b in mca_io_ompio_file_write_at_all ()
   from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_om
#19 0x00007ffff7ed7d80 in PMPI_File_write_at_all ()
   from /lib/x86_64-linux-gnu/libmpi.so.40
#20 0x00005555557e0f0b in H5FD__mpio_write ()
#21 0x0000555555608fe9 in H5FD_write ()
#22 0x0000555555813284 in H5F__accum_write ()
#23 0x00005555556b828b in H5PB_write ()
#24 0x00005555555f950a in H5F_shared_block_write ()
#25 0x00005555557de51c in H5D__mpio_select_write ()
--Type <RET> for more, q to quit, c to continue without paging--
#26 0x00005555557d3676 in H5D__final_collective_io ()
#27 0x00005555557deaf3 in H5D__contig_collective_write ()
#28 0x00005555555c736c in H5D__write ()
#29 0x00005555557a4651 in H5VL__native_dataset_write ()
#30 0x000055555578f617 in H5VL_dataset_write ()
#31 0x00005555555c5fea in H5Dwrite ()
#32 0x0000555555572fff in writeH5compressed (str=0x7fffffffd870 
    data=0x7ff7e7fff010, dimsf=0x7fffffffd858, 
    fichier=0x55555583c260 "SDScomp1d.h5", compressed=false)
    at /home/henry/projets/mpi_test_perf/ph5_file_utils.c:154
#33 0x000055555556db06 in main ()

if aybody can help?

Thanks in advance,
Gérard

Hi @gerard.henry,

can you share information such as the versions of HDF5 and OpenMPI, as well as the command used to run the test program? I just built your example with the latest develop branch of HDF5 and OpenMPI 5.0.5 and then ran the test program as: mpirun -np 4 ./ph5_dataset 33 ph5_dataset_33_4.h5 without a crash. Among different versions there are sometimes issues with HDF5 and sometimes issues with OpenMPI, so, if possible, it would be good to try with more recent versions of each.

i tested on two platforms:
sequential machine Ubuntu 20.04.6 LTS, Open MPI: 4.0.3, HDF5 1.12.3
and
cluster of CentOS 7.9, OpenMPI 4.0.5 and HDF5 1.12.3

the command to run the test program is exactly what you ran, even if i use slurm on the machines

On the cluster, it’s very difficult to recompile OpenMPI, and we have no support to help. But i try the latest HDF5 like you.

thanks for your reply
Gérard

Hi Jordan,

could you confirm that you build the code without modification?
on Ubuntu 20, with HDF5-1.14.4.3, latest release on github, i got the following errors:

In file included from /home/henry/projets/mpi_test_perf/ph5_file_utils.c:11:
/home/henry/projets/mpi_test_perf/ph5_file_utils.h:16:28: error: conflicting types for ‘hsize_t’
   16 | typedef unsigned long long hsize_t;
      |                            ^~~~~~~
In file included from /home/henry/LIBRARY_PARA/LIBRARIES/hdf5-1.14.4.3/include/hdf5.h:21,
                 from /home/henry/projets/mpi_test_perf/ph5_file_utils.c:6:
/home/henry/LIBRARY_PARA/LIBRARIES/hdf5-1.14.4.3/include/H5public.h:297:18: note: previous declaration of ‘hsize_t’ was here
  297 | typedef uint64_t hsize_t;
      |                  ^~~~~~~
/home/henry/projets/mpi_test_perf/ph5_file_utils.c: In function ‘writeH5compressed’:
/home/henry/projets/mpi_test_perf/ph5_file_utils.c:86:37: warning: passing argument 2 of ‘H5Screate_simple’ from incompatible pointer type [-Wincompatible-pointer-types]
   86 |     filespace = H5Screate_simple(1, dimsf, NULL);
      |                                     ^~~~~
      |                                     |
      |                                     const hsize_t * {aka const long long unsigned int *}
In file included from /home/henry/LIBRARY_PARA/LIBRARIES/hdf5-1.14.4.3/include/H5Ppublic.h:29,
                 from /home/henry/LIBRARY_PARA/LIBRARIES/hdf5-1.14.4.3/include/hdf5.h:35,
                 from /home/henry/projets/mpi_test_perf/ph5_file_utils.c:6:
/home/henry/LIBRARY_PARA/LIBRARIES/hdf5-1.14.4.3/include/H5Spublic.h:323:55: note: expected ‘const hsize_t *’ {aka ‘const long unsigned int *’} but argument is of type ‘const hsize_t *’ {aka ‘const long long unsigned int *’}
  323 | H5_DLL hid_t H5Screate_simple(int rank, const hsize_t dims[], const hsize_t maxdims[]);
      |                                         ~~~~~~~~~~~~~~^~~~~~
/home/henry/projets/mpi_test_perf/ph5_file_utils.c:104:44: warning: passing argument 3 of ‘H5Pset_chunk’ from incompatible pointer type [-Wincompatible-pointer-types]
  104 |         status = H5Pset_chunk(plist_id, 1, &chunk_dim);
      |                                            ^~~~~~~~~~
      |                                            |
      |                                            hsize_t * {aka long long unsigned int *}
In file included from /home/henry/LIBRARY_PARA/LIBRARIES/hdf5-1.14.4.3/include/hdf5.h:35,
                 from /home/henry/projets/mpi_test_perf/ph5_file_utils.c:6:
/home/henry/LIBRARY_PARA/LIBRARIES/hdf5-1.14.4.3/include/H5Ppublic.h:6425:69: note: expected ‘const hsize_t *’ {aka ‘const long unsigned int *’} but argument is of type ‘hsize_t *’ {aka ‘long long unsigned int *’}
 6425 | H5_DLL herr_t H5Pset_chunk(hid_t plist_id, int ndims, const hsize_t dim[/*ndims*/]);
      |                                                       ~~~~~~~~~~~~~~^~~~~~~~~~~~~~
/home/henry/projets/mpi_test_perf/ph5_file_utils.c:145:40: warning: passing argument 2 of ‘H5Screate_simple’ from incompatible pointer type [-Wincompatible-pointer-types]
  145 |     hid_t mspace = H5Screate_simple(1, &count, NULL);
      |                                        ^~~~~~
      |                                        |
      |                                        hsize_t * {aka long long unsigned int *}
In file included from /home/henry/LIBRARY_PARA/LIBRARIES/hdf5-1.14.4.3/include/H5Ppublic.h:29,
                 from /home/henry/LIBRARY_PARA/LIBRARIES/hdf5-1.14.4.3/include/hdf5.h:35,
                 from /home/henry/projets/mpi_test_perf/ph5_file_utils.c:6:
/home/henry/LIBRARY_PARA/LIBRARIES/hdf5-1.14.4.3/include/H5Spublic.h:323:55: note: expected ‘const hsize_t *’ {aka ‘const long unsigned int *’} but argument is of type ‘hsize_t *’ {aka ‘long long unsigned int *’}
  323 | H5_DLL hid_t H5Screate_simple(int rank, const hsize_t dims[], const hsize_t maxdims[]);
      |                                         ~~~~~~~~~~~~~~^~~~~~
/home/henry/projets/mpi_test_perf/ph5_file_utils.c:146:49: warning: passing argument 3 of ‘H5Sselect_hyperslab’ from incompatible pointer type [-Wincompatible-pointer-types]
  146 |     H5Sselect_hyperslab(wspace, H5S_SELECT_SET, &offset, NULL, &count, NULL);
      |                                                 ^~~~~~~
      |                                                 |
      |                                                 hsize_t * {aka long long unsigned int *}
In file included from /home/henry/LIBRARY_PARA/LIBRARIES/hdf5-1.14.4.3/include/H5Ppublic.h:29,
                 from /home/henry/LIBRARY_PARA/LIBRARIES/hdf5-1.14.4.3/include/hdf5.h:35,
                 from /home/henry/projets/mpi_test_perf/ph5_file_utils.c:6:
/home/henry/LIBRARY_PARA/LIBRARIES/hdf5-1.14.4.3/include/H5Spublic.h:1213:83: note: expected ‘const hsize_t *’ {aka ‘const long unsigned int *’} but argument is of type ‘hsize_t *’ {aka ‘long long unsigned int *’}
 1213 | L herr_t H5Sselect_hyperslab(hid_t space_id, H5S_seloper_t op, const hsize_t start[],
      |                                                                ~~~~~~~~~~~~~~^~~~~~~

/home/henry/projets/mpi_test_perf/ph5_file_utils.c:146:64: warning: passing argument 5 of ‘H5Sselect_hyperslab’ from incompatible pointer type [-Wincompatible-pointer-types]
  146 |     H5Sselect_hyperslab(wspace, H5S_SELECT_SET, &offset, NULL, &count, NULL);
      |                                                                ^~~~~~
      |                                                                |
      |                                                                hsize_t * {aka long long unsigned int *}
In file included from /home/henry/LIBRARY_PARA/LIBRARIES/hdf5-1.14.4.3/include/H5Ppublic.h:29,
                 from /home/henry/LIBRARY_PARA/LIBRARIES/hdf5-1.14.4.3/include/hdf5.h:35,
                 from /home/henry/projets/mpi_test_perf/ph5_file_utils.c:6:
/home/henry/LIBRARY_PARA/LIBRARIES/hdf5-1.14.4.3/include/H5Spublic.h:1214:73: note: expected ‘const hsize_t *’ {aka ‘const long unsigned int *’} but argument is of type ‘hsize_t *’ {aka ‘long long unsigned int *’}
 1214 |                                   const hsize_t stride[], const hsize_t count[], const hsize_t block[]);
      |                                                           ~~~~~~~~~~~~~~^~~~~~~
make[3]: *** [CMakeFiles/ph5_dataset.dir/build.make:90: CMakeFiles/ph5_dataset.dir/ph5_file_utils.c.o] Error 1

thanks

Indeed, I did have to make just a few modifications for compilation, which could potentially affect results:
changes.patch (1.2 KB). Since the typedefs are already in hdf5.h, I just removed them.

ok, thanks, i did the same changes and it works on sequential machine, not on the cluster, with errors:

[skylake077:109320] *** Process received signal ***
[skylake077:109320] Signal: Segmentation fault (11)
[skylake077:109320] Signal code:  (-6)
[skylake077:109320] Failing at address: 0x6d20001ab08
[skylake077:109320] [ 0] /lib64/libpthread.so.0(+0xf630)[0x2aaaabd3d630]
[skylake077:109320] [ 1] /lib64/libc.so.6(+0x154e1b)[0x2aaaac09ee1b]
[skylake077:109320] [ 2] /trinity/shared/apps/tr17.10/x86_64/openmpi-gcc112-psm2-4.0.5/lib/libopen-pal.so.40(opal_generic_simple_unpack+0x6e7)[0x2aaaac621097]
[skylake077:109320] [ 3] /trinity/shared/apps/tr17.10/x86_64/openmpi-gcc112-psm2-4.0.5/lib/libmpi.so.40(ompi_datatype_sndrcv+0x1df)[0x2aaaab149d0f]
[skylake077:109320] [ 4] /trinity/shared/apps/tr17.10/x86_64/openmpi-gcc112-psm2-4.0.5/lib/openmpi/mca_coll_basic.so(mca_coll_basic_scatterv_intra+0x14b)[0x2aaabf80a99b]
[skylake077:109320] [ 5] /trinity/shared/apps/tr17.10/x86_64/openmpi-gcc112-psm2-4.0.5/lib/libmpi.so.40(PMPI_Scatterv+0x170)[0x2aaaab1709e0]
[skylake077:109320] [ 6] /home/ceyraud/projets/mpi_test_perf/build/ph5_dataset[0x634903]
[skylake077:109320] [ 7] /home/ceyraud/projets/mpi_test_perf/build/ph5_dataset[0x639ada]
[skylake077:109320] [ 8] /home/ceyraud/projets/mpi_test_perf/build/ph5_dataset[0x63b93b]
[skylake077:109320] [ 9] /home/ceyraud/projets/mpi_test_perf/build/ph5_dataset[0x63c7c9]
[skylake077:109320] [10] /home/ceyraud/projets/mpi_test_perf/build/ph5_dataset[0x44f1f0]
[skylake077:109320] [11] /home/ceyraud/projets/mpi_test_perf/build/ph5_dataset[0x6079b2]
[skylake077:109320] [12] /home/ceyraud/projets/mpi_test_perf/build/ph5_dataset[0x5f57d7]
[skylake077:109320] [13] /home/ceyraud/projets/mpi_test_perf/build/ph5_dataset[0x43b9ed]
[skylake077:109320] [14] /home/ceyraud/projets/mpi_test_perf/build/ph5_dataset[0x43eaf6]
[skylake077:109320] [15] /home/ceyraud/projets/mpi_test_perf/build/ph5_dataset[0x4077ca]
[skylake077:109320] [16] /home/ceyraud/projets/mpi_test_perf/build/ph5_dataset[0x40728b]
[skylake077:109320] [17] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaaabf6c555]
[skylake077:109320] [18] /home/ceyraud/projets/mpi_test_perf/build/ph5_dataset[0x406ea3]
[skylake077:109320] *** End of error message ***

on your side, have you tested it on a sequential machine or on a cluster?

thanks for your help

So far I’ve only tested this on my local machine. It’s possible there may be an issue with MPI_Scatterv on that cluster’s version of OpenMPI. The algorithm for dealing with compressed data in parallel switches over to MPI_Scatterv as the number of chunks involved increases, so that’s likely why you only see the issue as you increase the I/O size.