Data Chunking parallel IO problem with Lustre 2.10 and HDF5 1.10.x

Hi
I am facing a problem with data chunking using Lustre 2.10 ( and 2.6) filesystem using HDF5 1.10.1 in parallel mode.
I attached in my mail a simple C program which create immediately the crash caused at Line 94 trying
to create the dataset collectively.
I observed the crash when i simply set the chunk size to be the same as the dataset size. I know that this is one
of the "non recommended" setup according to your documentation ("PitFalls")
https://support.hdfgroup.org/HDF5/doc1.8/Advanced/Chunking/index.html
But leaving apart the performance penalty , it should not cause a complete crash of the program.
Furthermore testing the same program with the older HDF5 version 1.8.16 DO Not cause any crash on the same
Lustre 2.10 ( or 2.6 ) version. So it seems that something has been change in the data chunking implementation
between the two major HDF5 version 1.8.x and 1.10.x .

Could you please tell me what should be changed for the data chunk size in the program when using the new version HDF5 1.10.x?

Thanks in advance,
Denis Bertini

PS:
Here is the core dump that i observed as soon as i use more that one MPI process
H5Pcreate access succeed
H5Pcreate access succeed
-I- Chunk size 176000:
-I- Chunk size 176000:
[lxbk0341:39368] *** Process received signal ***
[lxbk0341:39368] Signal: Segmentation fault (11)
[lxbk0341:39368] Signal code: Address not mapped (1)
[lxbk0341:39368] Failing at address: (nil)
[lxbk0341:39368] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f7742122890]
[lxbk0341:39368] [ 1] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(ADIOI_Flatten+0x1577)[0x7f772e8ac657]
[lxbk0341:39368] [ 2] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(ADIOI_Flatten_datatype+0xe3)[0x7f772e8ad363]
[lxbk0341:39368] [ 3] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(ADIO_Set_view+0x1fd)[0x7f772e8a2f5d]
[lxbk0341:39368] [ 4] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(mca_io_romio314_dist_MPI_File_set_view+0x2f6)[0x7f772e889e06]
[lxbk0341:39368] [ 5] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(mca_io_romio314_file_set_view+0x22)[0x7f772e883802]
[lxbk0341:39368] [ 6] /lustre/hebe/rz/dbertini/plasma/softw/lib/libmpi.so.40(MPI_File_set_view+0xdd)[0x7f77423bfb2d]

mytest.c (2.58 KB)

My apology for taking so long to respond.

We do have a reported JIRA issue with HDF5 1.10.x and OpenMPI, your issue is probably related. We think this is not a Lustre issue, from a technical lead at Intel:

This isn’t a Lustre issue. From the trace, this definitely seems like a bug in openmpi / romio.

The romio in openmpi is pretty outdated and we had bugs fixed in the mpich repo that hasn’t carried over to the ompi repo.

Meanwhile, depending on the ompi version he is using, he can try to use – mca io ompio, or report this to the openmpi mailing list.

I tried your test program using Cray MPI on Lustre (2.6), with HDF5 1.10.1 and develop, and I did not have any issues, so if you need to use 1.10.x then you will for now need to use an alternate version of MPI.

Scot

···

On Dec 7, 2017, at 3:44 AM, Bertini, Denis Dr. <D.Bertini@gsi.de<mailto:D.Bertini@gsi.de>> wrote:

Hi
I am facing a problem with data chunking using Lustre 2.10 ( and 2.6) filesystem using HDF5 1.10.1 in parallel mode.
I attached in my mail a simple C program which create immediately the crash caused at Line 94 trying
to create the dataset collectively.
I observed the crash when i simply set the chunk size to be the same as the dataset size. I know that this is one
of the "non recommended" setup according to your documentation ("PitFalls")
https://support.hdfgroup.org/HDF5/doc1.8/Advanced/Chunking/index.html
But leaving apart the performance penalty , it should not cause a complete crash of the program.
Furthermore testing the same program with the older HDF5 version 1.8.16 DO Not cause any crash on the same
Lustre 2.10 ( or 2.6 ) version. So it seems that something has been change in the data chunking implementation
between the two major HDF5 version 1.8.x and 1.10.x .

Could you please tell me what should be changed for the data chunk size in the program when using the new version HDF5 1.10.x?

Thanks in advance,
Denis Bertini

PS:
Here is the core dump that i observed as soon as i use more that one MPI process
H5Pcreate access succeed
H5Pcreate access succeed
-I- Chunk size 176000:
-I- Chunk size 176000:
[lxbk0341:39368] *** Process received signal ***
[lxbk0341:39368] Signal: Segmentation fault (11)
[lxbk0341:39368] Signal code: Address not mapped (1)
[lxbk0341:39368] Failing at address: (nil)
[lxbk0341:39368] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f7742122890]
[lxbk0341:39368] [ 1] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(ADIOI_Flatten+0x1577)[0x7f772e8ac657]
[lxbk0341:39368] [ 2] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(ADIOI_Flatten_datatype+0xe3)[0x7f772e8ad363]
[lxbk0341:39368] [ 3] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(ADIO_Set_view+0x1fd)[0x7f772e8a2f5d]
[lxbk0341:39368] [ 4] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(mca_io_romio314_dist_MPI_File_set_view+0x2f6)[0x7f772e889e06]
[lxbk0341:39368] [ 5] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(mca_io_romio314_file_set_view+0x22)[0x7f772e883802]
[lxbk0341:39368] [ 6] /lustre/hebe/rz/dbertini/plasma/softw/lib/libmpi.so.40(MPI_File_set_view+0xdd)[0x7f77423bfb2d]
<mytest.c>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5