After we found out that we can work around the crash mentioned above by setting the OpenMPI backend to ompio and that we have to use a release build of HDF5 to get around the crash described in Crash when freeing user-provided buffer on filter callback, we found that the minimal test also crashes if we set
#define _COMPRESS
#define CHUNK1 256
#define NCHUNK1 8192
This is the 1.12 branch with the patch from @jhenderson both as a debug and release build:
rank=1 writing dataset2
rank=3 writing dataset2
rank=2 writing dataset2
rank=0 writing dataset2
[cdr1042:149555:0:149555] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x1211e95c)
==== backtrace ====rank=1 writing dataset2
rank=3 writing dataset2
rank=2 writing dataset2
rank=0 writing dataset2
[cdr1042:149555:0:149555] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x1211e95c)
==== backtrace ====
0 0x0000000000033280 killpg() ???:0
1 0x0000000000145c24 __memcpy_avx512_unaligned_erms() ???:0
2 0x000000000006ac5c opal_generic_simple_pack() ???:0
3 0x00000000000040cf ompi_mtl_psm2_isend() ???:0
4 0x00000000001c772b mca_pml_cm_isend() pml_cm.c:0
5 0x000000000011a1ef shuffle_init.isra.1() fcoll_dynamic_gen2_file_write_all.c:0
6 0x000000000011c11b mca_fcoll_dynamic_gen2_file_write_all() ???:0
7 0x00000000000bcd7e mca_common_ompio_file_write_at_all() ???:0
8 0x0000000000159b96 mca_io_ompio_file_write_at_all() ???:0
9 0x00000000000958a8 PMPI_File_write_at_all() ???:0
10 0x000000000073d5b6 H5FD__mpio_write() /scratch/rickn/hdf5/src/H5FDmpio.c:1466
11 0x00000000004f2424 H5FD_write() /scratch/rickn/hdf5/src/H5FDint.c:248
12 0x000000000077eeb6 H5F__accum_write() /scratch/rickn/hdf5/src/H5Faccum.c:826
13 0x00000000005ef5c8 H5PB_write() /scratch/rickn/hdf5/src/H5PB.c:1031
14 0x00000000004d92d0 H5F_block_write() /scratch/rickn/hdf5/src/H5Fio.c:251
15 0x000000000044d0ea H5C__flush_single_entry() /scratch/rickn/hdf5/src/H5C.c:6109
16 0x000000000072d01b H5C__flush_candidates_in_ring() /scratch/rickn/hdf5/src/H5Cmpio.c:1372
17 0x000000000072d989 H5C__flush_candidate_entries() /scratch/rickn/hdf5/src/H5Cmpio.c:1193
18 0x000000000072f603 H5C_apply_candidate_list() /scratch/rickn/hdf5/src/H5Cmpio.c:386
19 0x000000000072ace3 H5AC__propagate_and_apply_candidate_list() /scratch/rickn/hdf5/src/H5ACmpio.c:1276
20 0x000000000072af40 H5AC__rsp__dist_md_write__flush_to_min_clean() /scratch/rickn/hdf5/src/H5ACmpio.c:1835
21 0x000000000072cc0c H5AC__run_sync_point() /scratch/rickn/hdf5/src/H5ACmpio.c:2157
22 0x0000000000422a89 H5AC_unprotect() /scratch/rickn/hdf5/src/H5AC.c:1568
23 0x000000000075006b H5B__insert_helper() /scratch/rickn/hdf5/src/H5B.c:1101
24 0x00000000007507fc H5B__insert_helper() /scratch/rickn/hdf5/src/H5B.c:998
25 0x0000000000750e1f H5B_insert() /scratch/rickn/hdf5/src/H5B.c:596
26 0x0000000000753dde H5D__btree_idx_insert() /scratch/rickn/hdf5/src/H5Dbtree.c:1009
27 0x0000000000735772 H5D__link_chunk_filtered_collective_io() /scratch/rickn/hdf5/src/H5Dmpio.c:1462
28 0x0000000000739abe H5D__chunk_collective_io() /scratch/rickn/hdf5/src/H5Dmpio.c:878
29 0x000000000073a52a H5D__chunk_collective_write() /scratch/rickn/hdf5/src/H5Dmpio.c:960
30 0x00000000004955bd H5D__write() /scratch/rickn/hdf5/src/H5Dio.c:780
31 0x00000000007038e9 H5VL__native_dataset_write() /scratch/rickn/hdf5/src/H5VLnative_dataset.c:206
32 0x00000000006e36f3 H5VL__dataset_write() /scratch/rickn/hdf5/src/H5VLcallback.c:2151
33 0x00000000006ecab6 H5VL_dataset_write() /scratch/rickn/hdf5/src/H5VLcallback.c:2185
34 0x0000000000493da0 H5Dwrite() /scratch/rickn/hdf5/src/H5Dio.c:313
35 0x0000000000404183 main() /scratch/rickn/test_hdf5/test_orig.c:111
36 0x00000000000202e0 __libc_start_main() ???:0
37 0x0000000000403c5a _start() /tmp/nix-build-glibc-2.24.drv-0/glibc-2.24/csu/../sysdeps/x86_64/start.S:120
===================