I’m using HDF5 1.10.6 on a HPE Cray system and when running at scale I get H5Dwrite failures with the following signature:
HDF5-DIAG: Error detected in HDF5 (1.10.6) MPI-process 16407:
#000: H5Dio.c line 336 in H5Dwrite(): can’t write data
major: Dataset
minor: Write failed
#001: H5Dio.c line 820 in H5D__write(): can’t write data
major: Dataset
minor: Write failed
#002: H5Dchunk.c line 2398 in H5D__chunk_write(): error looking up chunk address
major: Dataset
minor: Can’t get value
#003: H5Dchunk.c line 2985 in H5D__chunk_lookup(): can’t query chunk address
major: Dataset
minor: Can’t get value
#004: H5Dbtree.c line 1049 in H5D__btree_idx_get_addr(): can’t get chunk info
major: Dataset
minor: Can’t get value
#005: H5B.c line 335 in H5B_find(): unable to load B-tree node
major: B-Tree node
minor: Unable to protect metadata
#006: H5AC.c line 1352 in H5AC_protect(): H5C_protect() failed
major: Object cache
minor: Unable to protect metadata
#007: H5C.c line 2345 in H5C_protect(): can’t load entry
major: Object cache
#008: H5B.c line 335 in H5B_find(): unable to load B-tree node
major: B-Tree node
minor: Unable to protect metadata
#009: H5AC.c line 1352 in H5AC_protect(): H5C_protect() failed
major: Object cache
minor: Unable to protect metadata
#010: H5C.c line 2345 in H5C_protect(): can’t load entry
major: Object cache
minor: Unable to load metadata into cache
#011: H5C.c line 6699 in H5C_load_entry(): Can’t deserialize image
major: Object cache
minor: Unable to load metadata into cache
#012: H5Bcache.c line 181 in H5B__cache_deserialize(): wrong B-tree signature
major: B-Tree node
minor: Bad value
minor: Unable to load metadata into cache
#008: H5C.c line 6594 in H5C_load_entry(): Can’t read image*
major: Object cache
minor: Read failed
#009: H5Fio.c line 118 in H5F_block_read(): read through page buffer failed
major: Low-level I/O
minor: Read failed
#010: H5PB.c line 732 in H5PB_read(): read through metadata accumulator failed
major: Page Buffering
minor: Read failed
#011: H5Faccum.c line 260 in H5F__accum_read(): driver read request failed
major: Low-level I/O
minor: Read failed
#012: H5FDint.c line 205 in H5FD_read(): driver read request failed
major: Virtual File Layer
minor: Read failed
#013: H5FDmpio.c line 1557 in H5FD_mpio_read(): MPI_File_read_at failed
major: Internal error (too specific to document in detail)
minor: Some MPI function failed
#014: H5FDmpio.c line 1557 in H5FD_mpio_read(): Other I/O error , error stack:
ADIOI_CRAY_READCONTIG(258): Other I/O error Input/output error
This is using Cray MPICH 8.1.23.
Any idea what would cause this kind of error?