Crash when writing parallel compressed chunks

@jhenderson I’ve just revisited this one as we’ve been trying to get our code running with your patch and I’ve found that with slight changes to the chunking parameters it still crashes. I’ve updated the test code I used above very slightly to change the chunk params (https://gist.github.com/jrs65/97e36592785d3db2729a8ed20521eaa6).

The various sets of parameters used and their behaviour in three versions of HDF5 (1.10.5, 1.10.5 with your final patch above, and 1.10.6) is documented in the comments in the gist. The salient points are that the original test case still failed on 1.10.6 (hanging rather than crashing); and that with a slight change to chunking parameters (small chunk size but more of them, which increases the total axis size) all versions, including your patched version crash.

I get slightly different messages depending on whether I enable compression or not, but it’s pretty much the same across all versions. The debug error messages I get without compression are:

HDF5-DIAG: Error detected in HDF5 (1.10.6) HDF5-DIAG: Error detMPI-process 2:
  #ected in HDF5 (1.10.6) 000: H5D.c line 151 in H5Dcreate2MPI-process 3:
(): unable to create dataset
    major: Dataset
    minor  #000: H5D.c: Unable to initialize object
  #001: H5Dint.c line 337 in H5D__create_named() line 151 in H5Dcreate2(): unable to create dataset
    major:: unable to create and link to dataset
    major: Dataset
    minor: Unable to initialize object
  #002 Dataset
    minor: Unable to initialize object
  #001: H5Dint.c line: H5L.c line 1592 in H5L_link_object(): unable to create new link to object
    major: Links
     337 in H5D__create_named(): unable to create and link to dataset
    major: Dataset
    minorminor: Unable to initialize object
  #003: H5L.c line 1833 in H5L__create_real(): : Unable to initialize object
  #002: H5L.c line 1592 in H5L_link_object(): unable to create new link to object
can't insert link
    major: Links
    minor: Unable to insert object
  #004:     major: Links
    minor: Unable to initialize object
  #003:H5Gtraverse.c line 851 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    m H5L.c line 1833 in H5L__create_real(): can't insert link
    major: Links
inor: Object not found
  #005: H5Gtraverse.c line 582 in H5G__traverse_real(): can't look up component
    minor: Unable to insert object
  #004: H5Gtraverse.c line 851 in    major: Symbol table
    minor: Object not found
  #006: H5Gobj.c line 1126 in  H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #005: H5Gtraverse.c H5G__obj_lookup(): can't check for link info message
    major: Symbol table
    minor: Can't get value
  #007: line 582 in H5G__traverse_real(): can't look up component
    major: Symbol table
    miH5Gobj.c line 327 in H5G__obj_get_linfo(): unable to read object header
    major: Symbol table
    minor: Object not found
  #006: H5Gobj.c line 1126 in H5G__obj_lookup(): nor: Can't get value
  #008: H5Omessage.c line 883 in H5O_msg_exists(can't check for link info message
    major: Symbol table
    minor: Can't get value
  #007): unable to protect object header
    major: Object header
    minor: Unable to protect metadata
  #009:: H5Gobj.c line 327 in H5G__obj_get_linfo(): unable to read object header
    major: H5Oint.c line 1066 in H5O_protect(): unable to load object header
    major: Object header
    m Symbol table
    minor: Can't get value
  #008: H5Omessage.c line 883 iinor: Unable to protect metadata
  #010: H5AC.c line 1352 in H5AC_protect(): H5C_protect() failedn H5O_msg_exists(): unable to protect object header
    major: Object header
    minor: Unable to protect metadata

    major: Object cache
    minor: Unable to protect metadata
  #011: H5C.c l  #009: H5Oint.c line 1066 in H5O_protect(): unable to load object header
    majine 2298 in H5C_protect(): MPI_Bcast failed
    major: Internal error (too specific to document in detail)
    minor: Object header
    minor: Unable to protect metadata
  #010: H5AC.c lineor: Some MPI function failed
  #012: H5C.c line 2298 in H5C_protect(): MPI_ERR_TRUNCATE: message truncated
 1352 in H5AC_protect(): H5C_protect() failed
    major: Object cache
    mino    major: Internal error (too specific to document in detail)
    minor: MPI Error String
r: Unable to protect metadata
  #011: H5C.c line 2298 in H5C_protect(): MPI_Bcast failed
rank=2 writing dataset2
    major: Internal error (too specific to document in detail)
    minor: Some MPI function failed
  #012: H5C.c line 2298 in H5C_protect(): MPI_ERR_TRUNCATE: message truncated
    major: Internal error (too specific to document in detail)
    minor: MPI Error String
rank=3 writing dataset2
HDF5-DIAG: Error detected in HDF5 (1.10.6) MPI-process 3:
  #000: H5Dio.c line 314 in H5Dwrite(): dset_id is not a dataset ID
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.6) MPI-process 3:
  #000: H5D.c line 337 in H5Dclose(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
rank=3 closing everything
HDF5-DIAG: Error detected in HDF5 (1.10.6) MPI-process 2:
  #000: H5Dio.c line 314 in H5Dwrite(): dset_id is not a dataset ID
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.6) MPI-process 2:
  #000: H5D.c line 337 in H5Dclose(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
rank=2 closing everything

Any ideas that’s going wrong in here?

Thanks!