SWMR flush dependency error


#1

Hi,

I’m developing a process which uses SWMR mode, but I’m seeing an error:

HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: …/…/src/H5Dio.c line 336 in H5Dwrite(): can’t write data
major: Dataset
minor: Write failed
#001: …/…/src/H5Dio.c line 798 in H5D__write(): unable to initialize storage
major: Dataset
minor: Unable to initialize object
#002: …/…/src/H5Dint.c line 2262 in H5D__alloc_storage(): unable to initialize chunked storage
major: Low-level I/O
minor: Unable to initialize object
#003: …/…/src/H5Dchunk.c line 2805 in H5D__chunk_create(): can’t create chunk index
major: Dataset
minor: Unable to initialize object
#004: …/…/src/H5Dearray.c line 967 in H5D__earray_idx_create(): can’t create extensible array
major: Dataset
minor: Unable to initialize object
#005: …/…/src/H5EA.c line 216 in H5EA_create(): can’t create extensible array header
major: Extensible Array
minor: Unable to initialize object
#006: …/…/src/H5EAhdr.c line 444 in H5EA__hdr_create(): unable to remove extensible array header from cache
major: Extensible Array
minor: Unable to remove object
#007: …/…/src/H5AC.c line 2648 in H5AC_remove_entry(): can’t remove entry
major: Object cache
minor: Unable to remove object
#008: …/…/src/H5C.c line 8721 in H5C_remove_entry(): can’t notify client about entry to evict
major: Object cache
minor: Unable to notify object about action
#009: …/…/src/H5EAcache.c line 577 in H5EA__cache_hdr_notify(): unable to destroy flush dependency between header and extensible array ‘top’ proxy
major: Extensible Array
minor: Unable to destroy a flush dependency
#010: …/…/src/H5ACproxy_entry.c line 391 in H5AC_proxy_entry_remove_child(): unable to remove flush dependency on proxy entry
major: Object cache
minor: Unable to destroy a flush dependency
#011: …/…/src/H5AC.c line 1498 in H5AC_destroy_flush_dependency(): H5C_destroy_flush_dependency() failed
major: Object cache
minor: Unable to destroy a flush dependency
#012: …/…/src/H5C.c line 3707 in H5C_destroy_flush_dependency(): Child entry doesn’t have a flush dependency parent array
major: Object cache
minor: Unable to destroy a flush dependency
#013: …/…/src/H5EAhdr.c line 433 in H5EA__hdr_create(): unable to add extensible array entry as child of array proxy
major: Extensible Array
minor: Can’t set value
#014: …/…/src/H5ACproxy_entry.c line 306 in H5AC_proxy_entry_add_child(): unable to cache proxy entry
major: Object cache
minor: Unable to insert object
#015: …/…/src/H5AC.c line 851 in H5AC_insert_entry(): H5C_insert_entry() failed
major: Object cache
minor: Unable to insert metadata into cache
#016: …/…/src/H5C.c line 1458 in H5C_insert_entry(): H5C__make_space_in_cache failed
major: Object cache
minor: Unable to insert metadata into cache
#017: …/…/src/H5C.c line 6975 in H5C__make_space_in_cache(): unable to flush entry
major: Object cache
minor: Unable to flush data from cache
#018: …/…/src/H5C.c line 6163 in H5C__flush_single_entry(): can’t notify client about entry to evict
major: Object cache
minor: Unable to notify object about action
#019: …/…/src/H5EAcache.c line 577 in H5EA__cache_hdr_notify(): unable to destroy flush dependency between header and extensible array ‘top’ proxy
major: Extensible Array
minor: Unable to destroy a flush dependency
#020: …/…/src/H5ACproxy_entry.c line 391 in H5AC_proxy_entry_remove_child(): unable to remove flush dependency on proxy entry
major: Object cache
minor: Unable to destroy a flush dependency
#021: …/…/src/H5AC.c line 1498 in H5AC_destroy_flush_dependency(): H5C_destroy_flush_dependency() failed
major: Object cache
minor: Unable to destroy a flush dependency
#022: …/…/src/H5C.c line 3707 in H5C_destroy_flush_dependency(): Child entry doesn’t have a flush dependency parent array
major: Object cache
minor: Unable to destroy a flush dependency

I have a 1 dimensional unlimited length dataset.
The process can write successfully some full chunks as it appends them to the dataset.
It looks like this fails as soon as I try to write a partial chunk to the dataset.
I thought that this would have been allowed as per the SWMR documentation I’ve seen.
I can also confirm that when I remove the H5Fstart_swmr_write(fid) call then the process can successfully complete.
This seems to be pointing to a bug in SWMR mode, but I may be wrong about some assumptions here.
Any help debugging would be appreciated.

Thanks,
Mike


#2

For other people seeing the same issue, the solution / workaround seems to be to commit the data type (so that it can be shared) and then reopen it right before calling append and then immediately closing it.


#3

I got almost identical issue as yours here. My write can write all full chunks dataset, then at the end, crashed after wrote some of partial chunk dataset. It always crashed after wrote about 5900 partial chunks. Would you please share your solution if you have fixed it?

Thanks,
Rodger