Dear all,
I have tried to fix this following problem since more than 3 months but
still did not succeeded, I hope
some of you gurus could help me out.
I am using HDF5 to store my results from a plasma turbulence code (basically
6-D and 3-D data,
and a table (to store several scalar data). In a single CPU run, HDF5 (and
parallel HDF5) works fine
but for a larger CPU number (and large amount of data output steps) I got
the following error message
at the end of the simulation when I want to close the HDF5 file :
********* snip ****
HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) MPI-process 24:
#000: H5F.c line 1956 in H5Fclose(): decrementing file ID failed
major: Object atom
minor: Unable to close file
#001: H5F.c line 1756 in H5F_close(): can't close file
major: File accessability
minor: Unable to close file
#002: H5F.c line 1902 in H5F_try_close(): unable to flush cache
major: Object cache
minor: Unable to flush data from cache
#003: H5F.c line 1681 in H5F_flush(): unable to flush metadata cache
major: Object cache
minor: Unable to flush data from cache
#004: H5AC.c line 950 in H5AC_flush(): Can't flush.
major: Object cache
minor: Unable to flush data from cache
#005: H5AC.c line 4695 in H5AC_flush_entries(): Can't propagate clean
entries list.
major: Object cache
minor: Unable to flush data from cache
#006: H5AC.c line 4450 in
H5AC_propagate_flushed_and_still_clean_entries_list(): Can't receive and/or
process clean slist broadcast.
major: Object cache
minor: Internal error detected
#007: H5AC.c line 4595 in H5AC_receive_and_apply_clean_list(): Can't mark
entries clean.
major: Object cache
minor: Internal error detected
#008: H5C.c line 5150 in H5C_mark_entries_as_clean(): Listed entry not in
cache?!?!?.
major: Object cache
minor: Internal error detected
^[[0mHDF5: infinite loop closing library
D,G,A,S,T,F,F,AC,FD,P,FD,P,FD,P,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD
****** snap ***
I get this error message deterministically, if I increase the data output
frequency, (or CPU number). Finally I cannot open
this file anymore, because HDF5 complains it is corrupted (sure, because it
was not probably closed).
I get the same error on different computers ( with different environment,
e.g. compiler, openmpi library, distribution).
Any Idea to fix this problem is highly appreciated.
Thanks for your help & time
Paul