Infinite closing loop with (parallel) HDF-1.8.4-1

Dear all,

I have tried to fix this following problem since more than 3 months but
still did not succeeded, I hope
some of you gurus could help me out.

I am using HDF5 to store my results from a plasma turbulence code (basically
6-D and 3-D data,
and a table (to store several scalar data). In a single CPU run, HDF5 (and
parallel HDF5) works fine
but for a larger CPU number (and large amount of data output steps) I got
the following error message
at the end of the simulation when I want to close the HDF5 file :

********* snip ****

HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) MPI-process 24:
  #000: H5F.c line 1956 in H5Fclose(): decrementing file ID failed
    major: Object atom
    minor: Unable to close file
  #001: H5F.c line 1756 in H5F_close(): can't close file
    major: File accessability
    minor: Unable to close file
  #002: H5F.c line 1902 in H5F_try_close(): unable to flush cache
    major: Object cache
    minor: Unable to flush data from cache
  #003: H5F.c line 1681 in H5F_flush(): unable to flush metadata cache
    major: Object cache
    minor: Unable to flush data from cache
  #004: H5AC.c line 950 in H5AC_flush(): Can't flush.
    major: Object cache
    minor: Unable to flush data from cache
  #005: H5AC.c line 4695 in H5AC_flush_entries(): Can't propagate clean
entries list.
    major: Object cache
    minor: Unable to flush data from cache
  #006: H5AC.c line 4450 in
H5AC_propagate_flushed_and_still_clean_entries_list(): Can't receive and/or
process clean slist broadcast.
    major: Object cache
    minor: Internal error detected
  #007: H5AC.c line 4595 in H5AC_receive_and_apply_clean_list(): Can't mark
entries clean.
    major: Object cache
    minor: Internal error detected
  #008: H5C.c line 5150 in H5C_mark_entries_as_clean(): Listed entry not in
cache?!?!?.
    major: Object cache
    minor: Internal error detected
^[[0mHDF5: infinite loop closing library

D,G,A,S,T,F,F,AC,FD,P,FD,P,FD,P,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD

****** snap ***

I get this error message deterministically, if I increase the data output
frequency, (or CPU number). Finally I cannot open
this file anymore, because HDF5 complains it is corrupted (sure, because it
was not probably closed).
I get the same error on different computers ( with different environment,
e.g. compiler, openmpi library, distribution).
Any Idea to fix this problem is highly appreciated.

Thanks for your help & time

Paul

Paul,

Any chance you can provide us with the example code that demonstrates the problem? If so, could you please mail it to help@hdfgroup.org? We will enter a bug report and will take a look. It will also help if you can indicate OS, compiler version and MPI I/O version.

Thank you!

Elena

···

On Apr 20, 2010, at 8:29 AM, Paul Hilscher wrote:

Dear all,

I have tried to fix this following problem since more than 3 months but still did not succeeded, I hope
some of you gurus could help me out.

I am using HDF5 to store my results from a plasma turbulence code (basically 6-D and 3-D data,
and a table (to store several scalar data). In a single CPU run, HDF5 (and parallel HDF5) works fine
but for a larger CPU number (and large amount of data output steps) I got the following error message
at the end of the simulation when I want to close the HDF5 file :

********* snip ****

HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) MPI-process 24:
  #000: H5F.c line 1956 in H5Fclose(): decrementing file ID failed
    major: Object atom
    minor: Unable to close file
  #001: H5F.c line 1756 in H5F_close(): can't close file
    major: File accessability
    minor: Unable to close file
  #002: H5F.c line 1902 in H5F_try_close(): unable to flush cache
    major: Object cache
    minor: Unable to flush data from cache
  #003: H5F.c line 1681 in H5F_flush(): unable to flush metadata cache
    major: Object cache
    minor: Unable to flush data from cache
  #004: H5AC.c line 950 in H5AC_flush(): Can't flush.
    major: Object cache
    minor: Unable to flush data from cache
  #005: H5AC.c line 4695 in H5AC_flush_entries(): Can't propagate clean entries list.
    major: Object cache
    minor: Unable to flush data from cache
  #006: H5AC.c line 4450 in H5AC_propagate_flushed_and_still_clean_entries_list(): Can't receive and/or process clean slist broadcast.
    major: Object cache
    minor: Internal error detected
  #007: H5AC.c line 4595 in H5AC_receive_and_apply_clean_list(): Can't mark entries clean.
    major: Object cache
    minor: Internal error detected
  #008: H5C.c line 5150 in H5C_mark_entries_as_clean(): Listed entry not in cache?!?!?.
    major: Object cache
    minor: Internal error detected
^[[0mHDF5: infinite loop closing library
      D,G,A,S,T,F,F,AC,FD,P,FD,P,FD,P,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD

****** snap ***

I get this error message deterministically, if I increase the data output frequency, (or CPU number). Finally I cannot open
this file anymore, because HDF5 complains it is corrupted (sure, because it was not probably closed).
I get the same error on different computers ( with different environment, e.g. compiler, openmpi library, distribution).
Any Idea to fix this problem is highly appreciated.

Thanks for your help & time

Paul

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org