File state after flush and crash

Sebastian_Rettenberg · June 2, 2015, 9:26am

Hi,

I want to save the program state of a parallel program in an HDF5 file.
For performance reason I do not want to open/close the file each time I write the program state, but use flush it instead.

Thus my main loop looks basically like this:
while more_work() {
  do_work()
  update_hdf5_attributes()
  update hdf5_dataset()
  flush_hdf5()
}

However, even if the program crashes during do_work(), I get a corrupt HDF5 file.

I found a short conversation regarding this but it is already 5 years old:
http://lists.hdfgroup.org/pipermail/hdf-forum_lists.hdfgroup.org/2010-February/002543.html

They mentioned that this might be a problem with the meta data cache.
Is this still true? Is there a way around it?
I also have some other open HDF5 file. Might this be a problem?

Best regards,
Sebastian

···

--
Sebastian Rettenberger, M.Sc.
Technische Universität München
Department of Informatics
Chair of Scientific Computing
Boltzmannstrasse 3, 85748 Garching, Germany
http://www5.in.tum.de/

Mohamad_Chaarawi · June 8, 2015, 3:08pm

Hi Sebastian,

What happens in do_work()? Are you modifying the file in question?
If yes, then corruption can be expected..
If not, then the file should not be corrupted, and if it is then we have a bug in the library.

If you can send a replicator for this problem we can investigate further.

Thanks,
Mohamad

···

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Sebastian Rettenberger
Sent: Tuesday, June 02, 2015 4:27 AM
To: HDF Users Discussion List
Subject: [Hdf-forum] File state after flush and crash

Hi,

I want to save the program state of a parallel program in an HDF5 file.
For performance reason I do not want to open/close the file each time I write the program state, but use flush it instead.

Thus my main loop looks basically like this:
while more_work() {
  do_work()
  update_hdf5_attributes()
  update hdf5_dataset()
  flush_hdf5()
}

However, even if the program crashes during do_work(), I get a corrupt
HDF5 file.

I found a short conversation regarding this but it is already 5 years old:
http://lists.hdfgroup.org/pipermail/hdf-forum_lists.hdfgroup.org/2010-February/002543.html

They mentioned that this might be a problem with the meta data cache.
Is this still true? Is there a way around it?
I also have some other open HDF5 file. Might this be a problem?

Best regards,
Sebastian

--
Sebastian Rettenberger, M.Sc.
Technische Universität München
Department of Informatics
Chair of Scientific Computing
Boltzmannstrasse 3, 85748 Garching, Germany http://www5.in.tum.de/

Sebastian_Rettenberg · June 10, 2015, 12:28pm

Hi,

no, I do not modify the file that gets corrupted in do_work(). I only access other HDF5 files.

I figured out, that this problem only exists when creating large files (> 3 TB) in parallel (> 1500 MPI tasks). For much smaller files, I did not run into this problem.

I will try to figure out the critical file size a create a replicator but this might need some time since I have to wait for the compute resources.

I am not sure is this is helpful, but here is the error I get, when I try to access the corrupt file with h5debug:

HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0:
  #000: H5F.c line 1582 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5F.c line 1373 in H5F_open(): unable to read superblock
    major: File accessibilty
    minor: Read failed
  #002: H5Fsuper.c line 351 in H5F_super_read(): unable to load superblock
    major: Object cache
    minor: Unable to protect metadata
  #003: H5AC.c line 1329 in H5AC_protect(): H5C_protect() failed.
    major: Object cache
    minor: Unable to protect metadata
  #004: H5C.c line 3570 in H5C_protect(): can't load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #005: H5C.c line 7950 in H5C_load_entry(): unable to load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #006: H5Fsuper_cache.c line 471 in H5F_sblock_load(): truncated file: eof = 3968572377152, sblock->base_addr = 0, stored_eoa = 3968574947328
    major: File accessibilty
    minor: File has been truncated
cannot open file

Best regards,
Sebastian

···

On 06/08/2015 05:08 PM, Mohamad Chaarawi wrote:

Hi Sebastian,

What happens in do_work()? Are you modifying the file in question?
If yes, then corruption can be expected..
If not, then the file should not be corrupted, and if it is then we have a bug in the library.

If you can send a replicator for this problem we can investigate further.

Thanks,
Mohamad

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Sebastian Rettenberger
Sent: Tuesday, June 02, 2015 4:27 AM
To: HDF Users Discussion List
Subject: [Hdf-forum] File state after flush and crash

Hi,

I want to save the program state of a parallel program in an HDF5 file.
For performance reason I do not want to open/close the file each time I write the program state, but use flush it instead.

Thus my main loop looks basically like this:
while more_work() {
  do_work()
  update_hdf5_attributes()
  update hdf5_dataset()
  flush_hdf5()
}

However, even if the program crashes during do_work(), I get a corrupt
HDF5 file.

I found a short conversation regarding this but it is already 5 years old:
http://lists.hdfgroup.org/pipermail/hdf-forum_lists.hdfgroup.org/2010-February/002543.html

They mentioned that this might be a problem with the meta data cache.
Is this still true? Is there a way around it?
I also have some other open HDF5 file. Might this be a problem?

Best regards,
Sebastian

--
Sebastian Rettenberger, M.Sc.
Technische Universität München
Department of Informatics
Chair of Scientific Computing
Boltzmannstrasse 3, 85748 Garching, Germany http://www5.in.tum.de/

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

--
Sebastian Rettenberger, M.Sc.
Technische Universität München
Department of Informatics
Chair of Scientific Computing
Boltzmannstrasse 3, 85748 Garching, Germany
http://www5.in.tum.de/

Sebastian_Rettenberg · June 15, 2015, 2:07pm

Hi,

I think I figure out the problem:

The file size was not the real problem but the slightly different implementation for the large files.
To get a good performance on the parallel file system, I add some gaps in the dataset such that every task starts writing data at a multiple of the file system block size. This introduces "gaps" in the dataset with uninitialized values.
This itself is not a problem, however, the last task also added a gap at the end of the dataset which is never written. Thus, the file is smaller than expected.
An H5F_close() seems to fix either the header or the file size while a simple H5F_flush() does not.
Remove the gap from the last task solves the problem for me.

To reproduce this:
- Create a new file with a single dataset.
- Write parts of the dataset (make sure that some values at the end of the dataset are not initialized)
- Flush the file
- Crash the program
- Try to open the h5 file with h5dump or h5debug

I am using the MPIO backend and I have the 2 flags for the dataset:
H5Pset_layout(h5plist, H5D_CONTIGUOUS);
H5Pset_alloc_time(h5plist, H5D_ALLOC_TIME_EARLY);
Not sure if this is important.

Let me know, if you still need a reproducer code.

Best regards,
Sebastian

···

On 06/10/2015 02:28 PM, Sebastian Rettenberger wrote:

Hi,

no, I do not modify the file that gets corrupted in do_work(). I only
access other HDF5 files.

I figured out, that this problem only exists when creating large files
(> 3 TB) in parallel (> 1500 MPI tasks). For much smaller files, I did
not run into this problem.

I will try to figure out the critical file size a create a replicator
but this might need some time since I have to wait for the compute
resources.

I am not sure is this is helpful, but here is the error I get, when I
try to access the corrupt file with h5debug:

HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0:
  #000: H5F.c line 1582 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5F.c line 1373 in H5F_open(): unable to read superblock
    major: File accessibilty
    minor: Read failed
  #002: H5Fsuper.c line 351 in H5F_super_read(): unable to load
superblock
    major: Object cache
    minor: Unable to protect metadata
  #003: H5AC.c line 1329 in H5AC_protect(): H5C_protect() failed.
    major: Object cache
    minor: Unable to protect metadata
  #004: H5C.c line 3570 in H5C_protect(): can't load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #005: H5C.c line 7950 in H5C_load_entry(): unable to load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #006: H5Fsuper_cache.c line 471 in H5F_sblock_load(): truncated
file: eof = 3968572377152, sblock->base_addr = 0, stored_eoa =
3968574947328
    major: File accessibilty
    minor: File has been truncated
cannot open file

Best regards,
Sebastian

On 06/08/2015 05:08 PM, Mohamad Chaarawi wrote:

Hi Sebastian,

What happens in do_work()? Are you modifying the file in question?
If yes, then corruption can be expected..
If not, then the file should not be corrupted, and if it is then we
have a bug in the library.

If you can send a replicator for this problem we can investigate further.

Thanks,
Mohamad

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On
Behalf Of Sebastian Rettenberger
Sent: Tuesday, June 02, 2015 4:27 AM
To: HDF Users Discussion List
Subject: [Hdf-forum] File state after flush and crash

Hi,

I want to save the program state of a parallel program in an HDF5 file.
For performance reason I do not want to open/close the file each time
I write the program state, but use flush it instead.

Thus my main loop looks basically like this:
while more_work() {
    do_work()
    update_hdf5_attributes()
    update hdf5_dataset()
    flush_hdf5()
}

However, even if the program crashes during do_work(), I get a corrupt
HDF5 file.

I found a short conversation regarding this but it is already 5 years
old:
http://lists.hdfgroup.org/pipermail/hdf-forum_lists.hdfgroup.org/2010-February/002543.html

They mentioned that this might be a problem with the meta data cache.
Is this still true? Is there a way around it?
I also have some other open HDF5 file. Might this be a problem?

Best regards,
Sebastian

--
Sebastian Rettenberger, M.Sc.
Technische Universität München
Department of Informatics
Chair of Scientific Computing
Boltzmannstrasse 3, 85748 Garching, Germany http://www5.in.tum.de/

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

--
Sebastian Rettenberger, M.Sc.
Technische Universität München
Department of Informatics
Chair of Scientific Computing
Boltzmannstrasse 3, 85748 Garching, Germany
http://www5.in.tum.de/

Mohamad_Chaarawi · June 15, 2015, 4:17pm

Hi Sebastian,

Thank you for the use case to replicate the problem. I managed to replicate and it is indeed a bug in the library caused by moving truncation of the file to its allocated EOA from the flush call to the close call, which someone did a while ago. I have entered Jira Bug HDFFV-9418 for this.

Thanks,
Mohamad

···

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Sebastian Rettenberger
Sent: Monday, June 15, 2015 9:07 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] File state after flush and crash

Hi,

I think I figure out the problem:

The file size was not the real problem but the slightly different implementation for the large files.
To get a good performance on the parallel file system, I add some gaps in the dataset such that every task starts writing data at a multiple of the file system block size. This introduces "gaps" in the dataset with uninitialized values.
This itself is not a problem, however, the last task also added a gap at the end of the dataset which is never written. Thus, the file is smaller than expected.
An H5F_close() seems to fix either the header or the file size while a simple H5F_flush() does not.
Remove the gap from the last task solves the problem for me.

To reproduce this:
- Create a new file with a single dataset.
- Write parts of the dataset (make sure that some values at the end of the dataset are not initialized)
- Flush the file
- Crash the program
- Try to open the h5 file with h5dump or h5debug

I am using the MPIO backend and I have the 2 flags for the dataset:
H5Pset_layout(h5plist, H5D_CONTIGUOUS);
H5Pset_alloc_time(h5plist, H5D_ALLOC_TIME_EARLY); Not sure if this is important.

Let me know, if you still need a reproducer code.

Best regards,
Sebastian

On 06/10/2015 02:28 PM, Sebastian Rettenberger wrote:

Hi,

no, I do not modify the file that gets corrupted in do_work(). I only
access other HDF5 files.

I figured out, that this problem only exists when creating large files
(> 3 TB) in parallel (> 1500 MPI tasks). For much smaller files, I did
not run into this problem.

I will try to figure out the critical file size a create a replicator
but this might need some time since I have to wait for the compute
resources.

I am not sure is this is helpful, but here is the error I get, when I
try to access the corrupt file with h5debug:

HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0:
  #000: H5F.c line 1582 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5F.c line 1373 in H5F_open(): unable to read superblock
    major: File accessibilty
    minor: Read failed
  #002: H5Fsuper.c line 351 in H5F_super_read(): unable to load
superblock
    major: Object cache
    minor: Unable to protect metadata
  #003: H5AC.c line 1329 in H5AC_protect(): H5C_protect() failed.
    major: Object cache
    minor: Unable to protect metadata
  #004: H5C.c line 3570 in H5C_protect(): can't load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #005: H5C.c line 7950 in H5C_load_entry(): unable to load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #006: H5Fsuper_cache.c line 471 in H5F_sblock_load(): truncated
file: eof = 3968572377152, sblock->base_addr = 0, stored_eoa =
3968574947328
    major: File accessibilty
    minor: File has been truncated
cannot open file

Best regards,
Sebastian

On 06/08/2015 05:08 PM, Mohamad Chaarawi wrote:

Hi Sebastian,

What happens in do_work()? Are you modifying the file in question?
If yes, then corruption can be expected..
If not, then the file should not be corrupted, and if it is then we
have a bug in the library.

If you can send a replicator for this problem we can investigate further.

Thanks,
Mohamad

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On
Behalf Of Sebastian Rettenberger
Sent: Tuesday, June 02, 2015 4:27 AM
To: HDF Users Discussion List
Subject: [Hdf-forum] File state after flush and crash

Hi,

I want to save the program state of a parallel program in an HDF5 file.
For performance reason I do not want to open/close the file each time
I write the program state, but use flush it instead.

Thus my main loop looks basically like this:
while more_work() {
    do_work()
    update_hdf5_attributes()
    update hdf5_dataset()
    flush_hdf5()
}

However, even if the program crashes during do_work(), I get a
corrupt
HDF5 file.

I found a short conversation regarding this but it is already 5 years
old:
http://lists.hdfgroup.org/pipermail/hdf-forum_lists.hdfgroup.org/2010
-February/002543.html

They mentioned that this might be a problem with the meta data cache.
Is this still true? Is there a way around it?
I also have some other open HDF5 file. Might this be a problem?

Best regards,
Sebastian

--
Sebastian Rettenberger, M.Sc.
Technische Universität München
Department of Informatics
Chair of Scientific Computing
Boltzmannstrasse 3, 85748 Garching, Germany http://www5.in.tum.de/

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.o
rg
Twitter: https://twitter.com/hdf5

--
Sebastian Rettenberger, M.Sc.
Technische Universität München
Department of Informatics
Chair of Scientific Computing
Boltzmannstrasse 3, 85748 Garching, Germany http://www5.in.tum.de/

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

File state after flush and crash