Preventing file corruption from power loss

John_Mainzer · June 9, 2010, 10:16pm

Hi Ethan,

Just a small correction to Quincy's comment on what journaling will
do for you -- see comment below.

Best regards,

John Mainzer

From hdf-forum-bounces@hdfgroup.org Tue Jun 08 22:25:06 2010
From: Quincey Koziol <koziol@hdfgroup.org>
Mime-Version: 1.0 (Apple Message framework v1078)
Date: Tue, 8 Jun 2010 22:27:23 -0500
To: HDF Users Discussion List <hdf-forum@hdfgroup.org>
Subject: Re: [Hdf-forum] Preventing file corruption from power loss
Reply-To: HDF Users Discussion List <hdf-forum@hdfgroup.org>

Hi Ethan,

Hi,
=20
I am using HDF5 very happily except for occasional issues with file =

corruption. I would like to be as robust as possible to power loss at =
arbitrary times. I don't mind losing the last several seconds or even =
minutes of data, but I don't want to corrupt the file in some way that =
means I lose access to older data I have already written out. After =
working on the issue for a bit I have several ideas, and would like some =
feedback from the community on which to pursue.

=20
1. Maybe this will just go away with the metadata journaling feature =

in 1.10? Or if it is not completely gone, I can at least run a tool to =
repair the metadata when the file is not properly closed. Does anyone =
have any experience with the current state of this feature? Is there =
anything outside of the metadata that won't be handled by this =
journaling?

Yes, metadata journaling should address issues about file =
corruption, at least returning the file to the last API operation before =
the application aborted.

Actually, journaling will return the HDF5 file metadata to the state
indicated by the last API operation all of whose associated metadata
journal file entries have hit disk at the time of the crash.

Depending on how you configure the journal entry buffers, and what
sort of operations you are doing, this could be anywhere from thousands
of API operations, to just a few. As always, there is a trade off -- the
more up to date you require the journal file to be, the more journaling
will cost you in terms of execution time.

That said, we have done some work using AIO for journal writes which
seems to reduce the journaling overhead quite nicely. Also, syncing the
file at important points in your run may help as well, and with journaling
there shouldn't be any metadata corruption if you crash during the sync.

It will not help with updates to raw data =
(i.e. H5Dwrite) that haven't hit disk yet, though.

Very true, and a point to be considered carefully. We have discussed
journaling raw data writes as well, but so far it is just talk.

···

On Jun 7, 2010, at 8:42 PM, Ethan Dreyfuss wrote:

2. Maybe the behavior of H5FD_STDIO would be better than H5FD_SEC2. =

The corrupt files return "Invalid file size or file size less than =
superblock eoa. Validation stopped." when h5check is run on them. I =
found a reference to what I think is this particular issue here: =
http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html#SEC21 Alternatively =
maybe I can just repair my files by writing a new EOF marker or changing =
the EOA marker. But then again, this may be the first problem h5check =
finds but not the only problem with the file.

=20
3. Use H5FD_SPLIT, and make periodic backups of the metadata portion =

of the file. I started experimenting with this option but I got some =
odd results. Before I spend too much more time on this I'd like to know =
that this actually does make sense given what gets stored in which file. =
My datasets are only expanding, so I'm hoping that an older metadata =
file would still provide correct information for accessing objects in a =
data file that has some additional data (possibly partially) written to =
it. What I saw though was that while I could still open the file, some =
datasets seemed to be missing. Does the layout of the data portion =
change over time if I never delete data? I do overwrite data, so maybe =
chunks get shuffled around whenever they are actually stored. Also is =
there a way to use h5repack or a similar utility to put split files back =
into a single file that can be opened with the SEC2 VFD?

Using another file driver probably won't help, since the state =
of the metadata structures on disk could still be inconsistent.

Other suggestions are of course welcome. Thanks for all the great =

work on HDF5.

If you have memory to spare, you could "cork the cache" until =
you reached a suitable point to update the metadata in the file, call =
H5Fflush(), then continue with your application. Here's a code snippet =
to cork the cache:

H5AC_cache_config_t mdc_config;
hid_t fapl;

fapl =3D H5Pcreate(H5P_FILE_ACCESS);

mdc_config.version =3D H5AC__CURR_CACHE_CONFIG_VERSION;
H5Pget_mdc_config(fapl, &mdc_config)

mdc_config.evictions_enabled =3D FALSE;
mdc_config.incr_mode =3D H5C_incr__off;
mdc_config.decr_mode =3D H5C_decr__off;

H5Pset_mdc_config(fapl, &mdc_config);

<other calls to modify the fapl>

<H5Fopen or H5Fcreate with this fapl>

But, it is possible that the application could fail in the =
middle of flushing the cache to the file, so this has the possibility of =
not helping. Generally speaking, journaling will solve the problem =
entirely, but it's not quite here yet.

Quincey

Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Ethan_Dreyfuss · June 9, 2010, 11:38pm

Quincey and John,

Thank you for the helpful responses. What will happen if you end up with
consistent metadata (as will be the case post-journaling), but raw data
which is not consistent with this metadata? It would seem like this problem
could be made worse by the raw data caching layer, since my understanding is
that raw data doesn't get written to disk until it is evicted from the raw
data cache (or there is a flush / the file is closed). Is this true?

When does the metadata relating to a given chunk of raw data get written out
relative to the raw data itself? In what size pieces are raw data written
to disk? For concreteness, what I have is a chunked dataset compressed with
the scale-offset filter, and I'm using the SEC2 VFD.

Thanks,
Ethan

···

On Wed, Jun 9, 2010 at 3:16 PM, John R. Mainzer <mainzer@hdfgroup.org>wrote:

Hi Ethan,

  Just a small correction to Quincy's comment on what journaling will
do for you -- see comment below.

                                        Best regards,

                                        John Mainzer

>From hdf-forum-bounces@hdfgroup.org Tue Jun 08 22:25:06 2010
>From: Quincey Koziol <koziol@hdfgroup.org>
>Mime-Version: 1.0 (Apple Message framework v1078)
>Date: Tue, 8 Jun 2010 22:27:23 -0500
>To: HDF Users Discussion List <hdf-forum@hdfgroup.org>
>Subject: Re: [Hdf-forum] Preventing file corruption from power loss
>Reply-To: HDF Users Discussion List <hdf-forum@hdfgroup.org>
>
>Hi Ethan,
>
>On Jun 7, 2010, at 8:42 PM, Ethan Dreyfuss wrote:
>
>> Hi,
>>=20
>> I am using HDF5 very happily except for occasional issues with file =
>corruption. I would like to be as robust as possible to power loss at =
>arbitrary times. I don't mind losing the last several seconds or even =
>minutes of data, but I don't want to corrupt the file in some way that =
>means I lose access to older data I have already written out. After =
>working on the issue for a bit I have several ideas, and would like some =
>feedback from the community on which to pursue.
>>=20
>> 1. Maybe this will just go away with the metadata journaling feature =
>in 1.10? Or if it is not completely gone, I can at least run a tool to =
>repair the metadata when the file is not properly closed. Does anyone =
>have any experience with the current state of this feature? Is there =
>anything outside of the metadata that won't be handled by this =
>journaling?
>
> Yes, metadata journaling should address issues about file =
>corruption, at least returning the file to the last API operation before =
>the application aborted.

   Actually, journaling will return the HDF5 file metadata to the state
indicated by the last API operation all of whose associated metadata
journal file entries have hit disk at the time of the crash.

  Depending on how you configure the journal entry buffers, and what
sort of operations you are doing, this could be anywhere from thousands
of API operations, to just a few. As always, there is a trade off -- the
more up to date you require the journal file to be, the more journaling
will cost you in terms of execution time.

  That said, we have done some work using AIO for journal writes which
seems to reduce the journaling overhead quite nicely. Also, syncing the
file at important points in your run may help as well, and with journaling
there shouldn't be any metadata corruption if you crash during the sync.

> It will not help with updates to raw data =
>(i.e. H5Dwrite) that haven't hit disk yet, though.

   Very true, and a point to be considered carefully. We have discussed
journaling raw data writes as well, but so far it is just talk.

>> 2. Maybe the behavior of H5FD_STDIO would be better than H5FD_SEC2. =
>The corrupt files return "Invalid file size or file size less than =
>superblock eoa. Validation stopped." when h5check is run on them. I =
>found a reference to what I think is this particular issue here: =
>http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html#SEC21 Alternatively

>maybe I can just repair my files by writing a new EOF marker or changing =
>the EOA marker. But then again, this may be the first problem h5check =
>finds but not the only problem with the file.
>>=20
>> 3. Use H5FD_SPLIT, and make periodic backups of the metadata portion =
>of the file. I started experimenting with this option but I got some =
>odd results. Before I spend too much more time on this I'd like to know =
>that this actually does make sense given what gets stored in which file. =
> My datasets are only expanding, so I'm hoping that an older metadata =
>file would still provide correct information for accessing objects in a =
>data file that has some additional data (possibly partially) written to =
>it. What I saw though was that while I could still open the file, some =
>datasets seemed to be missing. Does the layout of the data portion =
>change over time if I never delete data? I do overwrite data, so maybe =
>chunks get shuffled around whenever they are actually stored. Also is =
>there a way to use h5repack or a similar utility to put split files back =
>into a single file that can be opened with the SEC2 VFD?
>
> Using another file driver probably won't help, since the state =
>of the metadata structures on disk could still be inconsistent.
>
>> Other suggestions are of course welcome. Thanks for all the great =
>work on HDF5.
>
> If you have memory to spare, you could "cork the cache" until =
>you reached a suitable point to update the metadata in the file, call =
>H5Fflush(), then continue with your application. Here's a code snippet =
>to cork the cache:
>
> H5AC_cache_config_t mdc_config;
> hid_t fapl;
>
> fapl =3D H5Pcreate(H5P_FILE_ACCESS);
>
> mdc_config.version =3D H5AC__CURR_CACHE_CONFIG_VERSION;
> H5Pget_mdc_config(fapl, &mdc_config)
>
> mdc_config.evictions_enabled =3D FALSE;
> mdc_config.incr_mode =3D H5C_incr__off;
> mdc_config.decr_mode =3D H5C_decr__off;
>
> H5Pset_mdc_config(fapl, &mdc_config);
>
> <other calls to modify the fapl>
>
> <H5Fopen or H5Fcreate with this fapl>
>
> But, it is possible that the application could fail in the =
>middle of flushing the cache to the file, so this has the possibility of =
>not helping. Generally speaking, journaling will solve the problem =
>entirely, but it's not quite here yet.
>
> Quincey
>
>
>Hdf-forum is for HDF software users discussion.
>Hdf-forum@hdfgroup.org
>http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Quincey_Koziol · June 10, 2010, 2:41am

Hi Ethan,

Quincey and John,

Thank you for the helpful responses. What will happen if you end up with consistent metadata (as will be the case post-journaling), but raw data which is not consistent with this metadata? It would seem like this problem could be made worse by the raw data caching layer, since my understanding is that raw data doesn't get written to disk until it is evicted from the raw data cache (or there is a flush / the file is closed). Is this true?

Yes, this could happen. We don't have a super option on the table for addressing inconsistencies in the raw data, at the moment. It doesn't seem like a smart idea to journal gigabytes of raw data being written to the file... :-/

When does the metadata relating to a given chunk of raw data get written out relative to the raw data itself?

Given the parameters John mentions below, the metadata should be journaled before the raw data is written to the file. If the metadata describing the raw data is still in the journal buffer in memory (and not in the journal file on disk), it is possible for the raw data to be on disk, but unreachable from the metadata. That should be harmless and invisible to an application reading the dataset.

In what size pieces are raw data written to disk? For concreteness, what I have is a chunked dataset compressed with the scale-offset filter, and I'm using the SEC2 VFD.

Chunked datasets are written to disk atomically in entire chunks.

Quincey

···

On Jun 9, 2010, at 6:38 PM, Ethan Dreyfuss wrote:

Thanks,
Ethan

On Wed, Jun 9, 2010 at 3:16 PM, John R. Mainzer <mainzer@hdfgroup.org> wrote:
Hi Ethan,

  Just a small correction to Quincy's comment on what journaling will
do for you -- see comment below.

                                        Best regards,

                                        John Mainzer

>From hdf-forum-bounces@hdfgroup.org Tue Jun 08 22:25:06 2010
>From: Quincey Koziol <koziol@hdfgroup.org>
>Mime-Version: 1.0 (Apple Message framework v1078)
>Date: Tue, 8 Jun 2010 22:27:23 -0500
>To: HDF Users Discussion List <hdf-forum@hdfgroup.org>
>Subject: Re: [Hdf-forum] Preventing file corruption from power loss
>Reply-To: HDF Users Discussion List <hdf-forum@hdfgroup.org>
>
>Hi Ethan,
>
>On Jun 7, 2010, at 8:42 PM, Ethan Dreyfuss wrote:
>
>> Hi,
>>=20
>> I am using HDF5 very happily except for occasional issues with file =
>corruption. I would like to be as robust as possible to power loss at =
>arbitrary times. I don't mind losing the last several seconds or even =
>minutes of data, but I don't want to corrupt the file in some way that =
>means I lose access to older data I have already written out. After =
>working on the issue for a bit I have several ideas, and would like some =
>feedback from the community on which to pursue.
>>=20
>> 1. Maybe this will just go away with the metadata journaling feature =
>in 1.10? Or if it is not completely gone, I can at least run a tool to =
>repair the metadata when the file is not properly closed. Does anyone =
>have any experience with the current state of this feature? Is there =
>anything outside of the metadata that won't be handled by this =
>journaling?
>
> Yes, metadata journaling should address issues about file =
>corruption, at least returning the file to the last API operation before =
>the application aborted.

  Actually, journaling will return the HDF5 file metadata to the state
indicated by the last API operation all of whose associated metadata
journal file entries have hit disk at the time of the crash.

  Depending on how you configure the journal entry buffers, and what
sort of operations you are doing, this could be anywhere from thousands
of API operations, to just a few. As always, there is a trade off -- the
more up to date you require the journal file to be, the more journaling
will cost you in terms of execution time.

  That said, we have done some work using AIO for journal writes which
seems to reduce the journaling overhead quite nicely. Also, syncing the
file at important points in your run may help as well, and with journaling
there shouldn't be any metadata corruption if you crash during the sync.

> It will not help with updates to raw data =
>(i.e. H5Dwrite) that haven't hit disk yet, though.

  Very true, and a point to be considered carefully. We have discussed
journaling raw data writes as well, but so far it is just talk.

>> 2. Maybe the behavior of H5FD_STDIO would be better than H5FD_SEC2. =
>The corrupt files return "Invalid file size or file size less than =
>superblock eoa. Validation stopped." when h5check is run on them. I =
>found a reference to what I think is this particular issue here: =
>http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html#SEC21 Alternatively =
>maybe I can just repair my files by writing a new EOF marker or changing =
>the EOA marker. But then again, this may be the first problem h5check =
>finds but not the only problem with the file.
>>=20
>> 3. Use H5FD_SPLIT, and make periodic backups of the metadata portion =
>of the file. I started experimenting with this option but I got some =
>odd results. Before I spend too much more time on this I'd like to know =
>that this actually does make sense given what gets stored in which file. =
> My datasets are only expanding, so I'm hoping that an older metadata =
>file would still provide correct information for accessing objects in a =
>data file that has some additional data (possibly partially) written to =
>it. What I saw though was that while I could still open the file, some =
>datasets seemed to be missing. Does the layout of the data portion =
>change over time if I never delete data? I do overwrite data, so maybe =
>chunks get shuffled around whenever they are actually stored. Also is =
>there a way to use h5repack or a similar utility to put split files back =
>into a single file that can be opened with the SEC2 VFD?
>
> Using another file driver probably won't help, since the state =
>of the metadata structures on disk could still be inconsistent.
>
>> Other suggestions are of course welcome. Thanks for all the great =
>work on HDF5.
>
> If you have memory to spare, you could "cork the cache" until =
>you reached a suitable point to update the metadata in the file, call =
>H5Fflush(), then continue with your application. Here's a code snippet =
>to cork the cache:
>
> H5AC_cache_config_t mdc_config;
> hid_t fapl;
>
> fapl =3D H5Pcreate(H5P_FILE_ACCESS);
>
> mdc_config.version =3D H5AC__CURR_CACHE_CONFIG_VERSION;
> H5Pget_mdc_config(fapl, &mdc_config)
>
> mdc_config.evictions_enabled =3D FALSE;
> mdc_config.incr_mode =3D H5C_incr__off;
> mdc_config.decr_mode =3D H5C_decr__off;
>
> H5Pset_mdc_config(fapl, &mdc_config);
>
> <other calls to modify the fapl>
>
> <H5Fopen or H5Fcreate with this fapl>
>
> But, it is possible that the application could fail in the =
>middle of flushing the cache to the file, so this has the possibility of =
>not helping. Generally speaking, journaling will solve the problem =
>entirely, but it's not quite here yet.
>
> Quincey
>
>
>Hdf-forum is for HDF software users discussion.
>Hdf-forum@hdfgroup.org
>http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Ethan_Dreyfuss · June 23, 2010, 10:26pm

Corking the cache has gotten me a good chunk of the way there.

Is there any way to do the equivalent of "corking the cache" but for the raw
data cache? Can you give me any insights into what I would expect to see
when I read back a file where the metadata is consistent but the raw data is
not (as will be the case when some of the raw data cache has been written
out but the metadata cache is corked)?

Thanks,
Ethan

···

On Wed, Jun 9, 2010 at 7:41 PM, Quincey Koziol <koziol@hdfgroup.org> wrote:

Hi Ethan,

On Jun 9, 2010, at 6:38 PM, Ethan Dreyfuss wrote:

Quincey and John,

Thank you for the helpful responses. What will happen if you end up with
consistent metadata (as will be the case post-journaling), but raw data
which is not consistent with this metadata? It would seem like this problem
could be made worse by the raw data caching layer, since my understanding is
that raw data doesn't get written to disk until it is evicted from the raw
data cache (or there is a flush / the file is closed). Is this true?

Yes, this could happen. We don't have a super option on the table for
addressing inconsistencies in the raw data, at the moment. It doesn't seem
like a smart idea to journal gigabytes of raw data being written to the
file... :-/

When does the metadata relating to a given chunk of raw data get written
out relative to the raw data itself?

Given the parameters John mentions below, the metadata should be journaled
before the raw data is written to the file. If the metadata describing the
raw data is still in the journal buffer in memory (and not in the journal
file on disk), it is possible for the raw data to be on disk, but
unreachable from the metadata. That should be harmless and invisible to an
application reading the dataset.

In what size pieces are raw data written to disk? For concreteness, what
I have is a chunked dataset compressed with the scale-offset filter, and I'm
using the SEC2 VFD.

Chunked datasets are written to disk atomically in entire chunks.

Quincey

Thanks,
Ethan

On Wed, Jun 9, 2010 at 3:16 PM, John R. Mainzer <mainzer@hdfgroup.org>wrote:

Hi Ethan,

  Just a small correction to Quincy's comment on what journaling will
do for you -- see comment below.

                                        Best regards,

                                        John Mainzer

>From hdf-forum-bounces@hdfgroup.org Tue Jun 08 22:25:06 2010
>From: Quincey Koziol <koziol@hdfgroup.org>
>Mime-Version: 1.0 (Apple Message framework v1078)
>Date: Tue, 8 Jun 2010 22:27:23 -0500
>To: HDF Users Discussion List <hdf-forum@hdfgroup.org>
>Subject: Re: [Hdf-forum] Preventing file corruption from power loss
>Reply-To: HDF Users Discussion List <hdf-forum@hdfgroup.org>
>
>Hi Ethan,
>
>On Jun 7, 2010, at 8:42 PM, Ethan Dreyfuss wrote:
>
>> Hi,
>>=20
>> I am using HDF5 very happily except for occasional issues with file =
>corruption. I would like to be as robust as possible to power loss at =
>arbitrary times. I don't mind losing the last several seconds or even =
>minutes of data, but I don't want to corrupt the file in some way that =
>means I lose access to older data I have already written out. After =
>working on the issue for a bit I have several ideas, and would like some

>feedback from the community on which to pursue.
>>=20
>> 1. Maybe this will just go away with the metadata journaling feature =
>in 1.10? Or if it is not completely gone, I can at least run a tool to =
>repair the metadata when the file is not properly closed. Does anyone =
>have any experience with the current state of this feature? Is there =
>anything outside of the metadata that won't be handled by this =
>journaling?
>
> Yes, metadata journaling should address issues about file =
>corruption, at least returning the file to the last API operation before

>the application aborted.

   Actually, journaling will return the HDF5 file metadata to the state
indicated by the last API operation all of whose associated metadata
journal file entries have hit disk at the time of the crash.

  Depending on how you configure the journal entry buffers, and what
sort of operations you are doing, this could be anywhere from thousands
of API operations, to just a few. As always, there is a trade off -- the
more up to date you require the journal file to be, the more journaling
will cost you in terms of execution time.

  That said, we have done some work using AIO for journal writes which
seems to reduce the journaling overhead quite nicely. Also, syncing the
file at important points in your run may help as well, and with journaling
there shouldn't be any metadata corruption if you crash during the sync.

> It will not help with updates to raw data =
>(i.e. H5Dwrite) that haven't hit disk yet, though.

   Very true, and a point to be considered carefully. We have discussed
journaling raw data writes as well, but so far it is just talk.

>> 2. Maybe the behavior of H5FD_STDIO would be better than H5FD_SEC2. =
>The corrupt files return "Invalid file size or file size less than =
>superblock eoa. Validation stopped." when h5check is run on them. I =
>found a reference to what I think is this particular issue here: =
>http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html#SEC21 Alternatively

>maybe I can just repair my files by writing a new EOF marker or changing

>the EOA marker. But then again, this may be the first problem h5check =
>finds but not the only problem with the file.
>>=20
>> 3. Use H5FD_SPLIT, and make periodic backups of the metadata portion =
>of the file. I started experimenting with this option but I got some =
>odd results. Before I spend too much more time on this I'd like to know

>that this actually does make sense given what gets stored in which file.

> My datasets are only expanding, so I'm hoping that an older metadata =
>file would still provide correct information for accessing objects in a =
>data file that has some additional data (possibly partially) written to =
>it. What I saw though was that while I could still open the file, some =
>datasets seemed to be missing. Does the layout of the data portion =
>change over time if I never delete data? I do overwrite data, so maybe =
>chunks get shuffled around whenever they are actually stored. Also is =
>there a way to use h5repack or a similar utility to put split files back

>into a single file that can be opened with the SEC2 VFD?
>
> Using another file driver probably won't help, since the state =
>of the metadata structures on disk could still be inconsistent.
>
>> Other suggestions are of course welcome. Thanks for all the great =
>work on HDF5.
>
> If you have memory to spare, you could "cork the cache" until =
>you reached a suitable point to update the metadata in the file, call =
>H5Fflush(), then continue with your application. Here's a code snippet =
>to cork the cache:
>
> H5AC_cache_config_t mdc_config;
> hid_t fapl;
>
> fapl =3D H5Pcreate(H5P_FILE_ACCESS);
>
> mdc_config.version =3D H5AC__CURR_CACHE_CONFIG_VERSION;
> H5Pget_mdc_config(fapl, &mdc_config)
>
> mdc_config.evictions_enabled =3D FALSE;
> mdc_config.incr_mode =3D H5C_incr__off;
> mdc_config.decr_mode =3D H5C_decr__off;
>
> H5Pset_mdc_config(fapl, &mdc_config);
>
> <other calls to modify the fapl>
>
> <H5Fopen or H5Fcreate with this fapl>
>
> But, it is possible that the application could fail in the =
>middle of flushing the cache to the file, so this has the possibility of

>not helping. Generally speaking, journaling will solve the problem =
>entirely, but it's not quite here yet.
>
> Quincey
>
>
>Hdf-forum is for HDF software users discussion.
>Hdf-forum@hdfgroup.org
>http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________

Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Quincey_Koziol · June 24, 2010, 2:03pm

Hi Ethan,

Corking the cache has gotten me a good chunk of the way there.

Good.

Is there any way to do the equivalent of "corking the cache" but for the raw data cache?

Hmm, not currently, no.

Can you give me any insights into what I would expect to see when I read back a file where the metadata is consistent but the raw data is not (as will be the case when some of the raw data cache has been written out but the metadata cache is corked)?

Hmm, you will appear to have "data from the future", if you update a dataset that existed before the metadata cache is corked. We've talked about journaling all the raw data changes also, but that can be very expensive when the raw data is large, so we haven't gone that route yet.

Quincey

···

On Jun 23, 2010, at 5:26 PM, Ethan Dreyfuss wrote:

Thanks,
Ethan

On Wed, Jun 9, 2010 at 7:41 PM, Quincey Koziol <koziol@hdfgroup.org> wrote:
Hi Ethan,

On Jun 9, 2010, at 6:38 PM, Ethan Dreyfuss wrote:

Quincey and John,

Thank you for the helpful responses. What will happen if you end up with consistent metadata (as will be the case post-journaling), but raw data which is not consistent with this metadata? It would seem like this problem could be made worse by the raw data caching layer, since my understanding is that raw data doesn't get written to disk until it is evicted from the raw data cache (or there is a flush / the file is closed). Is this true?

Yes, this could happen. We don't have a super option on the table for addressing inconsistencies in the raw data, at the moment. It doesn't seem like a smart idea to journal gigabytes of raw data being written to the file... :-/

When does the metadata relating to a given chunk of raw data get written out relative to the raw data itself?

Given the parameters John mentions below, the metadata should be journaled before the raw data is written to the file. If the metadata describing the raw data is still in the journal buffer in memory (and not in the journal file on disk), it is possible for the raw data to be on disk, but unreachable from the metadata. That should be harmless and invisible to an application reading the dataset.

In what size pieces are raw data written to disk? For concreteness, what I have is a chunked dataset compressed with the scale-offset filter, and I'm using the SEC2 VFD.

Chunked datasets are written to disk atomically in entire chunks.

    Quincey

Thanks,
Ethan

On Wed, Jun 9, 2010 at 3:16 PM, John R. Mainzer <mainzer@hdfgroup.org> wrote:
Hi Ethan,

  Just a small correction to Quincy's comment on what journaling will
do for you -- see comment below.

                                        Best regards,

                                        John Mainzer

>From hdf-forum-bounces@hdfgroup.org Tue Jun 08 22:25:06 2010
>From: Quincey Koziol <koziol@hdfgroup.org>
>Mime-Version: 1.0 (Apple Message framework v1078)
>Date: Tue, 8 Jun 2010 22:27:23 -0500
>To: HDF Users Discussion List <hdf-forum@hdfgroup.org>
>Subject: Re: [Hdf-forum] Preventing file corruption from power loss
>Reply-To: HDF Users Discussion List <hdf-forum@hdfgroup.org>
>
>Hi Ethan,
>
>On Jun 7, 2010, at 8:42 PM, Ethan Dreyfuss wrote:
>
>> Hi,
>>=20
>> I am using HDF5 very happily except for occasional issues with file =
>corruption. I would like to be as robust as possible to power loss at =
>arbitrary times. I don't mind losing the last several seconds or even =
>minutes of data, but I don't want to corrupt the file in some way that =
>means I lose access to older data I have already written out. After =
>working on the issue for a bit I have several ideas, and would like some =
>feedback from the community on which to pursue.
>>=20
>> 1. Maybe this will just go away with the metadata journaling feature =
>in 1.10? Or if it is not completely gone, I can at least run a tool to =
>repair the metadata when the file is not properly closed. Does anyone =
>have any experience with the current state of this feature? Is there =
>anything outside of the metadata that won't be handled by this =
>journaling?
>
> Yes, metadata journaling should address issues about file =
>corruption, at least returning the file to the last API operation before =
>the application aborted.

  Actually, journaling will return the HDF5 file metadata to the state
indicated by the last API operation all of whose associated metadata
journal file entries have hit disk at the time of the crash.

  Depending on how you configure the journal entry buffers, and what
sort of operations you are doing, this could be anywhere from thousands
of API operations, to just a few. As always, there is a trade off -- the
more up to date you require the journal file to be, the more journaling
will cost you in terms of execution time.

  That said, we have done some work using AIO for journal writes which
seems to reduce the journaling overhead quite nicely. Also, syncing the
file at important points in your run may help as well, and with journaling
there shouldn't be any metadata corruption if you crash during the sync.

> It will not help with updates to raw data =
>(i.e. H5Dwrite) that haven't hit disk yet, though.

  Very true, and a point to be considered carefully. We have discussed
journaling raw data writes as well, but so far it is just talk.

>> 2. Maybe the behavior of H5FD_STDIO would be better than H5FD_SEC2. =
>The corrupt files return "Invalid file size or file size less than =
>superblock eoa. Validation stopped." when h5check is run on them. I =
>found a reference to what I think is this particular issue here: =
>http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html#SEC21 Alternatively =
>maybe I can just repair my files by writing a new EOF marker or changing =
>the EOA marker. But then again, this may be the first problem h5check =
>finds but not the only problem with the file.
>>=20
>> 3. Use H5FD_SPLIT, and make periodic backups of the metadata portion =
>of the file. I started experimenting with this option but I got some =
>odd results. Before I spend too much more time on this I'd like to know =
>that this actually does make sense given what gets stored in which file. =
> My datasets are only expanding, so I'm hoping that an older metadata =
>file would still provide correct information for accessing objects in a =
>data file that has some additional data (possibly partially) written to =
>it. What I saw though was that while I could still open the file, some =
>datasets seemed to be missing. Does the layout of the data portion =
>change over time if I never delete data? I do overwrite data, so maybe =
>chunks get shuffled around whenever they are actually stored. Also is =
>there a way to use h5repack or a similar utility to put split files back =
>into a single file that can be opened with the SEC2 VFD?
>
> Using another file driver probably won't help, since the state =
>of the metadata structures on disk could still be inconsistent.
>
>> Other suggestions are of course welcome. Thanks for all the great =
>work on HDF5.
>
> If you have memory to spare, you could "cork the cache" until =
>you reached a suitable point to update the metadata in the file, call =
>H5Fflush(), then continue with your application. Here's a code snippet =
>to cork the cache:
>
> H5AC_cache_config_t mdc_config;
> hid_t fapl;
>
> fapl =3D H5Pcreate(H5P_FILE_ACCESS);
>
> mdc_config.version =3D H5AC__CURR_CACHE_CONFIG_VERSION;
> H5Pget_mdc_config(fapl, &mdc_config)
>
> mdc_config.evictions_enabled =3D FALSE;
> mdc_config.incr_mode =3D H5C_incr__off;
> mdc_config.decr_mode =3D H5C_decr__off;
>
> H5Pset_mdc_config(fapl, &mdc_config);
>
> <other calls to modify the fapl>
>
> <H5Fopen or H5Fcreate with this fapl>
>
> But, it is possible that the application could fail in the =
>middle of flushing the cache to the file, so this has the possibility of =
>not helping. Generally speaking, journaling will solve the problem =
>entirely, but it's not quite here yet.
>
> Quincey
>
>
>Hdf-forum is for HDF software users discussion.
>Hdf-forum@hdfgroup.org
>http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________

Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Preventing file corruption from power loss

>maybe I can just repair my files by writing a new EOF marker or changing

>that this actually does make sense given what gets stored in which file.