Recovering from power loss and timeline for journaling

Hi,

I'm working on a project where we are recording data directly from our
sensor platform to a set of HDF5 files. We recently ran into an issue where
the system lost power and now I cannot open the HDF5 files. The data
appears to be there, but I'm guessing some of the metadata is corrupt/out
of date. We are currently using v1.8.x of the HDF5 library, mostly with the
C++ API, but with some C API calls as well.

A few questions:
1) What can I do to recover this data? I've started looking at the HDF5
file format, but any pointers would be appreciated.

2) Is there anything I can do in the future to avoid file corruption from
power loss/crashing processes? We already periodically call the flush()
method.

3) I saw that metadata journaling is on the roadmap at v1.10, but I also
saw mentions that that release was targeted for 2009 - is there an update
on when this will be available?

Many thanks - I'm a great fan of HDF5 and I don't want to go back to
writing flat binary files!

-John

···

-----------------
John Kua
Research Scientist
Applied Nuclear Physics
Lawrence Berkeley National Laboratory

Hi,

About 2, you could try the development SWMR version (
ftp://hdfgroup.uiuc.edu/pub/outgoing/SWMR/src/), which, in theory, should
not suffer from the problem.
I have no idea about v1.10 but I would also be very much interested.

Cheers,
Filipe

···

On Mon, Aug 11, 2014 at 10:51 PM, John Kua <jkua@lbl.gov> wrote:

Hi,

I'm working on a project where we are recording data directly from our
sensor platform to a set of HDF5 files. We recently ran into an issue where
the system lost power and now I cannot open the HDF5 files. The data
appears to be there, but I'm guessing some of the metadata is corrupt/out
of date. We are currently using v1.8.x of the HDF5 library, mostly with the
C++ API, but with some C API calls as well.

A few questions:
1) What can I do to recover this data? I've started looking at the HDF5
file format, but any pointers would be appreciated.

2) Is there anything I can do in the future to avoid file corruption from
power loss/crashing processes? We already periodically call the flush()
method.

3) I saw that metadata journaling is on the roadmap at v1.10, but I also
saw mentions that that release was targeted for 2009 - is there an update
on when this will be available?

Many thanks - I'm a great fan of HDF5 and I don't want to go back to
writing flat binary files!

-John

-----------------
John Kua
Research Scientist
Applied Nuclear Physics
Lawrence Berkeley National Laboratory

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Thanks for the pointer. However, I'm having problems building - I'm using
cmake on Ubuntu 12.04 and I'm getting:

[ 8%] Building C object src/CMakeFiles/hdf5.dir/H5Dproxy.c.o
/home/jkua/build/hdf5-1.9.178-swmr_chksum5/src/H5Dproxy.c:73:40: error:
unknown type name ‘H5D_chunk_proxy_t’

among other assorted errors.

Aside from that, is there a tool for rebuilding an HDF5 with the journal? I
looked around in that distribution a little and didn't see anything
obvious.

Or do you think the SWMR additions prevent corruption due to abnormal
program termination? From what I understand, the SWMR additions make it
possible to have multiple readers by adding a header checksum. If a reader
attempts a read and the checksum fails, it knows it has to retry because
the writer was still accessing the file. You still have a problem if the
writer fails in mid-write.

-John

···

On Tue, Aug 12, 2014 at 7:39 AM, Filipe Maia <frmaia@lbl.gov> wrote:

Hi,

About 2, you could try the development SWMR version (
ftp://hdfgroup.uiuc.edu/pub/outgoing/SWMR/src/), which, in theory, should
not suffer from the problem.
I have no idea about v1.10 but I would also be very much interested.

Cheers,
Filipe

On Mon, Aug 11, 2014 at 10:51 PM, John Kua <jkua@lbl.gov> wrote:

Hi,

I'm working on a project where we are recording data directly from our
sensor platform to a set of HDF5 files. We recently ran into an issue where
the system lost power and now I cannot open the HDF5 files. The data
appears to be there, but I'm guessing some of the metadata is corrupt/out
of date. We are currently using v1.8.x of the HDF5 library, mostly with the
C++ API, but with some C API calls as well.

A few questions:
1) What can I do to recover this data? I've started looking at the HDF5
file format, but any pointers would be appreciated.

2) Is there anything I can do in the future to avoid file corruption from
power loss/crashing processes? We already periodically call the flush()
method.

3) I saw that metadata journaling is on the roadmap at v1.10, but I also
saw mentions that that release was targeted for 2009 - is there an update
on when this will be available?

Many thanks - I'm a great fan of HDF5 and I don't want to go back to
writing flat binary files!

-John

-----------------
John Kua
Research Scientist
Applied Nuclear Physics
Lawrence Berkeley National Laboratory

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi John,

I think you are using CMake, right?
We haven’t have a chance yet to port the SWMR branch to be built with CMake. Could you please try "configure, make, make check"?

As for journaling…. Yes, there will be a tool for rebuilding an HDF5 file using a journal file. We have it in a branch, and we are in the process of merging SWMR and journaling features into HDF5 trunk. At this point we are on schedule to have the features in the trunk by the end of the year. There will be announcements on this mailing list as soon as the new features become available in the snapshots or if we are slipping our deadline (hopefully NOT! :-).

Elena

···

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Aug 12, 2014, at 5:52 PM, John Kua <jkua@lbl.gov<mailto:jkua@lbl.gov>> wrote:

Thanks for the pointer. However, I'm having problems building - I'm using cmake on Ubuntu 12.04 and I'm getting:
[ 8%] Building C object src/CMakeFiles/hdf5.dir/H5Dproxy.c.o
/home/jkua/build/hdf5-1.9.178-swmr_chksum5/src/H5Dproxy.c:73:40: error: unknown type name ‘H5D_chunk_proxy_t’

among other assorted errors.

Aside from that, is there a tool for rebuilding an HDF5 with the journal? I looked around in that distribution a little and didn't see anything obvious.

Or do you think the SWMR additions prevent corruption due to abnormal program termination? From what I understand, the SWMR additions make it possible to have multiple readers by adding a header checksum. If a reader attempts a read and the checksum fails, it knows it has to retry because the writer was still accessing the file. You still have a problem if the writer fails in mid-write.

-John

On Tue, Aug 12, 2014 at 7:39 AM, Filipe Maia <frmaia@lbl.gov<mailto:frmaia@lbl.gov>> wrote:
Hi,

About 2, you could try the development SWMR version (ftp://hdfgroup.uiuc.edu/pub/outgoing/SWMR/src/), which, in theory, should not suffer from the problem.
I have no idea about v1.10 but I would also be very much interested.

Cheers,
Filipe

On Mon, Aug 11, 2014 at 10:51 PM, John Kua <jkua@lbl.gov<mailto:jkua@lbl.gov>> wrote:
Hi,

I'm working on a project where we are recording data directly from our sensor platform to a set of HDF5 files. We recently ran into an issue where the system lost power and now I cannot open the HDF5 files. The data appears to be there, but I'm guessing some of the metadata is corrupt/out of date. We are currently using v1.8.x of the HDF5 library, mostly with the C++ API, but with some C API calls as well.

A few questions:
1) What can I do to recover this data? I've started looking at the HDF5 file format, but any pointers would be appreciated.

2) Is there anything I can do in the future to avoid file corruption from power loss/crashing processes? We already periodically call the flush() method.

3) I saw that metadata journaling is on the roadmap at v1.10, but I also saw mentions that that release was targeted for 2009 - is there an update on when this will be available?

Many thanks - I'm a great fan of HDF5 and I don't want to go back to writing flat binary files!

-John

-----------------
John Kua
Research Scientist
Applied Nuclear Physics
Lawrence Berkeley National Laboratory

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5