Crashproofing HDF5

Interested in crash proofing HDF5?

As part of our ongoing efforts to improve HDF5, 2024 marks the restarting of old efforts to crash proof HDF5. When HDF5 crashes or has failures, it can lead to data loss or corruption of the file. The HDF Group is starting work on solutions that will solve these issues. Currently, we are investigating Metadata Journaling, Write-Ahead-Log (WAL), non-VFD SWMR, VFD SWMR, checkpointing, and other potential solutions.

We would love to hear from you! What might this work mean to you? Would your organization be interested in helping fund this work or collaborating with us on a proposal regarding this work?

If you have thoughts, ideas, concerns, or are simply interested in this work, please let us know below!

5 Likes

Exploring solutions such as Metadata Journaling, Write-Ahead-Log (WAL), various forms of SWMR, and checkpointing signifies a significant evolution towards more database-like robustness for HDF5 instead of a file format.

I have been navigating a similar path by integrating the Parallax KV store with HDF5 (GitHub - gesalous/parallax_hdf5_vol_connector: HDF5 VOL plugin for Parallax key value store) as a VOL plugin. This approach solves recovery capabilities by leveraging the inherent resilience features of the KV store. However, it introduces challenges regarding backward compatibility.
I am very interested in how HDF5 is heading and its potential for transforming data storage practices. We could discuss potential collaboration plans if you are interested.

3 Likes

Sounds very good and is highly expected. We have an interactive application that allows to edit data that are mapped to HDF5, and in the unlikely but not impossible case that the application crashes, all those edits should not be lost. Status quo is that catching an exception signal and closing the library from there prevents the worst cases and the files usually remain fine. Sometimes it is required to use the h5clear utility, which unfortunately is not available as a library call, so it cannot be built into the application itself. A more “official” solution would be very useful.

1 Like

Thank you for your response. I have sent you an email to further discuss collaboration plans.

Dear THG colleagues,

We at Lifeboat (www.lifeboat.llc) are very excited to see that you are reviving the effort to address one of the biggest HDF5 deficiencies - file corruption. Our company has been actively seeking funding to address the problem and, if funded, to contribute the solution to HDF5.

Our solution is based on metadata journaling that was implemented as part of full SWMR feature. (For the community members who are interested in the metadata journaling approach please see RFC and full SWMR implementation GitHub - HDFGroup/hdf5 at feature/vfd_swmr_beta_2 ).

Our technical proposal covers the work required to integrate full SWMR feature into HDF5 and to implement a recovery tool and other enhancements to the library to make this approach performant and to work with the parallel applications.

We would be happy to discuss possible collaboration efforts with you if there is an interest and resource on your side to pursue the development.

Thank you!

Elena Pourmal
Lifeboat, LLC CEO

1 Like