Avoiding a corrupted hdf5-file or be able to recover it


#1

Hi,

I’m using SWMR to write a file based on swmr_addrem_writer.c from hdf5 tests.
Having a non-corrupted hdf5-file or be able to recover it when the writer is crashing (segmentation-fault) or killed with “kill -9” is important since I run long simulations (e.g. 2 days). I’m willing to lose some data, e.g. last 10 minutes.

I did an experiment:
1, Wrote some records
2. Flushed the whole hdf5-file with H5Fflush API
3. Put the application to sleep (for 60 seconds)
4. Killed the application with “kill -9” during that sleep
Unfortunately, h5stat and h5dump utilities reported the hdf5-file as corrupted. Interestingly, h5watch utility read the hdf5-file correctly during the writer run, before it was killed.

Does anyone have any suggestions/recommendations how to avoid corrupted hdf5-file or be able to recover it afterward?

I read on the web that hdf5 developers worked on a journaling feature for hdf5-file recovery several years ago. Is there any update on it? Is it completed? Dropped?

Thanks,
Maoz


#2

Hi Maoz,
If you have a file open for SWMR writing and the writer crashes (or some other failure occurs), you should run the h5recover tool on the file. h5recover will reset the file format flag that indicates that the file is open for writing. (h5dump, h5stat, etc. won’t read from a file that’s open for writing, but h5watch will)

Quincey

#3

Where can I find h5recover utility?
Couldn’t find it in hdf5-1.10.4.

Thanks,
Maoz


#4

Hi Maoz,
My apologies, the name is ‘h5clear’, not ‘h5recover’.

	Quincey

#5

Thanks. “h5clear -s <*.h5>” did the trick.