Data corruption detection and/or correction


#1

So, HDF5 lib has ability to do checksuming on any data passing from client to file via H5Dwrite/H5Dread, correct? This can be used to detect data corruption. Just curious, but has there ever been any discussion of extending this functionality to attribute data, HDF5 lib metadata, etc?

The way current checksuming is intended to work now is that data is checksumed on write and the checksum is stored for each chunk of a dataset. And, only upon read is detection of possible data corruption actually performed, by comparing checksum of chunk after read to whatever was stored for chunk during write, correct? If so, it means we can only detect data corruption upon read back.

Has anyone considered extending this idea to parity and data correction? I mean, storing parity bits and enabling the ability to do correction to corruptions upon read?

Also, does HDF5 library tools (maybe h5unjam) have some minimal ability to do data correction to HDF5 meatadata much like fsck can correct a corrupt file system?


#2

Hi Mark,
If you use the “latest format” option in the 1.10 or later releases, all the metadata (which includes user metadata like attributes) is checksummed automatically. So, errors could be detected, but not corrected. Storing a duplicate of checksummed metadata would be a great way to both detect and (probably) correct errors, but it hasn’t been explored beyond some casual whiteboarding…

The ‘h5check’ tool can validate that an HDF5 file is not corrupted, but can’t correct errors it finds.  Again, we’ve kicked around ideas for correcting corrupted files, but haven’t gone anywhere serious with it.

	Quincey

#3

Ok, interesting. Thanks. So, in 1.10, metadata is automatically checksummed (and that includes attributes) and this is independent of any, optional, checksumming on application raw data which is still handled via fletcher32 filter (or perhaps some user-defined filter)? Regarding h5check…is that checking only HDF5 metadata? Or, does it optionally check application raw data when such data was written with checksums? And, a final curiosity there…any support to do that check faster by running it in parallel, maybe even up to just single node parallel?


#4

Hi Mark,

miller86

    May 4

Ok, interesting. Thanks. So, in 1.10, metadata is automatically checksummed (and that includes attributes) and this is independent of any, optional, checksumming on application raw data which is still handled via fletcher32 filter (or perhaps some user-defined filter)?

Yes.   Raw data isn’t checksummed unless the dataset is chunked and a filter (like fletcher32) is applied that does that.

Regarding h5check…is that checking only HDF5 metadata? Or, does it optionally check application raw data when such data was written with checksums?

Currently, it’s only checking the metadata in the file.

And, a final curiosity there…any support to do that check faster by running it in parallel, maybe even up to just single node parallel?

h5check is only serial right now.  I’m not certain that it would pay off to check metadata in parallel (at least to most use cases), but it would be a nice option for checking checksums on the raw data (if that was first added).

Quincey