Recover corrupted datasets

I have some HDF5 files that have corrupted datasets (it’s only one out of thousands that’s corrupted in each file). I’m not sure how the corruption happened. Here’s the error I’m getting with h5dump:

HDF5 "file.h5" {
DATASET "/group/dataset" {
   DATATYPE  H5T_STD_U64LE
   DATASPACE  SIMPLE { ( 29691 ) / ( 29691 ) }
HDF5-DIAG: Error detected in HDF5 (1.10.7) thread 1:
  #000: ../../../src/H5Dio.c line 198 in H5Dread(): can't read data
    major: Dataset
    minor: Read failed
  #001: ../../../src/H5Dio.c line 599 in H5D__read(): can't read data
    major: Dataset
    minor: Read failed
  #002: ../../../src/H5Dchunk.c line 2589 in H5D__chunk_read(): unable to read raw data chunk
    major: Low-level I/O
    minor: Read failed
  #003: ../../../src/H5Dchunk.c line 3952 in H5D__chunk_lock(): data pipeline read failed
    major: Dataset
    minor: Filter operation failed
  #004: ../../../src/H5Z.c line 1419 in H5Z_pipeline(): filter returned failure during read
    major: Data filters
    minor: Read failed
  #005: ../../../src/H5Zdeflate.c line 121 in H5Z__filter_deflate(): inflate() failed
    major: Data filters
    minor: Unable to initialize object
   DATA {h5dump error: unable to print data

   }
}
}
H5tools-DIAG: Error detected in HDF5:tools (1.10.7) thread 1:
  #000: ../../../../tools/lib/h5tools_dump.c line 1769 in h5tools_dump_simple_dset(): H5Dread failed
    major: Failure in tools library
    minor: error in function

Is there any way to salvage some data from this dataset? I understand that I probably can’t get all the values back, but if I could get some, that would already be helpful.

For example, could I get the values from all the chunks that are still good and just skip the chunks that are bad?

Any help would be highly appreciated!

What do

h5dump -pBH file.h5

or

h5ls -vr file.h5

or

h5stat file.h5

return?

How did you produce the file in the first place?

G.

Thank you for the response. I found how to read individual chunks with the C++ API and when passing them through libdeflate, there is indeed one chunk that gives when error when trying to inflate. So I can now inflate the good chunks.

I think I used h5py to produce the file quite a while ago. Not sure why some chunks in some data sets didn’t compress properly.

It might be possible to recover the corrupted data by going back to the same versions of h5py and HDF5 that were used to create the files. I know of at least one other case (netcdf-3, not HDF5) where a trick like this was viable.

Also consider asking your question on the h5py mailing list.

1 Like