h5 Corruption by external SSD


#1

Hi all,

I have a 60 GB h5 file which I believe was corrupted by an external SSD. When re-copied from the source computer on a different SSD, the same file is openable in Python, HDFView 3.14, and MATLAB, but the corrupt files cannot be opened by any of these. Unfortunately, only some of the data is still saved on the source computer, and I fear I may have lost the rest for good, but want to check if there’s any way I might salvage them (such as resetting a closing flag or whatnot; my own research has been a bit confusing as I’m new to this). I get the following errors:

  • Python h5py: OSError: Unable to open file (file signature not found)

  • HDFView: java.io.IOException: Unsupported fileformat - /Volumes/ShawnP/FixedMaybe/229N4AT11-08_07_48_23.h5

  • MATLAB: Error using h5infoc
    Unable to open the file because of HDF5 Library error. Reason:

    H5Fget_obj_count not a file id

I’m wondering if there would be any way to reset the file, or even clone meta data from a similar file to just force this to open? Or if there’s any low level processing that can be done? The more I’ve learned about h5s in the past week, the less I feel this is an option, but I thought it would be worth checking if y’all had any ideas, as this SSD systematically corrupted each file, and I wonder if there might be a systematic solution to save all of them.

Thanks for your time!


#2

Hi,

I"m sorry to hear that you lost valuable data during copy.
(It sounds like we need a data protection plan from State Farm or Allsate nowadays. :slight_smile: )

I want to understand your situation better.

a 60G of HDF5 file: Is it one 60G file or multiple files like (60 * 1G HDF5 files).

only some of the data is still saved on the source computer, and I fear I may have lost the rest for good:
If multiple files, did you use move command (or select all files and drag to SSD) between source computer and SSD?
If one HDF5 file, how some of data are still available and accessible?


#3

Hi!

So actually it a handful of h5 files, each 60 G. I had moved them all at the same time using copy paste (I did it overnight due to my project time constraints, so the paste took about 15 hours I believe), and then deleted the originals off the computer to make room for acquisition of more files (which were also corrupted on the SSD, but I kept the originals and was able to transfer them just fine with a different SSD, therefor recovering them).

Thanks for your concern and questions


#4

Thank you for the details!

Regarding file recovery, please search “corruption” in this forum.
You’ll find that you’re not alone and you may find some answers.

We used to have h5check for older library versions so that’s something you can try if your file is created with HDF5 1.6 or 1.8.

Finally, can you disclose the brand and model of your SSD?
I want to avoid the memory chips used in your SSD. :grinning:


#5

Thanks for the heads up, I will try to search for “corruption”. I’ve been searching for a long time though, so my hopes are not too hight.

Yes, I’ve noticed that h5check and h5clear were options that people kept bringing up. I was confused for a while because I was only using the h5py wrapper and these aren’t available there. I don’t code in C, so will need to take time to figure out how to use the HDF5 package elsewhere (though if these functions were to be added to h5py I feel it would be quite useful!). Always a learning opportunity :slight_smile:

The drive is a 5 TB ‎"zzbbkkzz" generic drive. I bought on Amazon and it is no longer available! Not worth buying a cheap SSD, I’m sticking with name brand from now on. Turns out others have had similar issues with large files.


#6

When you mentioned “zzbbkkzz”, I thought you were hiding the brand but it actually exists on Amazon! :laughing:

No wonder it has 1.5 star review. Thank you so much for sharing everything! It’s very useful information for holiday shopping.