Is there a way to recover datasets in HDF5 files deleted using h5py?
I deleted large datasets in HDF5 files using the following method:
with h5py.File('datafile.h5') as h5file:
As my intention was to decrease the size on disk of these HDF5 files, I noticed the file size did not change. An explanation I found for this is that data was not actually deleted, but only the “link” to the dataset was removed. The space could be recovered by rewriting the HDF5 file using h5repack command.
Just before repacking the data to remove the “dead” space, I discovered I had incorrectly deleted something valuable. Is there a way to recover the deleted dataset in the HDF5 file, given that it may still be there, just unavailable to access?
It seems that h5py uses the HDF5 API function H5Adelete_by_name to delete a dataset.
Would anyone with more knowledge of HDF5 know if this means the data is actually lost in the process?
If it is still there, perhaps it could be accessed by other means, such as indexing. The HDF5 files in which I’d like to recover a dataset deleted at a specific path are all constructed in an identical process (as for the structure and order of datasets, although content of datasets varies between files). Perhaps I could use a remaining original file to get some sort of index to the location of the dataset, which could be used in the modified files to access the deleted dataset?
Do you want to access the deleted dataset (which is a bad idea, since the space could be partially or fully reused), or recover the space from it?
I want like to access the deleted dataset, as I deleted something I should have deleted.
Once I’ve recovered and processed that dataset, I would proceed to delete it and repack the file to shrink its size on disk.
Ah, that’s going to be very difficult. There’s not really any standard way to do that, without intimate knowledge of the file format, etc. I might be able to do this, can you email me at: email@example.com to see about detail?