Hi Matt,
No one has explained a simple method to repair a file that has been
corrupted by HDFView2.5 with HDF5 1.8.2 or to even detect the
corruption: None of the h5 utilities detected it. It is pretty clear
to me that there is not much testing that happens for file
compatibilities across the **supported** API versions. I apologize
for sounding unappreciative fo the work done, and feel there is no
better alternative to HDF5, but is this really considered excellent?Maybe I missed this in your earlier messages, but was the file actually corrupted by
HDFView? Is it no longer readable? If so, could you provide access to a copy of it, so we
can work with you to address the issue.Yes. HDFView2.5 corrupts HDF5 data files (whether created with h5py
and v1.8.* or IDL7.0) in such a way that they can not be read in IDL
at all.
In IDL7.0, h5f_open() on a file that has been opened by HDFViewer fails.
In IDL 6.3, h5f_open() on such a file crashes IDL.
In IDL7.0, "x = h5_browser()" on such a file crashes IDL.The file is altered in "the header" (sorry, I am new to hdf5 so do not
know the layout of the file), between bytes 160 and 192. More details
and small files prior to and after reading with HDFView2.5 to
demonstrate the problem are at
http://cars9.uchicago.edu/pybeamline/DataFormats/H5UsageNotes/HDF5AndIDL
Ah, yes, the problem is that the bug in the HDF5 library [incorrectly] re-wrote the superblock for the HDF5 file (when opened with HDFView, or any other application that opened the file for read-write access) with a later version of the superblock than was previously there. So, the file was not corrupted (although I can certainly see that it appears so to a user), but was inadvertently upgraded to a later version of the file format, that IDL can't currently read. Any any case, it's fixed in the latest release of the HDF5 library (1.8.4).
The version information is contained in the files, that's not a problem.
Where? I thought that yesterday you said that not being able to
detect version numbers of a file was a feature, as multiple
applications may write to a single file. Perhaps I misunderstood you.
I don't see it, and none of the utilities report file versions or
object versions....
Each data structure (and often parts of data structures) in the HDF5 has its own version number. Each version number is updated independently of changes to other data structures. So, it's difficult to say that a particular HDF5 file conforms to a particular version of the HDF5 library - parts of it may be readable by very early library versions and parts of it may be readable by only the latest library version. This aligns with what I mentioned earlier about the library choosing the earliest possible version to write out - the granularity is very fine.
What I think you want is a utility that will check the format of each file to verify the version
of the objects within it, which I mentioned is on our development path. Is there something
else you'd like to see?Rather than a separate utility, I would prefer to see that as each
object is opened for reading (by h5*_open), that a version number or
unique id tagging the version of that object would be read. If the
object ID is not recognized, it would be detectable as a "future
feature" that the library may not be able to use, so that the h5*_open
could fail gracefully. Perhaps this exists already? If different
APIs and formats are expected to be interchangeable, it seems like
you'd need some sort of check like this, no?
Yes, each API routine does this already and will return with an error if the library encounters a format version that it doesn't understand. Was there a particular instance which didn't do this?
Quincey
···
On Dec 2, 2009, at 12:48 PM, Matt Newville wrote: