repairing data written with bad types

Someone else just discovered a mess that I've made in our data, where I was using double (64-bit IEEE floating point) for in-memory storage for certain fields in compound data types/structs, but on disk was using IEEE-NATIVE-FLOAT (i.e. 32-bit).

To be more specific, I had a C++ type double that was being mapped to HDF type H5T_NATIVE_FLOAT rather than DOUBLE. The end result hasn't been pretty - the files are usable only in the code that retained that incorrect mapping.

Since repairing the type mapping to double->NATIVE_DOUBLE, the previously written data files have invalid values for those fields. I was kind of hoping the internal HDF5 type mapping would resolve the translation correctly, but it hasn't.

I'm assuming at this point I'm going to have to rewrite these old files with the bad mapping in them. Does anyone have any suggestions as to how best accomplish this task? The original files contain:

1 dataset with the dodgy type mappings
1 dataset consisting of references to the above (essentially the same data but from a different indexing)
several dimension scales for the above two datasets
a slew of datatypes

What would have to change are the datatypes and the one dataset with the dodgy type mappings. Is there a quick and easy way to create a new file with the change in types (I can back-fill the correct values for the broken fields later) or do I just need to rewrite the whole thing from scratch?

It sounds you had specified the wrong in-memory type when writing the
data to disk such as specifying in-memory as float while it actually
was a double*? HDF5 as using a void pointer doesn't check that of course.

If now the disk type had been the same as the memory data type, then
I would expect the data on disk being just a binary copy of what you
had in memory. In that case, you might be able just read it back as
float into memory (giving it a double array), and write it again,
now correctly, as a double.

IF the disk data type was not the same as IEEE-NATIVE-FLOAT, then HDF5
might have had some data transformation when writing, and not sure if
that is revertible when reading. Might be worth a try if re-creating
the data set is too effortsome.

          Werner

···

On Thu, 02 May 2013 10:47:50 -0500, John K <jkml@arlut.utexas.edu> wrote:

Someone else just discovered a mess that I've made in our data, where I was using double (64-bit IEEE floating point) for in-memory storage for certain fields in compound data types/structs, but on disk was using IEEE-NATIVE-FLOAT (i.e. 32-bit).

To be more specific, I had a C++ type double that was being mapped to HDF type H5T_NATIVE_FLOAT rather than DOUBLE. The end result hasn't been pretty - the files are usable only in the code that retained that incorrect mapping.

Since repairing the type mapping to double->NATIVE_DOUBLE, the previously written data files have invalid values for those fields. I was kind of hoping the internal HDF5 type mapping would resolve the translation correctly, but it hasn't.

I'm assuming at this point I'm going to have to rewrite these old files with the bad mapping in them. Does anyone have any suggestions as to how best accomplish this task? The original files contain:

1 dataset with the dodgy type mappings
1 dataset consisting of references to the above (essentially the same data but from a different indexing)
several dimension scales for the above two datasets
a slew of datatypes

What would have to change are the datatypes and the one dataset with the dodgy type mappings. Is there a quick and easy way to create a new file with the change in types (I can back-fill the correct values for the broken fields later) or do I just need to rewrite the whole thing from scratch?

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Please be aware that a double is twice as long as a float. So, if you wanted to store 100 doubles and specified IEEE-NATIVE-FLOAT, you have only written the first 50 doubles to disk. So, unless you have actually calculated the number of elements in the data from the _size_ of the array, you will have to recreate your data because half of it will be missing - there is no way of fixing missing data.

Cheers,
Nathanael Hübbe

···

On 05/02/2013 05:47:50 PM, John K wrote:

Someone else just discovered a mess that I've made in our data, where I was using double (64-bit IEEE floating point) for in-memory storage for certain fields in compound data types/structs, but on disk was using IEEE-NATIVE-FLOAT (i.e. 32-bit).

To be more specific, I had a C++ type double that was being mapped to HDF type H5T_NATIVE_FLOAT rather than DOUBLE. The end result hasn't been pretty - the files are usable only in the code that retained that incorrect mapping.

Since repairing the type mapping to double->NATIVE_DOUBLE, the previously written data files have invalid values for those fields. I was kind of hoping the internal HDF5 type mapping would resolve the translation correctly, but it hasn't.

I'm assuming at this point I'm going to have to rewrite these old files with the bad mapping in them. Does anyone have any suggestions as to how best accomplish this task? The original files contain:

1 dataset with the dodgy type mappings
1 dataset consisting of references to the above (essentially the same data but from a different indexing)
several dimension scales for the above two datasets
a slew of datatypes

What would have to change are the datatypes and the one dataset with the dodgy type mappings. Is there a quick and easy way to create a new file with the change in types (I can back-fill the correct values for the broken fields later) or do I just need to rewrite the whole thing from scratch?

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org