Writing data with 1.10.5 and reading with 1.4.4

Looking at the documentation, in HDF5 1.6 there were several new APIs related to allocation time and fill-value handling. For the contiguous layout, could you try to set (via the dataset creation property list) the allocation time H5Pset_alloc_time to H5D_ALLOC_TIME_EARLY and the fill time H5Pset_fill_time to H5D_FILL_TIME_ALLOC. Also, could you try compact layout for this small dataset? Thanks, G.

G,
Thanks for the suggestion, the older HLHDF code still does not see the dataset after setting the fill time and alloc time
-joe

I can reproduce the problem with a local build of 1.4.4. This isolates the problem away from HLHDF and toward a deficiency in the writer (C#, HDF.PInvoke, HDF5 1.10.5).

mac56:~/hdf5/issues/1115.1.4-compat/sample1 64> h5dump-1.4.4 -B TestOLD5.h5
HDF5 "TestOLD5.h5" {
BOOT_BLOCK { boot block not yet implemented }
GROUP "/" {
   DATASET "test" {
      DATATYPE  H5T_IEEE_F32LE  
      DATASPACE  SIMPLE { ( 10 ) / ( 10 ) } 
      DATA {
         0, 1, 2, 3, 4, 5, 6, 7, 8, 9
      } 
   } 
} 
} 
mac56:~/hdf5/issues/1115.1.4-compat/sample1 65> h5dump-1.4.4 -B TestHDF5.h5 
HDF5 "TestHDF5.h5" {
BOOT_BLOCK { boot block not yet implemented }
GROUP "/" {
h5dump error: unknown object "test"
} 
} 

I too have some ignorance in this territory. Can someone please help me turn on the HDF5 stack trace? I have wanted to know how to do that for several years. :wink:

Can you run this snippet of Python and test the resulting file h5py.h5 w/ 1.4.4?

import numpy as np, h5py

with h5py.File("h5py.h5") as f:
    f["test"] = np.arange(10, dtype=np.float32)

For debugging this might help (instead of h5debug):

Gerd, as requested:

mac56:~/hdf5/issues/1115.1.4-compat/python1 78> python prog1.py

mac56:~/hdf5/issues/1115.1.4-compat/python1 79> ll h5py.h5 
-rw-r--r--  1   2088 Nov 18 13:59:16 2019 h5py.h5

mac56:~/hdf5/issues/1115.1.4-compat/python1 80> h5dump-1.4.4 -B h5py.h5
HDF5 "h5py.h5" {
BOOT_BLOCK { boot block not yet implemented }
GROUP "/" {
h5dump error: unknown object "test"
} 
} 

Here is new data. I used our normal h5debug in 1.10.5 to examine the internal details of object headers of dataset “test” in Joseph’s before and after test files. This is right where h5dump-1.4.4 caught an error, so a good place to start. The two object headers were refreshingly similar in many ways. However there were interesting differences as well.

  • The “old” 1.4.4-compatible file has nothing resembling a “fill” message. The new file has a “fill_new” message.
  • Old has an “mtime” message, new has “mtime_new”.
  • The “layout” message is old version 0, vs. new version 3.

The file from Gerd’s python test contains these same new feature versions, thus the expected failure.

Each of these three items appears to be a versioned feature in the 1.10 RFC document (above) for format compatibility, and 1.4 not supporting the latest. So the question is no longer “what is wrong with the new file”, but rather “how to tell a modern library such as HDF.PInvoke or h5py to select the earliest possible format”. Or maybe, “what did not work in the thing that we already tried”.

A good test would be a simple test in pure C that tries to use H5P_SET_LIBVER_BOUNDS in the desired way. This would distinguish behavior in higher level libraries from a possible bug in the 1.10.5 core. If the core works correctly, then presumably the higher library is in current maintenance and can be fixed.

I am out of time for today, so maybe someone else could try that.

In the end, HDF.PInvoke and h5py just call the the C-library. They can expose only what’s there in C, and I don’t see a way to coerce the C-library into using “the earliest possible format.” Apart from engineering challenges, I think the main reason why the double H5F_LIBVER_EARLIEST combo is not supported in H5P_SET_LIBVER_BOUNDS, is performance. As simple as it sounds, ‘the earliest possible format’ is a volatile thing that can change with every object creation. It’s a bit like ‘today’s weather in Champaign, IL’. To ask at every object creation, “What’s today’s earliest version?”, would be a drag on performance. I think it might be interesting to explore the cost of creating a standalone tool with the express purpose to downgrade a file snapshot to its “current earliest version.”

Forensics found some hard information. The bottom line is, library versions 1.8 and 1.10 can not make files that can be read by 1.4 and earlier, regardless of compatibility settings. There is an exception. If you write scalars only, then earlier libraries can apparently read them. This was Joseph’s lead-in, and it got me hooked. My bad!

A section of the old FAQ says this: With the release of HDF5-1.8, the decision was made to only write version 3 layout messages. The consequence is that HDF5-1.6.2 and previous releases cannot read files created with HDF5-1.8.

Unfortunately this is not well documented elsewhere, such as the other format change documents, the library compatibility function, or release notes.

Dave, G, you two are quite the detectives.
Thanks so much for your determination on this topic.
Dave’s last message confirms my fears.
Thankfully we have a back-up plan, it is not pretty, but it will involve another file format to bridge the gap between the 2 versions of our software to write the 2 different HDF5 files.

I must say my experience on this forum was amazing, I didn’t think anyone would care about my problem.
Thanks!
-joe

Hi Joe,
the hack that I proposed at the beginning may work:

  1. check out HDF5 v1.4.4 and compile for static, link against your application which saves the data; use RPC for data exchange: extra points for shared memory approach
  2. check out latest HDF5 , compile as static, and link it against your application.
  3. wire them up.
    • This is the tricky part, the simplest is to have two separate application statically linked then zeroMQ or similar data exchange.
    • create two static libraries and use symbol manipulation to hide the hdf5 calls. This is tricky but is the fastest one.

Of course you can save it into an intermediate file format and read it back: this is the slowest and maybe the simplest to do.
best: steve

according to the RFC and the function docs, that combination earliest/earliest is not supported.. .