I’m working on covering more parts of the HDF5 specification in the pure python reader https://github.com/jjhelmus/pyfive (which we plan to use for recovering data from corrupt files).
Files that store groups while tracking the order (indexed groups) use a b-tree v2 with fractal heap. The specs are unclear about what a “heap ID” is (or at least I don’t get it):
Version 2 B-tree, Type 5 Record Layout - Link Name for Indexed Group ID | This is a 7-byte sequence of bytes and is the heap ID for the link record in the group’s fractal heap.
Then when looking at the fractal heap
Fields: Fractal Heap Direct Block Object Data | This section of the direct block stores the actual data for objects in the heap. The size of this section is determined by the direct block’s size minus the size of the other fields stored in the direct block (for example, the Signature, Version, and others including the Checksum if it is present).
How is the “heap ID” from the b-tree record related to the direct block data of the fractal heap? How do I know which direct block it is and what the offset+size is of the data in the block’s data?
As a more general question about fractal heaps: how is the “object data” from a direct block structured? Can data be extracted from this binary blob without knowing anything else?