A simple blockchain in HDF5 and a preview of h5explain, our first “pickled” tool - Gerd Heber on Call the Doctor 5/5/26
In this episode, The HDF Group’s Gerd Heber (@gheber) will show you how to implement a simple blockchain-like, tamper-evident structure in HDF5. He will discuss relevant design choices and show how to use it in your code. Gerd will then examine a sample file using our first tool, h5explain, which is based on the poke pickles in our machine-readable HDF5 file format specification. He will conclude with a preview of how to create a human-friendly file format specification from the machine-readable one.
To join, just jump on the zoom:
Launch Meeting - Zoom
May 5, 2026,12:20 p.m. central time US/Canada
The example can be found here.
h5explain is part of hdf5-pickles
~/projects/hdf5-pickles$ ./tools/h5explain chain.h5
h5explain: interactive HDF5 byte-level explorer
mapping mode: strict (@)
Navigation:
root jump to the root group object header
h5super jump to the HDF5 superblock
cd ("NAME") follow a hard link from the current group header
go (OFF#B) parse a primitive at an absolute file address
go (OFF#B, "PATH") parse with an explicit path label
gos ("0xADDR") parse an address supplied as text
gos ("0xADDR", "PATH") parse with an explicit path label
back return to the previous location
pwd show current label, offset, and kind
Inspection:
info explain the current primitive
msgs decode object header messages (OHDR only)
cur return the current mapped value (raw poke struct)
ls or links list hard links when current primitive is a group header
traverse fully traverse a chunk index (may be slow for large datasets)
dump hex-dump from the current primitive offset
h5dump hex-dump the current primitive extent
Type help to show this message again.
current: HDF5 superblock at 0UL#B [superblock]
(h5explain)
Enjoy! G.
In this session of Call the Doctor, Gerd Heber covered the intersection of data integrity and low-level file exploration. He started with a practical demonstration of how to implement a tamper-evident blockchain natively within HDF5 files to secure provenance data and then shifted to a major milestone in the HDF5 SHINES project: the development of machine-readable file format specifications. Using new tools like h5explain and h5markers, he provided a byte-level walkthrough of HDF5’s internal structures, showing how developers can perform deep file forensics and metadata analysis without relying on the HDF5 library.
Relevant Links
Today I Learned
- HDF5 Blockchain: You can implement native tamper-evidence by interlinking datasets via SHA-256 hashes and storing the “head” as a group attribute.
- Interactive Forensics:
h5explain and h5markers allow you to navigate and scan an HDF5 file binary without needing the library to interpret the file for you—perfect for forensic investigations.
- The “Golden Copy”: We are experimenting with using machine-readable GNU poke pickles as the primary specification source, generating human-readable Markdown via YAML sidecars to ensure 100% accuracy.
- Cloud Check: You can identify a cloud-optimized (page-allocated) file just by checking for a Superblock Version 2 or 3.
Gerd Heber walks through the technical implementation of these features, from SHA-256 opaque types to the new SHINES documentation strategy. For the full deep dive, technical summary, and a cleaned transcript of the session, check out the full blog post: A Simple Blockchain in HDF5 and a Preview of h5explain.
This material is based upon work supported by the U.S. National Science Foundation under Federal Award No. 2534078. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.