Inside HDF5: A Hands-On Clinic with GNU poke and HDF5 Pickles - Gerd Heber on Call the Doctor 3/31/26
HDF5 files are usually accessed through high-level APIs, but sometimes the clearest way to understand a file is to inspect its on-disk structures directly. In this HDF Clinic, we will walk through a hands-on tutorial using HDF5 pickles with GNU poke to explore HDF5 metadata at the byte level. Starting from a sample file, we will launch poke, load the pickles, map the superblock, locate the root object header, decode object header messages, verify checksums, and follow links to dataset metadata. Along the way, we will examine how dataspaces, datatypes, and layout information are represented in the file, and how GNU poke turns raw bytes into readable, queryable structures.
The session will also show how these pickles provide an executable, machine-readable view of HDF5 that is useful for inspection, validation, debugging, and learning the format itself. We will finish by demonstrating a small write-through edit on a disposable copy of an HDF5 file and discussing the care needed when changing metadata directly, including dependent fields such as checksums. This clinic is aimed at developers, advanced users, and anyone who wants a practical introduction to working with HDF5 files from the inside out using GNU poke.
To join, just jump on the zoom: Launch Meeting - Zoom
March 31, 2025,12:20 p.m. central time US/Canada
Save as minimal-hdf5.pk and run with POKE_LOAD_PATH=$PWD/pickles poke -qs minimal-hdf5.pk.
bob@penguin:~/projects/hdf5-pickles$ POKE_LOAD_PATH=$PWD/pickles poke -qs minimal-hdf5.pk
The current IOS is now `*image*'.
_____
---' __\_______
______) GNU poke 4.3
__)
__)
---._______)
Copyright (C) 2024 The poke authors.
License GPLv3+: GNU GPL version 3 or later.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Powered by Jitter 0.7.312.
Perpetrated by Jose E. Marchesi.
For help, type ".help".
Type ".exit" to leave the program.
bob@penguin:~/projects/hdf5-pickles$ h5dump empty.h5
HDF5 "empty.h5" {
GROUP "/" {
}
}
What happens when you can’t use the HDF5 library to read a file? In this session of Call the Doctor, we dive into HDF5 Pickles—a new machine-readable specification that allows for low-level binary analysis and repair using GNU poke.
The Concept: You can inspect, modify, and even create HDF5 files from scratch without ever linking the HDF5 library. The Tool:GNU poke using a new machine-readable specification called “HDF5 Pickles.”Why it matters:
Precision Editing: Traditional tools like h5dump or h5edit rely on the library itself. If a file is corrupted, the library might fail to open it. With GNU poke and pickles, you are looking at the raw bytes as defined by the file format specification.
Check Integrity: You can manually trigger and verify checksums (like the Jenkins lookup3 hash) for object headers [18:50].
Security & Research: This is a “giant step forward” for security researchers who want to perform targeted fuzzing—creating slightly “broken” files to see how the HDF5 library handles errors [43:50].
Beyond the video, we’ve prepared a structured technical overview and searchable transcript on The HDF Group Blog. This is a great resource for anyone looking to understand the specific file repair and fuzzing strategies discussed today.