Pure Go HDF5 Library - Production-Ready Write Support (v0.11.4-beta)

Pure Go HDF5 Library - Production-Ready Write Support (v0.11.4-beta)

Project: GitHub - scigolib/hdf5: Modern Pure Go implementation of the HDF5 file format
Latest Release: v0.11.4-beta (November 2, 2025)
Status: Production-ready for read operations, ~90% write support complete


Overview

I’m excited to share a major milestone for the HDF5 Go library - a pure Go implementation of the HDF5 file format with no CGo dependencies. After intensive development, the library has reached production-ready status for write operations with comprehensive feature coverage.

Notably, this implementation supports all current HDF5 file format versions (Superblock v0, v2, v3), making it ready for HDF5 2.0 when it arrives - any future format versions can be added as library updates without breaking the API. This positions it as a truly future-proof, ultra-modern HDF5 implementation.


What Makes This Implementation Unique

This is likely the most modern pure Go HDF5 implementation available today, featuring:

Future-Proof Architecture

  • All HDF5 Format Versions Supported:
    • Superblock v0 (HDF5 1.0-1.6) - Legacy format
    • Superblock v2 (HDF5 1.8+) - Modern streamlined format
    • Superblock v3 (HDF5 1.10+) - SWMR support
  • Ready for HDF5 2.0 - Future format versions will be added in v1.x updates
  • Backward & Forward Compatible - Read/write files from any HDF5 version
  • Ultra-Modern Library - All formats supported from day one!

This means files created today will remain compatible when HDF5 2.0 arrives, and the library will support the new format through simple updates rather than major rewrites.

Advanced Performance Optimization

  • Smart B-tree Rebalancing - Automatic optimization with 4 modes:
    • Default - No rebalancing (like reference C library)
    • Lazy - Batch processing (10-100x faster deletions)
    • Incremental - Background rebalancing (zero pause time)
    • Smart - Auto-tuning with workload detection

Manual and automatic rebalancing strategies allow users to optimize for their specific workloads - from append-heavy to deletion-intensive patterns.

Production-Grade Quality

  • 86.1% test coverage overall, 77.8% for core package
  • 8,000+ lines of professional integration tests
  • 57 reference HDF5 files validated (across all format versions)
  • Zero linter issues (34+ linters)
  • Cross-platform - Linux, macOS, Windows

Comprehensive Documentation

  • 2,700+ lines of detailed guides
  • 4 working examples demonstrating different rebalancing strategies
  • Complete API reference
  • Architecture documentation following HDF5 C library patterns

Current Feature Support

Read Support (100% Complete)

  • All HDF5 format versions (Superblock v0, v2, v3) :sparkles:
  • All datatypes (integers, floats, strings, compounds, arrays, enums, references, opaque)
  • All layouts (compact, contiguous, chunked)
  • All storage types (compact, dense with fractal heap + B-tree v2)
  • Compression (GZIP/Deflate)
  • Object headers v1 (legacy HDF5 < 1.8) and v2 (modern HDF5 >= 1.8)
  • Both traditional (symbol table) and modern (object header) groups
  • Attributes (compact and dense storage)

Write Support (~90% Complete)

  • Multiple format versions (Superblock v0 for legacy, v2 for modern) :sparkles:
  • File creation (Truncate/Exclusive modes)
  • Dataset writing (contiguous, chunked layouts, all datatypes)
  • Group creation (symbol table, dense with automatic transition)
  • Attribute writing (compact 0-7, dense 8+ with automatic transition)
  • Attribute modification/deletion (complete lifecycle)
  • Compression (GZIP/Deflate, Shuffle filter, Fletcher32 checksum)
  • Advanced datatypes (arrays, enums, references, opaque)
  • Dense storage Read-Modify-Write (full RMW cycle)
  • Smart B-tree rebalancing with auto-tuning
  • Legacy compatibility (Superblock v0 + Object Header v1)
  • Free space management

Remaining Work (~10%)

  • Soft/external links (hard links fully supported)
  • Compound datatype writing (read works perfectly)
  • Indirect blocks for fractal heap (direct blocks cover most use cases)

Why This Matters for the HDF Community

Ultra-Modern Foundation

Unlike implementations locked to specific HDF5 versions, this library was built with all format versions in mind from the start. When HDF5 2.0 arrives, support will be added through regular library updates (v1.x releases) without breaking existing code or requiring major API changes.

Performance Innovation

The Smart Rebalancing API is a unique feature not found in other HDF5 implementations. It addresses a common real-world problem: B-tree degradation under deletion-heavy workloads. Users can now:

  • Let the library auto-detect their workload patterns
  • Choose manual strategies for specific use cases
  • Achieve 10-100x performance improvements for deletions
  • Maintain compact, efficient file structures

Pure Go Benefits

No CGo dependencies means:

  • Easy cross-compilation (compile for any platform from any platform)
  • No C toolchain required (simpler deployment)
  • Memory safety (Go’s garbage collector)
  • Better debugging (no C/Go boundary issues)
  • Easier maintenance (single language codebase)

Development Journey

The journey from concept to production took over a year - from studying the HDF5 format specification to implementing all the intricate details of the binary format for multiple format versions. Previous attempts using C bindings and other approaches were unsuccessful.

However, modern development technologies played a crucial role in the successful completion of this project. The combination of:

  • Comprehensive HDF5 format specification (excellent documentation!)
  • Reference C library implementation (invaluable reference)
  • Modern development tools and practices
  • AI-assisted development for testing and documentation

…made it possible to achieve what would have been extremely challenging or perhaps impossible otherwise. Without these modern technologies, completing this multi-version implementation in a year - or potentially ever - would not have been feasible. We’re grateful for the tools and technologies that made this possible. :folded_hands:


Technical Implementation

Architecture

The implementation follows HDF5 C library patterns (H5Adense.c, H5Aint.c, H5Oattribute.c) while maintaining idiomatic Go code:

  • Pure Go (no CGo dependencies)
  • Version-aware parsing (automatic format detection)
  • Buffer pooling for memory efficiency
  • Context-rich error handling
  • Signature-based format dispatch
  • Table-driven tests with comprehensive scenarios

Format Version Support Strategy

// Read: Automatically detects and handles any supported version
file, _ := hdf5.Open("legacy_v0.h5")    // Works!
file, _ := hdf5.Open("modern_v2.h5")    // Works!
file, _ := hdf5.Open("swmr_v3.h5")      // Works!

// Write: Choose format version for compatibility needs
fw, _ := hdf5.CreateForWrite("legacy.h5", hdf5.CreateTruncate,
    hdf5.WithSuperblockVersion(0),  // For HDF5 1.0-1.6 compatibility
)

fw, _ := hdf5.CreateForWrite("modern.h5", hdf5.CreateTruncate,
    hdf5.WithSuperblockVersion(2),  // Modern format (default)
)

Validation

  • Round-trip testing (Go write → C library read → verify)
  • h5dump compatibility validation across all format versions
  • Real HDF5 reference files from various sources (v0, v2, v3)
  • Integration tests with actual file I/O

Example Usage

Smart Rebalancing (Auto-Tuning)

package main

import "github.com/scigolib/hdf5"

func main() {
    // Create file with smart auto-tuning rebalancing
    fw, _ := hdf5.CreateForWrite("data.h5", hdf5.CreateTruncate,
        hdf5.WithSmartRebalancing(
            hdf5.SmartAutoDetect(true),   // Detect workload patterns
            hdf5.SmartAutoSwitch(true),   // Switch modes automatically
        ),
    )
    defer fw.Close()

    // Library automatically optimizes for your workload!
    // 10-100x performance improvement for deletion-heavy operations
}

Full Read-Modify-Write Cycle

// Create and write
fw, _ := hdf5.CreateForWrite("data.h5", hdf5.CreateTruncate)
fw.WriteAttribute("/", "version", "1.0")
fw.Close()

// Reopen and modify - full RMW cycle works!
fw2, _ := hdf5.OpenForWrite("data.h5", hdf5.OpenReadWrite)
fw2.ModifyDenseAttribute("/", "version", "2.0")
fw2.Close()

// Verify with h5dump - shows updated value ✓

Multi-Version Compatibility

// Read any version automatically
oldFile, _ := hdf5.Open("1999_hdf5_v0.h5")     // Superblock v0
modernFile, _ := hdf5.Open("2015_hdf5_v2.h5")  // Superblock v2
swmrFile, _ := hdf5.Open("2020_hdf5_v3.h5")    // Superblock v3

// All work seamlessly! Format version auto-detected.

Resources


Roadmap to v1.0.0

v0.11.4-beta (Current) - Write support ~90% complete
                        - All current HDF5 versions supported ✨
         ↓
v0.11.5-beta          - Soft/external links, indirect blocks
         ↓
v0.12.0-rc.1          - FEATURE COMPLETE (API frozen)
         ↓
v0.12.x-rc.x          - Community testing (2-3 months)
         ↓
v1.0.0 STABLE         - Production release (Late 2026)
                        - ALL HDF5 formats supported (v0, v2, v3)
                        - Ready for HDF5 2.0 (future updates)
                        - Long-term API stability guarantee

Our v2.0.0 will only happen if we need to change the Go API - not because of HDF5 format changes. When HDF5 2.0 arrives, it will be supported in our v1.x updates!

The path to v1.0.0 is clear - remaining work is primarily “finishing touches” rather than fundamental features.


Community & Feedback

I’d greatly appreciate feedback from the HDF community:

  • Are there specific use cases or edge cases we should prioritize?
  • What additional validation scenarios would be valuable?
  • Interest in contributing or testing with real-world files across different format versions?
  • Thoughts on the multi-version support strategy?

The project welcomes contributions, bug reports, and suggestions. This is an active open-source project with regular releases and improvements.


Special Thanks

Professor Ancha Baranova - This project would not have been possible without her invaluable support and assistance throughout the development process.

The HDF Group - Thank you for the excellent format specification, comprehensive documentation, and the C library reference implementation that made this pure Go implementation possible. The clear specification for multiple format versions was essential for building a future-proof library!


Project Status: Beta - Production-ready for read operations, write support advancing rapidly
Latest Version: v0.11.4-beta (November 2, 2025)
License: MIT
Platform: Cross-platform (Linux, macOS, Windows)
Format Support: HDF5 Superblock v0, v2, v3 (all current versions + ready for HDF5 2.0)


Looking forward to feedback and collaboration with the HDF community! :rocket:

1 Like

Great to see more and more language native implementation of HDF5! I am not a go user but I did the same for C# (creating a pure C# library) and therefore I have a natural interest in how these different implementations work :slight_smile: That is why I would like to know if your new library already supports data slicing or if that is for a future release. Thanks & awesome work!

3 Likes

@apollo3zehn-h5 Thank you for the excellent question!

Currently (v0.11.4-beta), data slicing (hyperslab selection) is NOT implemented.

What works now:

  • Full dataset reading: dataset.Read() reads entire dataset

What’s missing:

  • Hyperslab selection (rectangular blocks)
  • Point selection (specific coordinates)
  • Stride-based reading

Why it’s not there yet:
Our current focus has been on write support (90% complete). We prioritized:

  1. Write features (user-requested)
  2. Attribute modification
  3. Dense storage support

When will it be added:
This is a HIGH priority feature (standard HDF5 capability). We’ll add it in:

  • v0.11.6-beta or v0.12.0-rc.1 (Q1 2026)
  • Estimated: 3-5 days of development

Thank you for bringing this up! It’s definitely needed for production use with large datasets. We’ll create a task (TASK-019) and add it to our roadmap.

Would appreciate your input on API design:

// Option A: Explicit hyperslab
dataset.ReadHyperslab(start, count, stride, block []uint64)

// Option B: Python-like slicing
dataset.ReadSlice(start, end []uint64)

// Option C: Both?

Great to connect with another native HDF5 implementer!

Thanks for your quick answer! Full support for slicing would mean to have two selections: in case of reading, there would be one selection for the source dataset located in the HDF5 file and another one for the target buffer (in the RAM). Since that requires the algorithm to know the dimensions of the target buffer, a third parameter with these dimensions is required.

On top of that, it might happen that additional read parameters are required which the native C library handles via the Dataset access property list (docs), e.g. to provide the virtual dataset prefix.

So that is why in my lib, the method signature to read from a dataset looks like this:

public T Read<T>(
    H5DatasetAccess datasetAccess, 
    Selection? fileSelection = null, 
    Selection? memorySelection = null, 
    ulong[]? memoryDims = null
)

The Selection can be either of type HyperslabSelection or PointSelection. I had to put significant amount of time into slicing support so in case you aim for all features of the C-library, I think 3-5 days could be too optimistic. But for a simpler implementation it would be enough for sure.

1 Like

@apollo3zehn-h5 Thank you for the detailed technical feedback! This was incredibly valuable.

You’re absolutely right - I had oversimplified the hyperslab selection implementation. After reviewing your points and the C library documentation (H5Dread signature with both mem_space_id and file_space_id),
I’ve completely revised the design for TASK-019.

Key changes:

  • Revised estimate: 5-7 days (was 3-5 days)
  • Two-level API approach: simple convenience methods + full ReadHyperslabAdvanced() with separate file/memory selections
  • Phased implementation: MVP (simple API) for v0.11.6-beta, advanced features later

Your feedback prevented an architectural mistake. Much appreciated! :folded_hands:

The updated task with incorporated feedback: [internal docs, will implement in v0.11.6-beta]

1 Like

Hi @a.kolkov,

Great contribution to the HDF5 ecosystem!

The Smart Rebalancing API seems to be an excellent feature. Do you think it’s possible to port it to the HDF5 C API/library?

1 Like

Thanks! The API needs production validation first. If it proves broadly useful, porting to C library could be discussed. Open to community feedback!

1 Like

@a.kolkov Thank you for this excellent work on the pure Go library. On your hyperslab question, I suggest option C (both). Strides and blocks are less frequently used than simple hyperslabs. However, all of these access modes are entangled in the same internal code areas. I think it would be better to design for all hyperslab modes up front, rather than facing redesign efforts when adding missing pieces later.

I suggest priority on these areas which I think are fundamental:

  • Hyperslab access for both read and write.
  • Extendible dimensions, also called unlimited dimensions.

Also please ensure full UTF-8 support everywhere, including HDF5 object names, string data storage, and external file names.

1 Like

@dave.allured Thanks for the detailed technical feedback!

Hyperslab selection is already our next priority (TASK-019) - planning full implementation with stride/block support as you recommend. Extendible dimensions (TASK-018) are in the roadmap. UTF-8 is already supported across the library.

Appreciate the guidance on avoiding future redesign - will implement complete hyperslab API from the start.

For validation, the current HDF5 1.14.6 distribution includes no less than 452 HDF5 test files (*.h5). Consider these as an external test suite. Try reading all of these with the Go library. Expect some surprises. (-; Some files may be intentionally invalid HDF5. These will exercise your internal validation logic and exception handling.

There are also 593 text dump files (*.ddl). If the Go library includes a native text-to-HDF5 generator tool, then use these text files to test the library’s write capabilities.

If this causes you to run into advanced format features that you did not anticipate, I would say to focus first on core library functionality, and save the more obscure capabilities for later releases.

2 Likes

@dave.allured Excellent suggestion! Testing against the official HDF5 test suite (452 files from 1.14.6) is a great validation strategy.

Currently we use targeted tests with h5dump and Python h5py for interoperability validation, but comprehensive testing against the official suite would indeed expose edge cases and obscure format details.

We already follow the “core functionality first” approach you recommend - phased implementation with MVP features (as seen in v0.11.5 soft/external links). This helps maintain quality while avoiding complexity overload.

Will definitely consider integrating the official test suite as part of our path to v1.0.0. Thank you for the practical testing guidance!

Update: MATLAB Library Released - Why I Started the HDF5 Project

Hi everyone,

I wanted to share some exciting news and explain what motivated me to start working on the pure Go HDF5 library in the first place.

Today I released v0.2.0-beta of the MATLAB File Reader for Go: Release Release Notes: v0.2.0-beta · scigolib/matlab · GitHub

This was actually the whole reason I got into writing the HDF5 library!

I needed to read and write MATLAB .mat files from Go applications. MATLAB v7.3+ files use HDF5 format, and there was no pure Go solution - all existing libraries required CGo and external C dependencies, which was a dealbreaker for cross-platform deployment.

So I decided: “Fine, I’ll write my own HDF5 library.” :sweat_smile:

What the MATLAB library does now:

  • :white_check_mark: Reads MATLAB v5 (legacy binary format) and v7.3 (HDF5) files
  • :white_check_mark: Writes both formats - complete bidirectional I/O
  • :white_check_mark: Pure Go, no CGo, works everywhere
  • :white_check_mark: All numeric types, complex numbers, multi-dimensional arrays
  • :white_check_mark: Production-ready: 100% tests passing, 78.5% coverage, 0 linter issues

The HDF5 v0.11.5-beta library you helped me build was absolutely critical for this. The nested datasets, group attributes, and proper MATLAB_class support made it possible to correctly implement MATLAB v7.3 complex numbers and maintain full compatibility with MATLAB/Octave.

Thank you to everyone who provided feedback and helped improve the HDF5 library. The MATLAB library wouldn’t exist without it!

If anyone’s interested in the technical details or has MATLAB file I/O needs, check out the release notes. It’s been quite a journey from “I need to read .mat files” to “let’s write a production HDF5 library.” :rocket:


Links:

Cheers,
Andy

2 Likes

Hi everyone!

Just released v0.11.6-beta with the features we discussed earlier:

What’s in this release:

  • :white_check_mark: Hyperslab selection (the feature @apollo3zehn-h5 requested!) - 10-250x faster partial reads
  • :white_check_mark: Dataset resize with unlimited dimensions
  • :white_check_mark: Variable-length datatypes (VLen strings, ragged arrays)

Special thanks to @apollo3zehn-h5 for the excellent technical guidance on hyperslab implementation! Your insights about the two-selection API (file_space + mem_space) and the C library reference were invaluable. The implementation is much more robust thanks to your feedback.

The hyperslab feature now includes:

  • Simple API: ReadSlice(start, count) for basic use cases
  • Advanced API: ReadHyperslab(selection) with stride/block support
  • Smart optimizations (1D fast path, bounding box, chunk-aware reading)
  • Comprehensive tests (22 subtests with round-trip validation)

We’re now at ~75% write support. Remaining ~25% includes soft/external links, compound datatype writing, and some filters.

Next sprint (v0.11.7-beta) will focus on the remaining features, including TASK-020 (official HDF5 test suite validation with those 452 .h5 files @dave.allured mentioned).

Release details: Release 🚀 v0.11.6-beta: Advanced Features - Dataset Resize + VLen + Hyperslab · scigolib/hdf5 · GitHub

Try it: go get github.com/scigolib/hdf5@v0.11.6-beta

Looking forward to your feedback!