PureHDF 1.0-beta.1 released (.NET)

Hello everyone! I am happy to announce that the project icon PureHDF has matured to a nearly feature-complete HDF5 reading and (limited) writing library for .NET applications. It comes without any native dependencies and is available on Nuget as beta release.

PureHDF was developed in response to the common pitfalls that native cross-platform development and library distribution present for library developers and library users in the .NET world. This library runs on any platform where .NET Standard 2.0 or higher is available (Linux, OSX, Windows) and on nearly any modern hardware architecture.

The main features of PureHDF are shown in the table below.

Reading Writing Feature
βœ“ βœ“ generic API
βœ“ βœ“ easy filter access
βœ“ βœ“ hardware-accelerated filters
βœ“ βœ“ data slicing
βœ“ βœ“ multidimensional arrays
βœ“ βœ“ compound data
βœ“ βœ“ variable-length data
βœ“ - multithreading (1)
βœ“ - Amazon S3 access
βœ“ - HSDS access (2)

(1): works in principle, but small issues remain, which will be solved in the final release (v1.0.0)
(2): basic functionality is implemented

Please also see the GitHub project page and the documentation for the reading and writing API.

Note: The work for this beta release aimed at stabilizing the API surface. The work for the final release (1.0.0) will focus on performance improvements and bug fixing.

Quick start

Below are some excerpts from the documentation linked above to give you a quick overview.

Reading

The following code snippets show how to work with the reading API. The first step is to open the file to read from:

using PureHDF;

var file = H5File.OpenRead("path/to/file.h5");

The method H5File.OpenRead returns an instance of type NativeFile which represents the root group /. From there you can access any object within the file:

var group = file.Group("/path/to/my/group");
var dataset = file.Dataset("/path/to/my/dataset");
var commitedDataType = file.Group("/path/to/my/datatype");
var unknownObject = file.Get("/path/to/my/unknown/object");

HDF5 objects can have zero or more attributes attached which can either be accessed by enumerating all attributes (myObject.Attributes()) or by direct access (myObject.Attribute("attribute-name"));

When you have a dataset or attribute available, you can read it’s data by providing a compatible generic type as shown below.

var intScalar = dataset.Read<int>();
var doubleArray = dataset.Read<double[]>();
var double2DArray = dataset.Read<double[,]>();
var double3DArray = dataset.Read<double[,,]>();
var floatJaggedArray = dataset.Read<float[][]>();

[!NOTE]
An overview over compatible return types can be found in the Simple Data or the Complex Data sections.

Writing

PureHDF can easily create new files, as described in more detail below. However, editing existing files is outside the scope of PureHDF.

To get started, first create a new H5File instance:

var file = new H5File();

A H5File derives from the H5Group type because it represents the root group. H5Group implements the IDictionary interface, where the keys represent the links in an HDF5 file and the value determines the type of the link: either it is another H5Group or a H5Dataset.

You can create an empty group like this:

var group = new H5Group();

If the group should have some datasets, just add them using the dictionary collection initializer - just like with a normal dictionary:

var group = new H5Group()
{
    ["numerical-dataset"] = new double[] { 2.0, 3.1, 4.2 },
    ["string-dataset"] = new string[] { "One", "Two", "Three" }
};

Datasets and attributes can both be created either by instantiating their specific class (H5Dataset, H5Attribute) or by just providing some kind of data. This data can be nearly anything: arrays, scalars, numerical values, strings, anonymous types, enums, complex objects, structs, bool values, etc. However, whenever you want to provide more details like the dimensionality of the attribute or dataset, the chunk layout or the filters to be applied to a dataset, you need to instantiate the appropriate class.

But first, let’s see how to add attributes. Attributes cannot be added directly using the dictionary collection initializer because that is only for datasets. However, every H5Group has an Attribute property which accepts our attributes:

var group = new H5Group()
{
    Attributes = new()
    {
        ["numerical-attribute"] = new double[] { 2.0, 3.1, 4.2 },
        ["string-attribute"] = new string[] { "One", "Two", "Three" }
    }
};

The full example with the root group, a subgroup, two datasets and two attributes looks like this:

using PureHDF;

var file = new H5File()
{
    ["my-group"] = new H5Group()
    {
        ["numerical-dataset"] = new double[] { 2.0, 3.1, 4.2 },
        ["string-dataset"] = new string[] { "One", "Two", "Three" },
        Attributes = new()
        {
            ["numerical-attribute"] = new double[] { 2.0, 3.1, 4.2 },
            ["string-attribute"] = new string[] { "One", "Two", "Three" }
        }
    }
};

The last step is to write the defined file to the drive:

file.Write("path/to/file.h5");

Supported filters overview

The first group of filters is built into PureHDF.

Filter Compress Decompress Notes
Shuffle βœ“ βœ“ hardware-accelerated
Fletcher-32 βœ“ βœ“
N-Bit - -
Scale-Offset - βœ“
Deflate βœ“ βœ“ based on ZLibStream
C-Blosc2 βœ“ βœ“ native, hardware-accelerated
BZip2 (SharpZipLib) βœ“ βœ“
Deflate (ISA-L) βœ“ βœ“ native, hardware-accelerated
Deflate (SharpZipLib) βœ“ βœ“
LZF βœ“ βœ“

I hope the .NET developers on this forum will find this library useful. If you have any questions, problems or suggestions feel free to add them under this post or open a new issue in the repository.

Apollo3zehn

1 Like