Hello everyone! I am happy to announce that the project PureHDF has matured to a nearly feature-complete HDF5 reading and (limited) writing library for .NET applications. It comes without any native dependencies and is available on Nuget as beta release.
PureHDF was developed in response to the common pitfalls that native cross-platform development and library distribution present for library developers and library users in the .NET world. This library runs on any platform where .NET Standard 2.0 or higher is available (Linux, OSX, Windows) and on nearly any modern hardware architecture.
The main features of PureHDF are shown in the table below.
Reading | Writing | Feature |
---|---|---|
β | β | generic API |
β | β | easy filter access |
β | β | hardware-accelerated filters |
β | β | data slicing |
β | β | multidimensional arrays |
β | β | compound data |
β | β | variable-length data |
β | - | multithreading (1) |
β | - | Amazon S3 access |
β | - | HSDS access (2) |
(1): works in principle, but small issues remain, which will be solved in the final release (v1.0.0)
(2): basic functionality is implemented
Please also see the GitHub project page and the documentation for the reading and writing API.
Note: The work for this beta release aimed at stabilizing the API surface. The work for the final release (1.0.0) will focus on performance improvements and bug fixing.
Quick start
Below are some excerpts from the documentation linked above to give you a quick overview.
Reading
The following code snippets show how to work with the reading API. The first step is to open the file to read from:
using PureHDF;
var file = H5File.OpenRead("path/to/file.h5");
The method H5File.OpenRead
returns an instance of type NativeFile
which represents the root group /
. From there you can access any object within the file:
var group = file.Group("/path/to/my/group");
var dataset = file.Dataset("/path/to/my/dataset");
var commitedDataType = file.Group("/path/to/my/datatype");
var unknownObject = file.Get("/path/to/my/unknown/object");
HDF5 objects can have zero or more attributes attached which can either be accessed by enumerating all attributes (myObject.Attributes()
) or by direct access (myObject.Attribute("attribute-name")
);
When you have a dataset or attribute available, you can read itβs data by providing a compatible generic type as shown below.
var intScalar = dataset.Read<int>();
var doubleArray = dataset.Read<double[]>();
var double2DArray = dataset.Read<double[,]>();
var double3DArray = dataset.Read<double[,,]>();
var floatJaggedArray = dataset.Read<float[][]>();
[!NOTE]
An overview over compatible return types can be found in the Simple Data or the Complex Data sections.
Writing
PureHDF can easily create new files, as described in more detail below. However, editing existing files is outside the scope of PureHDF.
To get started, first create a new H5File
instance:
var file = new H5File();
A H5File
derives from the H5Group
type because it represents the root group. H5Group
implements the IDictionary
interface, where the keys represent the links in an HDF5 file and the value determines the type of the link: either it is another H5Group
or a H5Dataset
.
You can create an empty group like this:
var group = new H5Group();
If the group should have some datasets, just add them using the dictionary collection initializer - just like with a normal dictionary:
var group = new H5Group()
{
["numerical-dataset"] = new double[] { 2.0, 3.1, 4.2 },
["string-dataset"] = new string[] { "One", "Two", "Three" }
};
Datasets and attributes can both be created either by instantiating their specific class (H5Dataset
, H5Attribute
) or by just providing some kind of data. This data can be nearly anything: arrays, scalars, numerical values, strings, anonymous types, enums, complex objects, structs, bool values, etc. However, whenever you want to provide more details like the dimensionality of the attribute or dataset, the chunk layout or the filters to be applied to a dataset, you need to instantiate the appropriate class.
But first, letβs see how to add attributes. Attributes cannot be added directly using the dictionary collection initializer because that is only for datasets. However, every H5Group
has an Attribute
property which accepts our attributes:
var group = new H5Group()
{
Attributes = new()
{
["numerical-attribute"] = new double[] { 2.0, 3.1, 4.2 },
["string-attribute"] = new string[] { "One", "Two", "Three" }
}
};
The full example with the root group, a subgroup, two datasets and two attributes looks like this:
using PureHDF;
var file = new H5File()
{
["my-group"] = new H5Group()
{
["numerical-dataset"] = new double[] { 2.0, 3.1, 4.2 },
["string-dataset"] = new string[] { "One", "Two", "Three" },
Attributes = new()
{
["numerical-attribute"] = new double[] { 2.0, 3.1, 4.2 },
["string-attribute"] = new string[] { "One", "Two", "Three" }
}
}
};
The last step is to write the defined file to the drive:
file.Write("path/to/file.h5");
Supported filters overview
The first group of filters is built into PureHDF.
Filter | Compress | Decompress | Notes |
---|---|---|---|
Shuffle | β | β | hardware-accelerated |
Fletcher-32 | β | β | |
N-Bit | - | - | |
Scale-Offset | - | β | |
Deflate | β | β | based on ZLibStream |
C-Blosc2 | β | β | native, hardware-accelerated |
BZip2 (SharpZipLib) | β | β | |
Deflate (ISA-L) | β | β | native, hardware-accelerated |
Deflate (SharpZipLib) | β | β | |
LZF | β | β |
I hope the .NET developers on this forum will find this library useful. If you have any questions, problems or suggestions feel free to add them under this post or open a new issue in the repository.
Apollo3zehn