Defining and retrieving structured data?

I have been using HDF5 on and off, currently returning to it for a computational geometry project.

I am looking for the correct terminology to describe what I am looking for which I believe there was but my search fails me.

I’d like to store a structure of information

{
  face_index (int32)
  centroid (float32[2])
}

I have no problem storing an array of index and a separate array of position

But I recall there was some structure mechanism which allow us to retrieve whole element of a structure.

Or maybe my memory is failing me.

Thank you in advanced.

Hi @yue.nicholas,

What you are looking for is called a compound data type in HDF5. It is meant to store structured data just like the one you have posted.

As an example, to store structured data in C using HDFql (a high-level declarative language that simplifies handling HDF5) could be done as follows:

// declare structure
struct my_data
{
   int face_index;
   float centroid[2];
};

// declare variables
struct my_data data;
char script[1024];

// populate variable 'data' with some dummy values
data.face_index = 10;
data.centroid[0] = 15.2;
data.centroid[1] = 17.4;

// create an HDF5 file named 'test.h5' and use (i.e. open) it
hdfql_execute("CREATE AND USE FILE test.h5");

// register variable 'data'
hdfql_variable_register(&data);

// prepare script to create a compound dataset named 'dset' which stores the values of variable 'data'
sprintf(script, "CREATE DATASET dset AS COMPOUND(face_index AS INT OFFSET %ld, centroid AS FLOAT(2) OFFSET %ld) VALUES FROM MEMORY 0", offsetof(struct my_data, face_index), offsetof(struct my_data, centroid));

// execute script
hdfql_execute(script);

Besides C, HDFql supports C++, C#, Python, Java, Fortran and the R programming languages.

2 Likes

Great example, Mr. HDFql! Can you attach the h5dump output for everyone’s benefit?

The elements of the dataset created are of an HDF5 compound datatype. This is just another way of saying that you are dealing with records. All records of a given type have a set of named user-visible fields. With HDF5, you can read or write subsets of fields (partial records) over the entire dataset or just parts of it.

G.

Sure! Here is the output of running h5dump against file test.h5 (which was generated by the C code snippet above):

HDF5 "test.h5" {
GROUP "/" {
   DATASET "dset" {
      DATATYPE  H5T_COMPOUND {
         H5T_STD_I32LE "face_index";
         H5T_ARRAY { [2] H5T_IEEE_F32LE } "centroid";
      }
      DATASPACE  SCALAR
      DATA {
      (0): {
            10,
            [ 15.2, 17.4 ]
         }
      }
   }
}
}

A compound data type allows you to define a structure with multiple fields while crm data enrichment, and you can then create an array of this compound data type. Here’s a basic example in Python using the h5py library:

import h5py
import numpy as np

Sample data

data = [
{“face_index”: 1, “centroid”: [0.5, 0.5]},
{“face_index”: 2, “centroid”: [1.0, 1.0]},
# Add more data as needed
]

Define the compound data type

dtype = np.dtype([
(“face_index”, np.int32),
(“centroid”, np.float32, (2,))
])

Convert the data to a structured NumPy array

structured_data = np.array([(entry[“face_index”], entry[“centroid”]) for entry in data], dtype=dtype)

Create an HDF5 file and store the structured data

with h5py.File(“your_file.h5”, “w”) as file:
file.create_dataset(“your_dataset”, data=structured_data)

In this example, dtype defines the compound data type, and structured_data is a NumPy array with this data type. This array can then be stored in an HDF5 dataset. You can modify the example according to your specific needs and data.

Later, when you read the data back from HDF5, you can access individual fields as you would with a structured NumPy array.