Dataset Insert Vector of Compound Type


#1

Using the C HDF5 library, I’m trying to use a prepopulated vector and insert it into my HDF5 file. I would like to not have to recopy the data and just add the vector I have. Is there any way I can get H5Tarray_create to work in my case? I want to use H5Tarray because of it’s easier to debug in HDFView (and looks better). H5Tvlen_create does work, but doesn’t get the desired effect in HDFView.

struct Key {
	float code, value;
};

struct Instruction {
	float timestamp;
	Key *keys;
};

struct v_Instruction {
	float timestamp;
	hvl_t keys;
};

//---

std::vector<Key> keys(2);
keys[0].code = 1;
keys[0].value = 123;
keys[1].code = 2;
keys[1].value = 234;

const hsize_t arr_dims[] = {keys.size()};

hid_t compoundKey = H5Tcreate(H5T_COMPOUND, sizeof(Key));
H5Tinsert(compoundKey, "Code", HOFFSET(Key, code), H5T_NATIVE_FLOAT);
H5Tinsert(compoundKey, "Value", HOFFSET(Key, value), H5T_NATIVE_FLOAT);

//
// Wanted: Array Method
//
hid_t key_array = H5Tarray_create(compoundKey, 1, arr_dims);
size_t s_Instruction = sizeof(Instruction::timestamp) + sizeof(Key) * keys.size();
hid_t compoundInstr = H5Tcreate(H5T_COMPOUND, s_Instruction);
H5Tinsert(compoundInstr, "timestamp", 0, H5T_NATIVE_FLOAT);
H5Tinsert(compoundInstr, "Keys", sizeof(Instruction::timestamp), key_array);

//Create Dataset
//Write Data
Instruction test;
test.timestamp = 1;
test.keys = &keys.front();
H5Dwrite(dset, compoundInstr, H5S_ALL, H5S_ALL, H5P_DEFAULT, &test); // Doesn't set data correctly

//
// V Len Method
//
hid_t vLen_key = H5Tvlen_create(compoundKey);
compoundInstr = H5Tcreate(H5T_COMPOUND, sizeof(v_Instruction));
H5Tinsert(compoundInstr, "timestamp", HOFFSET(v_Instruction, timestamp), H5T_NATIVE_FLOAT);
H5Tinsert(compoundInstr, "Keys", HOFFSET(v_Instruction, keys), vLen_key);

//Create Dataset
//Write Data
v_Instruction v_test;
v_test.timestamp = 1;
v_test.keys.len = keys.size();
v_test.keys.p = &keys.front();
H5Dwrite(dset, compoundInstr, H5S_ALL, H5S_ALL, H5P_DEFAULT, &v_test);

#2

H5CPP implements read|write with zero copy for non-strings, and comes with an LLVM based introspection tool, all descriptors (except h5::pt_t<T> packet table ) are binary compatible with HDF5 CAPI allowing you to interchange C and C++ calls withing C++ code.
the following excerpt is from the github examples. Presentation slides are here.

#include "struct.h"
#include <h5cpp/core>
	#include "generated.h"
#include <h5cpp/io>

int main(){
  h5::fd_t fd = h5::create("example.h5",H5F_ACC_TRUNC);
	
  std::vector<sn::example::Record> vec = h5::utils::get_test_data<sn::example::Record>(20);
  // this part is zero copy, reads data size out from `vec`
  h5::write(fd, "some dataset, somwhere", vec, h5::chunk{20} | h5::gzip{9});
  // more efficient way to do it is to pre-create dataset:  
  auto ds2 = h5::create<sn::example::Record>(fd, "dataset", h5::current_dims{20});
  // you always can pass typed pointers as well, just specify the size:
  h5::write(ds2, vec.data(), h5::count{20}); 
// <- this is not recommended with vectors, there is no performance difference between the two syntax;  
// instead use it for typed pointers, or  for non-STL like containers
}
// RAII will close all resources when leaving code block

The HDF5 type descriptors are generated with h5cpp introspection tool.
best: steve


#3

We are unable to change from the HDF5 C Library, so while this could be a solution it isn’t a solution for me.

My question is how to get H5Tarray_create to work without using Key keys[#]. I tried (and failed) using std::array, which on the backend uses the c-style arrays. The problem with [] is that I would have to go through it to assign all values from my std::vector.

I would like for more information sake on why my way won’t work or if there is a way to do this.


#4

From your response it appears you don’t understand what binary compatibility means, so I elaborate:

  • you can interact with your CAPI calls:
    • pass HDF5 CAPI identifier to H5CPP identifier: They are the same after the template processor is done
    • pass HCPP identifier to HDF5 CAPI calls: They are the same after the template processor done
  • it is guaranteed identity between the C code compiled with C++17 compiler and H5CPP templates
  • AFAIK there is no other HDF5 CAPI

This meta-programming template library was crafted for the very purpose to NOT TO REPLACE the HDF5 CAPI but to enhance it with features that C language can’t provide; and exists in C++17.


#5

Nathan, the fixed-size array version (H5T_ARRAY) works only if all instructions have
the same key count.

struct Instruction {
	float timestamp;
	Key keys[MAX_KEYS];
};

Is there such a maximum key count and how many key counts do
you have on average? You could just fill in default keys and use compression
to trim the overhead, at least in the file.

G.


#6

When I pass my std::vector of my structure for recording it is always the same size.

Key keys[MAX_KEYS] does work, but I have to copy each of my structures to the c-style array. I was hoping for a way to avoid copying the structures. When I tried Key &keys[MAX_KEYS] (to get a refence so I didn’t have to copy) and it crashed.


#7

The storage layouts are fundamentally different:

struct Instruction {
	float timestamp;
	Key keys[MAX_KEYS];
};

is contiguous in memory.

struct Instruction {
	float timestamp;
	Key* keys;
};

is not, even if the keys arrays have all the same size.

v_Instruction and initializing the pointers in hvl_t is perhaps your best option.

G.