Simple Linked List in HDF5


#1

Suppose we have the following struct:
typededef struct node{
int data;
node* next;
}

Then I want to find the equivalent compound type in HDF5 (c++). I was thinking of using H5Tvlen_create, but in this situation but this seems difficult since node* is not an HDF5 Datatype. Any ideas would be much appreciated in this regard.


#2

I arrived late to the party – just noticing this on C++ list. For good result, I would not suggest encoding node* next in the above structure. Instead save the data in-order as a vector, then recreate the list from the homogeneous dataset.

here is the code with h5cpp :

#include "struct.h" // arbitrary complex POD structure 
#include <h5cpp/core>
	#include "generated.h" 
// compound type descriptor must be sandwitched in between
#include <h5cpp/io>


int main(){
   // create HDF5 container with truncate
   auto fd = h5::create("example.h5",H5F_ACC_TRUNC);
   //creating dataset with `node_t` will require the shim code `generated.h` see below
   //cast returned dataset handle `h5::ds_t` to packet_table handle `h5::pt_t` 
   h5::pt_t pt = h5::create<my::node_t>(fd, "stream of struct",
      h5::max_dims{H5S_UNLIMITED}, h5::chunk{1024} | h5::gzip{9} );
   // do your link list thingy
   std::list<my::node_t> linked_list = { ... };
   // iterate through collection, and persist `nodes` in order
   for( auto node : linked_list )
      h5::append(pt, node); // buffers nodes to chunk_size
      // once bucket is full, dumps it into dataset
} // all resources are RAII enabled, properly closed when leaving code block

Your node in struct.h is:

#ifndef  MY_STRUCT_79843 // include guard
#define MY_STRUCT_79843
namespace my {
	struct node_t {
		int data;
	};
}
#endif

And the generated.h may be handwritten the following way, or machine generated with h5cpp compiler assisted reflection:

#ifndef H5CPP_GUARD_ErRrk
#define H5CPP_GUARD_ErRrk

namespace h5{
    template<> hid_t inline register_struct<my::node_t>(){

        hid_t ct_00 = H5Tcreate(H5T_COMPOUND, sizeof (my::node_t));
        H5Tinsert(ct_00, "data",	HOFFSET(my::node_t,data),H5T_NATIVE_INT);
		return ct_00;
    };
}
H5CPP_REGISTER_STRUCT(my::node_t);
#endif

best wishes: steven


#3

As have I. That said, I have some code that demonstrates just this sort of thing here. This example is a bit more complicated because it involves 3 such types instead of just one. It also uses HDF5’s type conversion mechanisms to handle converting memory POINTER types to dataset OFFSET values. And, there is no read-back example yet. But, it might be helpful to you in stimulating thought.


#4

Yes indeed is a mind provoking approach! If someone stole the idea on a bright morning I could be a suspect :slight_smile:

Here is an an idea using References:
Picture a graph of an arbitrary C++ objects with different memory locations, and the task is to save/restore the state efficiently respect to time and space. Given the distribution of objects respect to type it seems a reasonable to store same types/classes in a single vector of HDF5 Compound type, and references/pointers to classes/types are represented as HDF5 Reference.

The above appears to be a sound approach:

  • Objects to other classes: OK
  • Reference to same class but different object: OK
  • Circular reference to same object: OK

Saving a graph of objects into HDF5 is more interesting if designed such that these objects can be restored in other systems. I am winking at you Python, Julia, Matlab, R and friends with Object support.

It also appears that some mappings are cumbersome:
c++ stl::array<T,N> to HDF5 Array or hyperslab? C++ implementation is a template, N has to be known at compile time, OTOH HDF5 Array doesn’t allow partial IO leading to a mismatch.

C++ enum type are even stricter, requiring the values to be available at compile time constexpr.

Then again this thread is about linked lists, and probably should start a new one: RFC object level interop between popular HPC programming environments.