Creating an HDF5 with dynamic Members

I am trying to create an hdf5 file that is extendable, has attributes, and
is compressed. Along with this my hdf5 has a Compound datatype that
includes members. All of this is simple to create when you have static
members:

    #include <vector>

    #include <string>

    #include <iostream>

    #include <H5Cpp.h>

    using namespace H5;

    using namespace std;

    const double NULL_VALUE = -999.99;

    const int ATTR_DIM = 1;

    const int ATTR_RANK = 1;

    const int CHUNK_RANK = 1;

    const hsize_t CHUNK_DIM[2] = {1, 365};

    const int STRTYPE_LENGTH = 20;

    const int COMPRESS_RATE = 6;

    const H5std_string STATIONID("stationID");

    const H5std_string STARTUP("startup");

    const H5std_string SAMPLERATE("samplerate");

    int main(void)

    {

      string filename = "NEALL01_60.h5";

      vector<string> sensors;

      string datasetName = "60";

      int dimSize = 1000;

      string stationID = "NEALL01";

      string startup = "1988-05-10 05:00:00";

      int samplerate = 60;

      hsize_t attrdim[] = {ATTR_DIM};

      DataSpace attrspace(ATTR_RANK, attrdim);

      StrType strdatatype(H5::PredType::C_S1, STRTYPE_LENGTH);

      const H5std_string stationIDBuff (stationID);

      const H5std_string startupBuff (startup);

      const H5std_string samplerateBuff (std::to_string(samplerate));

      hsize_t maxdim[] = {H5S_UNLIMITED, H5S_UNLIMITED};

      sensors.push_back("ml2_10cm");

      sensors.push_back("ml2_20cm");

      int rank = 1;

      typedef struct s1_t {

        double a;

        double b;

      } s1_t;

      int i;

      s1_t s1[dimSize];

      for (i = 0; i< dimSize; i++)

      {

        s1[i].a = NULL_VALUE;

        s1[i].b = NULL_VALUE;

      }

      hsize_t dim[] = {dimSize};

      DataSpace space(rank, dim, maxdim);

      H5File *file = new H5File(filename, H5F_ACC_TRUNC);

      CompType mtype1(sizeof(s1_t));

      int count = 0;

      for (auto a = sensors.cbegin(); a < sensors.cend(); ++a)

      {

        mtype1.insertMember(*a, sizeof(double)*count,
PredType::NATIVE_DOUBLE);

        count++;

      }

      DSetCreatPropList creatplist;

      creatplist.setChunk(CHUNK_RANK, CHUNK_DIM);

      creatplist.setDeflate(COMPRESS_RATE);

      DataSet *dataset;

      dataset = new DataSet(file->createDataSet(datasetName, mtype1, space,
creatplist));

      dataset->write(s1, mtype1);

      Attribute myatt1 = dataset->createAttribute(STATIONID, strdatatype,
attrspace);

      Attribute myatt2 = dataset->createAttribute(STARTUP, strdatatype,
attrspace);

      Attribute myatt3 = dataset->createAttribute(SAMPLERATE, strdatatype,
attrspace);

      myatt1.write(strdatatype, stationIDBuff);

      myatt2.write(strdatatype, startupBuff);

      myatt3.write(strdatatype, samplerateBuff);

      delete dataset;

      delete file;

    }

But, I dont know the amount of members to be added to the dataset until
run-time. In this case a struct will not due since it cant be added to.
Whats left is C++ containers such as vector or map. My question is how do I
implement the below code using a vector or map?

Hmm. It kinda feels like you are asking to different but related questions.

First, how do you deal with a compound-typed dataset in which that type may change with time (e.g. members are added -- or maybe even deleted).

Second, how might one do something similar with an STL container (e.g. like a map).

First, in your example, the s1_t struct member's are *both* the same type. Is that true in general? I mean if you add more members, are those new members *always* going to be the same type as the members already there (double in your example) or can you easily make them be (e.g. ints fit into doubles just fine even though you wind up wasting 4 bytes for each int). If so, here's how that might look. . .

You create HDF5 dataset of type double with 2 dimensions both UNLIMITED. Conceptually, think of this dataset as "s1_t[MxT] where 'M=members' and 'T=time'. Now, suppose you have s1_t data that varies in time like so. . .

5 values of s1_t data of type struct {double a; double b;}
7 values of s1_t data of type struct {double a; double b; double i; double x;}
1 values of s1_t data of type struct {double b; double i; double z;}

here, 'a', 'b', 'i', 'x' and 'z' are symantically totally different (double) objects you want to store. And, 'i' is really supposed to be an int.

Write the first 5 values as a 2x5 space of doubles to the dataset. After completion, you dataset will look like

a a a a a. . .
b b b b b. . .
.
.
.

where '…' are meant to indicate it can be extended in either of these directions…

write the next 7 values as a 4x7 space of doubles to the dataset resulting in. . .

a a a a a a a a a a a a. . .
b b b b b b b b b b b b. . .
. i i i i i i i. . .
. x x x x x x x. . .
.

write the next 1 values as a 3x1 space of doubles resulting in. . .

a a a a a a a a a a a a b
b b b b b b b b b b b b i
. i i i i i i i z
. x x x x x x x
.

Ok, that takes care of actually writing it. But, what about decoding (reading) it later? In particular, what about preserving the member names so that you could faithfully present those back to a user at some point?

Well, you need to use another dataset to capture that information. Something simple you could use is an UNLIMITED 1-D character dataset that looks like. . .

"<{double a; double b;},0,5><{double a; double b; int i; double x},5,7><{double b; int i; double z;},12,1>"

You can use '<>' chars to parse this array of chars and conclude that the s1_t dataset has 5 entries of type {double a; double b;} starting at index 0, 7 entries of type {double a; double b; int i; double x} starting at index 5 and 1 entry of type {double b; int i; double z;} starting at index 12. Note also that in this character dataset, we also capture that 'i' is really intended to be treated as an int.

Now, if your member types are going to vary as well, then I think you'd wind up having to create multiple datasets for each of the possible primitive types the members you can have and then marshal values from these structs around to the appropriate datasets. Note that long swaths of N/A entries in an HDF5 dataset should *not* result in wasted space becase HDF5 is smart enough NOT to store any blocks that don't actually have any values specified.

Hope that helps.

Mark

···

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Jamie Ryan Lahowetz <jlahowetz2@unl.edu<mailto:jlahowetz2@unl.edu>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Friday, September 18, 2015 11:29 AM
To: "hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>" <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] Creating an HDF5 with dynamic Members

I am trying to create an hdf5 file that is extendable, has attributes, and is compressed. Along with this my hdf5 has a Compound datatype that includes members. All of this is simple to create when you have static members:

    #include <vector>

    #include <string>

    #include <iostream>

    #include <H5Cpp.h>

    using namespace H5;

    using namespace std;

    const double NULL_VALUE = -999.99;

    const int ATTR_DIM = 1;

    const int ATTR_RANK = 1;

    const int CHUNK_RANK = 1;

    const hsize_t CHUNK_DIM[2] = {1, 365};

    const int STRTYPE_LENGTH = 20;

    const int COMPRESS_RATE = 6;

    const H5std_string STATIONID("stationID");

    const H5std_string STARTUP("startup");

    const H5std_string SAMPLERATE("samplerate");

    int main(void)

    {

      string filename = "NEALL01_60.h5";

      vector<string> sensors;

      string datasetName = "60";

      int dimSize = 1000;

      string stationID = "NEALL01";

      string startup = "1988-05-10 05:00:00";

      int samplerate = 60;

      hsize_t attrdim[] = {ATTR_DIM};

      DataSpace attrspace(ATTR_RANK, attrdim);

      StrType strdatatype(H5::PredType::C_S1, STRTYPE_LENGTH);

      const H5std_string stationIDBuff (stationID);

      const H5std_string startupBuff (startup);

      const H5std_string samplerateBuff (std::to_string(samplerate));

      hsize_t maxdim[] = {H5S_UNLIMITED, H5S_UNLIMITED};

      sensors.push_back("ml2_10cm");

      sensors.push_back("ml2_20cm");

      int rank = 1;

      typedef struct s1_t {

        double a;

        double b;

      } s1_t;

      int i;

      s1_t s1[dimSize];

      for (i = 0; i< dimSize; i++)

      {

        s1[i].a = NULL_VALUE;

        s1[i].b = NULL_VALUE;

      }

      hsize_t dim[] = {dimSize};

      DataSpace space(rank, dim, maxdim);

      H5File *file = new H5File(filename, H5F_ACC_TRUNC);

      CompType mtype1(sizeof(s1_t));

      int count = 0;

      for (auto a = sensors.cbegin(); a < sensors.cend(); ++a)

      {

        mtype1.insertMember(*a, sizeof(double)*count, PredType::NATIVE_DOUBLE);

        count++;

      }

      DSetCreatPropList creatplist;

      creatplist.setChunk(CHUNK_RANK, CHUNK_DIM);

      creatplist.setDeflate(COMPRESS_RATE);

      DataSet *dataset;

      dataset = new DataSet(file->createDataSet(datasetName, mtype1, space, creatplist));

      dataset->write(s1, mtype1);

      Attribute myatt1 = dataset->createAttribute(STATIONID, strdatatype, attrspace);

      Attribute myatt2 = dataset->createAttribute(STARTUP, strdatatype, attrspace);

      Attribute myatt3 = dataset->createAttribute(SAMPLERATE, strdatatype, attrspace);

      myatt1.write(strdatatype, stationIDBuff);

      myatt2.write(strdatatype, startupBuff);

      myatt3.write(strdatatype, samplerateBuff);

      delete dataset;

      delete file;

    }

But, I dont know the amount of members to be added to the dataset until run-time. In this case a struct will not due since it cant be added to. Whats left is C++ containers such as vector or map. My question is how do I implement the below code using a vector or map?

I'm not sure if I understand your answer entirely. I did find a way to
dynamically write a structure using an array of doubles whos dimensions are
set by a variable or to just expand a dataset to the size needed.

I think the real issue I'm having is that each rank needs to be labeled. I
can use the table api but I'm worried about performance. So two new
questions have arrived: can a chunk of data be retrieved that is attached
to a comptype member? And is there any performance degradation using the
high level table api?

···

On Fri, Sep 18, 2015, 4:32 PM Miller, Mark C. <miller86@llnl.gov> wrote:

Hmm. It kinda feels like you are asking to different but related questions.

First, how do you deal with a compound-typed dataset in which that type
may change with time (e.g. members are added -- or maybe even deleted).

Second, how might one do something similar with an STL container (e.g.
like a map).

First, in your example, the s1_t struct member's are *both* the same type.
Is that true in general? I mean if you add more members, are those new
members *always* going to be the same type as the members already there
(double in your example) or can you easily make them be (e.g. ints fit into
doubles just fine even though you wind up wasting 4 bytes for each int). If
so, here's how that might look. . .

You create HDF5 dataset of type double with 2 dimensions both UNLIMITED.
Conceptually, think of this dataset as "s1_t[MxT] where 'M=members' and
'T=time'. Now, suppose you have s1_t data that varies in time like so. . .

5 values of s1_t data of type struct {double a; double b;}
7 values of s1_t data of type struct {double a; double b; double i; double
x;}
1 values of s1_t data of type struct {double b; double i; double z;}

here, 'a', 'b', 'i', 'x' and 'z' are symantically totally different
(double) objects you want to store. And, 'i' is really supposed to be an
int.

Write the first 5 values as a 2x5 space of doubles to the dataset. After
completion, you dataset will look like

a a a a a. . .
b b b b b. . .
.
.
.

where '…' are meant to indicate it can be extended in either of these
directions…

write the next 7 values as a 4x7 space of doubles to the dataset resulting
in. . .

a a a a a a a a a a a a. . .
b b b b b b b b b b b b. . .
. i i i i i i i. . .
. x x x x x x x. . .
.

write the next 1 values as a 3x1 space of doubles resulting in. . .

a a a a a a a a a a a a b
b b b b b b b b b b b b i
. i i i i i i i z
. x x x x x x x
.

Ok, that takes care of actually writing it. But, what about decoding
(reading) it later? In particular, what about preserving the member names
so that you could faithfully present those back to a user at some point?

Well, you need to use another dataset to capture that information.
Something simple you could use is an UNLIMITED 1-D character dataset that
looks like. . .

"<{double a; double b;},0,5><{double a; double b; int i; double
x},5,7><{double b; int i; double z;},12,1>"

You can use '<>' chars to parse this array of chars and conclude that the
s1_t dataset has 5 entries of type {double a; double b;} starting at index
0, 7 entries of type {double a; double b; int i; double x} starting at
index 5 and 1 entry of type {double b; int i; double z;} starting at index
12. Note also that in this character dataset, we also capture that 'i' is
really intended to be treated as an int.

Now, if your member types are going to vary as well, then I think you'd
wind up having to create multiple datasets for each of the possible
primitive types the members you can have and then marshal values from these
structs around to the appropriate datasets. Note that long swaths of N/A
entries in an HDF5 dataset should *not* result in wasted space becase HDF5
is smart enough NOT to store any blocks that don't actually have any values
specified.

Hope that helps.

Mark

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Jamie
Ryan Lahowetz <jlahowetz2@unl.edu>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Date: Friday, September 18, 2015 11:29 AM
To: "hdf-forum@lists.hdfgroup.org" <hdf-forum@lists.hdfgroup.org>
Subject: [Hdf-forum] Creating an HDF5 with dynamic Members

I am trying to create an hdf5 file that is extendable, has attributes, and
is compressed. Along with this my hdf5 has a Compound datatype that
includes members. All of this is simple to create when you have static
members:

    #include <vector>

    #include <string>

    #include <iostream>

    #include <H5Cpp.h>

    using namespace H5;

    using namespace std;

    const double NULL_VALUE = -999.99;

    const int ATTR_DIM = 1;

    const int ATTR_RANK = 1;

    const int CHUNK_RANK = 1;

    const hsize_t CHUNK_DIM[2] = {1, 365};

    const int STRTYPE_LENGTH = 20;

    const int COMPRESS_RATE = 6;

    const H5std_string STATIONID("stationID");

    const H5std_string STARTUP("startup");

    const H5std_string SAMPLERATE("samplerate");

    int main(void)

    {

      string filename = "NEALL01_60.h5";

      vector<string> sensors;

      string datasetName = "60";

      int dimSize = 1000;

      string stationID = "NEALL01";

      string startup = "1988-05-10 05:00:00";

      int samplerate = 60;

      hsize_t attrdim[] = {ATTR_DIM};

      DataSpace attrspace(ATTR_RANK, attrdim);

      StrType strdatatype(H5::PredType::C_S1, STRTYPE_LENGTH);

      const H5std_string stationIDBuff (stationID);

      const H5std_string startupBuff (startup);

      const H5std_string samplerateBuff (std::to_string(samplerate));

      hsize_t maxdim[] = {H5S_UNLIMITED, H5S_UNLIMITED};

      sensors.push_back("ml2_10cm");

      sensors.push_back("ml2_20cm");

      int rank = 1;

      typedef struct s1_t {

        double a;

        double b;

      } s1_t;

      int i;

      s1_t s1[dimSize];

      for (i = 0; i< dimSize; i++)

      {

        s1[i].a = NULL_VALUE;

        s1[i].b = NULL_VALUE;

      }

      hsize_t dim[] = {dimSize};

      DataSpace space(rank, dim, maxdim);

      H5File *file = new H5File(filename, H5F_ACC_TRUNC);

      CompType mtype1(sizeof(s1_t));

      int count = 0;

      for (auto a = sensors.cbegin(); a < sensors.cend(); ++a)

      {

        mtype1.insertMember(*a, sizeof(double)*count,
PredType::NATIVE_DOUBLE);

        count++;

      }

      DSetCreatPropList creatplist;

      creatplist.setChunk(CHUNK_RANK, CHUNK_DIM);

      creatplist.setDeflate(COMPRESS_RATE);

      DataSet *dataset;

      dataset = new DataSet(file->createDataSet(datasetName, mtype1,
space, creatplist));

      dataset->write(s1, mtype1);

      Attribute myatt1 = dataset->createAttribute(STATIONID, strdatatype,
attrspace);

      Attribute myatt2 = dataset->createAttribute(STARTUP, strdatatype,
attrspace);

      Attribute myatt3 = dataset->createAttribute(SAMPLERATE, strdatatype,
attrspace);

      myatt1.write(strdatatype, stationIDBuff);

      myatt2.write(strdatatype, startupBuff);

      myatt3.write(strdatatype, samplerateBuff);

      delete dataset;

      delete file;

    }

But, I dont know the amount of members to be added to the dataset until
run-time. In this case a struct will not due since it cant be added to.
Whats left is C++ containers such as vector or map. My question is how do I
implement the below code using a vector or map?

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5