Reading a compound dataset (rank=1) and access to a field "as an array"

The HDF5 file I’m working with has a dataspace of rank=1 and DIM0=50.000. The datatype is a compound datatype. Let’s say each element has the following fields:

typedef struct element {

  long long int SCALAR_FIELD;
  double ARRAY_FIELD[ARRAY_SIZE]

} element;

I managed to read the dataset and I stored the data into an array of struct:
// …
element * data = new element[NUMBER_OF_ELEMENTS];
// …
dataset->read(data, e_mtype);

So far so good.

Using the Python HDF5 library I can read the same compound dataset and the output type is a numpy structured array. The numpy structed array allows to read the fields by column:
scalar_field_data = data[“SCALAR_FIELD”] // outputs an array
or
array_filed_data = data[“ARRAY_FIELD”] // output a matrix

How can I achieve the same behaviour using c++? Of course, I can loop through the array of structures and read the field I am interested into for each struct element and put it into a new array, something like:

int * scalar_field_data = new int[NUMBER_OF_ELEMENTS]
for(int i = 0; i < NUMBER_OF_ELEMENTS; i++) {
scalar_field_data[i] = data[i]->SCALAR_FIELD;
}

Is there a way to avoid this loop and read the “SCALAR_FIELD” (for each dataset element) directly into an array?

In addition, I need to put these data inside an xtensor array . I am aware of this library but it seems, it doesn’t support compound datatypes.

Thank your in advance.

Yes, just define a compound in-memory datatype that has just one member, e.g., “SCALAR_FIELD”, or the subset of members that you care about, and call H5Dread. (The same goes for writing.) It’s important that the field names match exactly. If there’s no match, the library won’t do anything. See also the compound.[h,c,cpp] examples in H5CPP.

G.

Thank you very much @gheber . As you suggested, this example shows the right way to do the job.

The next step is to read the double ARRAY_FIELD[ARRAY_SIZE] field and put it directly into a matrix.

I managed the case in which the matrix is statically defined, with this code:

#define LENGTH        10
#define ARRSIZE        10

typedef struct s1_t {
   int    a;
   float  b;
   double c; 
   float arr[ARRSIZE];  // I want to read this directly into a matrix
} s1_t;

void read() {

  hid_t      s3_tid;
  float s3[LENGTH][ARRSIZE]; // the matrix (statically defined)
  
  hid_t      file, dataset, space;
  herr_t     status;
  hsize_t    dim[] = {LENGTH};
 
  file = H5Fopen(H5FILE_NAME, H5F_ACC_RDONLY, H5P_DEFAULT);

  dataset = H5Dopen2(file, "dset", H5P_DEFAULT);

  s3_tid = H5Tcreate(H5T_COMPOUND, sizeof(float)*ARRSIZE);

  hsize_t adims[1] = { ARRSIZE };
  hid_t loctype = H5Tarray_create(H5T_NATIVE_FLOAT, 1, adims);

  status = H5Tinsert(s3_tid, "arr_field", 0, loctype);

  status = H5Dread(dataset, s3_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s3);

  for( i = 0; i < LENGTH; i++){
      for(int j=0; j < ARRSIZE; j++) {
          printf("%.4f ", s3[i][j]);
      }
      printf("\n");
  }
  H5Tclose(s3_tid);
  H5Dclose(dataset);
  H5Fclose(file);  
}

So far so good. The output is fine.

I am trying to do perform the same reading operation using a dinamically allocated matrix:

hid_t      s3_tid; 

float** s3 = new float*[LENGTH];
for (int i = 0; i < LENGTH; ++i)
    s3[i] = new float[ARRSIZE];

As before, (1) I need to define the datatype for s3.

s3_tid = H5Tcreate(H5T_COMPOUND, sizeof(float)*ARRSIZE);

(2) I need to add the member to the compound datatype s3:

hsize_t adims[1] = { ARRSIZE };
hid_t loctype = H5Tarray_create(H5T_NATIVE_FLOAT, 1, adims);
status = H5Tinsert(s3_tid, "arr_name", 0, loctype);

(3) Finally I can read the data:

status = H5Dread(dataset, s3_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s3);

In this case, I get a segmentation error during the H5Dread().

Maybe the problem is that the memory is allocated contiguously only for a single dimension of the matrix (the LENGHT is this case) and I did not specify that to the hdf5 library…

I would ask you which steps among (1), (2) and (3) are not correct and how to fix them.

Thank you in advance.

The problem is that your s3 is a ragged array and (generally) not a contiguous region in memory.

float* s3 = new float[LENGTH*ARRSIZE];

should fix that.

G.

Yes! Thank you very much @gheber

Thank you very much for the fix buddy, really appreciated

epicsports.fun

hi gheber, I am also facing a similar issue: trying to read vector field from compound dataset
do I need to malloc memory for the vector field under compound using c++?