H5TBget_field_info() Offsets seem wrong

When reading a compound data type, I first call H5TBget_filed_info(). This gives me arrays for names, sizes and offsets. Maybe I’m mistaking the meaning of the help text:

char *field_names[]
OUT: An array containing the names of the fields.
size_t *field_sizes
OUT: An array containing the size of the fields.
size_t *field_offsets
OUT: An array containing the offsets of the fields.

I was thinking that the field_offsets would be the offset for a given field in the compound data table. This doesn’t always seem to be the case. For example, I have a compound with the four fields char[16], uint_8, double, and double. Below are the sizes and offsets returned:

Size   Offset
16     0
1      16
8      24
8      32

Which I would be okay with except that the total size of the chunked data is 36. What this tells me is that the data offset for the two doubles is off by 4. The data read from these locations for the two doubles comes out wrong. The data sets that I’m reading are unknown to me (see previous posts “Reading data from an unknown source” and …“Take 2”).

I’m still attempting to understand this. This next data point confuses me even more:

struct {
        unsigned char;
        unsigned char;
        unsigned char;
        unsigned char;
        unsigned char;
        unsigned char;
        unsigned char;

The above is the struct that I see from reading the data programmatically which agrees with HDF5View. A call to H5Tget_size() returns 208 – which is the CHUCK size reported in HDF5View for this compound.

The odd part is the following read by H5TBget_filed_info():

Offset     Size
     0        1
     1        1
     2        1
     3        1
     4        1
     5        1
     6        1
     7        1
     8        8

How do these sizes equate to 208? More importantly: How do I read these compounds in the correct locations? Is there a function for reading field values that I’m missing somewhere? My approach for getting the data from a compound is:

int fields = 0;
int num = 0;
int size = 0;
vector<char *> names;
vector<size_t> sizes;
vector<size_t> offsets;
hid_t did = H5Dopen(parent, name);
hid_t tdid = H5Dget_type(did);
H5T_class_t tclass = H5Tget_class(tdid);
switch (tclass){
        case H5T_COMPOUND:
                H5TBget_table_info(parent, name, &fields, &num);
                H5TBget_field_info(parent, name, &names[0], &sizes[0], &offsets[0], &size);
                size_t chunk = H5Tget_size(tdid);
                char *buf = calloc(fields, chunk);
                H5Dread(did, tdid, H5S_ALL, H5S_ALL, H5P_DEFAULT, buf);
                // From here, I parse the dataset using the offsets.  This seems invalid.

If you can shed any light, please do so.


Hi Andy,
are you using C++? Did you want to give a go at H5CPP? Here is a link to compound datatype example.

std::vector<sn::example::Record> vec = h5::utils::get_test_data<sn::example::Record>(20);
h5::write(fd, "orm/partial/vector one_shot", vec ); // it will do the right thing!!!

H5CPP doesn’t per-se have a table API instead it comes with LLVM based compiler assisted reflection, you are free to code away with arbitrary complex POD structs, the descriptors are generated for you. You might also want to check out the h5::append operator, which provides throughput near the underlying filesystem.

best: steve

If anyone ever comes across this looking for a solution, here is what I ended up doing:

        vector<char *> names;
        vector<size_t> sizes;
        vector<size_t> h5sizes;
        vector<size_t> offsets;
        vector<int> inds;
        size_t slab_size = 0;
        H5TBget_table_info(ID, name, &fields, &npoints);
        npoints = H5Sget_simple_extent_npoints();
        H5TBget_field_info(ID, name, &names[0], &h5sizes[0], &offsets[0], &size);
        offsets[0] = 0;
        for (int i=0; i<fields; ++i){
                inds[i] = i;
                slab_size += sizes[i];
                offsets[i] = offsets[i-1];
                offsets[i] += sizes[i-1];                                         
        char *buf = calloc(npoints, slab_size);
        union {
                float *d_float;
                int *d_int;
                char *d_str;
                char *d_char;
                // Etc, etc
        } ptr;
        for (int i=0; i<npoints; ++i){
                ptr.d_char = buf + (slab_size * i) + offsets[i];
                // Then copy the data over to whatever you are doing... etc, etc, etc