Reading from an unknown source -- Take 2

I’m nearly there. I can read all of the data types but one. Looking at the various HDF5 files with HDFView, I see a type in a compound table of “String, length = 32, padding = H5T_STR_NULLTERM, cset = H5T_CSET_ASCII”. Using the function H5Tget_member_type() I get the type value. I have an internal db of all of the H5T_NATIVE_ types. I cannot match this returned type to any of the NATIVE types. I have attempted to H5Tcopy(H5T_C_S1) (which I know was used to make these) but cannot match it. The two H5T types in the quoted text does not work in H5Tcopy() – though it does compile.

What is the correct way/identifier to read these NULL terminated strings?

EDIT: NOTE: I was using H5Tcopy() to feed into H5Tequal() to test if this was the type I’m getting back from H5Tget_member_type().

Thanks in advance,
Andy

1 Like

Andy (?), I’m not sure what you are trying to achieve, but here’s a guess.
If your question is “What is the native type of a string type?”, the answer is “The type itself.”
The documentation for H5Tget_native_type(dtype_id, ...) states: “…String, time, opaque, and reference datatypes are returned as a copy of dtype_id.” That is. for string datatypes (H5Tget_class == H5T_STRING), there’s nothing to see here as far as the datatype is concerned. Look at what its encoding and size are (i.e., is it fixed-length or variable-length), and then what the padding is. That should be sufficient to “disentangle” any string datatype.

Here’s a little more food for thought.

#include "hdf5.h"
#include <stdio.h>

int main(int argc, char** argv)
{
  {
    hid_t strtype = H5Tcreate(H5T_STRING,32);
    hid_t n_strtype = H5Tget_native_type(strtype, H5T_DIR_ASCEND);
    hid_t n_h5t_c_s1 = H5Tget_native_type(H5T_C_S1, H5T_DIR_ASCEND);

    printf(" 0: %ld\n", H5Tget_size(strtype));
    printf(" 1: %d\n", H5T_CSET_ASCII == H5Tget_cset(strtype));
    printf(" 2: %d\n", H5T_STR_NULLTERM == H5Tget_strpad(strtype));
    printf(" 3: %ld\n", H5Tget_size(H5T_C_S1));
    printf(" 4: %d\n", H5T_CSET_ASCII == H5Tget_cset(H5T_C_S1));
    printf(" 5: %d\n", H5T_STR_NULLTERM == H5Tget_strpad(H5T_C_S1));
    printf(" 6: %d\n", H5Tequal(n_strtype, n_h5t_c_s1));
    printf(" 7: %d\n", H5Tequal(n_strtype, H5T_NATIVE_CHAR));
    printf(" 8: %d\n", H5Tequal(n_strtype, H5T_NATIVE_UCHAR));
    printf(" 9: %d\n", H5Tequal(n_strtype, strtype));
    printf("10: %d\n", H5Tequal(n_h5t_c_s1, H5T_C_S1));

    H5Tclose(n_h5t_c_s1);
    H5Tclose(n_strtype);
    H5Tclose(strtype);
  }

  return 0;
}

The output should read:

 0: 32
 1: 1
 2: 1
 3: 1
 4: 1
 5: 1
 6: 0
 7: 0
 8: 0
 9: 1
10: 1

G.

1 Like

I guess I should have tagged the previous thread: https://forum.hdfgroup.org/t/reading-data-from-an-unknown-source/6342. I’ll now copy the example from there and append to it:

pid = [open group]
did =  H5Dopen(pid, name);
type = H5Tget_class((tdid = H5Dget_type(did)));
switch (type){
        case H5T_COMPOUND:
                if ((H5Tget_table_info(pid, name, &fields, &records))<0){
                        // handle error
                }
                
                if ((H5Tget_field_info(pid, name, fnames, fsizes, foofsets, &size))<0){
                        // handle error
                }
                for (i=0; i<fields; ++i){
                        native_hid = H5Tget_member_type(tdid, i);
                        // From here, you have to use H5Tequal(native_hid, H5T_NATIVE_)
                // I do a local static structure array with all of the H5T_NATIVE_ types an run it through a loop:
                for (i=0; i0)
                                // Found it.  This is not the problem.
                        else
                                // This is the string type that I cannot figure out.

As I stated in the above post, I have attempted H5tequal() with HF5_STR_NULLTERM (which does not compile), same for H5T_CSET_ASCII (which does not compile), and have attempted the H5T_C_S1 which does not equal.

@gheber: What you have is the writing side of things. What I have is a mixed bag of data files that I cannot trace back to any source that created them. I need to read the anonymous data sets to figure out where the data belongs… I can map it off of type and (in the case of the strings) the length, but I gotta have the length. I will, however, attempt the H5Tget_size() to see if that gives me what I want. If that works, I’ll try to limp along with that. :slight_smile:

Thanks for the reply!
Andy

So, using the H5Tget_size() returns the size of all parameters in the data set. In one case in this file, there are three string values each of size 32. The return for get_size() with tdid above gives me 96. Not what I need.

– Edit: 96 is the size of the CHUNKED value within HDF5 viewer.

Found a workshop slide deck that was exactly what I’m attempting to do. Unfortunately, it looks like it stops prematurely.

The last slide suggests that one should read the class of each member of the compound data type. When I do this I get back a response of 3. I have tried a few data types, but I cannot find one that matches 3. Anyone?

EDIT: Update: Crap, I didn’t feed H5T_C_S1 into the H5Tget_class(), which returns 3 as well. So, I guess that means I found my identifier.

Thanks,
Andy

I have now come full circle. I am back to needing the H5Tget_size() per @gheber’s suggestion.

Questions:

  1. Is there a way to get a handle to a member of a compound data set?
  2. Is there a way to get the size of a member of a compound data set?
  3. Is the “right way” to get the size of this member to get the offset value of this member and the offset value of the next member (or, in case of the last member the compound data set’s size) and take the difference in these values?
  4. What is the “Right Way” or the “HDF5 Way” of getting the size of a fixed width string within a compound data set?

Thanks,
Andy

Oh. I get it. I got confused on the H5Tget_member_type() function’s return. I can just feed that in to H5T_get_size(). I was feeding in the tdid in place of the type returned.

I got it now.