Dataset with unknown datatype


#1

I have been provided with an HDF5 file that exhibits strange behaviour with my software regarding file locking when certain datasets are read (Github issue here, file available from here)

It’s mostly likely a bug in my software, but when I use h5dump on the offending datasets it returns a datatype that I don’t recognise e.g.

$ h5dump -H -N /row_attrs/Accession minimal_example.h5
HDF5 "/home/msmith/Downloads/minimal_example.h5" {
DATASET "/row_attrs/Accession" {
   DATATYPE  "/#1832"
   DATASPACE  SIMPLE { ( 23469 ) / ( 23469 ) }
}
}

I think the dataset is supposed to be H5T_STRING with variable length, and it’s possible to read it as such, but I don’t see this "/#1832" datatype when I try to create my own dataset with variable length strings.

Can anyone else me identify the datatype or know if it might lead to odd behaviour when reading it?


#2

I believe you are dealing with a so-called named or committed datatype. Check the file for a datatype object linked as “#1832” in the root group. h5dump or HDFView should give you a rendering of the datatype definition included in this object. When accessing this dataset, the HDF5 library will automatically locate the datatype definition and “do the right thing.” G.


#3

Thanks a lot, that’s very helpful. h5dump on the whole file shows me this definition.

DATATYPE "#1832" H5T_STRING {
  STRSIZE H5T_VARIABLE;
  STRPAD H5T_STR_NULLTERM;
  CSET H5T_CSET_UTF8;
  CTYPE H5T_C_S1;
};

That’s for introducing me to a concept in HDF5 I hadn’t seen before.