How to read a UTF-8 string


#1

I’m trying to read a string attribute in a HDF5 file that has CSET set to H5T_CSET_UTF8. Using h5dump shows the attribute to be:

   ATTRIBUTE "basePath" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "/screen/%T/"
      }
   }

I use the following test code to read this:

program test

use h5lt
use hdf5

implicit none

integer h5_err
integer(hid_t) file_id
character(16) string

call h5open_f(h5_err)
call h5fopen_f('astra.h5', H5F_ACC_RDONLY_F, file_id, h5_err)
call H5Eset_auto_f(1, h5_err)    ! Reset
call h5eclear_f(h5_err)
call H5LTget_attribute_string_f(file_id, '.', 'basePath', string, h5_err)

print *, h5_err
print *, string

end program

Running the program gives:

erpsim1:~/linux_lib/test> ../debug/bin/test
           0
 ???     

In other words I get garbage for the string. I don’t see why the program does not work since the string has only ASCII compatible characters. In any case, what is the correct way to read in this attribute? Obviously there is a way since h5dump can read it.

Thanks for any help.


#2

From reading other posts it looks like the problem in probably not UTF8 but the fact that the string size is H5T_VARIABLE. I still, though, need a way to read in an attribute string that can be either of fixed or variable length. Anyone know of any example code for this (using Fortran preferred)?


#3

Hi,

I have an example that does not use the high level libraries. It reads an attribute that looks like this:

$ h5dump vatt.h5
HDF5 “vatt.h5” {
GROUP “/” {
ATTRIBUTE “MyAttribute” {
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): “a variable-length string!”
}
}
}
}

See the example: vlenatt_F03.f90 (1.5 KB)

-Barbara


#4

The HL API you are using can’t read VL strings.

I would take a look at the examples found here:

https://support.hdfgroup.org/HDF5/examples/api-fortran.html

and in particular,

h5ex_t_string_F03.f90
h5ex_t_vlen_F03.f90
h5ex_t_vlenatt_F03.f90
h5ex_t_vlstring.f90
h5ex_t_vlstring_F03.f90

Use the Fortran 2003 H5Aread_f version for reading both datatypes. You can get the datatype of the attribute beforehand, and then pass the correct arguments to H5Aread accordingly.

Scot