How to read a UTF-8 string


#1

I’m trying to read a string attribute in a HDF5 file that has CSET set to H5T_CSET_UTF8. Using h5dump shows the attribute to be:

   ATTRIBUTE "basePath" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "/screen/%T/"
      }
   }

I use the following test code to read this:

program test

use h5lt
use hdf5

implicit none

integer h5_err
integer(hid_t) file_id
character(16) string

call h5open_f(h5_err)
call h5fopen_f('astra.h5', H5F_ACC_RDONLY_F, file_id, h5_err)
call H5Eset_auto_f(1, h5_err)    ! Reset
call h5eclear_f(h5_err)
call H5LTget_attribute_string_f(file_id, '.', 'basePath', string, h5_err)

print *, h5_err
print *, string

end program

Running the program gives:

erpsim1:~/linux_lib/test> ../debug/bin/test
           0
 ???     

In other words I get garbage for the string. I don’t see why the program does not work since the string has only ASCII compatible characters. In any case, what is the correct way to read in this attribute? Obviously there is a way since h5dump can read it.

Thanks for any help.


#2

From reading other posts it looks like the problem in probably not UTF8 but the fact that the string size is H5T_VARIABLE. I still, though, need a way to read in an attribute string that can be either of fixed or variable length. Anyone know of any example code for this (using Fortran preferred)?


#3

Hi,

I have an example that does not use the high level libraries. It reads an attribute that looks like this:

$ h5dump vatt.h5
HDF5 “vatt.h5” {
GROUP “/” {
ATTRIBUTE “MyAttribute” {
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): “a variable-length string!”
}
}
}
}

See the example: vlenatt_F03.f90 (1.5 KB)

-Barbara


#4

The HL API you are using can’t read VL strings.

I would take a look at the examples found here:

https://support.hdfgroup.org/HDF5/examples/api-fortran.html

and in particular,

h5ex_t_string_F03.f90
h5ex_t_vlen_F03.f90
h5ex_t_vlenatt_F03.f90
h5ex_t_vlstring.f90
h5ex_t_vlstring_F03.f90

Use the Fortran 2003 H5Aread_f version for reading both datatypes. You can get the datatype of the attribute beforehand, and then pass the correct arguments to H5Aread accordingly.

Scot


#5

Hello,
sorry to get back on this ancient thread, but actually i have this problem and unable to solve it.
With your program, and the same h5 file, except that H5T_CSET_ASCII is replaced by H5T_CSET_UTF8, it fails with:

$ ./vlenatt_F03
HDF5-DIAG: Error detected in HDF5 (1.12.0) thread 0:
  #000: H5A.c line 726 in H5Aread(): unable to read attribute
    major: Attribute
    minor: Read failed
  #001: H5VLcallback.c line 1232 in H5VL_attr_read(): attribute read failed
    major: Virtual Object Layer
    minor: Read failed
  #002: H5VLcallback.c line 1199 in H5VL__attr_read(): attribute read failed
    major: Virtual Object Layer
    minor: Read failed
  #003: H5VLnative_attr.c line 176 in H5VL__native_attr_read(): unable to read attribute
    major: Attribute
    minor: Read failed
  #004: H5Aint.c line 635 in H5A__read(): unable to convert between src and dst datatypes
    major: Attribute
    minor: Feature is unsupported
  #005: H5T.c line 4815 in H5T_path_find(): can't find datatype conversion path
    major: Datatype
    minor: Can't get value
  #006: H5T.c line 5028 in H5T__path_find_real(): no appropriate function for conversion path
    major: Datatype
    minor: Unable to initialize object

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

How it is possible to read utf-8 string? in the example file h5ex_t_vlenatt_F03.f90, there is no UTF-8.
What’s the correct way?

Thanks in advance,

Gérard