Problems with H5::DataSet::read(H5std_string...) for fixed length H5T_STR_NULLPAD strings

There seem to be a problem reading fixed length strings that are not null-terminated (H5T_STR_NULLPAD) with the H5::DataSet::read(H5std_string...) method. What happens is the following:
1: H5::DataSet::read(H5std_string...) calls DataSet::p_read_fixed_len.
2: A new "strg_C" string buffer of lenght attr_size+1 is allocated in DataSet::p_read_fixed_len. This buffer is neither cleared nor null-terminated in any way.
3: The "strg_C" buffer is filled with data from a H5Dread call. H5Dread will only copy in actual string data, and _not_ zero-termination for H5T_STR_NULLPAD strings. This leaves "strg_C" without any zero-termination.
4: "strg_C" is assigned to the std::string output. This operation is nondeterministic, and will generate output of arbitrary length, depending on memory content.

I believe that the above problem can be resolved by clearing the "strg_C" immediately after allocation. This will ensure that the string is always zero-terminated, independent of actual length. A patch that attempt to fix this issue is attached.

Best regards,
Fredrik Orderud

H5DataSet.cpp.patch (495 Bytes)

Hello Fredrik,

Thank you for reporting the problem. It will be addressed in the next release.

Binh-Minh

ยทยทยท

________________________________________
From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Fredrik Orderud <fredrik@orderud.no>
Sent: Tuesday, February 18, 2014 5:50 PM
To: HDF Users Discussion List
Subject: [Hdf-forum] Problems with H5::DataSet::read(H5std_string...) for fixed length H5T_STR_NULLPAD strings

There seem to be a problem reading fixed length strings that are not
null-terminated (H5T_STR_NULLPAD) with the
H5::DataSet::read(H5std_string...) method. What happens is the following:
1: H5::DataSet::read(H5std_string...) calls DataSet::p_read_fixed_len.
2: A new "strg_C" string buffer of lenght attr_size+1 is allocated in
DataSet::p_read_fixed_len. This buffer is neither cleared nor
null-terminated in any way.
3: The "strg_C" buffer is filled with data from a H5Dread call. H5Dread
will only copy in actual string data, and _not_ zero-termination for
H5T_STR_NULLPAD strings. This leaves "strg_C" without any zero-termination.
4: "strg_C" is assigned to the std::string output. This operation is
nondeterministic, and will generate output of arbitrary length,
depending on memory content.

I believe that the above problem can be resolved by clearing the
"strg_C" immediately after allocation. This will ensure that the string
is always zero-terminated, independent of actual length. A patch that
attempt to fix this issue is attached.

Best regards,
Fredrik Orderud