Writing and reading variable length std::string in compound data type

Hi,

I’m in the process of integrating HDF5 infrastructure to store/retrieve our simulation results. There’s a need to store compound data type (POD structs) in some of the datasets. In particular, we need to store/retrieve std::string in those compound data types. The writing appears to be working just fine as I’m able to view it using hdf5view. Reading results in a crash because the string length (of the data member of POD struct) looks to be incorrectly set to a very large number.

GCC version: 6.3.0p2 with C++11 enabled.
OS: Linux Red Hat 7.7

Here’s a code snippet that illustrates the struct and performs read from the dataset:

struct layerLegend_hdf5 {
int layerId = 0;
std::string layerName;
}

// Register compound type
hid_t m_strType = H5Tcopy(H5T_C_S1);
H5Tset_size(m_strType, H5T_VARIABLE);
CompType compType = H5Tcreate(H5T_COMPOUND, sizeof(layerLegend_hdf5));
compType.insertMember(“layerId”, HOFFSET(layerLegend_hdf5, layerId), PredType::NATIVE_INT);
compType.insertMember(“layerName”, HOFFSET(layerLegend_hdf5, layerName), m_strType);

// Template function to read from a dataset:
template
Status readDataSet(const h5FilePtr &file, const std::string &dsPath,
const DataType &dataType, std::vector &values, bool clearValues=true) {
if (file && file->exists(dsPath)) {
auto dataset(file->openDataSet(dsPath));
// Get dataspace of the dataset.
DataSpace dataspace(dataset.getSpace());
// Get the number of dimensions in the dataspace.
const int rank(dataspace.getSimpleExtentNdims());
hsize_t dims_out[rank]; // Rank is always 1 in our case
dataspace.getSimpleExtentDims(dims_out, nullptr);
T* data_out(new T[dims_out[0]]);
if(clearValues) {
values.clear(); values.reserve(dims_out[0]);
}
if(data_out) {
dataset.read(data_out, dataType);
for (hsize_t i = 0; i < dims_out[0]; i++) {
values.emplace_back(data_out[i]);
}
delete [] data_out;
}
dataset.close();
return Status();
}
return Status::IOError("readDataSet, problem opening dataset: ", dsPath);
}

Like I mentioned above, the read appears to be Ok. But, when I retrieve the ‘layerName’ field of ‘layerLegend_hdf5’ struct, the expected string is there but the length of the string is set to a very large number. This causes other client code to crash.

Is there anything else that needs to be done when reading such compound data types that have variable length std::string data members? Please let me know.

Thanks
-Kat

Hi @kat,

Not sure if the issue you are experiencing is because the member layerName (of struct layerLegend_hdf5) stores a big string or because it is declared as a std::string (I would guess that’s probably the latter). Would you mind to declare the member as a char * instead and see if the issue goes away?

To give you an idea/example, the following code snippet works as expected in C++ using HDFql (unfortunately, I am not familiar with the API you are using):

// declare structure
struct layerLegend_hdf5
{
    int layerId;
    char *layerName;
};

// declare variables
struct layerLegend_hdf5 write;
struct layerLegend_hdf5 read;
std::stringstream script;

// create an HDF5 file named 'test.h5' and use (i.e. open) it
HDFql::execute("CREATE AND USE FILE test.h5");

// set variable 'write' with dummy values
write.layerId = 15;
write.layerName = (char *) malloc(10);
strcpy(write.layerName, "my layer");

// prepare script to create a compound dataset named 'my_compound' (with two members: 'layerId' and 'layerName') and write values stored in variable 'write' into it
script << "CREATE DATASET my_compound AS COMPOUND(layerId AS INT OFFSET " << offsetof(layerLegend_hdf5, layerId) << ", layerName as VARCHAR OFFSET " << offsetof(layerLegend_hdf5, layerName) << ") SIZE " << sizeof(struct layerLegend_hdf5) << " VALUES FROM MEMORY " << HDFql::variableTransientRegister(&write);

// execute script
HDFql::execute(script);

// prepare script to read values from compound 'my_compound' and populate variable 'read' with these
script.str("");
script << "SELECT FROM my_compound INTO MEMORY " << HDFql::variableTransientRegister(&read);

// execute script
HDFql::execute(script);

// print values stored in variable 'read'
std::cout << "layerId=" << read.layerId << std::endl;
std::cout << "layerName=" << read.layerName << std::endl;

Hope this helps!

Hi @contact ,

Thanks! I’d come to the same conclusion about using char* instead of std::string as I looked through various sites.

Once I changed ‘layerName’ type to be char*, I’m able to read/write just like how you’ve shown.

It would be useful to:

  1. Support std::string natively for compound data types.
  2. Add clear documentation that std::string is not supported for compound types if there’s no plan in the near future.

Thanks
-Kat

Great to know that the issue is solved @kat!