Writing Variable Length Data To Compound Dataset


I am writing a program that converts google protocol buffer messages into the HDF5 file format via the CPP interface, and I seem to be hitting an issue when attempting to copy strings + other variable length fields.

I have a compound type compType that I add a member to as follows

// Note: I am calculating the offset by hand since I do not have the necessary structures at compile time
H5::DataType v = H5::StrType(H5::PredType::C_S1, H5T_VARIABLE);
compType.insertMember(fieldName, currentOffset, v);
currentOffset += sizeof(char *);

I write the structured data to a buffer containg the rest of my compound type as follows

char * value = calloc(1, src.length()+1);
memcpy(value, src, src.length());
buffer.replace(offset, sizeof(char *), &value, sizeof(char *));

Then I write the buffers containing my compound data to a dataset

dataSet.write(buffer, compType);

This process works for all of the fixed length data types I want to support, but I seem to get either bad data or segfaults when writing no matter what approach I use for strings / var length types. Wondering what the correct approach to do something like this would be / what type to use + what goes in the binary buffer?

NOTE: please don’t suggest HDFql, it’s not available to me

EDIT: the approach above segfaults inside some vlen conversion function that I don’t have debug symbols for. Another approach I have tried is as follows:

Create the compound type

// Note: I am calculating the offset by hand since I do not have the necessary structures at compile time
H5::VarLenType v = H5::VarLenType(H5::PredType::C_S1);
compType.insertMember(fieldName, currentOffset, v);
currentOffset += sizeof(hvl_t);

Copy the data from std::string

hvl_t vlInfo;
vlInfo.len = value.length() + 1;
vlInfo.p = calloc(1, value.length() + 1);
memcpy(vlInfo.p, value.c_str(), value.length());

buffer.replace(offset, sizeof(hvl_t), (char *)&vlInfo, sizeof(hvl_t));

Write to file

dataSet.write(buffer, compType);

This approach does not segfault but all of my strings show up as ERROR in HDFView



Hi Nicholas,

I don’t think you want sizeof(char *), that’s only the size of the pointer.
Also, the last argument of insertMember is the datatype of the new member.


It was my understanding for variable length types you either use a pointer or an item of hvl_t. Regardless I don’t really find this answer to be all that helpful since I know what I was doing was wrong.

I was really hoping the C++ lib has the ability to put variable length types inside a compound type, and that someone could give advice on how to do that not just tell me the code I said isn’t working doesn’t work.

I understand that not having a minimal example to work off of can make this process more difficult so I’ve cobbled together a basic example of what I’m trying to do:

#include <H5Cpp.h>
#include <vector>
#include <string>

bool recreateIssue = true;

constexpr uint32_t dataspaceRank = 1;
constexpr uint32_t datapointCount = 10;
constexpr hsize_t dataspaceDims[] = { 10 };

std::vector<std::string> stringData = 

std::string empty = "";

std::vector<uint32_t> uintData = 

int main()
    // Open file
    H5::H5File file("test.hdf", H5F_ACC_TRUNC);

    // Create compound type
    // PLEASE NOTE: I am not using HOFFSET since in my real program I don't know the structure of the data until runtime
    H5::CompType compType( (size_t)100 );

    compType.insertMember("uint1", 0, H5::PredType::NATIVE_UINT32);
    compType.insertMember("str", sizeof(uint32_t), H5::StrType(0, H5T_VARIABLE));
    compType.insertMember("uint2", sizeof(uint32_t) + sizeof(char *), H5::PredType::NATIVE_UINT32);
    compType.setSize(sizeof(uint32_t) + sizeof(char *) + sizeof(uint32_t));

    // Create dataset/space
    H5::DataSpace dataSpace(dataspaceRank, dataspaceDims);

    H5::DataSet dataSet = file.createDataSet("test", compType, dataSpace);

    // Organize data into buffer

    std::string databuffer(compType.getSize() * datapointCount, ' ');

    uint32_t offset = 0;
    for(uint32_t i = 0; i < datapointCount; i++)
        databuffer.replace(offset, sizeof(uint32_t), (char *)&uintData[i], sizeof(uint32_t));
        offset += sizeof(uint32_t);

            databuffer.replace(offset, sizeof(char *), stringData[i].c_str(), sizeof(char *));
            databuffer.replace(offset, sizeof(char *), empty.c_str(), sizeof(char *));
        offset += sizeof(char *);

        databuffer.replace(offset, sizeof(uint32_t), (char *)&uintData[i], sizeof(uint32_t));
        offset += sizeof(uint32_t);

    // Write buffer to file

    dataSet.write(databuffer, compType);


    return 0;

This was the only approach I could find that people said worked online, but it segfaults so long as the strings actually have data in them. I’ve also attached a copy of the backtrace for additional information.

#0  0x00007ffff699ce57 in __strlen_avx2 () from /lib64/libc.so.6
#1  0x00007ffff7a85812 in H5T__conv_vlen () from /lib64/libhdf5.so.103
#2  0x00007ffff7a7a29b in H5T_convert () from /lib64/libhdf5.so.103
#3  0x00007ffff7a840df in H5T__conv_struct_opt () from /lib64/libhdf5.so.103
#4  0x00007ffff7a7a29b in H5T_convert () from /lib64/libhdf5.so.103
#5  0x00007ffff790e931 in H5D__scatgath_write () from /lib64/libhdf5.so.103
#6  0x00007ffff78f6426 in H5D__contig_write () from /lib64/libhdf5.so.103
#7  0x00007ffff790a260 in H5D__write () from /lib64/libhdf5.so.103
#8  0x00007ffff790a9aa in H5Dwrite () from /lib64/libhdf5.so.103
#9  0x00007ffff7621673 in H5::DataSet::write(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, H5::DataType const&, H5::DataSpace const&, H5::DataSpace const&, H5::DSetMemXferPropList const&) const () from /lib64/libhdf5_cpp.so.103
#10 0x0000000000402bc4 in main () at /home/nicholas.desmarais/Documents/hdf_example/src/main.cpp:75

Thanks for the example! I’ll try to take a look.

Not a C++ expert, but from my reading of std::string::replace, it looks like the line

databuffer.replace(offset, sizeof(char *), stringData[i].c_str(), sizeof(char *));

Will copy the first sizeof(char *) bytes from the string data to the compound data buffer, when it needs to copy the address of the string data. Maybe something like

char *tmp_c_str = stringData[i].c_str();
databuffer.replace(offset, sizeof(char *), (char *)&tmp_c_str, sizeof(char *));

would work?