How to make a dataset of multiple 2D char arrays in C++?

I want to write multiple 2D char arrays into one HDF5 dataset. In the code below, there are two arrays in a struct and I put(std::copy) "hello"s to one array(name) and "hi"s to the other(nicknams). I thought that the result would look:

(b'hello', b'hey')
(b'hello', b'hey')
(b'hello', b'hey')
(b'hello', b'hey')

But when I actually run the code, the result looks like:

(b'hello', b'hey')
(b'(\xae\nVD\xae\nV\xff\xff', b'\xdc\x82\nV\x12\xd2\xfcv\x8f\x01')
(b'\x04', b'\x01') 
(b'$\xf8O', b'9\x08\xd2\xd5\xb4\xf8O')]

Since the very first row looks fine, I guess that there’s something wrong with null characters when writing arrays to the dataset. I have looked into example codes on the official web page(https://support.hdfgroup.org/HDF5/examples/), but I still don’t understand where this problem come from. I’d like to know what the problem is and how to fix it.

#include "pch.h"
#include <iostream>
#include "H5Cpp.h"

using namespace H5;

const H5std_string MEMBER1("name");
const H5std_string MEMBER2("nickname");

struct newStruct1 {
    char        name[4][10]{};
    char        nickname[4][10]{};
};


void struct_to_dataset(newStruct1 *ptr_struct)
{        
    H5File file("file.h5", H5F_ACC_TRUNC);

    hid_t dtype_str = H5Tcopy(H5T_C_S1);
    size_t size = 10 * sizeof(char);                    
    H5Tset_size(dtype_str, size);

    hsize_t dim[] = { 4 };   
    DataSpace space(1, dim);

    CompType mtype1(sizeof(newStruct1));
    mtype1.insertMember(MEMBER1, HOFFSET(newStruct1, name), dtype_str);
    mtype1.insertMember(MEMBER2, HOFFSET(newStruct1, nickname), dtype_str);

    DataSet dataset = file.createDataSet("dset", mtype1, space);

    dataset.write(ptr_struct, mtype1);  
}

int main()
{
    newStruct1 struct01;
    newStruct1 *ptr01 = &struct01;

    char word1[10] = { "hello" };
    char word2[10] = { "hey" };

    for (int i = 0; i < 4; i++) {
        std::copy(word1, word1 + 10, ptr01->name[i]);
        std::copy(word2, word2 + 10, ptr01->nickname[i]);
    }

    struct_to_dataset(ptr01);
}

If you consider using modern C++ here is the implemention with H5CPP a header only library for HDF5:

#include <armadillo>
#include <h5cpp/all>
int main(){
   arma::Mat<char> M(2,3); // create a 2D datastructure often called rank 2, or matrix 
   h5::fd_t fd = h5::create("arma.h5",H5F_ACC_TRUNC);  // and a file
   h5::write(fd,  "my dataset",  M ); // write it with single shot
}

compile and link against libhdf5.so.
Most popular linear algebra systems are supported,as well as wrting/reading from/to raw memory locations. In addition all descriptors are compatible with the CAPI calls, which means you have the freedom to fall back for features not yet implemented.

The MIT licensed library is lean, kind (when it comes to syntax), tuned, and constantly improved, for details see H5CPP webpage and if interested in a sneak peek check out this sandbox.

best wishes: steve

Hi dodeuri,

I can attest to the quality and usuability of Steve’s H5CPP library (https://github.com/steven-varga/h5cpp). I went from knowing nothing about HDF5 to creating files in very little time. It is useful for creating basic datasets (like yours) , to very complex, fast solutions (e.g. large packet tables) with little knowledge beyond modern C++ skills and a basic understanding of HDF5.

I deduced you are using Visual C++ from the presence of pch.h in your source.

So, I created a very simple Visual Studio 2017 (or later) solution as an example of writing a 2D char array to a 2D HDF5 dataset using the vs2017-windows branch of the library. Only three lines needed for the HDF5 part. You can see the solution here: https://github.com/ChrisDrozdowski/2d_char_array_ex

Please read the brief README.md

It has all files you need except that you need to install HDF5 1.10.5 into: C:\Program Files\HDF_Group\HDF5\1.10.5

Give it a try.

~Chris

Thank you for the example code and suggestion. It’s cool that you can use a linear algebra library with HDF5.
I personally use HDF5 and C++ only when receiving and writing real time data and I analyze it with Python, so I wish there’s a way to do the job in a simple and basic way without other libraries. (so that I don’'t get lost looking at the code in the future :sweat_smile:)
I would appreciate it if I could get some example code of writing multiple 2d char arrays to the same dataset.
Thank you again for your answer! Have a nice weekend!

Thank you for sharing. Simple and basic may be contradictory. In this case the very basic is HDF5 CAPI, a solid constantly developed file system with API far from simple. This complexity has been reduced to pythonic simplicity with c++ template meta programming techniques, resulting an extremely fast and easy to use CRUD like operators, where you don’t need to remember to order of parameters, and omit as many as context allows.

A header only library is the C++ way to say: without using ‘other libraries’. When Armadillo is used only as container it is header only, similarly to H5CPP, and in most cases you’d ship them with your project – preserving credits.

Of course you don’t need to use any linear algebra systems. By giving up readability you can call h5 operators on typed memory pointers. Notice the h5::count{...} is mandatory in such case, and not providing it will result an intelligent compile time error. FYI: the upcoming H5CPP version will have full STL like object support, and it will do the right thing with your own C++ classes as long as you follow the STL scheme.

example with pointers:

#include <h5cpp/all>
#include <cstddef>

int main(){
    h5::fd_t fd = h5::create("test.h5", H5F_ACC_TRUNC);	

	{
		char* ptr = static_cast<char*>( calloc(10,sizeof(double)) );
		h5::write<char>(fd, "dataset", ptr, h5::count{1,10});
		free( ptr );
	}
	{
		char* ptr = static_cast<char*>( calloc(10,sizeof(char)) );
		h5::read<char>(fd, "dataset", ptr, h5::count{1,8}, h5::offset{0,2} );
		free( ptr );
	}
}

If you work with windows OS and using Visual C++ compiler chain, follow Chis’s instructions.
as for the examples: Chis has them for Visual C++ here or just browse the original posix ones here.