I have a buffer (char array) in memory thats essentially the rows of a csv file.
The example csv file is say
A | B | C |
---|---|---|
1 | 1.0 | matt |
2 | 2.0 | jack |
For simplicity assume the column C is char arrays of length 5 bytes (4 ascii + null). The datatype of column A is uint8 (1 byte) and column B is float (4 bytes)
After reading the csv I have in memory (‘\n’) included.
1, 2.0, ‘matt’, ‘\n’, 2, 2.0, ‘jack’
What I would like is to go from that char array holding the raw bytes to an hdf5 file with 3 datasets within it. The datasets are called "A"
, "B"
, "C"
respectively. I would like to do this with zero copy going straight from buffer
to the respective dataset. My gut feeling is that I should be do this through mem_space
argument of
H5::DataSet::write
by using a H5::DataSpace::selectHyperslab
. I just don’t understand how to do that. I just don’t understand how I am supposed to use H5::DataSpace::selectHyperslab
here. I also don’t know if the hyperslab selection is going to be able to slice around the '\n'
.
#include <H5Cpp.h>
// hex | 1 | 1.0 | matt | \n | 2 | 2.0 | jack |
const unsigned char buffer[] = {0x01, 0x00, 0x00, 0x80, 0x3F, 0x6D, 0x61, 0x74, 0x74, 0x00, 0x0A , 0x00, 0x02, 0x00, 0x00, 0x00, 0x40, 0x6A, 0x61, 0x63, 0x6B, 0x00};
H5::H5File* file = new H5::H5File("example.h5", H5F_ACC_TRUNC);
hsize_t dims[] = {1};
H5::DataSpace* dataspace = new H5::DataSpace(1, dims);
H5::DataSet* A = new H5::DataSet(file->createDataSet("/A", H5::PredType::NATIVE_UINT8, *dataspace));
H5::DataSet* B = new H5::DataSet(file->createDataSet("/B", H5::PredType::NATIVE_FLOAT, *dataspace));
H5::DataSet* C = new H5::DataSet(file->createDataSet("/C", H5::PredType::NATIVE_CHAR, *dataspace));
// ??? H5::DataSpace memspace();
// ??? memspace.selectHyperslab();
A->write(buffer, H5::PredType::NATIVE_UINT8, memspace);
// ??? memspace.selectHyperslab();
B->write(buffer, H5::PredType::NATIVE_FLOAT, memspace);
// ??? memspace.selectHyperslab();
C->write(buffer, H5::PredType::NATIVE_CHAR, memspace);
My naive understanding at the moment is, I construct one H5::DataSpace
instance and can merely keep calling H5::DataSpace::selectHyperslab
with varying stride, step, count. I think, I really don’t grasp what the stride, step, count stuff is. I come from python where its a lot easier but I need to do some c++ stuff. I expect to be able to do this without having to copy the contents of buffer
instead 3 separate char arrays of length 2.
What I expect is that example.h5
contains three datasets where
A = [1, 2]
B = [1.0, 2.0]
C = ["matt", "jack"]