Having difficulty getting multiple columns in HDF5 Table Data

I am new to hdf5 and was trying to store a DataFrame into the hdf5 format. I was to append a row at different locations within the file; however, every time I append it shows up at an array in a single column rather than a single value in multiple columns. I have tried both h5py and pandas and it seems like pandas is the better option for appending. Additionally, I have really been trying a lot of different methods. Truly, any help would be greatly appreciated.

Here is me sending an array multiple times into the hdf5 file.

import pandas as pd
import numpy as np
data = np.zeros((1,48), dtype = float)

columnName = ['Hello'+str(y) for (x,y), item in np.ndenumerate(data)]
df = pd.DataFrame(data = data, columns =columnName)

file = pd.HDFStore('file.hdf5', mode = 'a', complevel = 9, comlib = 'blosc')
for x in range(0,11):
    file.put('/data', df, column_data = columnName , append = True, format = 'table')

Just as a reminder: dataframe is an abstract; when it comes to storing data there are two distinct ways to go about it:

  • homogeneous, similar sized data cells are next to each other possibly aligned | packed ie: Matrix
  • non-homogeneous where adjacent data cells may not have the same length/type ie: Record/Struct

DataFrames may be implemented both ways: column by column or row by row. In the latter case all you have to do is to create a Rank 1 (vector) of Compound Datatype in HDF5. If your data model is the former, then create as many vectors as columns and deal with the aftermath.

Both of the storage method has its pros and cons, the architect’s call to make the right choice. In any event I can’t speak for the python implementation but here is a link for the C++ :

// excerpt from h5cpp/examples/packet-table
 // SCALAR: pod 
    try { // centrally used error handling
        std::vector<sn::example::Record> stream = h5::utils::get_test_data<sn::example::Record>(127);
        // implicit conversion from h5::ds_t to h5::pt_t makes it a breeze to create
        // packet_table from h5::open | h5::create calls,
        // The newly created h5::pt_t  stateful container caches the incoming data until
        // bucket filled. IO operations are at h5::chunk boundaries
        // or when resource is released. Last partial chunk handled as expected.
        //
        // compiler assisted introspection generates boilerplate, developer 
        // can focus on the idea, leaving boring details to machines 
        //h5::pt_t pt = h5::create<sn::example::Record>(fd, "stream of struct",
        h5::pt_t pt = h5::create<sn::example::Record>(fd, "stream of struct",
                 h5::max_dims{H5S_UNLIMITED,7}, h5::chunk{4,7} | h5::gzip{9} );
        for( auto record : stream )
            h5::append(pt, record);
    } catch ( const h5::error::any& e ){
        std::cerr << "ERROR:" << e.what();
    }

You will find the full examples on the project website H5CPP or on my github page
You also should be able to find other examples for compound dataypes. As for the homogeneous arrangement use std::vector or any linear algebra objects supported by H5CPP.

best: steve