C++ equivalence of Python code for HDF5. Working with datasets: resize, working with indexes, etc

This post is about code conversion from Python to C++ about HDF5, trying to keep the C++ part simplified, and this about a specific part from a previous post here, just in order to separate that whole content in separate and more specific sub-questions.

General process in the program about HDF5:

  • Generate a .h5 file with one or more datasets.
  • Writing/reading if the .h5 file exists, and if not, then creating the .h5 file and writing/reading.
  • Each dataset in the file will contain data that comes from a 2d-collection with variable size.
  • The 2d-collection to insert in the dataset is returned by a function and we don’t know its exact size in advance (at compilation time). In some parts of the program we could know only the number of rows in advance and in other parts we could know only the number of columns in advance. This is the reason because the C++ code below shows tests with different “2-dimensional” data structures, because in C++ you have more options for different cases compared to Python.

Main question for this post:

What is the C++ equivalence for the following Python section (full code below):

f[dset_name].resize(f[dset_name].shape[0] + a_2d.shape[0], axis=0)
f[dset_name][-a_2d.shape[0]:] = a_2d

Python code to convert to C++:

import numpy as np
import h5py


file_name = "f_1"
dset_name = "dset_1"

# list-of-lists representing a function call that returns a list-of-lists with variable size.
l_2d = [[1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4]]

f = h5py.File(f"{file_name}.h5", "a")
# f.flush()

# Create dataset
if dset_name not in f.keys():
  f.create_dataset(dset_name, (0, 3), maxshape=(None, 3), dtype="i")
  # f.flush()

# Write data
a_2d = np.array(l_2d)

f[dset_name].resize(f[dset_name].shape[0] + a_2d.shape[0], axis=0)
f[dset_name][-a_2d.shape[0]:] = a_2d

f.flush()
f.close()

C++ code, progress for now:

#include <array>
#include <vector>
#include "H5Cpp.h"

const H5std_string FILE_NAME("f_1.h5");
const H5std_string DATASET_NAME("dset_1");

int main()
{
  // Tests with different data structures
  // int                               a_2d[4][3] = {{1, 1, 1}, {2, 2, 2}, {3, 3, 3}, {4, 4, 4}};
  // std::array<std::array<int, 3>, 4> a_2d{{ {1, 1, 1}, {2, 2, 2}, {3, 3, 3}, {4, 4, 4} }};
  std::vector<std::array<int, 3>>      a_2d{{1, 1, 1}, {2, 2, 2}, {3, 3, 3}, {4, 4, 4}};
  // std::array<std::vector<int>, 4>   a_2d{{ {1, 1, 1}, {2, 2, 2}, {3, 3, 3}, {4, 4, 4} }};
  // std::vector<std::vector<int>>     a_2d{{1, 1, 1}, {2, 2, 2}, {3, 3, 3}, {4, 4, 4}};
  
  try
  {
    Exception::dontPrint();
    
    H5::H5File f(FILE_NAME, H5F_ACC_TRUNC);  // AFAIK `H5F_ACC_TRUNC` wouldn't be the equivalence for the `"a"` mode in `h5py.File(f"{file_name}", "a")`
    
    
    hsize_t dims[2];
    dims[0] = 4;  // a_2d.size();
    dims[1] = 3;  // a_2d[0].size();
    H5::DataSpace dspace(2, dims);
    
    H5::DataSet dset = f.createDataSet(DATASET_NAME, H5::PredType::NATIVE_INT32, dspace);  // `H5::PredType::STD_I32BE`
    
    // dset.write(a_2d, H5::PredType::NATIVE_INT32);     // for C-style arrays
    dset.write(a_2d.data(), H5::PredType::NATIVE_INT32); // for non C-style arrays
    
    dspace.close();  // After checking some HDF5 C++ API examples in github,
    dset.close();    // it is not clear exactly what is needed about the
                     // `close` and `flush`
    
    // f.flush();  // Not tested because it asks for `H5F_scope_t scope`
    f.close();
    
  }

  catch (FileIException error)
  {
    error.printErrorStack();
    return -1;
  }

  catch (DataSetIException error)
  {
    error.printErrorStack();
    return -1;
  }

  catch (DataSpaceIException error)
  {
    error.printErrorStack();
    return -1;
  }

  return 0; // successfully terminated
}


About the HDF5 C++ API

I’m new with HDF5 and when searching for online information about the HDF5 C++ API documentation (this and other parts in https://support.hdfgroup.org), code examples from the documentation, and in general when searching for online information about HDF5 (another example here), I can notice the content and information about the C++ API is not always as complete as the content for the HDF5 C API. Just mentioning a mere illustration of this, when you compare the same code example from the documentation: in C and in Python, with the equivalent example in C++, I notice the C++ code example doesn’t content the parts about to close a dataspace and to close a file, and I mention this mainly for the purpose of highlighting that it would be great if the examples from the documentation had a 1:1 equivalence among the different languages, showing basic aspects that are present in this post, like the equivalence in C and C++ of the Python “a” mode (“Read/write if exists, create otherwise”), or when to use the flush functionality, etc.