Append to compound dataset

Hello All,

I am new to HDF5 and am using HDF5 version 1.8.16.

I have successfully created a compound dataset using the C API and then closed the HDF5 file. I would like to be able to open this HDF5 file up and add another record/row to the existing compound dataset. Is this possible and if so is there an example of how to do this with the C API?

Thank you for any help.

Hi Richard,

you might be interested in this c++ implementation which comes with an h5::append operator. The performant C++ code is based on HDF5 direct chunk write. You find relevant examples on my github page.

presentation decks are here

best wishes: steven

Here is a “starter kit.”

Homework: Run the program and write your own that appends another 5,000 elements to (4-byte) integers (Extra credit: Pretend you do not know how many elements the dataset you’re about to extend contains.)

G.

#include "hdf5.h"

#include <stdlib.h>

int main()
{
  __label__ fail_file, fail_dspace, fail_dset, fail_extent;

  int retval = EXIT_SUCCESS;

  hid_t file, dspace, dcpl, dset;

  if ((file = H5Fcreate("foo.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT)) ==
      H5I_INVALID_HID) {
    retval = EXIT_FAILURE;
    goto fail_file;
  }

  // create a 1D dataspace of indefinite extent, initial extent 0 (elements)
  if ((dspace = H5Screate_simple(1, (hsize_t[]){0}, (hsize_t[]){H5S_UNLIMITED}))
      == H5I_INVALID_HID) {
    retval = EXIT_FAILURE;
    goto fail_dspace;
  }

  // allocate space in the file in batches of 1024 dataset elements
  if ((dcpl = H5Pcreate(H5P_DATASET_CREATE)) == H5I_INVALID_HID) {
    retval = EXIT_FAILURE;
    goto fail_dcpl;
  }
  if (H5Pset_chunk(dcpl, 1, (hsize_t[]){1024}) < 0) {
    retval = EXIT_FAILURE;
    goto fail_dset;
  }

  // create the dataset
  // (replace H5T_STD_I32LE with your favorite datatype)
  if ((dset = H5Dcreate(file, "(4-byte) integers", H5T_STD_I32LE, dspace,
                        H5P_DEFAULT, dcpl, H5P_DEFAULT)) ==
      H5I_INVALID_HID) {
    retval = EXIT_FAILURE;
    goto fail_dset;
  }

  // grow from here!

  // "add one row"
  if (H5Dset_extent(dset, (hsize_t[]){1}) < 0) {
    retval = EXIT_FAILURE;
    goto fail_extent;
  }

  // "add 99 more rows"
  // 100 = 1 + 99
  if (H5Dset_extent(dset, (hsize_t[]){100}) < 0) {
    retval = EXIT_FAILURE;
    goto fail_extent;
  }

  // you can also shrink the dataset...

fail_extent:
  H5Dclose(dset);
fail_dset:
  H5Pclose(dcpl);
fail_dcpl:
  H5Sclose(dspace);
fail_dspace:
  H5Fclose(file);
fail_file:

  return retval;
}

Thank you @Steven for the reply and links.

However, for reasons beyond my control I have to stay within the confines of the C API for HDF5.

Hi @richard.haney

Yes, it is possible to create a compound dataset, write data to it, and append data to it at a later stage. To achieve this, the dataset needs to be extendible (like you most probably know) and the writing done with the help of either an hyperslab or point selection. To illustrate, here is a C program using HDFql that does this:

#include <stdlib.h>
#include <stdio.h>
#include “HDFql.h”

struct data
{
    float latitude;
    float longitude;
    int value;
}

int main(int argc, char *argv[])
{
    // declare variables
    struct data element;
    char script[100];

    // create an HDF5 file named 'example.h5' and use (i.e. open) it
    hdfql_execute("CREATE AND USE FILE example.h5”);

    // prepare script that creates an extendible (of unlimited size) dataset named 'dset' of type compound with three members ('latitude', 'longitude' and 'value')
    sprintf(script, "CREATE DATASET dset AS COMPOUND(latitude AS FLOAT OFFSET %d, longitude AS FLOAT OFFSET %d, value AS INT OFFSET %d)(0 TO UNLIMITED) SIZE %d”, offsetof(struct data, latitude), offsetof(struct data, longitude), offsetof(struct data, value), sizeof(struct data));

    // execute script
    hdfql_execute(script);
    
    // populate variable 'element' with some data
    element.latitude = 48.856613;
    element.longitude = 2.352222;
    element.value = 15;

    // extend dataset 'dset' by one unit (i.e. add a new row)
    hdfql_execute(“ALTER DIMENSION dset TO +1”);

    // prepare script that writes variable 'element' in the last row of dataset 'dset' using a point selection
    sprintf(script, “INSERT INTO dset(-1) VALUES FROM MEMORY %d”, hdfql_variable_transient_register(&element));

    // execute script
    hdfql_execute(script);

    // close HDF5 file in use (i.e. example.h5)
    hdfql_execute(“CLOSE FILE”);

    // use (i.e. open) HDF5 file 'example.h5'
    hdfql_execute(“USE FILE example.h5”);

    // populate variable 'element' with some data
    element.latitude = 52.520008;
    element.longitude = 13.404954;
    element.value = 17;

    // extend dataset 'dset' by one unit (i.e. add a new row)
    hdfql_execute(“ALTER DIMENSION dset TO +1”);

    // prepare script that writes variable 'element' in the last row of dataset 'dset' using a point selection
    sprintf(script, “INSERT INTO dset(-1) VALUES FROM MEMORY %d”, hdfql_variable_transient_register(&element));

    // execute script
    hdfql_execute(script);

    // close HDF5 file in use (i.e. example.h5)
    hdfql_execute(“CLOSE FILE”);

    return EXIT_SUCCESS;
}

In case performance is a concern, you may want to directly write the compound (i.e. chunk) by using the keyword DIRECTLY. This bypasses several internal processing steps of the HDF5 library itself (e.g. data conversion, filter pipeline), which can lead to a much faster writing. In other words:

sprintf(script, “INSERT DIRECTLY INTO dset(-1) VALUES FROM MEMORY %d”, hdfql_variable_transient_register(&element));

Is calling HDFql from C legal or cheating? :thinking:

2 Likes