H5Dread fails on preallocated array

Hi,

I am having this issue when reading a dataset if I use malloc to alllocate memory. Apologies in advance. I am not very proficient in C and just starting to learn HDF5. Unfortunately, I cannot use Python because of legacy software.

Let me explain.

This works with a small dataset:

double dset_data[num_timesteps][num_cells];

file_id = H5Fopen(filename, H5F_ACC_RDWR, H5P_DEFAULT);
group_id = H5Gopen2(file_id, discipline, H5P_DEFAULT);
dataset_id = H5Dopen2(group_id, variable, H5P_DEFAULT);
status_read = H5Dread(dataset_id, H5T_NATIVE_DOUBLE, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data);

and I can read the values from dset_data.

However, my actual dataset is quite large (8760 x 25000) so double dset_data[num_timesteps][num_cells] fails.

To avoid this issue I use malloc to allocate memory for the dset_data array, using the following code

int total_elements = num_timesteps * num_cells;
double *array = (double *)malloc(total_elements * sizeof(double));
for (int i = 0; i < total_elements; i++)
{
  array[i] = 0.0;
}

double **dset_data = (double **)malloc(num_timesteps * sizeof(double *));
for (int i = 0; i < num_timesteps; i++)
{
  dset_data[i] = &array[i * num_cells];
}

(The array is created correctly)

In this situation H5Dread fails (I get status_read == 0. Is there a way to see the actual error?) and when I try to read the values from dset_read I get segmentation fault

It might be a very silly mistake, but hopefully you can help to shed some light on what I am doing wrong.

Thanks

Hi @ruggiero,

Since you are not accustomed to work with C you may want to check HDFql as it (greatly) lowers the complexity of dealing with HDF5 in that language. Looking at the posted code, your use-case could be solved as follows in C using HDFql:

// declare variables
double *dset_data;
int i;

// allocate memory and assign it to variable 'dset_data'
dset_data = (double *) malloc(num_timesteps * num_cells * sizeof(double));

// register variable 'dset_data' for subsequent use (by HDFql)
hdfql_variable_transient_register(&dset_data);

// read dataset 'my_dataset' (from HDF5 file 'my_file.h5') and populate variable 'dset_data'
hdfql_execute("SELECT FROM my_file.h5 my_dataset INTO MEMORY 0");

// print content of variable 'dset_data'
for(i = 0; i < num_timesteps * num_cells; i++)
{
    printf("Value=%f\n", *(dset_data + i));
}

Hope it helps!

Thanks @contact

That would be a good idea. I downloaded it, but unfortunely I need the apple silicon version and it does not seem to be available.

I think I understand now what the problem was.

If I allocate dset_data with

double *dset_data;
dset_data = (double *)malloc(total_elements * sizeof(double));

everything works fine. The only difference is that I cannot use dset_data[i][j] to reference elements, but I need to use the row major syntax: dset_data[i * num_cells + j].

By allocating the array as I was doing before, I was not creating a continuous block of doubles which is what, I assume, H5Dread expects.

1 Like