*Version:* 1.8.1 but using -D H5_USE_16_API as I couldn't find examples that
are compatible with 1.8
*Hardware:* 3.2GHz Xeon with 1GB RAM
*OS* Linux 2.6.24-19 (64 bit)
*Compiler:* gcc (compiled using h5cc script)
I am just starting out with HDF5 and I would like to know the most efficient
way to write large number of rows of compound data. I combined the examples
for compound dataset and extendible dataset to create a program that will
write one row of data at a time. The program's (given below) performance is
very poor (takes about 3 real seconds to process 100K records). When I try
to run the same program on 1MM rows, the program brings down my machine. I
also tried using the example provided using Table High Level API and it
takes slighty more than a minute(90sec) to write 1MM rows but at least
succeeds. I also created a similar program using PyTables and that one
finishes in 1.8 seconds. I looked through the Table API code for PyTables
and it appears to be somewhat similar to H5TB code but PyTable's comments
say it is a stripped down version so not sure why their performance is much
better. I have provided my code below, could someone suggest the best way
to constuct the program so that it will perform at least slightly better
than PyTables? I read about Chunking and its impact on I/O performance but
could not figure out what I need to change - I did play with the chunk_dims
but with no impact.
My dataset has the following properties:
1. It has a fixed structure (number of fields is fixed)
2. Number of rows is unknown and will read be read one row at a time.
3. There is no requirement to read/write more than one row at any given
time.
Thanks very much in advance for your help!
-SK
*Code:*
#include "hdf5.h"
#define FILE "SDScompound.h5"
#define DATASETNAME "ArrayOfStructures"
#define LENGTH 1
#define RANK 1
#define ITER 100000
int main(void) {
/* First structure and dataset*/
typedef struct s1_t {
int a;
float b;
double c;
} s1_t;
s1_t s1[LENGTH];
hid_t s1_tid; /* File datatype identifier */
int i;
hid_t file, dataset, space, filespace, cparms; /* Handles */
herr_t status;
hsize_t dim[] = { LENGTH }; /* Dataspace dimensions */
hsize_t offset[LENGTH], size[LENGTH]; /* Dataspace dimensions */
hsize_t maxdims[] = { H5S_UNLIMITED };
hsize_t chunk_dims[] = { 5 }; // what is the best number if I am
reading/writing one row at a time
/*
* Initialize the data
*/
s1[i].a = 10;
s1[i].b = 15;
s1[i].c = 99.99;
/*
* Create the data space.
*/
space = H5Screate_simple (RANK, dim, maxdims);
/*
* Create the file.
*/
file = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
/* Modify dataset creation properties, i.e. enable chunking */
cparms = H5Pcreate (H5P_DATASET_CREATE);
status = H5Pset_chunk ( cparms, RANK, chunk_dims);
/*
* Create the memory data type.
*/
s1_tid = H5Tcreate (H5T_COMPOUND, sizeof(s1_t));
H5Tinsert(s1_tid, "a_name", HOFFSET(s1_t, a), H5T_NATIVE_INT);
H5Tinsert(s1_tid, "c_name", HOFFSET(s1_t, c), H5T_NATIVE_DOUBLE);
H5Tinsert(s1_tid, "b_name", HOFFSET(s1_t, b), H5T_NATIVE_FLOAT);
/*
* Create the dataset.
*/
dataset = H5Dcreate(file, DATASETNAME, s1_tid, space, cparms);
/* Extend the dataset to the orig dimension */
size[0] = dim[0];
status = H5Dextend (dataset, size);
/* Select a hyperslab */
filespace = H5Dget_space (dataset);
offset[0] = 0;
status = H5Sselect_hyperslab (filespace, H5S_SELECT_SET, offset, NULL,
dim, NULL);
/* Write the data to the hyperslab */
status = H5Dwrite (dataset, s1_tid,space, filespace, H5P_DEFAULT, s1);
for (i = 0; i < ITER; ++i) {
/* Extend the dataset. Add one more row */
++size[0]; // increase the row size by 1
status = H5Dextend (dataset, size);
/* Select a hyperslab */
filespace = H5Dget_space (dataset);
offset[0] = size[0] - 1; // offset starts at 0
status = H5Sselect_hyperslab (filespace, H5S_SELECT_SET, offset,
NULL, dim, NULL);
space = H5Screate_simple (RANK, dim, NULL);
status = H5Dwrite (dataset, s1_tid, space, filespace, H5P_DEFAULT,
s1);
// status = H5Fflush(file, H5F_SCOPE_GLOBAL); // program still
brings down the system w/ or w/o flush
}
/*
* Release resources
*/
H5Tclose(s1_tid);
H5Sclose(space);
H5Dclose(dataset);
H5Fclose(file);
return 0;
}