Gday,
I'm using 24 bit integers with HDF5 but finding the performance very poor.
I'm new to HDF5. I'm evaluating the format for use with a radio telescope.
The telescope will produce about 7TB per 12 hours of raw data, so space and
write efficiency are important.
The telescope system produces 24 bit signed integers. If we convert to 32
bits our files will grow by 33%, ie over 9.5TB instead of 7TB.
I successfully wrote 24 bit integers with HDF5. Unfortunately the writing is
very inefficient: the CPU sits at 100% and the write rate (in integers per
second) is about five times slower than with 32 bit ints, even though the
file is smaller. The file writing is CPU-bound: the disk is hardly working
at all, unlike with 32 bits where it's disk-bound.
It appears that the conversion from 32 bits in memory to 24 bits by the
library is very inefficient. I've done this with my own code and it's
possible to do very quickly so the writing is still disk-bound. In all cases
I'm using little-endian as my platform is Intel.
Is it possible to tune HDF5 to write 24 bit integers more efficiently? I've
included code snippets below.
Cheers,
Jay.
void write_integration_hdf5(struct cmac_cells* cmac, struct
corr_packet_header* header, hid_t f) {
static hid_t datatype = -1;
static hid_t dataspace = -1;
// We use either 32 or 24 bit types for values, according to args. 24 bit is
custom made.
if (datatype < 0) {
if (arg_24bits) {
// Define an HDF5 custom type for 24 bit int, little-endian (actually,
native)
datatype = H5Tcopy(H5T_NATIVE_INT32);
H5Tset_size(datatype, 3); //3 bytes
if (datatype < 0) {
fprintf(stderr, "Error creating HDF5 24 bit int type\n");
exit(-1);
}
} else {
datatype = H5T_NATIVE_INT32;
}
}
// Define the HDF5 dataspace, ie the rank and size of the dataset array
if (dataspace < 0) {
int rank = 2; //baseline x re/im
hsize_t dims[] = {NUM_CMAC_CELLS*NUM_VIS_PER_CELL, 2};
dataspace = H5Screate_simple(rank, dims, dims);
if (dataspace < 0) {
fprintf(stderr, "Error creating HDF5 dataspace\n");
exit(-1);
}
}
// Add a dataset
char name[50];
sprintf(name, "INT%d_FREQ%d", header->integration_num, header->frequency);
hid_t dataset = H5Dcreate(f, name, datatype, dataspace, H5P_DEFAULT);
if (dataset < 0) {
fprintf(stderr, "Error creating dataset %s\n", name);
exit(-1);
}
// Write the vis data to the new dataset
const char* buffer = (const char*)cmac->cells;
if (H5Dwrite(dataset, H5T_NATIVE_INT32, H5S_ALL, H5S_ALL, H5P_DEFAULT,
buffer) < 0) {
fprintf(stderr, "Error writing to dataset %s\n", name);
exit(-1);
}
// Close the dataset
H5Dclose(dataset);