Hi forum,
I apologize for the verbosity of this message ahead of time but the devil
is in the details. I've scoured the archives and have had trouble finding
something similar to my problem in terms of scale.
I am using an analog to digital acquisition device which delivers 16 bit
integers in 2d row major contiguous blocks. The dimensions of the row
(time) can be 128, 256, 512, and 1024 and the column dimension (space)
could be any integer between 1000 and 50000. This means are largest
delivered block coming from the device is 1024*50000*16/8/1024/1024 =
97.6565 megabytes to be written to disk per block. I essentially receive 3
of these blocks a second which translates to approximately 300mb/s. For the
smaller blocks, I'll simply receive more of them per second ultimately
equating to 300mb/s.
Since we are acquiring data for an unknown amount of time, the row
dimension is unlimited and say space dimension is 50000:
const std::string name = data_name;
const hsize_t dims[2] = {0, 50000};
const hsize_t maxdims[2] = {H5S_UNLIMITED, 50000};
const hsize_t time_count = 1024;
const hsize_t chunk_dims[2] = {time_count, 50000};
const size_t chunk_size = chunk_dims[0] * chunk_dims[1];
auto create_plist = H5Pcreate(H5P_DATASET_CREATE);
H5Pset_chunk(create_plist, 2, chunk_dims);
auto type = get_type(elementDesc, 0); // helper function to get the
data type we are dealing with
auto access_plist = H5Pcreate(H5P_DATASET_ACCESS);
const size_t rdcc_nbytes = 0;
const size_t rdcc_nslots = 0;
const double rdcc_w0 = 1;
H5Pset_chunk_cache(access_plist, rdcc_nslots, rdcc_nbytes, rdcc_w0);
if (use_szip_filter()) {
const size_t options_mask = H5_SZIP_NN_OPTION_MASK;
const size_t pixels_per_block = 16u;
H5Pset_szip(create_plist, options_mask, pixels_per_block);
}
auto datatype = H5Tcopy(std::get<1>(type));
H5Pset_fill_value(create_plist, datatype, NULL);
auto dataspace = H5Screate_simple(2, dims, maxdims);
auto dataset = H5Dcreate(file, name.c_str(), datatype, dataspace,
H5P_DEFAULT, create_plist, access_plist);
if (dataset < 0)
throw FileWriterError("Unable to create data var");
H5Sclose(dataspace);
H5Pclose(access_plist);
H5Pclose(create_plist);
Every time a block comes in from the acquisition device, the data set is
extended by the row dimension i.e, by 1024, and a write is performed:
const hsize_t size[2] = {some_previous_multiple_of_1024 + 1024, 50000};
H5Dextend(data_var.id, size);
const hsize_t dims[2] = {1024, 50000};
const hsize_t offset[2] = {some_previous_multiple_of_1024, 0};
auto filespace = H5Dget_space(data_var.id);
H5Sselect_hyperslab(filespace, H5S_SELECT_SET, offset, NULL, dims,
NULL);
auto memspace = H5Screate_simple(2, dims, NULL);
if (H5Dwrite(data_var.id, data_var.type, memspace, filespace,
H5P_DEFAULT, data) < 0)
return 0;
H5Sclose(memspace);
H5Sclose(filespace);
I have experimented with a variety of chunking and caching strategies both
in time and space. For example, it would be normal to typically process
rows sequentially in time of the acquired data, from the literature it
seemed appropriate to then have a single row as a chunk with dimensions 1 x
50000 which is approximately 0.1mb per chunk. However every strategy I
tried only hindered the write performance. The only way I was able to
achieve a write speed of 300mb/s was by disabling caching completely. I
also disabled strip mining though this may have had a negligibly impact. In
this case the chunk dimensions match the acquisition blocks exactly, a
single chunk per write. This seemed counter intuitive to me, at least the
portion about caching. Furthermore, I have a considerable I/O performance
hit when SZIP compression is enabled, and that's using experimental values
for chunking in both the time and space dimension. Does anyone have any
thoughts on this or suggestions for appropriate cache and chunking
parameters? Or any other parameters available to be configured that I've
missed?
Also, I have been trying to understand the B-Tree usage. Since in my
application the data is only ever written or read sequentially in time, it
would seem to me that the fastest implementation would be a root node with
a child whose children only ever have one child themselves. In other words
the tree would be something more akin to a linked list. Is this making
logical sense and is there any way to take advantage of this assumption?
I appreciate any input.
Best regards,
Brock Hargreaves