SWMR - extend dataset by 1000 records and write records one by one


#1

Hi,

My question is on SWMR. I have an one dimension array with unlimited size. In the writer, I get the records one by one (not in a block of records).
My implementation is based on swmr_addrem_writer.c from hdf5 tests.
For each record, I call H5Dset_extent, H5Dget_space, H5Sselect_hyperslab, and H5Dwrite.
I’m trying to improve runtime performance by calling H5Dset_extent and H5Dget_space less. I extended the dataset by 1000 Instead of by 1, and then called H5Sselect_hyperslab, and H5Dwrite 1000 times for each record. I got ~50% runtime improvement. For example in pseudo code:

hsize_t size[] = {1000};
H5Dset_extent(dataset, size);
hid = hidFileSpace = H5Dget_space(dataset);
for (int i = 0; i < 1000; i++)
{
  H5Sselect_hyperslab ... // move to last record
  H5Dwrite ... // write one record
}

The output *.h5 is fine. Just have “zeroed” records at the end of the dataset for the unwritten records (1000 - H5Dwrite calls in the last chunk).

Is it safe to do it? Will the SWMR reader see “zeroed” records during the writing (in the middle/end)?
Is it safe from SWMR point of view to reduce the dataset size to the exact size with H5Dset_extent just before closing the dataset?

Thanks,
Maoz


#2

May I ask what is the throughout you are getting with this approach?

sizeof(object) * nelements / 1 second and type of computer / hdd


#3

I’m trying to improve writer run-time by calling hdf5 APIs less. I got an improvement, but don’t know if it will * work when SWMR reader is running in parallel.

Common for both cases:
Computer: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Hdd: NFS - I know SWMR is not supported on it. I'm currently developing a writer as a POC.
OS: Sles11
I/O from strace:
* write: 1055
* lseek: 48
* other: 10
H5Pset_chunk is set to 1000 and H5Pset_chunk_cache is set to allocate all 1000 records in the cache.

Without improvement (extend by 1):
--------------------------------------------------------------
| dataset | sizeof(object) | number of elements | Total size |
| --------|----------------|--------------------|------------|
| 1       | 32             | 163,940            | 5,246,080  |
| 2       | 24             | 180,324            | 4,327,776  |
| 3       | 32             | 163,930            | 5,245,760  |
| 4       | 32             | 163,930            | 5,245,760  |
| 5       | 24             | 163,930            | 3,934,320  |
| 6       | 32             | 163,930            | 5,245,760  |
|---------|----------------|--------------------|------------|
| Total   |                |                    | 29,245,456 |
--------------------------------------------------------------
file size (bytes)        : 29,306,340
user time (seconds)      : 12.58
system time (seconds)    : 0.21
total time (seconds)     : 12.79
write rate (bytes/second): 2,286,587

With improvement (extend by 1000):
--------------------------------------------------------------
| dataset | sizeof(object) | number of elements | Total size |
| --------|----------------|--------------------|------------|
| 1       | 32             | 164,000            | 5,248,000  |
| 2       | 24             | 181,000            | 4,344,000  |
| 3       | 32             | 164,000            | 5,248,000  |
| 4       | 32             | 164,000            | 5,248,000  |
| 5       | 24             | 164,000            | 3,936,000  |
| 6       | 32             | 164,000            | 5,248,000  |
|---------|----------------|--------------------|------------|
| Total   |                |                    | 29,272,000 |
--------------------------------------------------------------
file size (bytes)        : 29,306,340
user time (seconds)      : 7.28
system time (seconds)    : 0.18
total time (seconds)     : 7.46
write rate (bytes/second): 3,923,860

Write rate is up from 2,286,587 to 3,923,860.

Same test case when dumping a binary file with C++ ofstream gives ~4.5X writing rate comparing to hdf5.
Total run-time that I used in the writing rate formula includes the “overhead” that generates the objects from a simulator.

Thanks,
Maoz


#4

Hi Maoz!

Extending by one record is no way currently, but might it be that the bug is fixed someday
(or at least, enters JIRA)?

Best wishes,
Andrey Paramonov


#5

The reason I was asking for some metrics is to ballpark your effort. In the current version of H5CPP the record/element wise read/write rate is about on par with raw C IO: on a Lenovo x230 laptop 350MB/sec for datasets size greater than system memory. And 1.5 - 3.5 GB/sec for small burts IO requests.

The comparison was against armadillo save/load: arma_binary, raw_binary.
Therefore if it is for speed improvement please consider evaluating H5CPP h5::append operator performance and or the techniques used there.
Although h5::append filter chain is not plugged in yet, the operator is functional with good IO properties and can take arbitrary T :== POD types | fundamental types and stl::vector and major linear algebra objects upto rank 3.

here is an example to append an std::vector from examples/packet_table

...	
h5::fd_t fd = h5::create("example.h5",H5F_ACC_TRUNC);
// dataset descriptor is automatically converted to packet table descriptor
h5::pt_t pt = h5::create<double>(fd, "stream of vectors",
	h5::max_dims{H5S_UNLIMITED,5}, h5::chunk{1,5} );

	arma::vec V(5); for(int i=0; i<5; i++) V(i) = i;
// the actual append operation
	for( int i = 0; i < 7; i++)
	    h5::append( pt, V);