SWMR - extend dataset by 1000 records and write records one by one

maoz.guttman · February 5, 2019, 3:48pm

Hi,

My question is on SWMR. I have an one dimension array with unlimited size. In the writer, I get the records one by one (not in a block of records).
My implementation is based on swmr_addrem_writer.c from hdf5 tests.
For each record, I call H5Dset_extent, H5Dget_space, H5Sselect_hyperslab, and H5Dwrite.
I’m trying to improve runtime performance by calling H5Dset_extent and H5Dget_space less. I extend the dataset by 1000 Instead of by 1, and then call H5Sselect_hyperslab, and H5Dwrite 1000 times for each record. I got ~50% runtime improvement. For example in pseudo code:

hsize_t size[] = {1000};
H5Dset_extent(dataset, size);
hid = hidFileSpace = H5Dget_space(dataset);
for (int i = 0; i < 1000; i++)
{
  H5Sselect_hyperslab ... // move to last record
  H5Dwrite ... // write one record
}

The output *.h5 is fine. Just have “zeroed” records at the end of the dataset for the unwritten records (1000 - H5Dwrite calls in the last chunk).

Is it this is safe for the SWMR reader? Will it see read “zeroed” records during the writing (in the middle/end)?
Is it safe from SWMR point of view to reduce the dataset size to the exact size just before closing the dataset
with H5Dset_extent?

Thanks,
Maoz

epourmal · February 21, 2019, 3:14am

Hi Maoz,

Current SWMR implementation cannot guarantee when a reader sees data. It is possible that dataset’s metadata including new extent value is written to the file before raw data. Therefore, the reader may see partially written chunks.

To answer your question, yes, it is safe for SWMR reader to access the chunk and read back some fill values; just keep in mind that it may be not completely written chunk.

It is not safe to reduce the size of the dataset that is under SWMR access.

Thank you!

Elena