Sustained write performance

Hello,

I'm using the HDF5 Lite calls to repeatedly write datasets to a large data
file. However, after a period of time (usually after writing between 10 and
30 GB), the write performance dramatically decreases. The size of each
dataset is rather small (on the order of 100kB) and the system writes data
at about 50 MB/s.

Replacing the HDF5 calls with regular io (e.g. fwrite()), I've shown that
the disk array and system are easily able to write multi hundred GB files at
even higher data rates of 75 MB/s. And even with the HDF5 library CPU and
disk IO are far from being pushed to their limits.

However, I am noticing that, when using HDF5, it will also cause disk reads
to occur which I suspect is destroying my write performance.

So a few questions: Does the HDF5 Lite library place any hidden disk reads
when creating/writing datasets? Would using the more general API provide
any more control? Does anyone have any suggestions, or experience, with
writing this type of data to disk?

Essentially, the program follows this pattern:

H5Fcreate();
OpenAndLockSharedMemoryBuffer()

while( still running ){

    while( still more data in buffer ){
        H5LTmake_dataset()
    }
    SwitchSharedMemoryBuffer()
}
H5Fclose()

Thanks,

···

--
Matthew Sunderland

Hi Matthew,

···

On Jun 9, 2010, at 7:29 PM, Matthew Sunderland wrote:

Hello,

I'm using the HDF5 Lite calls to repeatedly write datasets to a large data file. However, after a period of time (usually after writing between 10 and 30 GB), the write performance dramatically decreases. The size of each dataset is rather small (on the order of 100kB) and the system writes data at about 50 MB/s.

Replacing the HDF5 calls with regular io (e.g. fwrite()), I've shown that the disk array and system are easily able to write multi hundred GB files at even higher data rates of 75 MB/s. And even with the HDF5 library CPU and disk IO are far from being pushed to their limits.

However, I am noticing that, when using HDF5, it will also cause disk reads to occur which I suspect is destroying my write performance.

So a few questions: Does the HDF5 Lite library place any hidden disk reads when creating/writing datasets? Would using the more general API provide any more control? Does anyone have any suggestions, or experience, with writing this type of data to disk?

Essentially, the program follows this pattern:

H5Fcreate();
OpenAndLockSharedMemoryBuffer()

while( still running ){

    while( still more data in buffer ){
        H5LTmake_dataset()
    }
    SwitchSharedMemoryBuffer()
}
H5Fclose()

  I can't think of any obvious "hidden" reads when using the Lite API routines, but there may be some read operations, if metadata for the group that will contain the new dataset isn't already in the metadata cache. However, the Lite API routines are designed more for ease of use than performance...

  Quincey