How to optimize performance when writing multiple chunked datasets incrementally

We have an application which logs simultaneously samples of multiple sensors to a hdf5 file. For each sensor a chunked dataset is created. The sensors are sampled atomicly, so on each sample cycle a single value is written to each dataset. There are about 100 sensors (=datasets) which are sampled with 500Hz. The chunk size is 200000, the data type is float (32bit). Data to be written is about 100 * 4 * 500 = ~200Kb/s.
On some of the production PC we experienced very high CPU load and we wonder what we can do to optimize the writing performance.

  • Is it correct that writing a single value to a chunked dataset will not hit the disk until a chunk is full?
  • Should we buffer a greater number of samples in the application an write the buffer completely instead of writing single samples?

Any tips are appreciated.