Tuning HDF5 for big transfer rates

Matthias_Zeintlinger · April 21, 2010, 7:00am

Dear HDF5-Mailinglist-Subscribers,

I'm using HDF5 to record data received periodically from an EEG amplifier
device to disk. The amplifier is able to sample with 38400Hz, where each
sample consists of 256 float values (16-bit), one for each of the 256
channels. It sends blocks of 512 samples rather than sending each sample
separately. So I get an average transfer rate of about 38400*256*4 =~
40Megabytes/sec. The recording application is written in managed C++.

So my question is what is the best configuration for the HDF5 library to
handle such transfer rates (assuming that the disk itself can handle them),
e.g. which dataset organization (contigues, chunked,.) would be the best
solution, what chunk size (if using the chunked model) and so on. The data
acquisition thread and other threads of the application should still be able
to run.

After experimenting a little bit I found that the DataSet.write method which
writes data to the HDF5 dataset is blocking a very long time for bigger
chunk sizes what makes me believing that the write-method always writes the
data directly to disk. So I was wondering if it is possible to collect data
in an internal buffer of the dataset which will be written to disk only
after several seconds in a separate thread (so that new data can still be
collected while the bunch of old collected data is being written to disk).
Or maybe another solution will fit better?

I would be glad if some more experienced users out there could give me some
tips for tuning HDF5.

Best regards,

Matthias

PS: Just to let you know about our upcoming events:

<http://www.wcnr2010.org/> World Congress on NeuroRehabilitation from March
21 - 25, 2010 in Vienna, Austria
<http://www.cnsmeeting.org/> Cognitive Neuroscience Society Meeting from
April 17 - 20, 2010 with <http://www.gtec.at/profile/BCI_WS_CNS_2010.html>
g.tec BCI workshop in Montreal, Canada
<Welcome bcimeeting.org - Justhost.com; BCI Meeting 2010 from May 31 - June 4, 2010
with <http://www.bci2000.org/BCI2000/Workshop.html> BCI2000 workshop on May
30 - 31, 2010 in Asilomar, California, USA

<Home | Organization for Human Brain Mapping; 16th
Annual Meeting of the Organization for Human Brain Mapping from June 6 - 10,
2010 with <http://www.gtec.at/profile/BCI_WS_Brainmapping_2010.html> g.tec
BCI workshop in Barcelona, Spain
<http://www.icchp.org/> ICCHP 2010 from July 14 - 16, 2010 with g.tec BCI
workshop in Vienna, Austria
<http://www.cnsorg.org/2010/> 19th Computational Neurosciences Meeting from
July 24 - 30, 2010 in San Antonio, Texas, USA
<NameBright - Coming Soon; ICPR 2010 from August 23 - 26, 2010 in Istanbul,
Turkey

Would be nice to meet you there!

cid:image001.gif@01CA4735.5C324E80

···

_________________________________________________

cid:1A065B10873045C384A80A97969658DB@KrauszPC

Matthias Zeintlinger, Dipl.-Ing.
GUGER TECHNOLOGIES OG

Herbersteinstr. 60, 8020 Graz, Austria
phone: ++43 316 675106 - 22
fax: ++43 316 675106 - 39
e-mail: <mailto:zeintlinger@gtec.at> zeintlinger@gtec.at

web: <http://www.gtec.at/> www.gtec.at

_________________________________________________

This message and any attached files are confidential and intended solely for
the addressee(s). Any publication, transmission or other use of the
information by a person or entity other than the intended addressee is
prohibited. If you receive this in error please contact the sender and
delete the material. The sender does not accept liability for any errors or
omissions as a result of the transmission.

ccampjr · January 5, 2012, 7:57pm

/makes me believing that the write-method always writes the
data directly to disk. So I was wondering if it is possible to collect data
in an internal buffer of the dataset which will be written to disk only
after several seconds in a separate thread (so that new data can still be
collected while the bunch of old collected data is being written to disk).
Or maybe another solution will fit better?/

I'm in the same boat and I've found a few things:

1) Set the allocation time: H5Pset_alloc_time(cparms, H5D_ALLOC_TIME_INCR);
(cparms is the id of the property list associated with dataset creation)

2) Set the chunk cache size (VERY IMPORTANT) as the default buffer is 1 MB
My off-the-cuff recommendation is to start with a buffer size of at least
50% of your useable RAM (I use about 80%).

Example:
accessparms = H5Pcreate(H5P_DATASET_ACCESS); // get the access parameters
for the dataset
status = H5Pset_chunk_cache(accessparms,58757,6000000000,1);

The second parameter of the set chunk cache function is the number of cache
slots. Per the API, I recommend determining the # of chunks that can fit in
the cache (in this case 6GB), multiple by 100, and find the nearest prime #
to this. Obscure, I know, but check out the API for more info.

Hopefully this will get you underway. Using this method, you will pretty
much do all your writing to RAM until it starts to fill up.

Thanks,
C

···

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/Tuning-HDF5-for-big-transfer-rates-tp739280p3636085.html
Sent from the hdf-forum mailing list archive at Nabble.com.