We're looking into replacing our custom storage design for time-series data with HDF5 and we're looking mainly at HDF5 version 1.10 for the SWMR capability as we're doing this already with our custom storage.
To find out the best layout - we drafted a few test cases and started off a tutorial code sample in C++, adjusting it to replicate our current database structure, being one file per signal - so we are creating new empty files in a loop - and there we already ran into problems:
- the HDF5 garbage collector allocates lots of memory as soon as files are created - we tried to tune it with setGcReferences(), but could not;
- having reached 2GB - the HDF5 create function throws the exception "no space available for allocation" (We're running 64-bit Windows 8 with 16GB of RAM)
I'd have a few questions at this point:
- Can we reduce the amount of memory used by the garbage collector? If yes - how?
- Taking a step back: is the HDF5 API designed to handle thousands of files in practice?
- Or would it be better to have a single file with the same number of datasets in it? (We're talking about a few thousand datasets, each with several million rows.)
Thanks for your kind support
CONFIDENTIALITY : This e-mail and any attachments are confidential and may be privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person, use it for any purpose or store or copy the information in any medium.