Dear all HDF experts,
I'm looking into using HDF5 for storing data from a large high-energy
astrophysics observatory (vs using a custom binary format or something like
ROOT's file format, which is commonly used in high-energy physics), and
have run into a few problems. At the basic level, our data can be
described as follows:
- large data sets (gigabytes per second) where we don't know the full
size a-priori when writing
- the data is a set of "events", where each event contains a set of
instrumental readouts (vectors of numbers) + multiple sets of parameters
- there will be many hundreds of thousands of such events in a single file
- due to the large data rates, zero-suppression is needed (compression is
not enough), meaning that the vector data must be variable-length
- we need only sequential access, so random-access is not needed, but
speed and size-efficiency of reading and writing is critical due to the
- the data will be written (and generally read) an event-at-a-time (e.g.
a row of the table at once)
At first glance, the HDF packet-table interface looked like a great
solution, where each packet stores an event, and within the packet we would
put structured HDF data (the resulting data set would then look like a
table, with a few columns containing variable-length arrays). However, the
variable-length packet tables do not seem to have been ever implemented in
the HDF5 libraries, despite having examples and documentation. Is it
possible to store variable-length arrays in a fixed-length packet table?
Further more, in general it seems that variable-length arrays don't seem to
be well documented in HDF, though they appear to be supported.
Has anybody had any experience using similar data? Particularly tables
containing columns that have variable-length arrays in them? Is it
efficient in HDF, and are there examples of its use, or a recommendation on
what interfaces to use?
A second question is that in reality, this data contains
variable-length-arrays of variable-length arrays. However, we can get
around 1 level of encapsulation by just using an index variable, or
separating one dimension into separate tables, so it's not critical to
store the data this way. It would be nice though, since it would reflect
the actual hierarchy of the data directly in the format. Is such a format
even possible in HDF5?
Dr. Karl Kosack
CEA Saclay Bat 709
F-91191 Gif sur Yvette Cedex FRANCE