The first search is pretty straight forward. My link is pretty simple, there's a 1:1 correspondence between the line numbers in the time scale & the dataset.
The two other searches have to be brute forced from within the Packet Table interface (H5PT) by iterating each line to just pull the individual field(s). There may be a better way from the dataset (H5D). I've stuck with the PT interface because I generally grab the whole dataset and it simplifies the process of adding new data.
-----Original Message-----
From: hdf-forum-bounces@hdfgroup.org [mailto:hdf-forum-bounces@hdfgroup.org]
On Behalf Of Val Schmidt
Sent: Friday, February 11, 2011 11:28 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] hdf suitability for packetized data
Your question is a good one.
I would need to be able to pull a full record (or set of records) within a set
of time bounds.
I would need to be able to pull some field from all records for all times - as
a time series.
I might need to be able to pull all the fields within some field range for all
times.
I'm thinking of something similar to what you have done (I think) - that is,
to self-index the file. The index would be in it's own dataset with an array
of time records and perhaps a few other fields and relative links (I forget
what HDF5 calls them) to the actual data records.
-Val
On Feb 11, 2011, at 10:56 AM, Mitchell, Scott - IS wrote:
I'm doing something similar to what you are looking at. I have data coming
in from multiple instruments which go through processing and result in one or
several C# structures/arrays. In my example each instrument type has a
structure containing Packet Tables with associated time axes/scales. The
packet table structure mimics the instrument data structures.
Metadata is held in Attributes and other Packet Tables. I've created a
standard across the program, with specifics defined for each instrument.
I end up storing each individual instrument's data in its own file. In most
cases, a single thread processes and stores data, so I don't have to worry
about synchronization (as much).
I believe you'll want to store each data type in its own dataset or file.
For the ability to search by data type and data length issues. How are you
expecting to search?
In my case, we allow users to 'play back' the data. I have the time scale as
a separate dataset so I can do random access lookups without having to load
large data records to find a specific time.
Scott
-----Original Message-----
From: hdf-forum-bounces@hdfgroup.org [mailto:hdf-forum-
bounces@hdfgroup.org]
On Behalf Of Val Schmidt
Sent: Thursday, February 10, 2011 6:10 PM
To: hdf-forum@hdfgroup.org
Subject: [Hdf-forum] hdf suitability for packetized data
Hello everyone,
I am new to HDF and am trying to understand whether or not it might be a
suitable file format for my application. The data I'm interested to store
is
usually written by the collecting instrument to basic binary files of
concatenated packets (think c structures), each of which contains a header
with a time stamp, packet format, packet identifier, and packet size
followed
by the data itself (arrays) and associated metadata. There are 10's of
types
of packets that may come in any order and they are usually written to the
file
sequentially. Packets contain from 10-100 fields, some of which may be
arrays
of data of various sizes.
This format allows one to relatively quickly index a file by passing
through
the file and parsing only these headers. Then one can use the index to pull
subsets of the data in a non-linear fashion, sometimes simultaneously in
multiple threads for quite fast reading. The problem is that every
instrument
manufacturer has their own method of encoding packets and a single format
is
needed for archival purposes.
My question to you is how might a similar model be implemented in HDF5 such
that the same kind of indexing and parallel data retrieval is possible?
What
is to be avoided is the need to read through a file sequentially to get to
the
fields to extract.
It seems like HDF5 should handle this kind of thing well, but because I am
inexperienced and because most folks using it seem to be storing relatively
small numbers of very large arrays (imagery in many cases), rather than
relatively large numbers of smaller numbers of fields and smaller arrays,
it
is not clear to me how such an implementation might perform. So I guess I'm
also asking, what is the relative penalty for writing lots of small sets of
data?
I hope this makes sense.
Thanks in advance,
Val
------------------------------------------------------
Val Schmidt
CCOM/JHC
University of New Hampshire
Chase Ocean Engineering Lab
24 Colovos Road
Durham, NH 03824
e: vschmidt [AT] ccom.unh.edu
m: 614.286.3726
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
This e-mail and any files transmitted with it may be proprietary and are
intended solely for the use of the individual or entity to whom they are
addressed. If you have received this e-mail in error please notify the sender.
Please note that any views or opinions presented in this e-mail are solely
those of the author and do not necessarily represent those of ITT Corporation.
The recipient should check this e-mail and any attachments for the presence of
viruses. ITT accepts no liability for any damage caused by any virus
transmitted by this e-mail.
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
------------------------------------------------------
Val Schmidt
CCOM/JHC
University of New Hampshire
Chase Ocean Engineering Lab
24 Colovos Road
Durham, NH 03824
e: vschmidt [AT] ccom.unh.edu
m: 614.286.3726
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org