I'm a member of the MAST team at Culham Laboratory, UK. We are proposing to
migrate from our own archive format to HDF5 for MAST shot data. The majority
of MAST archive data items are simple real functions of time. Others are
real functions of time and one or more spatial dimensions. Each archive file
stores a subset of data items for a particular shot.
Are you considering netcdf4 (which is based on hdf5)?
And there is bp:
"The bp format was designed to be largely convertable to both
HDF-5 and netCDF while relaxing some of the consistency
requirements to increase performance for both writing and
reading for serial and parallel file systems. As a forward-looking
standard, all offsets and sizes of potentially large items use
64-bits to express the offset or size."
I'd like to know if there is an established HDF5 file layout standard for
our type of data which we should consider adopting. One issue is how to link
the principal dataset for each data item to datasets storing the independent
variable values. Another is how to store values of the independent
(dimension) variables efficiently. For example, in our work sets of data
items often share the time dimension, and time values can often be recorded
simply as 3 floats: startTime, timeIncrement, numberOfValues.
Is there an appropriate standard, or must we roll our own and hope others in
the field will accept it?
Marcus G. Daniels at Los Alamos might have some pointers -- he wrote
a simple hdf5 interface for the R (Splus clone) stats package. His email
is in the documentation -- just look for the hdf5 package on CRAN. Lots
of people at Los Alamos have encountered fusion data even if they were
not directly involved in the project -- one advantage of a small town!
How big are your datasets? Do you need to support parallel processing?
Are you more concerned with efficiency in your particular processing
environment or with ensuring data are accessible to a wide (in time
and/or space) audience using diverse tools/platforms. Is most
processing standardized or is it mostly ad-hoc?
Maybe you can steal something from netcdf4 (base on hdf4) and the
It would be nice to see a set of examples of HDF5 (and netcdf4) file layouts,
perhaps with some expert comments on what's good, bad, or ugly about
the format. In my field, hdf4 and netcdf3 are widely used, but the current
examples suffer from decisions made too many years ago, so might
work better as examples of "what not to do".
Do you use HLL's (matlab, IDL, Splus, etc.)? One approach is to load
the data into a package and use the simplest interface to save the data,
then see what you get. Octave (matlab "clone") supports the option to
use hdf5 for "save" files. If a variable is defined as a sequence (like your
time variable), octave uses the corresponding hdf5 data type.
The downside is that other HLL's may not support all the hdf5 capabilities.
R (Splus "clone") has a simple hdf5 interface, but at present does not
understand sequences. Being open source, I expect the capabilities
will evolve, so if you don't currently use R, that might never be a problem.
Even if you don't currently use HLL's, it could be helpful in designing a format
to check for interoperability. If you don't want to shell out for
tools, there are good open source "clones" (already mentioned, plus
gnudatalanguage for IDL), all with some level of hdf5 support. While you
don't expect everything to work, you can try to make sure that the things
that don't work would not be difficult to implement if they become
necessary in the future.
On Tue, Nov 25, 2008 at 8:00 AM, John Storrs <email@example.com> wrote:
George N. White III <firstname.lastname@example.org>
Head of St. Margarets Bay, Nova Scotia
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to email@example.com.
To unsubscribe, send a message to firstname.lastname@example.org.