HDF5 in Fusion Research

John_Storrs · November 25, 2008, 12:00pm

I'm a member of the MAST team at Culham Laboratory, UK. We are proposing to
migrate from our own archive format to HDF5 for MAST shot data. The majority
of MAST archive data items are simple real functions of time. Others are real
functions of time and one or more spatial dimensions. Each archive file stores
a subset of data items for a particular shot.

I'd like to know if there is an established HDF5 file layout standard for our
type of data which we should consider adopting. One issue is how to link the
principal dataset for each data item to datasets storing the independent
variable values. Another is how to store values of the independent (dimension)
variables efficiently. For example, in our work sets of data items often share
the time dimension, and time values can often be recorded simply as 3 floats:
startTime, timeIncrement, numberOfValues.

Is there an appropriate standard, or must we roll our own and hope others in
the field will accept it?

Regards
John

···

--
John Storrs, Experiments Dept e-mail: john.storrs@ukaea.org.uk
Building D3, UKAEA Fusion tel: 01235 466338
Culham Science Centre fax: 01235 466379
Abingdon, Oxfordshire OX14 3DB http://www.fusion.org.uk

This message has been scanned for viruses by BlackSpider MailControl - www.blackspider.com

roy.mendelssohn · November 25, 2008, 3:38pm

Have you looked at netcdf4, which uses the hdf5 format? That has shared dimensions.

-Roy

···

On Nov 25, 2008, at 4:00 AM, John Storrs wrote:

I'm a member of the MAST team at Culham Laboratory, UK. We are proposing to migrate from our own archive format to HDF5 for MAST shot data. The majority of MAST archive data items are simple real functions of time. Others are real functions of time and one or more spatial dimensions. Each archive file stores a subset of data items for a particular shot.
I'd like to know if there is an established HDF5 file layout standard for our type of data which we should consider adopting. One issue is how to link the principal dataset for each data item to datasets storing the independent variable values. Another is how to store values of the independent (dimension) variables efficiently. For example, in our work sets of data items often share the time dimension, and time values can often be recorded simply as 3 floats: startTime, timeIncrement, numberOfValues.
Is there an appropriate standard, or must we roll our own and hope others in the field will accept it?
Regards
John
--
John Storrs, Experiments Dept e-mail: john.storrs@ukaea.org.uk
Building D3, UKAEA Fusion tel: 01235 466338
Culham Science Centre fax: 01235 466379
Abingdon, Oxfordshire OX14 3DB http://www.fusion.org.uk

This message has been scanned for viruses by BlackSpider MailControl

**********************
"The contents of this message do not reflect any position of the U.S. Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097

e-mail: Roy.Mendelssohn@noaa.gov (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

gnwiii · November 25, 2008, 2:30pm

I'm a member of the MAST team at Culham Laboratory, UK. We are proposing to
migrate from our own archive format to HDF5 for MAST shot data. The majority
of MAST archive data items are simple real functions of time. Others are
real functions of time and one or more spatial dimensions. Each archive file
stores a subset of data items for a particular shot.

Are you considering netcdf4 (which is based on hdf5)?

And there is bp:
"The bp format was designed to be largely convertable to both
HDF-5 and netCDF while relaxing some of the consistency
requirements to increase performance for both writing and
reading for serial and parallel file systems. As a forward-looking
standard, all offsets and sizes of potentially large items use
64-bits to express the offset or size."
-- <http://adiosapi.org/index.php5?title=Dataformat>?

I'd like to know if there is an established HDF5 file layout standard for
our type of data which we should consider adopting. One issue is how to link
the principal dataset for each data item to datasets storing the independent
variable values. Another is how to store values of the independent
(dimension) variables efficiently. For example, in our work sets of data
items often share the time dimension, and time values can often be recorded
simply as 3 floats: startTime, timeIncrement, numberOfValues.

Is there an appropriate standard, or must we roll our own and hope others in
the field will accept it?

Marcus G. Daniels at Los Alamos might have some pointers -- he wrote
a simple hdf5 interface for the R (Splus clone) stats package. His email
is in the documentation -- just look for the hdf5 package on CRAN. Lots
of people at Los Alamos have encountered fusion data even if they were
not directly involved in the project -- one advantage of a small town!

How big are your datasets? Do you need to support parallel processing?
Are you more concerned with efficiency in your particular processing
environment or with ensuring data are accessible to a wide (in time
and/or space) audience using diverse tools/platforms. Is most
processing standardized or is it mostly ad-hoc?

Maybe you can steal something from netcdf4 (base on hdf4) and the
CF conventions:

<http://www.unidata.ucar.edu/projects/THREDDS/GALEON/netCDFprofile-short.htm>

It would be nice to see a set of examples of HDF5 (and netcdf4) file layouts,
perhaps with some expert comments on what's good, bad, or ugly about
the format. In my field, hdf4 and netcdf3 are widely used, but the current
examples suffer from decisions made too many years ago, so might
work better as examples of "what not to do".

Do you use HLL's (matlab, IDL, Splus, etc.)? One approach is to load
the data into a package and use the simplest interface to save the data,
then see what you get. Octave (matlab "clone") supports the option to
use hdf5 for "save" files. If a variable is defined as a sequence (like your
time variable), octave uses the corresponding hdf5 data type.

The downside is that other HLL's may not support all the hdf5 capabilities.
R (Splus "clone") has a simple hdf5 interface, but at present does not
understand sequences. Being open source, I expect the capabilities
will evolve, so if you don't currently use R, that might never be a problem.
Even if you don't currently use HLL's, it could be helpful in designing a format
to check for interoperability. If you don't want to shell out for
the commercial
tools, there are good open source "clones" (already mentioned, plus
gnudatalanguage for IDL), all with some level of hdf5 support. While you
don't expect everything to work, you can try to make sure that the things
that don't work would not be difficult to implement if they become
necessary in the future.

···

On Tue, Nov 25, 2008 at 8:00 AM, John Storrs <john.storrs@ukaea.org.uk> wrote:

--
George N. White III <aa056@chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Quincey_Koziol · November 25, 2008, 6:38pm

I'm a member of the MAST team at Culham Laboratory, UK. We are proposing to migrate from our own archive format to HDF5 for MAST shot data. The majority of MAST archive data items are simple real functions of time. Others are real functions of time and one or more spatial dimensions. Each archive file stores a subset of data items for a particular shot.
I'd like to know if there is an established HDF5 file layout standard for our type of data which we should consider adopting. One issue is how to link the principal dataset for each data item to datasets storing the independent variable values. Another is how to store values of the independent (dimension) variables efficiently. For example, in our work sets of data items often share the time dimension, and time values can often be recorded simply as 3 floats: startTime, timeIncrement, numberOfValues.
Is there an appropriate standard, or must we roll our own and hope others in the field will accept it?

Elena, would the neutron scattering community's standard Nexus file be similar?

Quincey

···

On Nov 25, 2008, at 6:00 AM, John Storrs wrote:

Regards
John
--
John Storrs, Experiments Dept e-mail: john.storrs@ukaea.org.uk
Building D3, UKAEA Fusion tel: 01235 466338
Culham Science Centre fax: 01235 466379
Abingdon, Oxfordshire OX14 3DB http://www.fusion.org.uk

This message has been scanned for viruses by BlackSpider MailControl

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

epourmal · November 25, 2008, 8:57pm

I'm a member of the MAST team at Culham Laboratory, UK. We are proposing to migrate from our own archive format to HDF5 for MAST shot data. The majority of MAST archive data items are simple real functions of time. Others are real functions of time and one or more spatial dimensions. Each archive file stores a subset of data items for a particular shot.
I'd like to know if there is an established HDF5 file layout standard for our type of data which we should consider adopting. One issue is how to link the principal dataset for each data item to datasets storing the independent variable values. Another is how to store values of the independent (dimension) variables efficiently. For example, in our work sets of data items often share the time dimension, and time values can often be recorded simply as 3 floats: startTime, timeIncrement, numberOfValues.
Is there an appropriate standard, or must we roll our own and hope others in the field will accept it?

Elena, would the neutron scattering community's standard Nexus file be similar?

I don't think so. Using NetCDF-4 will be a better solution in my opinion.

Elena

···

On Nov 25, 2008, at 12:38 PM, Quincey Koziol wrote:

On Nov 25, 2008, at 6:00 AM, John Storrs wrote:

Quincey

Regards
John
--
John Storrs, Experiments Dept e-mail: john.storrs@ukaea.org.uk
Building D3, UKAEA Fusion tel: 01235 466338
Culham Science Centre fax: 01235 466379
Abingdon, Oxfordshire OX14 3DB http://www.fusion.org.uk

This message has been scanned for viruses by BlackSpider MailControl