HDF5/XML synergy

gheber · November 10, 2011, 9:37pm

Dear HDF5 users,

We (The HDF group) would like to get your thoughts on the importance and
role of XML in the HDF5 ecosystem. To focus the discussion, we have

1) put together an online survey at HDF5/XML Synergy Survey

and

2) written the attached document summarizing some of our thoughts on the
subject.

If you get a chance, please complete the survey and share your
comments on the document with us by email or through the HDF-forum
(hdf-forum@hdfgroup.org)

Thank you.

Gerd Heber

Gerd Heber | The HDF Group
Email: gheber@hdfgroup.org | 1800 South Oak Street
Work: (217) 531-6109 | Suite 203
Mobile: (217) 419-0660 | Champaign, IL 61820

HDF5XML.pdf (515 KB)

Rhys_Ulerich · November 11, 2011, 3:06pm

We (The HDF group) would like to get your thoughts on the importance and
role of XML in the HDF5 ecosystem.

I like it. Especially the possibility of representing new domains
entirely within HDF5/XML.

On the data side, using a skeletal HDF5/XML dump of a datafile to
check that it is valid according to some domain-specific schema will
be handy. I suggest adding recommended HDF5 attributes with
particular names for such validation purposes (e.g. an XML schema
against which '/' or some other object should verify) so that tools
like h5diff could perform such a verification.

On the developer/user side, if there's a one-to-one HDF5-to-XML
mapping it should be possible to use XML DOM and SAX APIs, ideally
within the mainstream XML parsers, to traverse an HDF5 data file.
Suddenly you'll find HDF5 combined with commodity web technologies and
datasets relatively easily rendered with DOM/CSS approaches. It would
be a huge win to toss a plugin into Firefox, point it at an HDF5 file,
provide a stylesheet, and find the datasets genuinely browsable. I
personally love the utility and quality of the un*x CLI toolset but I
could imagine browser capabilities easing HDF5 adoption by many folks.

Random comment on the document: From where did you pull the fooT,
barG, etc. naming convention? I find the Hungarian-like notation a
bit distracting compared to, say, just spelling out "Type". No need
to introduce brevity-- it's XML.

- Rhys

Matthieu_Dorier · November 13, 2011, 7:58pm

Something I've always wanted in HDF5 is a way to externally describe the
properties of groups, datasets, etc. for instance when I want to
investigate the best chunk size for a dataset to have good I/O performance,
I have to change the parameters in the right HDF5 function in my code and
recompile my application, I may comment or uncomment some of the calls to
set or unset some properties, etc. Having an XML file describing the data
(groups, datasets, datatypes, etc.) together with the properties to use
(compression feature, chunk size, etc.) would ease the task of adapting the
I/O part of applications for particular needs or particular platforms. This
is an idea already present in the ADIOS data format (
http://www.olcf.ornl.gov/center-projects/adios/) and that I have also used
in my own I/O middleware Damaris (http://damaris.gforge.inria.fr).

Matthieu

···

2011/11/11 Rhys Ulerich <rhys.ulerich@gmail.com>

> We (The HDF group) would like to get your thoughts on the importance and
> role of XML in the HDF5 ecosystem.

I like it. Especially the possibility of representing new domains
entirely within HDF5/XML.

On the data side, using a skeletal HDF5/XML dump of a datafile to
check that it is valid according to some domain-specific schema will
be handy. I suggest adding recommended HDF5 attributes with
particular names for such validation purposes (e.g. an XML schema
against which '/' or some other object should verify) so that tools
like h5diff could perform such a verification.

On the developer/user side, if there's a one-to-one HDF5-to-XML
mapping it should be possible to use XML DOM and SAX APIs, ideally
within the mainstream XML parsers, to traverse an HDF5 data file.
Suddenly you'll find HDF5 combined with commodity web technologies and
datasets relatively easily rendered with DOM/CSS approaches. It would
be a huge win to toss a plugin into Firefox, point it at an HDF5 file,
provide a stylesheet, and find the datasets genuinely browsable. I
personally love the utility and quality of the un*x CLI toolset but I
could imagine browser capabilities easing HDF5 adoption by many folks.

Random comment on the document: From where did you pull the fooT,
barG, etc. naming convention? I find the Hungarian-like notation a
bit distracting compared to, say, just spelling out "Type". No need
to introduce brevity-- it's XML.

- Rhys

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Matthieu Dorier
ENS Cachan, Brittany (Computer Science dpt.)
IRISA Rennes, Office E324
http://perso.eleves.bretagne.ens-cachan.fr/~mdori307/wiki/

gheber · November 21, 2011, 3:20pm

Rhys, thanks for the comment and sorry for the delay. (Just got back from
travel...)

On the data side, using a skeletal HDF5/XML dump of a datafile to check

that

it is valid according to some domain-specific schema will be handy.
I suggest adding recommended HDF5 attributes with particular names for

such

validation purposes (e.g. an XML schema against which '/' or some other

object should verify)

so that tools like h5diff could perform such a verification.

This is an interesting idea. Although some users may perceive "recommended
HDF5 attributes" as too prescriptive and as stepping on their toes.
(Carrying
that through the tool chain would be a big investment.)

We were thinking of a more constraint-based approach, i.e., a domain expert
would
supply an XQuery (or XSL) transform that would convert an HDF5/XML
representation
into a Boolean valued checklist so that non-compliance can be easily
assessed.
The XQuery transform would consist more or less of a list of user-defined
predicates (Boolean functions) which check, e.g., for the presence of
certain
groups or attributes, certain sizes etc.

It would be a huge win to toss a plugin into Firefox, point it at an HDF5

file,

provide a stylesheet, and find the datasets genuinely browsable.
I personally love the utility and quality of the un*x CLI toolset but I

could

imagine browser capabilities easing HDF5 adoption by many folks.

Good point. I hope HDFView will let you do that, but in the absence of that,
most likely a browser will be at hand.

From where did you pull the fooT, barG, etc. naming convention?
I find the Hungarian-like notation a bit distracting compared to,
say, just spelling out "Type". No need to introduce brevity-- it's XML.

No deep philosophical reason here and, no, brevity wasn't the goal.
I just felt odd calling something 'datatypeType'.
There's a certain terminological overload here, because we are dealing with
HDF5 attributes and XML attributes, with HDF5 datatypes and XML schema
datatypes, etc.
Consistency and avoiding ambiguity, maybe at the expense of aesthetics,
were the main goals.

Thanks for those comments and let's keep up the discussion!

Best, G.

Rhys_Ulerich · December 6, 2011, 3:52am

Hi Gerd,

On the data side, using a skeletal HDF5/XML dump of a datafile to check

that

it is valid according to some domain-specific schema will be handy.

We were thinking of a more constraint-based approach, i.e., a domain expert
would
supply an XQuery (or XSL) transform that would convert an HDF5/XML
representation
into a Boolean valued checklist so that non-compliance can be easily
assessed.
The XQuery transform would consist more or less of a list of user-defined
predicates (Boolean functions) which check, e.g., for the presence of
certain
groups or attributes, certain sizes etc.

That XQuery approach sounds handy for details like making sure
different datasets have congruent sizes. Still, for gross structural
validation I'd much rather have a schema. Sounds like both approaches
might have a place. Fortunately XML gives you both for free once
HDF5/XML is involved.

- Rhys