Detecting netCDF versus HDF5

Hi Ward

As you know, Data Explorer is going to be a general purpose data reader for many formats, including HDF5 and netCDF.

Here

http://www.space-research.org/

Regarding the handling of both HDF5 and netCDF, it seems there is a potential issue, which is, how to tell if any HDF5 file was saved by the HDF5 API or by the netCDF API?

It seems to me that this is not possible. Is this correct?

netCDF uses an internal function NC_check_file_type to examine the first few bytes of a file, and for example for any HDF5 file the test is

/* Look at the magic number */
    /* Ignore the first byte for HDF */
    if(magic[1] == 'H' && magic[2] == 'D' && magic[3] == 'F') {
      *filetype = FT_HDF;
      *version = 5;

The problem is that this test works for any HDF5 file and for any netCDF file, which makes it impossible to tell which is which.

Which makes it impossible for any general purpose data reader to decide to use the netCDF API or the HDF5 API.

I have a possible solution for this , but before going any further, I would just like to confirm that

1) Is indeed not possible

2) See if you have a solid workaround for this, excluding the dumb ones, for example deciding on a extension .nc or .h5, or traversing the HDF5 file to see if it's non netCDF conforming one. Yes, to further complicate things, it is possible that the above test says OK for a HDF5 file, but then the read by the netCDF API fails because the file is a HDF5 non netCDF conformant

Thanks

···

----------------------
Pedro Vicente
pedro.vicente@space-research.org
http://www.space-research.org/

Perhaps NetCDF (and other higher-level APIs that are built on top of HDF5) should include an attribute attached to the root group that identifies the name and version of the API that created the file? (adopt this as a convention)

-john

···

On Mar 2, 2016, at 12:55 PM, Pedro Vicente <pedro.vicente@space-research.org> wrote:

Hi Ward

As you know, Data Explorer is going to be a general purpose data reader for many formats, including HDF5 and netCDF.

Here

http://www.space-research.org/

Regarding the handling of both HDF5 and netCDF, it seems there is a potential issue, which is, how to tell if any HDF5 file was saved by the HDF5 API or by the netCDF API?

It seems to me that this is not possible. Is this correct?

netCDF uses an internal function NC_check_file_type to examine the first few bytes of a file, and for example for any HDF5 file the test is

/* Look at the magic number */
  /* Ignore the first byte for HDF */
  if(magic[1] == 'H' && magic[2] == 'D' && magic[3] == 'F') {
    *filetype = FT_HDF;
    *version = 5;

The problem is that this test works for any HDF5 file and for any netCDF file, which makes it impossible to tell which is which.

Which makes it impossible for any general purpose data reader to decide to use the netCDF API or the HDF5 API.

I have a possible solution for this , but before going any further, I would just like to confirm that

1) Is indeed not possible

2) See if you have a solid workaround for this, excluding the dumb ones, for example deciding on a extension .nc or .h5, or traversing the HDF5 file to see if it's non netCDF conforming one. Yes, to further complicate things, it is possible that the above test says OK for a HDF5 file, but then the read by the netCDF API fails because the file is a HDF5 non netCDF conformant

Thanks

----------------------
Pedro Vicente
pedro.vicente@space-research.org
http://www.space-research.org/

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hmmm. Is there any big reason NOT to try to read a netCDF produced HDF5 file with the native HDF5 library if somene so chooses?

As far as detecting the data producer goes, I have a similar problem with my Silo library. Silo can write to HDF5. It can also write to PDB (Thats 'Portable Databse', https://wci.llnl.gov/codes/pact/pdb.html) not Protien Database).

And, attmpeting to read an HDF5 file produced by Silo using just the HDF5 library (e.g. w/o Silo) is a major pain.

To handle detection of Silo/HDF5, Silo/PDB, there are a couple of things I do.

First, augment the Linux 'file' utility calling it 'silofile'...

#!/bin/sh

···

#
# Use octal dump (od) command to examine first few bytes of file.
# If do not find expected bytes of any of the formats we'd like
# to identify here, fall back to using the good ole' file command.
#
for f in $*; do
    if test -f $f; then
        headerBytes=$(od -a -N 10 $f)
        if test -n "$(echo $headerBytes | tr -d ' ' | grep '<<PDB:')"; then
            echo "$f: Portable Database (PDB) data"
        elif test -n "$(echo $headerBytes | tr -d ' \\' | grep 'HDFcrnl')"; then
            echo "$f: Hierarchical Data Format version 5 (HDF5) data"
        else
            headerBytes=$(od -t x1 -N 4 $f)
            if test -n "$(echo $headerBytes | grep '0000000 0e 03 13 01')"; then
                echo "$f: Hierarchical Data Format version 4 (HDF4) data"
            else
                file $f
            fi
        fi
    else # not a regular file
        file $f
    fi
done

Now, this won't tell a user if the file was produced by Silo but it will tell a user whether the file appears to be HDF5, PDB or HDF4 and that is usually sufficient for Silo users

Now, from within C code, its sufficient for me to just attempt to open the file using Silo's open routines. That process involves looking for telltale signs the file was produced by Silo. It turns out the Silo library creates a couple of somewhat uniquley named char datasets in the root group of the file, "_silolibinfo" and "_hdf5libinfo". So, if Silo's open succeeds, its a fairly certain sign the file was actually produced by Silo.

In a cursory look over the libsrc4 sources in netCDF distro, I see a few things that might give a hint a file was created with netCDF. . .

First, in NC_CLASSIC_MODEL, an attribute gets attached to the root group named "_nc3_strict". So, the existence of an attribute on the root group by that name would suggest the HDF5 file was generated by netCDF.

Also, I tested a simple case of nc_open, nc_def_dim, etc. nc_close to see what it produced.

It appears to produce datasets for each 'dimension' defined with two attributes named "CLASS" and "NAME". The value of "CLASS" is a 16 char null-terminated string "DIMENSION_SCALE" and the value of "NAME" is a 64-char null terminated string of the form "This is a netCDF dimension but not a netCDF variable. %d"

Finally, someone does an nc_open followed immediately by nc_close, then I don't think the resulting HDF5 file has anything to suggest it might have been created by netCDF. OTOH, the file is also devoid of any objects in that case and so who cares whether netCDF produced it.

Hope that helps.

Mark

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of John Shalf <jshalf@lbl.gov<mailto:jshalf@lbl.gov>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Wednesday, March 2, 2016 1:02 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Cc: "netcdfgroup@unidata.ucar.edu<mailto:netcdfgroup@unidata.ucar.edu>" <netcdfgroup@unidata.ucar.edu<mailto:netcdfgroup@unidata.ucar.edu>>, Ward Fisher <wfisher@ucar.edu<mailto:wfisher@ucar.edu>>
Subject: Re: [Hdf-forum] Detecting netCDF versus HDF5

Perhaps NetCDF (and other higher-level APIs that are built on top of HDF5) should include an attribute attached to the root group that identifies the name and version of the API that created the file? (adopt this as a convention)

-john

On Mar 2, 2016, at 12:55 PM, Pedro Vicente <pedro.vicente@space-research.org<mailto:pedro.vicente@space-research.org>> wrote:
Hi Ward
As you know, Data Explorer is going to be a general purpose data reader for many formats, including HDF5 and netCDF.
Here
http://www.space-research.org/
Regarding the handling of both HDF5 and netCDF, it seems there is a potential issue, which is, how to tell if any HDF5 file was saved by the HDF5 API or by the netCDF API?
It seems to me that this is not possible. Is this correct?
netCDF uses an internal function NC_check_file_type to examine the first few bytes of a file, and for example for any HDF5 file the test is
/* Look at the magic number */
   /* Ignore the first byte for HDF */
   if(magic[1] == 'H' && magic[2] == 'D' && magic[3] == 'F') {
     *filetype = FT_HDF;
     *version = 5;
The problem is that this test works for any HDF5 file and for any netCDF file, which makes it impossible to tell which is which.
Which makes it impossible for any general purpose data reader to decide to use the netCDF API or the HDF5 API.
I have a possible solution for this , but before going any further, I would just like to confirm that
1) Is indeed not possible
2) See if you have a solid workaround for this, excluding the dumb ones, for example deciding on a extension .nc or .h5, or traversing the HDF5 file to see if it's non netCDF conforming one. Yes, to further complicate things, it is possible that the above test says OK for a HDF5 file, but then the read by the netCDF API fails because the file is a HDF5 non netCDF conformant
Thanks
----------------------
Pedro Vicente
pedro.vicente@space-research.org<mailto:pedro.vicente@space-research.org>
http://www.space-research.org/
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

I like John's suggestion here.

But, any code you add to any applications now will work *only* for files that were produced post-adoption of this convention.

There are probably a bazillion files out there at this point that don't follow that convention and you probably still want your applications to be able to read them.

In VisIt, we support >140 format readers. Over 20 of those are different variants of HDF5 files (H5part, Xdmf, Pixie, Silo, Samrai, netCDF, Flash, Enzo, Chombo, etc., etc.) When opening a file, how does VisIt figure out which plugin to use? In particular, how do we avoid one poorly written reader plugin (which may be the wrong one for a given file) from preventing the correct one from being found. Its kinda a hard problem.

Some of our discussion is captured here. . .

http://www.visitusers.org/index.php?title=Database_Format_Detection

Mark

···

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of John Shalf <jshalf@lbl.gov<mailto:jshalf@lbl.gov>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Wednesday, March 2, 2016 1:02 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Cc: "netcdfgroup@unidata.ucar.edu<mailto:netcdfgroup@unidata.ucar.edu>" <netcdfgroup@unidata.ucar.edu<mailto:netcdfgroup@unidata.ucar.edu>>, Ward Fisher <wfisher@ucar.edu<mailto:wfisher@ucar.edu>>
Subject: Re: [Hdf-forum] Detecting netCDF versus HDF5

Perhaps NetCDF (and other higher-level APIs that are built on top of HDF5) should include an attribute attached to the root group that identifies the name and version of the API that created the file? (adopt this as a convention)

-john

On Mar 2, 2016, at 12:55 PM, Pedro Vicente <pedro.vicente@space-research.org<mailto:pedro.vicente@space-research.org>> wrote:
Hi Ward
As you know, Data Explorer is going to be a general purpose data reader for many formats, including HDF5 and netCDF.
Here
http://www.space-research.org/
Regarding the handling of both HDF5 and netCDF, it seems there is a potential issue, which is, how to tell if any HDF5 file was saved by the HDF5 API or by the netCDF API?
It seems to me that this is not possible. Is this correct?
netCDF uses an internal function NC_check_file_type to examine the first few bytes of a file, and for example for any HDF5 file the test is
/* Look at the magic number */
   /* Ignore the first byte for HDF */
   if(magic[1] == 'H' && magic[2] == 'D' && magic[3] == 'F') {
     *filetype = FT_HDF;
     *version = 5;
The problem is that this test works for any HDF5 file and for any netCDF file, which makes it impossible to tell which is which.
Which makes it impossible for any general purpose data reader to decide to use the netCDF API or the HDF5 API.
I have a possible solution for this , but before going any further, I would just like to confirm that
1) Is indeed not possible
2) See if you have a solid workaround for this, excluding the dumb ones, for example deciding on a extension .nc or .h5, or traversing the HDF5 file to see if it's non netCDF conforming one. Yes, to further complicate things, it is possible that the above test says OK for a HDF5 file, but then the read by the netCDF API fails because the file is a HDF5 non netCDF conformant
Thanks
----------------------
Pedro Vicente
pedro.vicente@space-research.org<mailto:pedro.vicente@space-research.org>
http://www.space-research.org/
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

DETECTING HDF5 VERSUS NETCDF GENERATED FILES
REQUEST FOR COMMENTS

AUTHOR: Pedro Vicente

AUDIENCE:
1) HDF, netcdf developers,

Ed Hartnett
Kent Yang

2) HDF, netcdf users, that replied to this thread

Miller, Mark C.
John Shalf

3 ) netcdf tools developers

Mary Haley , NCL

4) HDF, netcdf managers and sponsors

David Pearah , CEO HDF Group
Ward Fisher, UCAR
Marinelli, Daniel J. , Richard Ullmman, Christopher Lynnes, NASA

5)
[CF-metadata] list

After this thread started 2 months ago, there was an annoucement on the [CF-metadata] mail list
about
"a meeting to discuss current and future netCDF-CF efforts and directions.
The meeting will be held on 24-26 May 2016 in Boulder, CO, USA at the UCAR Center Green facility."

This would be a good topic to put on the agenda, maybe?

THE PROBLEM:

Currently it is impossible to detect if an HDF5 file was generated by the HDF5 API or by the netCDF API.
See previous email about the reasons why.

WHY THIS MATTERS:

Software applications that need to handle both netCDF and HDF5 files cannot decide which API to use.
This includes popular visualization tools like IDL, Matlab, NCL, HDF Explorer.

SOLUTIONS PROPOSED: 2

SOLUTION 1: Add a flag to HDF5 source

The hdf5 format specification, listed here

https://www.hdfgroup.org/HDF5/doc/H5.format.html

describes a sequence of bytes in the file layout that have special meaning for the HDF5 API. It is common practice, when designing a data format,
so leave some fields "reserved for future use".

This solution makes use of one of these empty "reserved for future use" spaces to save a byte (for example) that describes an enumerator
of "HDF5 compatible formats".

An "HDF5 compatible format" is a data format that uses the HDF5 API at a lower level (usually hidden from the user of the upper API),
and providing its own API.

This category can still be divide in 2 formats:
1) A "pure HDF5 compatible format". Example, NeXus

http://www.nexusformat.org/

NeXus just writes some metadata (attributes) on top of the HDF5 API, that has some special meaning for the NeXus community

2) A "non pure HDF5 compatible format". Example, netCDF

Here, the format adds some extra feature besides HDF5. In the case of netCDF, these are shared dimensions between variables.

This sub-division between 1) and 2) is irrelevant for the problem and solution in question

The solution consists of writing a different enumerator value on the "reserved for future use" space. For example

Value decimal 0 (current value): This file was generated by the HDF5 API (meaning the HDF5 only API)
Value decimal 1: This file was generated by the netCDF API (using HDF5)
Value decimal 2: This file was generated by <put here another HDF5 based format>
and so on

The advantage of this solution is that this process involves 2 parties: the HDF Group and the other format's organization.

This allows the HDF Group to "keep track" of new HDF5 based formats. It allows to make the other format "HDF5 certified" .

SOLUTION 2: Add some metadata to the other API on top of HDF5

This is what Nexus uses.
A Nexus file on creation writes several attributes on the root group, like "NeXus_version" and other numeric data.
This is done using the public HDF5 API calls.

The solution for netCDF consists of the same approach, just write some specific attributes, and a special netCDF API to write/read them.

This solutions just requires the work of one party (the netCDF group)

END OF RFC

In reply to people that commented in the thread

@John Shalf

Perhaps NetCDF (and other higher-level APIs that are built on top of HDF5) should include an attribute attached
to the root group that identifies the name and version of the API that created the file? (adopt this as a convention)

yes, that's one way to do it, Solution 2 above

@Mark Miller

Hmmm. Is there any big reason NOT to try to read a netCDF produced HDF5 file with the native HDF5 library if someone so chooses?

It's possible to read a netCDF file using HDF5, yes.
There are 2 things that you will miss doing this:

1) the ability to inquire about shared netCDF dimensions.
2) the ability to read remotely with openDAP.
Reading with HDF5 also exposes metadata that is supposed to be private to netCDF. See below

And, attempting to read an HDF5 file produced by Silo using just the HDF5 library (e.g. w/o Silo) is a major pain.

This I don't understand. Why not read the Silo file with the Silo API?

That's the all purpose of this issue, each higher level API on top of HDF5 should be able to detect "itself".
I am not familiar with Silo, but if Silo cannot do this, then you have the same design flaw that netCDF has.

In a cursory look over the libsrc4 sources in netCDF distro, I see a few things that might give a hint a file was created with netCDF. . .

First, in NC_CLASSIC_MODEL, an attribute gets attached to the root group named "_nc3_strict". So, the existence of an attribute on the root group by that name would suggest the HDF5 file was generated by netCDF.

I think this is done only by the "old" netCDF3 format.

Also, I tested a simple case of nc_open, nc_def_dim, etc. nc_close to see what it produced.

It appears to produce datasets for each 'dimension' defined with two attributes named "CLASS" and "NAME".

This is because netCDF uses the HDF5 Dimension Scales API internally to keep track of shared dimensions. These are internal attributes
of Dimension Scales. This approach would not work because an HDF5 only file with Dimension Scales would have the same attributes.

I like John's suggestion here.

But, any code you add to any applications now will work *only* for files that were produced post-adoption of this convention.

yes. there are 2 actions to take here.
1) fix the issue for the future
2) try to retroactively have some workaround that makes possible now to differentiate a HDF5/netCDF files made before the adopted convention
see below

In VisIt, we support >140 format readers. Over 20 of those are different variants of HDF5 files (H5part, Xdmf, Pixie, Silo, Samrai, netCDF, Flash, Enzo, Chombo, etc., etc.)
When opening a file, how does VisIt figure out which plugin to use? In particular, how do we avoid one poorly written reader plugin (which may be the wrong one for a given file) from preventing the correct one from being found. Its kinda a hard problem.

Yes, that's the problem we are trying to solve. I have to say, that is quick a list of HDF5 based formats there.

Some of our discussion is captured here. . .

http://www.visitusers.org/index.php?title=Database_Format_Detection

I"ll check it out, thank you for the suggestions

@Ed Hartnett

I must admit that when putting netCDF-4 together I never considered that someone might want to tell the difference between a "native" HDF5 file and a netCDF-4/HDF5 file.

Well, you can't think of everything.

This is a major design flaw.
If you are in the business of designing data file formats, one of the things you have to do is how to make it possible to identify it from the other formats.

I agree that it is not possible to canonically tell the difference. The netCDF-4 API does use some special attributes to track named dimensions,

and to tell whether classic mode should be enforced. But it can easily produce files without any named dimensions, etc.

So I don't think there is any easy way to tell.

I remember you wrote that code together with Kent Yang from the HDF Group.
At the time I was with the HDF Group but unfortunately I did follow closely what you were doing.
I don't remember any design document being circulated that explains the internals of the "how to" make the netCDF (classic) model of shared dimensions
use the hierarchical group model of HDF5.
I know this was done using the HDF5 Dimension Scales (that I wrote), but is there any design document that explains it?

Maybe just some internal email exchange between you and Kent Yang?
Kent, how are you?
Do you remember having any design document that explains this?
Maybe something like a unique private attribute that is written somewhere in the netCDF file?

@Mary Haley, NCL

NCL is a widely used tool that handles both netCDF and HDF5

Mary, how are you?
How does NCL deal with the case of reading both pure HDF5 files and netCDF files that use HDF5?
Would you be interested in joining a community based effort to deal with this, in case this is an issue for you?

@David Pearah , CEO HDF Group

I volunteer to participate in the effort of this RFC together with the HDF Group (and netCDF Group).
Maybe we could make a "task force" between HDF Group, netCDF Group and any volunteer (such as tools developers that happen to be in these mail lists)?

The "task force" would have 2 tasks:
1) make a HDF5 based convention for the future and
2) try to retroactively salvage the current design issue of netCDF
My phone is 217-898-9356, you are welcome to call in anytime.

···

----------------------
Pedro Vicente
pedro.vicente@space-research.org
https://twitter.com/_pedro__vicente
http://www.space-research.org/

  ----- Original Message -----
  From: Miller, Mark C.
  To: HDF Users Discussion List
  Cc: netcdfgroup@unidata.ucar.edu ; Ward Fisher
  Sent: Wednesday, March 02, 2016 7:07 PM
  Subject: Re: [Hdf-forum] Detecting netCDF versus HDF5

  I like John's suggestion here.

  But, any code you add to any applications now will work *only* for files that were produced post-adoption of this convention.

  There are probably a bazillion files out there at this point that don't follow that convention and you probably still want your applications to be able to read them.

  In VisIt, we support >140 format readers. Over 20 of those are different variants of HDF5 files (H5part, Xdmf, Pixie, Silo, Samrai, netCDF, Flash, Enzo, Chombo, etc., etc.) When opening a file, how does VisIt figure out which plugin to use? In particular, how do we avoid one poorly written reader plugin (which may be the wrong one for a given file) from preventing the correct one from being found. Its kinda a hard problem.

  Some of our discussion is captured here. . .

  http://www.visitusers.org/index.php?title=Database_Format_Detection

  Mark

  From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of John Shalf <jshalf@lbl.gov>
  Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
  Date: Wednesday, March 2, 2016 1:02 PM
  To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
  Cc: "netcdfgroup@unidata.ucar.edu" <netcdfgroup@unidata.ucar.edu>, Ward Fisher <wfisher@ucar.edu>
  Subject: Re: [Hdf-forum] Detecting netCDF versus HDF5

    Perhaps NetCDF (and other higher-level APIs that are built on top of HDF5) should include an attribute attached to the root group that identifies the name and version of the API that created the file? (adopt this as a convention)

    -john

      On Mar 2, 2016, at 12:55 PM, Pedro Vicente <pedro.vicente@space-research.org> wrote:
      Hi Ward
      As you know, Data Explorer is going to be a general purpose data reader for many formats, including HDF5 and netCDF.
      Here
      http://www.space-research.org/
      Regarding the handling of both HDF5 and netCDF, it seems there is a potential issue, which is, how to tell if any HDF5 file was saved by the HDF5 API or by the netCDF API?
      It seems to me that this is not possible. Is this correct?
      netCDF uses an internal function NC_check_file_type to examine the first few bytes of a file, and for example for any HDF5 file the test is
      /* Look at the magic number */
         /* Ignore the first byte for HDF */
         if(magic[1] == 'H' && magic[2] == 'D' && magic[3] == 'F') {
           *filetype = FT_HDF;
           *version = 5;
      The problem is that this test works for any HDF5 file and for any netCDF file, which makes it impossible to tell which is which.
      Which makes it impossible for any general purpose data reader to decide to use the netCDF API or the HDF5 API.
      I have a possible solution for this , but before going any further, I would just like to confirm that
      1) Is indeed not possible
      2) See if you have a solid workaround for this, excluding the dumb ones, for example deciding on a extension .nc or .h5, or traversing the HDF5 file to see if it's non netCDF conforming one. Yes, to further complicate things, it is possible that the above test says OK for a HDF5 file, but then the read by the netCDF API fails because the file is a HDF5 non netCDF conformant
      Thanks
      ----------------------
      Pedro Vicente
      pedro.vicente@space-research.org
      http://www.space-research.org/
      _______________________________________________
      Hdf-forum is for HDF software users discussion.
      Hdf-forum@lists.hdfgroup.org
      http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
      Twitter: https://twitter.com/hdf5

    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    Hdf-forum@lists.hdfgroup.org
    http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
    Twitter: https://twitter.com/hdf5

------------------------------------------------------------------------------

  _______________________________________________
  Hdf-forum is for HDF software users discussion.
  Hdf-forum@lists.hdfgroup.org
  http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
  Twitter: https://twitter.com/hdf5

I'll admit to not having time to read this whole email in detail. But, I've read enough and wanted to make just a few remarks.

  1. Silo *does* know whether a given HDF5 file was produced by Silo. It does so by storing some key datasets in the HDF5 file that are, in all likelihood, unique to Silo. That isn't to say that some other workflow somewhere in the world might generate similarly named, shaped and typed datasets with similar contents. But, its unlikely enough situation that I claim Silo knows with certainty when it is given an HDF5 file whether the file was indeed produced with Silo.
  2. IMHO, this issue is totally analgous to global symbol name space in C applications. Every now and then, you encounter situations in linking together upty-umpt C libraries that two libraries export the same public symbol and the link fails. The best practice is to avoid using *common* symbol names like 'temp', 'lib', 'status', etc in the public symbol space. For example, we all prepend some 3 or 4 letter moniker to library function names (e.g. MPI_). It works, obviously when everyone in the community observes the best practice. Why can't the same approach be taken for HDF5 files? The HDF Group could advocate for and we, the community, coud adopt the best practice of associating say a string valued attribute with the root group in the file. The attributes name could be shared or it could be unique. Unique maybe a bit better but not required. What is required is the same best practice that the contents of that attribute be designed to be unique to the upper level API that is using it.
  3. I am not sure I appreciate nor agree with attempting to distinguish the difference others are trying to make between "Upper Level API" and a "pure HDF5 compatable format". Any thing written with HDF5 can be read with HDF5 (without the upper level API). Of course, there may be conventions that the upper level API utilizes that the HDF5 API itself may be ignorant of. So what? We do this quite frequently with Silo and Python. Silo writes HDF5 files and some users write Python scripts to read it. Those users understand the conventional ways in which Silo is using HDF5 and those conventions become codified in the Python they write to read HDF5 directly (e.g. without Silo). Its sometimes a pain because Silo does actually try very hard to obscure the details of how its using HDF5. But, it is nonetheless possible and so I see this distinction as rather moot.
  4. I think (not really sure) HDF5 may have some low level features to insert a magick byte sequence into the boot block or other parts of the file header *and* such data can be queried back into the application. If so, that solution might even be better as it avoids stuffing anything into the file's "HDF5 Object Global Namespace"
  5. We still have a bazillion legacy files out there. Can't fix those and so still need some hueristics to facilitate workflows using them.

Mark

···

From: Pedro Vicente <pedro.vicente@space-research.org<mailto:pedro.vicente@space-research.org>>
Date: Thursday, April 21, 2016 8:33 AM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>, "cf-metadata@cgd.ucar.edu<mailto:cf-metadata@cgd.ucar.edu>" <cf-metadata@cgd.ucar.edu<mailto:cf-metadata@cgd.ucar.edu>>, Discussion forum for the NeXus data format <nexus@nexusformat.org<mailto:nexus@nexusformat.org>>, "netcdfgroup@unidata.ucar.edu<mailto:netcdfgroup@unidata.ucar.edu>" <netcdfgroup@unidata.ucar.edu<mailto:netcdfgroup@unidata.ucar.edu>>
Cc: "Miller, Mark C." <miller86@llnl.gov<mailto:miller86@llnl.gov>>, "Marinelli, Daniel J. (GSFC-5810)" <daniel.j.marinelli@nasa.gov<mailto:daniel.j.marinelli@nasa.gov>>, "Richard.E.Ullman@nasa.gov<mailto:Richard.E.Ullman@nasa.gov>" <Richard.E.Ullman@nasa.gov<mailto:Richard.E.Ullman@nasa.gov>>, "Christopher.S.Lynnes@nasa.gov<mailto:Christopher.S.Lynnes@nasa.gov>" <Christopher.S.Lynnes@nasa.gov<mailto:Christopher.S.Lynnes@nasa.gov>>, Kent Yang <myang6@hdfgroup.org<mailto:myang6@hdfgroup.org>>, John Shalf <jshalf@lbl.gov<mailto:jshalf@lbl.gov>>, "haley@ucar.edu<mailto:haley@ucar.edu>" <haley@ucar.edu<mailto:haley@ucar.edu>>, Ward Fisher <wfisher@ucar.edu<mailto:wfisher@ucar.edu>>, Ed Hartnett <edwardjameshartnett@gmail.com<mailto:edwardjameshartnett@gmail.com>>
Subject: Re: [Hdf-forum] Detecting netCDF versus HDF5 -- PROPOSED SOLUTIONS --REQUEST FOR COMMENTS

DETECTING HDF5 VERSUS NETCDF GENERATED FILES
REQUEST FOR COMMENTS

AUTHOR: Pedro Vicente

AUDIENCE:
1) HDF, netcdf developers,

Ed Hartnett
Kent Yang

2) HDF, netcdf users, that replied to this thread

Miller, Mark C.
John Shalf

3 ) netcdf tools developers

Mary Haley , NCL

4) HDF, netcdf managers and sponsors

David Pearah , CEO HDF Group
Ward Fisher, UCAR
Marinelli, Daniel J. , Richard Ullmman, Christopher Lynnes, NASA

5)
[CF-metadata] list

After this thread started 2 months ago, there was an annoucement on the [CF-metadata] mail list
about
"a meeting to discuss current and future netCDF-CF efforts and directions.
The meeting will be held on 24-26 May 2016 in Boulder, CO, USA at the UCAR Center Green facility."

This would be a good topic to put on the agenda, maybe?

THE PROBLEM:

Currently it is impossible to detect if an HDF5 file was generated by the HDF5 API or by the netCDF API.
See previous email about the reasons why.

WHY THIS MATTERS:

Software applications that need to handle both netCDF and HDF5 files cannot decide which API to use.
This includes popular visualization tools like IDL, Matlab, NCL, HDF Explorer.

SOLUTIONS PROPOSED: 2

SOLUTION 1: Add a flag to HDF5 source

The hdf5 format specification, listed here

https://www.hdfgroup.org/HDF5/doc/H5.format.html<https://secure-web.cisco.com/10WllRNMv7Cf_PosL959Mbsygr6CpMM45KnvTONP6jx97fWvsvC10H9DS106gC-6O9BkN9sohf_QOYGlODmXFI1KQLkliXk0cZNwrY0tFjrXcIk2Zohje7KImMGcFyupYxpzbMgrELQuc9DVsvYOcCS6QIYztbPNCl4IFf0eGSsX7-OIgT2ptpe3Inuxm13VC3Ydw86ELUiQDJMVxvPE4wk6m6b2mwEWyA7F9PWEnmqzB6ZihhKbqzBn9GwC7VRnFDLFCFlcMQrqAaFHVa9uylHoWHm-KHv58o05PcuuXc2xWhZPLdz6K4UMwRVU9Q2-vzrO7Wb6uMzogIuXY3CPnQFkcmmBtAIKyRGpm-yLCN3M/https%3A%2F%2Fwww.hdfgroup.org%2FHDF5%2Fdoc%2FH5.format.html>

describes a sequence of bytes in the file layout that have special meaning for the HDF5 API. It is common practice, when designing a data format,
so leave some fields "reserved for future use".

This solution makes use of one of these empty "reserved for future use" spaces to save a byte (for example) that describes an enumerator
of "HDF5 compatible formats".

An "HDF5 compatible format" is a data format that uses the HDF5 API at a lower level (usually hidden from the user of the upper API),
and providing its own API.

This category can still be divide in 2 formats:
1) A "pure HDF5 compatible format". Example, NeXus

http://www.nexusformat.org/

NeXus just writes some metadata (attributes) on top of the HDF5 API, that has some special meaning for the NeXus community

2) A "non pure HDF5 compatible format". Example, netCDF

Here, the format adds some extra feature besides HDF5. In the case of netCDF, these are shared dimensions between variables.

This sub-division between 1) and 2) is irrelevant for the problem and solution in question

The solution consists of writing a different enumerator value on the "reserved for future use" space. For example

Value decimal 0 (current value): This file was generated by the HDF5 API (meaning the HDF5 only API)
Value decimal 1: This file was generated by the netCDF API (using HDF5)
Value decimal 2: This file was generated by <put here another HDF5 based format>
and so on

The advantage of this solution is that this process involves 2 parties: the HDF Group and the other format's organization.

This allows the HDF Group to "keep track" of new HDF5 based formats. It allows to make the other format "HDF5 certified" .

SOLUTION 2: Add some metadata to the other API on top of HDF5

This is what Nexus uses.
A Nexus file on creation writes several attributes on the root group, like "NeXus_version" and other numeric data.
This is done using the public HDF5 API calls.

The solution for netCDF consists of the same approach, just write some specific attributes, and a special netCDF API to write/read them.

This solutions just requires the work of one party (the netCDF group)

END OF RFC

In reply to people that commented in the thread

@John Shalf

Perhaps NetCDF (and other higher-level APIs that are built on top of HDF5) should include an attribute attached
to the root group that identifies the name and version of the API that created the file? (adopt this as a convention)

yes, that's one way to do it, Solution 2 above

@Mark Miller

Hmmm. Is there any big reason NOT to try to read a netCDF produced HDF5 file with the native HDF5 library if someone so chooses?

It's possible to read a netCDF file using HDF5, yes.
There are 2 things that you will miss doing this:
1) the ability to inquire about shared netCDF dimensions.
2) the ability to read remotely with openDAP.
Reading with HDF5 also exposes metadata that is supposed to be private to netCDF. See below

And, attempting to read an HDF5 file produced by Silo using just the HDF5 library (e.g. w/o Silo) is a major pain.

This I don't understand. Why not read the Silo file with the Silo API?

That's the all purpose of this issue, each higher level API on top of HDF5 should be able to detect "itself".
I am not familiar with Silo, but if Silo cannot do this, then you have the same design flaw that netCDF has.

In a cursory look over the libsrc4 sources in netCDF distro, I see a few things that might give a hint a file was created with netCDF. . .

First, in NC_CLASSIC_MODEL, an attribute gets attached to the root group named "_nc3_strict". So, the existence of an attribute on the root group by that name would suggest the HDF5 file was generated by netCDF.

I think this is done only by the "old" netCDF3 format.

Also, I tested a simple case of nc_open, nc_def_dim, etc. nc_close to see what it produced.

It appears to produce datasets for each 'dimension' defined with two attributes named "CLASS" and "NAME".

This is because netCDF uses the HDF5 Dimension Scales API internally to keep track of shared dimensions. These are internal attributes
of Dimension Scales. This approach would not work because an HDF5 only file with Dimension Scales would have the same attributes.

I like John's suggestion here.

But, any code you add to any applications now will work *only* for files that were produced post-adoption of this convention.

yes. there are 2 actions to take here.
1) fix the issue for the future
2) try to retroactively have some workaround that makes possible now to differentiate a HDF5/netCDF files made before the adopted convention
see below

In VisIt, we support >140 format readers. Over 20 of those are different variants of HDF5 files (H5part, Xdmf, Pixie, Silo, Samrai, netCDF, Flash, Enzo, Chombo, etc., etc.)
When opening a file, how does VisIt figure out which plugin to use? In particular, how do we avoid one poorly written reader plugin (which may be the wrong one for a given file) from preventing the correct one from being found. Its kinda a hard problem.

Yes, that's the problem we are trying to solve. I have to say, that is quick a list of HDF5 based formats there.

Some of our discussion is captured here. . .

http://www.visitusers.org/index.php?title=Database_Format_Detection

I"ll check it out, thank you for the suggestions

@Ed Hartnett

I must admit that when putting netCDF-4 together I never considered that someone might want to tell the difference between a "native" HDF5 file and a netCDF-4/HDF5 file.

Well, you can't think of everything.

This is a major design flaw.
If you are in the business of designing data file formats, one of the things you have to do is how to make it possible to identify it from the other formats.

I agree that it is not possible to canonically tell the difference. The netCDF-4 API does use some special attributes to track named dimensions,

and to tell whether classic mode should be enforced. But it can easily produce files without any named dimensions, etc.

So I don't think there is any easy way to tell.

I remember you wrote that code together with Kent Yang from the HDF Group.
At the time I was with the HDF Group but unfortunately I did follow closely what you were doing.
I don't remember any design document being circulated that explains the internals of the "how to" make the netCDF (classic) model of shared dimensions
use the hierarchical group model of HDF5.
I know this was done using the HDF5 Dimension Scales (that I wrote), but is there any design document that explains it?

Maybe just some internal email exchange between you and Kent Yang?
Kent, how are you?
Do you remember having any design document that explains this?
Maybe something like a unique private attribute that is written somewhere in the netCDF file?

@Mary Haley, NCL

NCL is a widely used tool that handles both netCDF and HDF5

Mary, how are you?
How does NCL deal with the case of reading both pure HDF5 files and netCDF files that use HDF5?
Would you be interested in joining a community based effort to deal with this, in case this is an issue for you?

@David Pearah , CEO HDF Group

I volunteer to participate in the effort of this RFC together with the HDF Group (and netCDF Group).
Maybe we could make a "task force" between HDF Group, netCDF Group and any volunteer (such as tools developers that happen to be in these mail lists)?

The "task force" would have 2 tasks:
1) make a HDF5 based convention for the future and
2) try to retroactively salvage the current design issue of netCDF
My phone is 217-898-9356, you are welcome to call in anytime.

----------------------
Pedro Vicente
pedro.vicente@space-research.org<mailto:pedro.vicente@space-research.org>
https://twitter.com/_pedro__vicente
http://www.space-research.org/

----- Original Message -----
From:Miller, Mark C.<mailto:miller86@llnl.gov>
To: HDF Users Discussion List<mailto:hdf-forum@lists.hdfgroup.org>
Cc: netcdfgroup@unidata.ucar.edu<mailto:netcdfgroup@unidata.ucar.edu> ; Ward Fisher<mailto:wfisher@ucar.edu>
Sent: Wednesday, March 02, 2016 7:07 PM
Subject: Re: [Hdf-forum] Detecting netCDF versus HDF5

I like John's suggestion here.

But, any code you add to any applications now will work *only* for files that were produced post-adoption of this convention.

There are probably a bazillion files out there at this point that don't follow that convention and you probably still want your applications to be able to read them.

In VisIt, we support >140 format readers. Over 20 of those are different variants of HDF5 files (H5part, Xdmf, Pixie, Silo, Samrai, netCDF, Flash, Enzo, Chombo, etc., etc.) When opening a file, how does VisIt figure out which plugin to use? In particular, how do we avoid one poorly written reader plugin (which may be the wrong one for a given file) from preventing the correct one from being found. Its kinda a hard problem.

Some of our discussion is captured here. . .

http://www.visitusers.org/index.php?title=Database_Format_Detection

Mark

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of John Shalf <jshalf@lbl.gov<mailto:jshalf@lbl.gov>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Wednesday, March 2, 2016 1:02 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Cc: "netcdfgroup@unidata.ucar.edu<mailto:netcdfgroup@unidata.ucar.edu>" <netcdfgroup@unidata.ucar.edu<mailto:netcdfgroup@unidata.ucar.edu>>, Ward Fisher <wfisher@ucar.edu<mailto:wfisher@ucar.edu>>
Subject: Re: [Hdf-forum] Detecting netCDF versus HDF5

Perhaps NetCDF (and other higher-level APIs that are built on top of HDF5) should include an attribute attached to the root group that identifies the name and version of the API that created the file? (adopt this as a convention)

-john

On Mar 2, 2016, at 12:55 PM, Pedro Vicente <pedro.vicente@space-research.org<mailto:pedro.vicente@space-research.org>> wrote:
Hi Ward
As you know, Data Explorer is going to be a general purpose data reader for many formats, including HDF5 and netCDF.
Here
http://www.space-research.org/
Regarding the handling of both HDF5 and netCDF, it seems there is a potential issue, which is, how to tell if any HDF5 file was saved by the HDF5 API or by the netCDF API?
It seems to me that this is not possible. Is this correct?
netCDF uses an internal function NC_check_file_type to examine the first few bytes of a file, and for example for any HDF5 file the test is
/* Look at the magic number */
   /* Ignore the first byte for HDF */
   if(magic[1] == 'H' && magic[2] == 'D' && magic[3] == 'F') {
     *filetype = FT_HDF;
     *version = 5;
The problem is that this test works for any HDF5 file and for any netCDF file, which makes it impossible to tell which is which.
Which makes it impossible for any general purpose data reader to decide to use the netCDF API or the HDF5 API.
I have a possible solution for this , but before going any further, I would just like to confirm that
1) Is indeed not possible
2) See if you have a solid workaround for this, excluding the dumb ones, for example deciding on a extension .nc or .h5, or traversing the HDF5 file to see if it's non netCDF conforming one. Yes, to further complicate things, it is possible that the above test says OK for a HDF5 file, but then the read by the netCDF API fails because the file is a HDF5 non netCDF conformant
Thanks
----------------------
Pedro Vicente
pedro.vicente@space-research.org<mailto:pedro.vicente@space-research.org>
http://www.space-research.org/
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

________________________________

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

An attribute in the root with a name that only a specific format would use
is both effective and trivial to implement. I consider not needing HDF
Group involvement or certification an advantage.

David

All,

I am probably missing something in this discussion. Since Pedro asked me to chime in and answer his question, I’ll try... [I am referring to Pedro’s initial question "Regarding the handling of both HDF5 and netCDF, it seems there is a potential issue, which is, how to tell if any HDF5 file was saved by the HDF5 API or by the netCDF API?”]

netCDF-4 file is an HDF5 file. netCDF-4 is not a file format but a convention how to store data that is described by the netCDF-4 data model in HDF5.

I don’t think there is a solution to the problem which APIs wrote the file. One can write a pure C program that doesn’t call HDF5 or netCDF-4 library but writes an HDF5 file according to the HDF5 file format and to the netCDF-4 convention making it a netCDF-4 file.

One should probably have a checker function that traverses an HDF5 file and tells if the file is compliant with the netCDF-4 convention. Adding attributes, etc., really will not help. I can add an attribute to a “non-netCDF-4" HDF5 file and fool netCDF-4 library. I can also write netCDF-4 file using just pure HDF5 library by following convention of the netCDF-4 library.

I think the tool should follow Common Data Model and shield data formats from the user. What I am missing?

Elena

···

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Apr 24, 2016, at 6:08 PM, Pedro Vicente <pedro.vicente@space-research.org> wrote:

All

I posted some code on github that solves the issue for older netCDF files, see below

In reply to previous comments

@ John Caron

Here are the blogs:

http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf4_shared_dimensions

I had seen some of your blogs but not the one above.
By looking at the netCDF code I came out with the code below that uses detection
of one of the "hidden" attributes described in that blog, and other one that is not described

@ David Brown

But this is not ideal, because we only

want to open files that are explicitly written using NetCDF4 as
NetCDF

Hi David, yes, that's the issue.

I think this piece of code I posted on github is the possible best solution
for this:

https://github.com/pedro-vicente/netcdf-detect

@ Ed Hartnett

I wrote that code by reading the comments you wrote on the files nc4file.c and nc4hdf.c > > here > > https://github.com/Unidata/netcdf-c/tree/master/libsrc4

do you agree with the solution?

anyone feel free to use that code

the C function is called is_netcdf()

the netCDF API writes, if variables and dimensions are present in the file:

1) an attribute named "_Netcdf4Dimid" (in some cases)
2) an attribute named "NAME", (always), saved by the HDF5 Dimension Scales API,
that contains the string "This is a netCDF dimension but not a netCDF variable."

This utility tries to detect both attributes by traversing the HDF5 file, if either case is found, it returns a value of 1

the program includes 3 test cases: 2 cases that generate a file with 1) and 2) above (they are mutually exclusive, it seems)
, and a third one that simply does

nc_create
nc_close

in this case, the above attributes are not written, so the test will fail, like someone posted here before.
I would say if someone writes this kind of file, it is irrelevant using HDF5 or netCDF , the files are virtually identical

* another * case that would give a false positive is the case where someone tries to be a spoiler and uses the HDF5 API
to write these 2 attributes
"_Netcdf4Dimid"
"This is a netCDF dimension but not a netCDF variable."

The only real "spoiler proof" 100% solution is the SOLUTION 1 I posted before:
to have HDF5 save a byte in the file that explicitly tells what kind of derived API it is.
This function would be a private HDF5 function called by the derived API, say called on
nc_create()
So, it does not deal with attributes written by public APIs at all

@ Elena Pourmal

Hi Elena, how are you?

Any change of discussing this solution?

by the way some of my email on this thread sent to the hdf-forum last Friday is waiting for approval

"Your mail to 'Hdf-forum' with the subject...
Is being held until the list moderator can review it for approval.
"

The hdf-forum now requires approval by a moderator?
that does not work very well on weekends for example

----------------------
Pedro Vicente
pedro.vicente@space-research.org
https://twitter.com/_pedro__vicente
http://www.space-research.org/

----- Original Message ----- From: "David Brown" <dbrown@ucar.edu>
To: <netcdfgroup@unidata.ucar.edu>
Sent: Saturday, April 23, 2016 3:06 PM
Subject: Re: [netcdfgroup] netcdfgroup Digest, Vol 1126, Issue 2

Since Pedro asked earlier about how NCL distinguishes between NetCDF4
and HDF5, I'm going to add my 2 cents to what now appears to be the
longest thread ever on this mailing list.

First a bit of background. Traditionally NCL has distinguished among
file formats based solely on file extensions. If a file name ends with
".nc" then it is considered to be a NetCDF file and will be opened
using the NetCDF library calls. Additionally there is an idiosyncratic
feature where you can add an "virtual" extension to a file name to
specify the format you want to use. For example, if the file is name
"test", you can open it as "test.h5" to open it using HDF5 calls.
Given this name NCL will look first for a file called "test.h5" and if
that is not found then it will look for "test". You can even add
extensions to files that already have them to open a file using
another format: e.g. test.hdf.nc.

But recent versions of NCL attempt to figure out the format of files
that do not have recognized extensions. And that means we have
definitely run into the issue that Pedro originally brought up. We
want our HDF5 module to handle HDF5 files on their own terms,
including, e.g., recognizing reference types. For now, we first try to
see if the file can be opened using the NetCDF library, and if not, we
try various versions of HDF. But this is not ideal, because we only
want to open files that are explicitly written using NetCDF4 as
NetCDF. So it is indeed welcome news that there will be global
attributes added to explicitly identify the file as NetCDF4. However,
it also would be nice if nc_inq_format or nc_inq_format_extended could
be adjusted to give a definitive answer as to whether the file was
created as NetCDF4. I have to admit I was quite surprised to discover
that nc_inq_format_extended would not answer this seemingly obvious
(to me at least) question.
-Dave Brown
NCL technical architect

On Sat, Apr 23, 2016 at 10:21 AM, <netcdfgroup-request@unidata.ucar.edu> >> wrote:

Send netcdfgroup mailing list submissions to
       netcdfgroup@unidata.ucar.edu

To subscribe or unsubscribe via the World Wide Web, visit
       http://mailman.unidata.ucar.edu/mailman/listinfo/netcdfgroup
or, via email, send a message with subject or body 'help' to
       netcdfgroup-request@unidata.ucar.edu

You can reach the person managing the list at
       netcdfgroup-owner@unidata.ucar.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of netcdfgroup digest..."

Today's Topics:

  1. Re: [CF-metadata] [Hdf-forum] Detecting netCDF versus HDF5 --
     PROPOSED SOLUTIONS --REQUEST FOR COMMENTS (John Caron)

----------------------------------------------------------------------

Message: 1
Date: Fri, 22 Apr 2016 21:57:51 -0600
From: John Caron <jcaron1129@gmail.com>
To: Pedro Vicente <pedro.vicente@space-research.org>
Cc: cf-metadata@cgd.ucar.edu, NetCDF-Java community
       <netcdf-java@unidata.ucar.edu>, netcdfgroup@unidata.ucar.edu
Subject: Re: [netcdfgroup] [CF-metadata] [Hdf-forum] Detecting netCDF
       versus HDF5 -- PROPOSED SOLUTIONS --REQUEST FOR COMMENTS
Message-ID:

<CAN1vDkp3iYVaBcEvoC8irp83AVKT85Mq+h75PWU_L-dExjWcMA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Here are the blogs:

http://www.unidata.ucar.edu/blogs/developer/en/entry/dimensions_scales
http://www.unidata.ucar.edu/blogs/developer/en/entry/dimension_scale2
http://www.unidata.ucar.edu/blogs/developer/en/entry/dimension_scales_part_3
http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf4_shared_dimensions
http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf4_use_of_dimension_scales

On Fri, Apr 22, 2016 at 7:57 AM, Pedro Vicente < >>> pedro.vicente@space-research.org> wrote:

John

>>>i have written various blogs on the unidata site about why netcdf4 !=
hdf5, and what the unique signature for shared dimensions looks like, in
>>>case you want details.

yes, I am interested, I had the impression by looking at the code some
years ago that netCDF writes some unique name attributes somewhere

----------------------
Pedro Vicente
pedro.vicente@space-research.org
https://twitter.com/_pedro__vicente
http://www.space-research.org/

----- Original Message -----
*From:* John Caron <jcaron1129@gmail.com>
*To:* Pedro Vicente <pedro.vicente@space-research.org>
*Cc:* cf-metadata@cgd.ucar.edu ; Discussion forum for the NeXus data
format <nexus@nexusformat.org> ; netcdfgroup@unidata.ucar.edu ; Dennis
Heimbigner <dmh@ucar.edu> ; NetCDF-Java community
<netcdf-java@unidata.ucar.edu>
*Sent:* Thursday, April 21, 2016 11:11 PM
*Subject:* Re: [CF-metadata] [netcdfgroup] [Hdf-forum] Detecting netCDF
versus HDF5 -- PROPOSED SOLUTIONS --REQUEST FOR COMMENTS

1) I completely agree with the idea of adding system metadata that
indicates the library version(s) that wrote the file.

2) the way shared dimensions are implemented by netcdf4 is a unique
signature that would likely identify (100 - epsilon) % of real data
files
in the wild. One could add such detection to the netcdf4 and/or hdf5
libraries, and/or write a utility program to detect.

there are 2 variants:

2.1) one could write a netcdf4 file without shared dimensions, though im
pretty sure no one does. but you could argue then that its fine to just
treat it as an hdf5 file and read through hdf5 library

2.2) one could write a netcdf4 file with hdf5 library, if you knew what
you are doing. i have heard of this happening. but then you could argue
that its really a netcdf4 file and you should use netcdf library to read
.

i have written various blogs on the unidata site about why netcdf4 !=
hdf5, and what the unique signature for shared dimensions looks like, in
case you want details.

On Thu, Apr 21, 2016 at 4:18 PM, Pedro Vicente < >>>> pedro.vicente@space-research.org> wrote:

If you have hdf5 files that should be readable, then I will undertake
to

look at them and see what the problem is.

ok, thank you

WRT to old files: We could produce a utility that would redef the file

and insert the
    _NCProperties attribute. This would allow someone to wholesale
    mark old files.

Excellent idea , Dennis

----------------------
Pedro Vicente
pedro.vicente@space-research.org
https://twitter.com/_pedro__vicente
http://www.space-research.org/

----- Original Message ----- From: <dmh@ucar.edu>
To: "Pedro Vicente" <pedro.vicente@space-research.org>; <
cf-metadata@cgd.ucar.edu>; "Discussion forum for the NeXus data format"
<
nexus@nexusformat.org>; <netcdfgroup@unidata.ucar.edu>
Sent: Thursday, April 21, 2016 5:02 PM
Subject: Re: [netcdfgroup] [Hdf-forum] Detecting netCDF versus HDF5 --
PROPOSED SOLUTIONS --REQUEST FOR COMMENTS

If you have hdf5 files that should be readable, then I will undertake
to

look at them and see what the problem is.
WRT to old files: We could produce a utility that would redef the
file
and insert the
    _NCProperties attribute. This would allow someone to wholesale
    mark old files.
=Dennis Heimbigner
Unidata

On 4/21/2016 2:17 PM, Pedro Vicente wrote:

Dennis

I am in the process of adding a global attribute in the root group

that captures both the netcdf library version and the hdf5 library

version
whenever a netcdf file is created. The current form is
_NCProperties="version=...|netcdflibversion=...|hdflibversion=..."

ok, good to know, thank you

> 1. I am open to suggestions about changing the format or adding

info > to it.

I personally don't care, anything that uniquely identifies a netCDF
file (HDF5 based) as such will work

2. Of course this attribute will not exist in files written using
older

versions of the netcdf library, but at least the process will have

begun.

yes

3. This technically does not address the original issue because there

exist
    hdf5 files not written by netcdf that are still compatible
with
and can be
    read by netcdf. Not sure this case is important or not.

there will always be HDF5 files not written by netcdf that netCDF
will
read as we are now.

this is not really the issue, but you just made a further issue :slight_smile:

the issue is that I would like an application that reads a netCDF
(HDF5
based) file to decide to use the netCDF or HDF5 API.
your attribute writing will do , for future files.
for older nertCDF files there may be a way to detect the current
attributes and data structures to see if we can make it "identify
itself"
as netCDF. A bit of debugging will confirm that, since Dimension
Scales
are used, that would be an (imperfect maybe) way to do it

regarding the "further issue " above

you could go one step further and for any HDF5 files not written by
netcdf , you could make netCDF reject the file reading,
because it's not "netCDF compliant".
Since having netCDF read pure HDF5 files is not a problem (at least
for
me), I don't know if you would want to do this, just an idea.
In my mind taking complexity and ambiguities of problems is always a
good thing

ah, I forgot one thing, related to this

In the past I have found several pure HDF5 files that netCDF failed
in
reading.
Since netCDF is HDF5 binary compatible, one would expect that all
HDF5
files will be read by netCDF.
Except if you specifically wrote something in the code that makes it
to
fail if some condition is not met,
This was a while ago, I'll try to find those cases and I'll send a
bug
report to the bug report email

----------------------
Pedro Vicente
pedro.vicente@space-research.org
https://twitter.com/_pedro__vicente
http://www.space-research.org/

----- Original Message ----- From: <dmh@ucar.edu>
To: "Pedro Vicente" <pedro.vicente@space-research.org>; "HDF Users
Discussion List" <hdf-forum@lists.hdfgroup.org>; <
cf-metadata@cgd.ucar.edu>; "Discussion forum for the NeXus data
format" <nexus@nexusformat.org>; <netcdfgroup@unidata.ucar.edu>
Cc: "John Shalf" <jshalf@lbl.gov>; <Richard.E.Ullman@nasa.gov>;
"Marinelli, Daniel J. (GSFC-5810)" <daniel.j.marinelli@nasa.gov>;
"Miller, Mark C." <miller86@llnl.gov>
Sent: Thursday, April 21, 2016 2:30 PM
Subject: Re: [netcdfgroup] [Hdf-forum] Detecting netCDF versus
HDF5 --
PROPOSED SOLUTIONS --REQUEST FOR COMMENTS

I am in the process of adding a global attribute in the root group

that captures both the netcdf library version and the hdf5 library
version
whenever a netcdf file is created. The current form is
_NCProperties="version=...|netcdflibversion=...|hdflibversion=..."
Where version is the version of the _NCProperties attribute and the
others
are e.g. 1.8.18 or 4.4.1-rc1.
Issues:
1. I am open to suggestions about changing the format or adding info
to it.
2. Of course this attribute will not exist in files written using
older versions
   of the netcdf library, but at least the process will have begun.
3. This technically does not address the original issue because
there
exist
    hdf5 files not written by netcdf that are still compatible
with
and can be
    read by netcdf. Not sure this case is important or not.
=Dennis Heimbigner
  Unidata

On 4/21/2016 9:33 AM, Pedro Vicente wrote:

DETECTING HDF5 VERSUS NETCDF GENERATED FILES
REQUEST FOR COMMENTS
AUTHOR: Pedro Vicente

AUDIENCE:
1) HDF, netcdf developers,
Ed Hartnett
Kent Yang
2) HDF, netcdf users, that replied to this thread
Miller, Mark C.
John Shalf
3 ) netcdf tools developers
Mary Haley , NCL
4) HDF, netcdf managers and sponsors
David Pearah , CEO HDF Group
Ward Fisher, UCAR
Marinelli, Daniel J. , Richard Ullmman, Christopher Lynnes, NASA
5)
[CF-metadata] list
After this thread started 2 months ago, there was an annoucement on
the [CF-metadata] mail list
about
"a meeting to discuss current and future netCDF-CF efforts and
directions.
The meeting will be held on 24-26 May 2016 in Boulder, CO, USA at
the
UCAR Center Green facility."
This would be a good topic to put on the agenda, maybe?
THE PROBLEM:
Currently it is impossible to detect if an HDF5 file was generated
by
the HDF5 API or by the netCDF API.
See previous email about the reasons why.
WHY THIS MATTERS:
Software applications that need to handle both netCDF and HDF5
files
cannot decide which API to use.
This includes popular visualization tools like IDL, Matlab, NCL,
HDF
Explorer.
SOLUTIONS PROPOSED: 2
SOLUTION 1: Add a flag to HDF5 source
The hdf5 format specification, listed here
https://www.hdfgroup.org/HDF5/doc/H5.format.html
describes a sequence of bytes in the file layout that have special
meaning for the HDF5 API. It is common practice, when designing a
data
format,
so leave some fields "reserved for future use".
This solution makes use of one of these empty "reserved for future
use" spaces to save a byte (for example) that describes an
enumerator
of "HDF5 compatible formats".
An "HDF5 compatible format" is a data format that uses the HDF5 API
at a lower level (usually hidden from the user of the upper API),
and providing its own API.
This category can still be divide in 2 formats:
1) A "pure HDF5 compatible format". Example, NeXus
http://www.nexusformat.org/
NeXus just writes some metadata (attributes) on top of the HDF5
API,
that has some special meaning for the NeXus community
2) A "non pure HDF5 compatible format". Example, netCDF
Here, the format adds some extra feature besides HDF5. In the case
of
netCDF, these are shared dimensions between variables.
This sub-division between 1) and 2) is irrelevant for the problem
and
solution in question
The solution consists of writing a different enumerator value on
the
"reserved for future use" space. For example
Value decimal 0 (current value): This file was generated by the
HDF5
API (meaning the HDF5 only API)
Value decimal 1: This file was generated by the netCDF API (using
HDF5)
Value decimal 2: This file was generated by <put here another HDF5
based format>
and so on
The advantage of this solution is that this process involves 2
parties: the HDF Group and the other format's organization.
This allows the HDF Group to "keep track" of new HDF5 based formats
.
It allows to make the other format "HDF5 certified" .
SOLUTION 2: Add some metadata to the other API on top of HDF5
This is what Nexus uses.
A Nexus file on creation writes several attributes on the root
group,
like "NeXus_version" and other numeric data.
This is done using the public HDF5 API calls.
The solution for netCDF consists of the same approach, just write
some specific attributes, and a special netCDF API to write/read
them.
This solutions just requires the work of one party (the netCDF
group)
END OF RFC
In reply to people that commented in the thread
@John Shalf
>>Perhaps NetCDF (and other higher-level APIs that are built on top
>>of
HDF5) should include an attribute attached
>>to the root group that identifies the name and version of the API
that created the file? (adopt this as a convention)
yes, that's one way to do it, Solution 2 above
@Mark Miller
>>>Hmmm. Is there any big reason NOT to try to read a netCDF
>>>produced
HDF5 file with the native HDF5 library if someone so chooses?
It's possible to read a netCDF file using HDF5, yes.
There are 2 things that you will miss doing this:
1) the ability to inquire about shared netCDF dimensions.
2) the ability to read remotely with openDAP.
Reading with HDF5 also exposes metadata that is supposed to be
private to netCDF. See below
>>>> And, attempting to read an HDF5 file produced by Silo using
>>>> just
the HDF5 library (e.g. w/o Silo) is a major pain.
This I don't understand. Why not read the Silo file with the Silo
API?
That's the all purpose of this issue, each higher level API on top
of
HDF5 should be able to detect "itself".
I am not familiar with Silo, but if Silo cannot do this, then you
have the same design flaw that netCDF has.

>>> In a cursory look over the libsrc4 sources in netCDF distro, I
>>> see
a few things that might give a hint a file was created with netCDF.
.
.
>>>> First, in NC_CLASSIC_MODEL, an attribute gets attached to the
root group named "_nc3_strict". So, the existence of an attribute
on
the root group by that name would suggest the HDF5 file was
generated by
netCDF.
I think this is done only by the "old" netCDF3 format.
>>>>> Also, I tested a simple case of nc_open, nc_def_dim, etc.
nc_close to see what it produced.
>>>> It appears to produce datasets for each 'dimension' defined
>>>> with
two attributes named "CLASS" and "NAME".
This is because netCDF uses the HDF5 Dimension Scales API
internally
to keep track of shared dimensions. These are internal attributes
of Dimension Scales. This approach would not work because an HDF5
only file with Dimension Scales would have the same attributes.

>>>> I like John's suggestion here.
>>>>>But, any code you add to any applications now will work *only*
for files that were produced post-adoption of this convention.
yes. there are 2 actions to take here.
1) fix the issue for the future
2) try to retroactively have some workaround that makes possible
now
to differentiate a HDF5/netCDF files made before the adopted
convention
see below

>>>> In VisIt, we support >140 format readers. Over 20 of those are
different variants of HDF5 files (H5part, Xdmf, Pixie, Silo,
Samrai,
netCDF, Flash, Enzo, Chombo, etc., etc.)
>>>>When opening a file, how does VisIt figure out which plugin to
use? In particular, how do we avoid one poorly written reader
plugin
(which may be the wrong one for a given file) from preventing the
correct
one from being found. Its kinda a hard problem.

Yes, that's the problem we are trying to solve. I have to say, that
is quick a list of HDF5 based formats there.
>>>> Some of our discussion is captured here. . .
http://www.visitusers.org/index.php?title=Database_Format_Detection
I"ll check it out, thank you for the suggestions
@Ed Hartnett
>>>I must admit that when putting netCDF-4 together I never
>>>considered
that someone might want to tell the difference between a "native"
HDF5 file and a netCDF-4/HDF5 file.
>>>>>Well, you can't think of everything.
This is a major design flaw.
If you are in the business of designing data file formats, one of
the
things you have to do is how to make it possible to identify it
from the
other formats.

>>> I agree that it is not possible to canonically tell the
difference. The netCDF-4 API does use some special attributes to
track named dimensions,
>>>>and to tell whether classic mode should be enforced. But it can
easily produce files without any named dimensions, etc.
>>>So I don't think there is any easy way to tell.
I remember you wrote that code together with Kent Yang from the HDF
Group.
At the time I was with the HDF Group but unfortunately I did follow
closely what you were doing.
I don't remember any design document being circulated that explains
the internals of the "how to" make the netCDF (classic) model of
shared
dimensions
use the hierarchical group model of HDF5.
I know this was done using the HDF5 Dimension Scales (that I
wrote),
but is there any design document that explains it?
Maybe just some internal email exchange between you and Kent Yang?
Kent, how are you?
Do you remember having any design document that explains this?
Maybe something like a unique private attribute that is written
somewhere in the netCDF file?

@Mary Haley, NCL
NCL is a widely used tool that handles both netCDF and HDF5
Mary, how are you?
How does NCL deal with the case of reading both pure HDF5 files and
netCDF files that use HDF5?
Would you be interested in joining a community based effort to deal
with this, in case this is an issue for you?

@David Pearah , CEO HDF Group
I volunteer to participate in the effort of this RFC together with
the HDF Group (and netCDF Group).
Maybe we could make a "task force" between HDF Group, netCDF Group
and any volunteer (such as tools developers that happen to be in
these mail
lists)?
The "task force" would have 2 tasks:
1) make a HDF5 based convention for the future and
2) try to retroactively salvage the current design issue of netCDF
My phone is 217-898-9356, you are welcome to call in anytime.
----------------------
Pedro Vicente
pedro.vicente@space-research.org <mailto:
pedro.vicente@space-research.org>
https://twitter.com/_pedro__vicente
http://www.space-research.org/

   ----- Original Message -----
   *From:* Miller, Mark C. <mailto:miller86@llnl.gov>
   *To:* HDF Users Discussion List <mailto:
hdf-forum@lists.hdfgroup.org>
   *Cc:* netcdfgroup@unidata.ucar.edu
   <mailto:netcdfgroup@unidata.ucar.edu> ; Ward Fisher
   <mailto:wfisher@ucar.edu>
   *Sent:* Wednesday, March 02, 2016 7:07 PM
   *Subject:* Re: [Hdf-forum] Detecting netCDF versus HDF5

   I like John's suggestion here.

   But, any code you add to any applications now will work *only*
for
   files that were produced post-adoption of this convention.

   There are probably a bazillion files out there at this point
that
   don't follow that convention and you probably still want your
   applications to be able to read them.

   In VisIt, we support >140 format readers. Over 20 of those are
   different variants of HDF5 files (H5part, Xdmf, Pixie, Silo,
   Samrai, netCDF, Flash, Enzo, Chombo, etc., etc.) When opening a
   file, how does VisIt figure out which plugin to use? In
   particular, how do we avoid one poorly written reader plugin
   (which may be the wrong one for a given file) from preventing
the
   correct one from being found. Its kinda a hard problem.

   Some of our discussion is captured here. . .

http://www.visitusers.org/index.php?title=Database_Format_Detection

   Mark

   From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org
   <mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of
John
   Shalf <jshalf@lbl.gov <mailto:jshalf@lbl.gov>>
   Reply-To: HDF Users Discussion List
<hdf-forum@lists.hdfgroup.org
   <mailto:hdf-forum@lists.hdfgroup.org>>
   Date: Wednesday, March 2, 2016 1:02 PM
   To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org
   <mailto:hdf-forum@lists.hdfgroup.org>>
   Cc: "netcdfgroup@unidata.ucar.edu
   <mailto:netcdfgroup@unidata.ucar.edu>"
   <netcdfgroup@unidata.ucar.edu
   <mailto:netcdfgroup@unidata.ucar.edu>>, Ward Fisher
   <wfisher@ucar.edu <mailto:wfisher@ucar.edu>>
   Subject: Re: [Hdf-forum] Detecting netCDF versus HDF5

       Perhaps NetCDF (and other higher-level APIs that are built
on
       top of HDF5) should include an attribute attached to the
root
       group that identifies the name and version of the API that
       created the file? (adopt this as a convention)

       -john

           On Mar 2, 2016, at 12:55 PM, Pedro Vicente >>>>>>>>> <pedro.vicente@space-research.org >>>>>>>>> <mailto:pedro.vicente@space-research.org>> wrote:
           Hi Ward
           As you know, Data Explorer is going to be a general
           purpose data reader for many formats, including HDF5
and
           netCDF.
           Here
           http://www.space-research.org/
           Regarding the handling of both HDF5 and netCDF, it
seems
           there is a potential issue, which is, how to tell if
any
           HDF5 file was saved by the HDF5 API or by the netCDF
API?
           It seems to me that this is not possible. Is this
correct?
           netCDF uses an internal function NC_check_file_type to
           examine the first few bytes of a file, and for example
for
           any HDF5 file the test is
           /* Look at the magic number */
              /* Ignore the first byte for HDF */
              if(magic[1] == 'H' && magic[2] == 'D' && magic[3] ==
'F') {
                *filetype = FT_HDF;
                *version = 5;
           The problem is that this test works for any HDF5 file
and
           for any netCDF file, which makes it impossible to tell
           which is which.
           Which makes it impossible for any general purpose data
           reader to decide to use the netCDF API or the HDF5 API.
           I have a possible solution for this , but before going
any
           further, I would just like to confirm that
           1) Is indeed not possible
           2) See if you have a solid workaround for this,
           excluding the dumb ones, for example deciding on a
           extension .nc or .h5, or traversing the HDF5 file to
see
           if it's non netCDF conforming one. Yes, to further
           complicate things, it is possible that the above test
says
           OK for a HDF5 file, but then the read by the netCDF API
           fails because the file is a HDF5 non netCDF conformant
           Thanks
           ----------------------
           Pedro Vicente
           pedro.vicente@space-research.org
           <mailto:pedro.vicente@space-research.org>
           http://www.space-research.org/
           _______________________________________________
           Hdf-forum is for HDF software users discussion.
           Hdf-forum@lists.hdfgroup.org
           <mailto:Hdf-forum@lists.hdfgroup.org>

http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
           Twitter: https://twitter.com/hdf5

       _______________________________________________
       Hdf-forum is for HDF software users discussion.
       Hdf-forum@lists.hdfgroup.org <mailto:
Hdf-forum@lists.hdfgroup.org>

http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
       Twitter: https://twitter.com/hdf5

------------------------------------------------------------------------

   _______________________________________________
   Hdf-forum is for HDF software users discussion.
   Hdf-forum@lists.hdfgroup.org

http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
   Twitter: https://twitter.com/hdf5

_______________________________________________
netcdfgroup mailing list
netcdfgroup@unidata.ucar.edu
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/

_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mailman.unidata.ucar.edu/mailing_lists/archives/netcdfgroup/attachments/20160422/f64faad2/attachment.html>

------------------------------

_______________________________________________
netcdfgroup mailing list
netcdfgroup@unidata.ucar.edu
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/

End of netcdfgroup Digest, Vol 1126, Issue 2
********************************************

_______________________________________________
netcdfgroup mailing list
netcdfgroup@unidata.ucar.edu
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/