requested feedback on the usage of HDF5 dimension scales and CF metadata conventions (fwd)

hello HDFer's

this has been an interesting discussion about cache, but I am going to change the subject a little bit with some information regarding HDF5 Dimension Scales (DS), that we just released, that I thought it might be useful for everybody.

We received some DS usage questions from users from the Royal Netherlands Meteorological Institute (KNMI), which I reproduce below, with their kind permission.

I prepared a follow up that I will send soon, but this email has the original question

Pedro

SCI_IMLM_lv2_20030911_08001_2741_v6.4.h5 (166 KB)

test_ds6.h5 (144 KB)

···

Date: Fri, 21 Mar 2008 15:54:37 -0500
To: R.M.van.Hees@sron.nl
From: Pedro Vicente Nunes <pvn@hdfgroup.org>
Subject: Re: requested feedback on the usage of HDF5 dimension scales and CF metadata conventions (fwd)
Cc: olgaw@ucar.edu, ben@unidata.ucar.edu, vegtevd@knmi.nl, hDF Helpdesk <help@hdfgroup.org>

Dear Richard and others

Barbara forwarded me your questions, since I implemented the Dimension Scales (DS) API.

By taking a look at your binary file I guess that lat_bnds and lon_bnds contain 4 sets of 1826 latitude or longitude values.

We did not preview this case of having several lat lon sets in one dataset. That would be of course possible to do in the future but in the meantime and
if my assumption that those values are indeed 4 sets of lat and lon, you will have to modify, if possible, the way you save these datasets

For your program to "work" correctly with our DSs you will have to write each set of these 4 sets on a different dataset (that is 4 datasets for lat and 4 for lon) , and then "link" each one of them with your data dataset

You are right that there is no user's manual for this. I was hoping to write such a guide in the near future by this was delayed for a couple months. It will be done tough.

In the meantime we only have the reference manual

http://www.hdfgroup.org/HDF5/doc/HL/RM_H5DS.html

and a test program for the API in the hdf5 distribution under

/hl/test/test_ds.c

by looking at this C program you'll probably have a guess about the "correct" way of using it.

Amongst other things, the program reads and writes realistic lat and lon data from the North Atlantic
(I used this data from an earlier position I had in a numeric hydrodinamics institute, IST , in Portugal)

Also, do download this HDF5 vizualization program called HDF Explorer (if you happen to use Windows)

http://www.space-research.org/

go the left menu and click download

it handles dimension scales , you can add several and choose the ones you want to be displayed in the map

here's a snapshot and the HDF5 file that test program produces with the Atlantic data is attached

let us know if you have more questions and thanks for using the DS API

what are the CF metadata conventions ?

Pedro

Emacs!

---------- Forwarded message ----------
Date: Thu, 20 Mar 2008 12:10:03 +0100
From: Richard van Hees <R.M.van.Hees@sron.nl>
To: help@hdfgroup.org, olgaw@ucar.edu, ben@unidata.ucar.edu
Cc: John van de Vegte <vegtevd@knmi.nl>
Subject: requested feedback on the usage of HDF5 dimension scales and CF
   metadata conventions

Dear Olga, Ben and HDF5 helpdesk,

For the project ADAGUC, I try to convert satellite observations written in various formats to a structured HDF5 file using the CF metadata conventions. In addition, I use the newly released HDF5 dimension scale API. Eventually, the goal is to use these HDF5 datasets as input for software to convert it to different formats (netCDF, geoTiff, kml, etc.) for users and applications which are not familiar with HDF5.

What I kindly ask you is to take a look at a small dataset with satellite data: Sciamachy level 2 retrievals of CH4, CO and H2O. The main dimension of the data is time, which is associated with spatial locations, given by (lon,lat) and bounds with 4-sided cells. This is all together a quite complicated case, however, typical for satellite observations.

Dear HDF5 helpdesk, I have tried to use the dimension scale API, however, due to the lack of documentation I am not sure that I use the API correctly. For example, would you suggest me to use the lat_bnds and lon_bnds as dimensions scales attached to the datasets lat and lon?

Any remark is welcome.

Best regards,

Richard van Hees (SRON, the Netherlands)

--------------------------------------------------------------
Pedro

--------------------------------------------------------------
Pedro

hi, Richard and others

let's go back to my original example that produces the file "test_ds6.h5"
in /hl/test/test_ds.c (file is attached)

that illustrates a "typical" use of the Dimension Scales API for this case. I had a set of bathymetry values for the North Atlantic (a 2D set of values) that were used in a numerical hydrodynamic model *with variable grid size*. This means that in the grid used, the squares that represent each cell are not equal, some are bigger than others.

If you do a map in HDF Explorer by right clicking on the dataset named "data" by choosing Discrete Map View, and then go to Options menu, choose Map, and enable the check box that says "Grid", then you will see a map with a grid in which the squares are different

This approach is commonly used in numeric models, where we want a finer resolution in some areas, but not in others, because having a smaller grid for all the domain might be computationally expensive.

So, anyway, this a perfect example of the use of HDF5 Dimension Scales. I happened to have the spatial information along the X and Y axis (longitude and latitude) values in ASCII files.

So, what that program does is to read all these ASCII files (data , lat and lon) to memory and convert them to a DS HDF5 file

This is done in line 2691 of the file /hl/test/test_ds.c, for example for the latitude

if(read_data("dslat.txt",1,latdims,&latbuf) < 0)
  goto out;

then I make 3 HDF5 datasets (one with the lat values, a 1D array, another with the lon values, a 1D array, another with my data values, the bathymetry, a 2D array)

for example, for lat

if(H5LTmake_dataset_float(fid, "lat", 1, latdims, latbuf) < 0)
  goto out;

note that in HDF Explorer, you see the "land" values (of Western Europe) in grey color. That means that the data dataset was saved with a "fill value" for the land values. The practical effect is that these values are not taken into consideration when doing the legend and the map (in the legend, you will see that the "sea" values range from 20 to 5500 meters) and the land values have the value of -99. So the -99 value is ignored when doing the map and legend.

This dataset is made in line 2719 of test_ds.c. You will notice that this time we did not use the High-Leval API. This is because the HL does not allow to use fill values, so I used the basic API.

Then comes the part you will have to do for your datasets. You open the data dataset, get an ID, DID variable here, open the latitude dataset, get its ID, DSID variable here, and "link" the 2 with this call

if(H5DSattach_scale(did,dsid,DIM0) < 0)

what this function does is to associated the dataset DSID (latitude) with the *dimension* specified by the parameter DIM0 (0, in this case, the first dimension of the 2D array) of the dataset DID

If you open HDF Explorer and expand the attributes of the "data" dataset you will see an attribute called DIMENSION_LIST.
This is done by this function. It is an array that contains 2 HDF5 references, one for the latitude dataset, other for the longitude)

If you expand the "lat" dataset , you will see that it contains an attribute called REFERENCE_LIST. It is a compound type that contains
1) a reference to my "data" dataset
2) the index of the data dataset this scale is to be associated with (0 for the lat, 1 for the lon)

this is also made by this function.
So, basically you just need to call this function H5DSattach_scale to link together your scales with your data datasets

in HDF Explorer, if you open the "select dimension scale" dialog, there are options to select the scales to anyone of the dimensions you have in your dataset (you can switch lat and lon for example)

So, now to your file

I see that the dataset "lat" has the DIMENSION_LIST attribute. It should not have, so I suspect this was somehow "incorrectly" linked, like I described above. The scales should have the REFERENCE_LIST attribute.

I also see that your data (CH4, H2O, so on) is 1D and that it contains 1827 values (the values themselves seem very large, but my knowledge of CH4 concentration units is a bit rusty at this time -:slight_smile: )

So, maybe you will want to link these datasets like this

H5DSattach_scale( CH4 dataset , lat , 0)

Also, your file exposed a bug in HDF Explorer (the 1D dataset use of dimension scales was not checking for the 1D case) I uploaded a new version to the web site. I get some strange values for the array indices though, for example for CH4 I get all "1349.04" values in the indices, maybe a side effect of the DS association you made in your file or just another bug in HDF Explorer

I don't quite understand your use of the corner coordinates of grid cells in the lat_bnds
dataset, and the center coordinates to the grid cells in the variables lat and lon, and also I don't have any suggestion of how "time" could be used here, but I think now you have a better idea how to use our DS API

Regarding HDF View, I am not sure if there are plans to support Dimension Scales

The net-CDF folks might have some suggestions on the use of the "time" variable here, since they regularly use it . By the way, netCDF4 also uses the HDF5 Dimension Scales API

some links

HDF Explorer

http://www.space-research.org/

Dimension Scales

http://www.hdfgroup.org/HDF5/doc/HL/

this latest page is being reworked now to include the Specification documents for Dimension Scales (also for the Table and Image APIs). check it again soon

Pedro

a snaphot of HDF Explorer showing the "select dimension scale" dialog, and the variable grid size on the map

Emacs!

test_ds6.h5 (144 KB)

···

At 04:03 AM 3/26/2008, Richard van Hees wrote:

Dear Pedro,

Many thanks for your quick response.

The CF metadata conventions is a reference document for netCDF usage for the climate and forecast community (<http://cf-pcmdi.llnl.gov/.

As you have guessed correctly, the variables lat_bnds and lon_bnds are the corner coordinates of grid cells, while the variables lat and lon contain the center coordinates to the grid cells. From your answer, I understood that I have to make time, lat and lon dimension scales of the measured species (like H2O, CO and CH4). I am not sure that I should define four different datasets to define the corners of the grid cells (quite an ugly solution). Currently, I try to figure out how the netCDF4 library solves this case.

Thank you for pointing me to HDF Explorer. Unfortunately, it only works under windows, while my main platform is Linux. I have access to a windows machine and the program works nicely on your product. I will use thios program to check my products. Will hdfview also be upgraded to work with DS?

Again many thanks for your help.

Best regards,

Richard

Pedro Vicente Nunes wrote:

Dear Richard and others

Barbara forwarded me your questions, since I implemented the Dimension Scales (DS) API.

By taking a look at your binary file I guess that lat_bnds and lon_bnds contain 4 sets of 1826 latitude or longitude values.

We did not preview this case of having several lat lon sets in one dataset. That would be of course possible to do in the future but in the meantime and
if my assumption that those values are indeed 4 sets of lat and lon, you will have to modify, if possible, the way you save these datasets

For your program to "work" correctly with our DSs you will have to write each set of these 4 sets on a different dataset (that is 4 datasets for lat and 4 for lon) , and then "link" each one of them with your data dataset

You are right that there is no user's manual for this. I was hoping to write such a guide in the near future by this was delayed for a couple months. It will be done tough.

In the meantime we only have the reference manual

http://www.hdfgroup.org/HDF5/doc/HL/RM_H5DS.html

and a test program for the API in the hdf5 distribution under

/hl/test/test_ds.c

by looking at this C program you'll probably have a guess about the "correct" way of using it.

Amongst other things, the program reads and writes realistic lat and lon data from the North Atlantic
(I used this data from an earlier position I had in a numeric hydrodinamics institute, IST , in Portugal)

Also, do download this HDF5 vizualization program called HDF Explorer (if you happen to use Windows)

http://www.space-research.org/

go the left menu and click download

it handles dimension scales , you can add several and choose the ones you want to be displayed in the map

here's a snapshot and the HDF5 file that test program produces with the Atlantic data is attached

let us know if you have more questions and thanks for using the DS API

what are the CF metadata conventions ?

Pedro

--------------------------------------------------------------
Pedro