Hi,
I use h5diff a lot in testing our software. When I compare the outputs of the software, I would like to know if some data is filled in when compared to a reference output product. Unfortunately h5diff does not compare datasets if one of the datasets contains only fillvalues.
As a possible example case, if a field in the reference product is not filled and only contains fillvalues, I would like to make sure that the output the software also contains only fillvalues. Currently h5diff skips the comparison because it considers the reference product dataset field empty.
I wanted to post here to ask whether this is a bug or a feature of h5diff? I’ve found this behaviour consistent at least in h5diff versions from 1.8 to 1.10. If it is a feature, could one add an extra command line option that would force comparisons with only fillvalue fields as well?
I found the following document about the h5diff behaviour. It seems that it is not explicitly addressed when datasets are considered empty.
I tried to detail the behaviour in the following example.
Create a test file test.nc that has a single dataset called ‘variable’ and leave it as FILLVALUE. Create a file test2.nc that is identical, except the dataset ‘variable’ contains data.
h5diff does not compare the datasets because the other is considered empty. At least I consider this incorrect behaviour. The datasets are not empty and have differing values in them.
$ h5diff -c test.nc test2.nc
Not comparable: </dimension1> or </dimension1> is an empty dataset
Not comparable: </variable> or </variable> is an empty dataset
```
OUTPUT from h5dump -d /variable test.nc
```sh
$ h5dump -d /variable test.nc
HDF5 "test.nc" {
DATASET "/variable" {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
DATA {
(0): -2147483647
}
ATTRIBUTE "DIMENSION_LIST" {
DATATYPE H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
DATA {
(0): (DATASET 331 /dimension1 )
}
}
}
}
```
OUTPUT from h5dump -d /variable test2.nc
```sh
$ h5dump -d /variable test2.nc
HDF5 "test2.nc" {
DATASET "/variable" {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
DATA {
(0): 1
}
ATTRIBUTE "DIMENSION_LIST" {
DATATYPE H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
DATA {
(0): (DATASET 331 /dimension1 )
}
}
}
}
The example files can be generated with the following code. Compiled with gcc main.cpp -Wall -std=c++11 -O3 -lnetcdf -lcurl -lstdc++ -o main
on Ubuntu 18.04 LTS on WLS
#include <string>
#include <vector>
#include "netcdf.h"
#define ERR {if(err!=NC_NOERR)printf("Error at line %d: %s\n",__LINE__,nc_strerror(err));}
int main()
{
int err, file_id, dim1_id, variable_id;
// Create a test file with couple dimensions and a single variable
err = nc_create("test.nc", NC_CLOBBER | NC_NETCDF4, &file_id); ERR
err = nc_def_dim(file_id, "dimension1", 1, &dim1_id); ERR
std::vector<int> dim_ids {dim1_id};
err = nc_def_var(file_id, "variable", NC_INT, dim_ids.size(), dim_ids.data(), &variable_id); ERR
// Close the file
err = nc_close(file_id); ERR
// Create a test file with couple dimensions and a single variable
err = nc_create("test2.nc", NC_CLOBBER | NC_NETCDF4, &file_id); ERR
err = nc_def_dim(file_id, "dimension1", 1, &dim1_id); ERR
std::vector<int> dim_ids2 {dim1_id};
err = nc_def_var(file_id, "variable", NC_INT, dim_ids2.size(), dim_ids2.data(), &variable_id); ERR
int data = 1;
nc_put_var(file_id, variable_id, &data);
// Close the file
err = nc_close(file_id); ERR
return 0;
}
Cheers
- Miika