H5diff considers datasets with only fillvalues as empty

Hi,

I use h5diff a lot in testing our software. When I compare the outputs of the software, I would like to know if some data is filled in when compared to a reference output product. Unfortunately h5diff does not compare datasets if one of the datasets contains only fillvalues.

As a possible example case, if a field in the reference product is not filled and only contains fillvalues, I would like to make sure that the output the software also contains only fillvalues. Currently h5diff skips the comparison because it considers the reference product dataset field empty.

I wanted to post here to ask whether this is a bug or a feature of h5diff? I’ve found this behaviour consistent at least in h5diff versions from 1.8 to 1.10. If it is a feature, could one add an extra command line option that would force comparisons with only fillvalue fields as well?

I found the following document about the h5diff behaviour. It seems that it is not explicitly addressed when datasets are considered empty.

I tried to detail the behaviour in the following example.

Create a test file test.nc that has a single dataset called ‘variable’ and leave it as FILLVALUE. Create a file test2.nc that is identical, except the dataset ‘variable’ contains data.

h5diff does not compare the datasets because the other is considered empty. At least I consider this incorrect behaviour. The datasets are not empty and have differing values in them.

        $ h5diff -c test.nc test2.nc
        Not comparable: </dimension1> or </dimension1> is an empty dataset
        Not comparable: </variable> or </variable> is an empty dataset
       ```
       
      
        OUTPUT from h5dump -d /variable test.nc
        
        ```sh
        $ h5dump -d /variable test.nc
        HDF5 "test.nc" {
        DATASET "/variable" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
            DATA {
            (0): -2147483647
            }
            ATTRIBUTE "DIMENSION_LIST" {
                DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
                DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
                DATA {
                (0): (DATASET 331 /dimension1 )
                }
            }
            }
        }
        ```

        OUTPUT from h5dump -d /variable test2.nc

        ```sh
        $ h5dump -d /variable test2.nc
        HDF5 "test2.nc" {
        DATASET "/variable" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
            DATA {
            (0): 1
            }
            ATTRIBUTE "DIMENSION_LIST" {
                DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
                DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
                DATA {
                (0): (DATASET 331 /dimension1 )
                }
            }
            }
        }

The example files can be generated with the following code. Compiled with gcc main.cpp -Wall -std=c++11 -O3 -lnetcdf -lcurl -lstdc++ -o main on Ubuntu 18.04 LTS on WLS

#include <string>
#include <vector>

#include "netcdf.h"

#define ERR {if(err!=NC_NOERR)printf("Error at line %d: %s\n",__LINE__,nc_strerror(err));}

int main()
{
    int err, file_id, dim1_id, variable_id;

    // Create a test file with couple dimensions and a single variable
    err = nc_create("test.nc", NC_CLOBBER | NC_NETCDF4, &file_id); ERR
    err = nc_def_dim(file_id, "dimension1", 1, &dim1_id); ERR
    std::vector<int> dim_ids {dim1_id};
    err = nc_def_var(file_id, "variable", NC_INT, dim_ids.size(), dim_ids.data(), &variable_id); ERR

    // Close the file
    err = nc_close(file_id); ERR

    // Create a test file with couple dimensions and a single variable
    err = nc_create("test2.nc", NC_CLOBBER | NC_NETCDF4, &file_id); ERR
    err = nc_def_dim(file_id, "dimension1", 1, &dim1_id); ERR
    std::vector<int> dim_ids2 {dim1_id};
    err = nc_def_var(file_id, "variable", NC_INT, dim_ids2.size(), dim_ids2.data(), &variable_id); ERR
    int data = 1;
    nc_put_var(file_id, variable_id, &data);

    // Close the file
    err = nc_close(file_id); ERR

    return 0;

    
}

Cheers

  • Miika