h5diff and ignoring attributes


#1

Hi,

I work on software that uses the R package rhdf5 (https://bioconductor.org/packages/release/bioc/html/rhdf5.html) to write HDF5 files. h5diff is used to compare HDF5 files for equality within our regression tests. As of rhdf 2.34.0, ‘rhdf5-NA.OK’ attributes are automatically added to HDF5 files that are written using rdf5. The presence of this attribute causes our h5diff comparisons to fail if HDF5 files produced in an environment with rhdf 2.34.0 or above are compared to those produced in an environment with rhd5 prior to 2.34.0 (even if all other attributes and data within the HDF5 files are identical).

It would be useful for h5diff to ignore certain attributes when comparing files. I’m aware this request has been raised before (h5diff --exclude-path for attributes, Oct 2012) with the response that “Unfortunately the h5diff doesn’t have option to ignore attribute yet. It’s in the task queue but not yet supported due to no funding source.” I was wondering whether this was still in the task queue and whether there was a plan to a implement this in h5diff

I do appreciate that longer term this will not be an issue as, in time, our users’ will all end up using rhdf 2.34.0 plus anyway.

thanks and best wishes,
mike


#2

I believe this is in recent releases, from the develop RELEASE.txt file;

- h5diff added a command line option to ignore attributes.

    h5diff would ignore all objects with a supplied path if the exclude-path argument is used.
    Adding the exclude-attribute argument will only exclude attributes, with the supplied path,
    from comparison.

    (ADB - 2020/07/20, HDFFV-5935)

#3

Hi Byrn,
Thanks for your reply. I’ve tried hdiff 1.10.7 and can see --exclude-attribute is now supported. However, I am having trouble getting it to work. I have an HDF5 file of structure (I’ve excised non-relevant detail and replaced it with ...):

HDF5 "vignette/output/WTnone/WTnone.h5" {
GROUP "/" {
   GROUP "YAL068C" {
      ...
   }
   ...
   GROUP "YAL001C" {
      GROUP "vignette" {
         GROUP "reads" {
            ATTRIBUTE "buffer_left" { ... }
            ATTRIBUTE "buffer_right" { ... }
            ATTRIBUTE "lengths" { ... }
            ATTRIBUTE "reads_by_len" { ... }
            ATTRIBUTE "reads_total" { ... }
            ATTRIBUTE "start_codon_pos" { ... }
            ATTRIBUTE "stop_codon_pos" {  ... }
            DATASET "data" {
               DATA {
               (0,0): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
               (0,18): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
               ...
               (3982,34): 0, 0, 0, 0, 0, 0, 0
               }
               ATTRIBUTE "rhdf5-NA.OK" {
                  DATATYPE  H5T_STD_I32LE
                  DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
                  DATA {
                  (0): 1
                  }
               }
            }
         }
      }
   }
}
}

The version of rhdf5 used determines if ATTRIBUTE "rhdf5-NA.OK" is present or not in the file. I want to ignore this attribute when comparing files created using different versions of rhdf. I’ve tried different alternatives but to no avail:

Ignore attribute for YAL001C group’s data only, specifying absolute path to attribute:

$ ../hdf5-1.10.7-linux-centos7-x86_64-gcc485-shared/bin/h5diff -v2 --exclude-attribute "/YAL001C/vignette/reads/data/rhdf5-NA.OK" /home/ubuntu/test-data-20210618/output/WTnone/WTnone.h5 vignette/output/WTnone/WTnone.h5 /YAL001C/vignette/reads/data

dataset: </YAL001C/vignette/reads/data> and </YAL001C/vignette/reads/data>
0 differences found
   obj1   obj2
 --------------------------------------
           x    rhdf5-NA.OK    
Attributes status:  0 common, 0 only in obj1, 1 only in obj2

Ignore attribute for YAL001C group’s data only, specifying relative path to attribute:

$ ../hdf5-1.10.7-linux-centos7-x86_64-gcc485-shared/bin/h5diff -v2 --exclude-attribute "rhdf5-NA.OK" /home/ubuntu/test-data-20210618/output/WTnone/WTnone.h5 vignette/output/WTnone/WTnone.h5 /YAL001C/vignette/reads/data

dataset: </YAL001C/vignette/reads/data> and </YAL001C/vignette/reads/data>
0 differences found
   obj1   obj2
 --------------------------------------
           x    rhdf5-NA.OK    
Attributes status:  0 common, 0 only in obj1, 1 only in obj2

Ignore attribute for YAL001C group, specifying absolute path to attribute:

 ../hdf5-1.10.7-linux-centos7-x86_64-gcc485-shared/bin/h5diff -v2 --exclude-attribute "/YAL001C/vignette/reads/data/rhdf5-NA.OK" /home/ubuntu/test-data-20210618/output/WTnone/WTnone.h5 vignette/output/WTnone/WTnone.h5 /YAL001C
...
dataset: </YAL001C/vignette/reads/data> and </YAL001C/vignette/reads/data>
0 differences found
   obj1   obj2
 --------------------------------------
           x    rhdf5-NA.OK    
Attributes status:  0 common, 0 only in obj1, 1 only in obj2

Ignore attribute for YAL001C group, specifying relative path to attribute:

$ ../hdf5-1.10.7-linux-centos7-x86_64-gcc485-shared/bin/h5diff -v2 --exclude-attribute "rhdf5-NA.OK" /home/ubuntu/test-data-20210618/output/WTnone/WTnone.h5 vignette/output/WTnone/WTnone.h5 /YAL001C

dataset: </YAL001C/vignette/reads/data> and </YAL001C/vignette/reads/data>
0 differences found
   obj1   obj2
 --------------------------------------
           x    rhdf5-NA.OK    
Attributes status:  0 common, 0 only in obj1, 1 only in obj2

Compare all groups, specifying relative path to attribute:

$ ../hdf5-1.10.7-linux-centos7-x86_64-gcc485-shared/bin/h5diff --exclude-attribute "rhdf5-NA.OK" /home/ubuntu/test-data-20210618/output/WTnone/WTnone.h5 vignette/output/WTnone/WTnone.h5
$ echo $?
1

Any hints would be most welcome, and apologies if I’ve missed or misinterpreted something.
thanks and best wishes,
mike


#4

Try the absolute path to the object that contains the attribute, maybe.

So “/YAL001C/vignette/reads/data”

Allen


#5

Hi Allen,
Thanks, that worked for the specific YAL001C entry:

$ ../hdf5-1.10.7-linux-centos7-x86_64-gcc485-shared/bin/h5diff -v2 --exclude-attribute /YAL001C/vignette/reads/data /home/ubuntu/test-data-20210618/output/WTnone/WTnone.h5 vignette/output/WTnone/WTnone.h5 /YAL001C/vignette/reads/data

dataset: </YAL001C/vignette/reads/data> and </YAL001C/vignette/reads/data>
0 differences found

OK

$ ../hdf5-1.10.7-linux-centos7-x86_64-gcc485-shared/bin/h5diff -v2 --exclude-attribute /YAL001C/vignette/reads/data /home/ubuntu/test-data-20210618/output/WTnone/WTnone.h5 vignette/output/WTnone/WTnone.h5 /YAL001C
dataset: </YAL001C/vignette/reads/data> and </YAL001C/vignette/reads/data>
0 differences found

OK.
The HDF5 file has that attribute on every group. I’d want to avoid having to specify an --exclude-attribute parameter for every group name in the file. Are relative paths or wild-cards supported in the --exclude-attribute parameter? I tried a relative path, vignette/reads/data, and also /*/vignette/reads/data but with no success.
thanks,
mike


#6

Looking at the code, I would say no, only absolute paths. However the logic seems similar to the other exclude option, which I think allows relative paths.

Maybe an enhancement issue is needed?

Allen


#7

That enhancement would be useful, thanks.