h5diff and compound types

I was trying to use h5diff to compare results between two hardware platforms and was quickly reminded that I'm always going to have different results. We store the latency of the data, which is of course different between the two systems.

These are (nearly) all compound types that I'm comparing, and I'd like to be able to ignore the "latency" field when doing comparisons, but I don't see any way to do this with h5diff. What's more, I haven't been able to get h5diff to print the field name with the different output, so I can't be sure the differences are all the latency without hand-inspecting each and every value (there are thousands).

Is there a more efficient way to do differencing on hdf5 files where you ignore one or more field of a compound type?

With the lack of response, I've been trying to hack h5diff to print compound type field labels. I'm planning on sharing patches in the hopes that the changes will get integrated into the distribution.

That said, is there a preference of the positioning of the field name? I've currently added the feature to the function that prints the element position in the dataset, so it looks like:

[ 1 0 ]latency -198.484 -136.111 62.3722

but could add it to the end instead (yeah I realize the format above needs a bit of work). It's slightly more difficult to add a field label to the end, but not impossible.

Any comments from the HDF team? Quincy? Peter? Barbara?

ยทยทยท

On 08/26/2013 03:55 PM, John K wrote:

I was trying to use h5diff to compare results between two hardware platforms and was quickly reminded that I'm always going to have different results. We store the latency of the data, which is of course different between the two systems.

These are (nearly) all compound types that I'm comparing, and I'd like to be able to ignore the "latency" field when doing comparisons, but I don't see any way to do this with h5diff. What's more, I haven't been able to get h5diff to print the field name with the different output, so I can't be sure the differences are all the latency without hand-inspecting each and every value (there are thousands).

Is there a more efficient way to do differencing on hdf5 files where you ignore one or more field of a compound type?

Here's a patch that adds member names to h5diff when compound types are being differenced. This patch is for 1.8.9 (we have not yet upgraded).

In detail:
1) member name is added to the mcomp_t type defined in tools/lib/h5diff.h
2) format strings in tools/lib/h5diff_array.c now include a spot for member name
3) several functions in above file have been modified to pass the mcomp_t structure around
4) calls to above functions of course changed
5) Use of format strings (parallel_print, etc.) changed to either use the member name (in diff_datum) or an empty string (everywhere else)
6) free new field of mcomp_t in close_member_types()

You should be able to patch your HDF5 1.8.9 source using gnu patch like so:
cd hdf5-1.8.9
patch -p1 <h5diff-member-names.patch

It may even work on 1.8.10 or 1.8.11.

Of course I'll feel slightly foolish if someone tells me the reason there's been no interest thus far is because this has already been addressed in one of those later versions :slight_smile:

h5diff-member-names.patch (127 KB)