Hello
h5diff seems to have slowed down by 2 orders of magnitude from 1.6.5
to 1.8.1. To wit:
[dws@ice empic]$ time h5diff-1.6.5 unideltaf3p_electrons_1.h5 /ice/ httpd/html/wwwice/vpresults-ice-opt/empic/unideltaf3p/ unideltaf3p_electrons_1.h5 /electrons -r
0 differences found
real 0m0.512s
user 0m0.101s
sys 0m0.411s
[dws@ice empic]$ time h5diff-1.8.1 -r unideltaf3p_electrons_1.h5 /ice/ httpd/html/wwwice/vpresults-ice-opt/empic/unideltaf3p/ unideltaf3p_electrons_1.h5 /electrons
real 0m47.461s
user 0m47.270s
sys 0m0.183s
I just tried 1.8.2 and it's comparable to 1.8.1.
It's pretty much unusable for us now, as a test suite which used to
take about an hour now takes 2.5 hours due to the time increase for
h5diff'ing.
Thanks,
Dave
What made h5diff slower between those 2 versions (1.6.5 and 1.8.1) was the introduction of Not a Number (NaN) detection (or more specifically the way we deal with this detection).
We are requesting for comments regarding a new h5diff option that would enable (or disable) that NaN detection. First (1) I explain how h5diff now handles NaN and (2) the new option
1) Detection of NaN
Consider a floating point number X.
Currently, NaN is detected (in some cases), converting X to a string (using the C library function sprintf , computationally a very slow operation) and then doing a string comparison between this string and one of the possible string representations of NaN. This NaN string representation varies between operating systems and can be for example "NAN ", "nan", "1.#SNAN", etc.
The expression " X != X ", is always false for every infinite or finite number X but reverse, true, if X is NaN. So we could theoretically use this expression to detect NaN.
However, in some platforms, this expression does not evaluate to true for NaN, so in these cases, we have to use the string approach.
Here's how the code works
First , we do the test to detect if X is NaN or not
retval = (X!=X);
if (retval==FALSE)
{
Call sprintf
}
For a "regular" number, retval evaluates to FALSE, so the sprintf is always called . So, worst case scenario, a file with no NaNs at all, like your file, sprintf is always called.
If X is a NaN on a platform where that expression correctly detects NaN (linux for example), retval evaluates to TRUE, and the sprintf is *not* called. So, if your file had only NaNs, you wouldn't see that 2 orders of magnitude slow down.
In some platforms (windows for example) that expression does not correctly detect NaN and retval evaluates to FALSE when X is a nan. In that case , the sprintf is always called too.
Here's the way h5diff prints these quantities
Consider the 2 datasets with 6 elements
Array element 0 1 2 3 4 5
Dataset 1 nan 1 nan 1 1 1
Dataset 2 nan nan 1 1 1 1
h5diff now reports this
dataset: </g1/fp17> and </g1/fp18>
size: [6] [6]
position fp17 fp18 difference
RFC_NaNsHDF5.pdf (349 KB)
···
At 12:28 PM 12/7/2008, Dave Wade-Stein wrote:
------------------------------------------------------------
[ 1 ] 1 -1.#IND 1.#QNAN
[ 2 ] -1.#IND 1 1.#QNAN
2 differences found
That there are 2 differences between the datasets. This result is consistent with all platforms we support . We consider the array position 0 (where both elements are NaN) to be a "no" difference.
2) Option in h5diff to avoid the NaN detection
To avoid these performance issues, we are introducing an option to enable (or disable) NaN detection. We are requesting comments regarding if h5diff should detect NaNs by default or not.
2.1) h5diff compares NaN by default
The new proposed option would be something like
-N, --nan Avoid NaNs detection
Example
./h5diff file1 file2
would compare file1 and file2, doing NaN detection (the default, slow).
To disable NaN detection, one would use
./h5diff -N file1 file2
2.2) h5diff does NOT compare NaN by default
The new proposed option would be something like
-N, --nan Detect NaNs
Example
./h5diff file1 file2
would compare file1 and file2, NOT doing NaN detection (the default, fast).
To enable NaN detection, one would use
./h5diff -N file1 file2
Here are some examples of h5diff output regarding the previous example, this time NOT detecting NaN.
For example, running on linux, we get
dataset: </g1/fp17> and </g1/fp18>
size: [6] [6]
position fp17 fp18 difference
------------------------------------------------------------
[ 0 ] nan nan nan
[ 1 ] 1 nan nan
[ 2 ] nan 1 nan
3 differences found
And running on windows we get
dataset: </g1/fp17> and </g1/fp18>
0 differences found
Note that these results vary between themselves and with the result using NaN detection.
Attached is a document with more details about the way h5diff currently detects NaN.
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.
--------------------------------------------------------------
Pedro Vicente (T) 217.265-0311
pvn@hdfgroup.org
The HDF Group. 1901 S. First. Champaign, IL 61820