RFC, h5dump specify precision of output for floating point numbers

RFC (Request For Comments), h5dump specify precision of output for floating point numbers

Dear James, Chris, Cheryl, and all HDF users

A feature has been requested to add an option for h5dump to specify precision of output for floating point numbers. Our proposed behavior is a new flag

-c --type=<string>

where < string > is the full formatting string for printf regarding floats and doubles, for example "%7.3e"

We can consider not specifying the "%" or not in this <string> since it's actually redundant. On the other hand, having it present might be more clear. By the way, this switch nomenclature can be modified, if anybody has a better suggestion

-c --type=<string>

Also, we are asking if anybody would like to propose another means to achieve this.

Right now, h5dump uses "%g" for floats and doubles. This may cut precision in some cases

For example, for the standard Aura data files, Time is stored in TAI93 units and is stored as a double precision number. An example data value is 411459578.239311. h5dump display this as 4.11459e+8.

Here's a sample of current output of h5dump for an Aura dataset

HDF5 "HIRPROF-OneOrb_b091_2006d131.he5" {
DATASET "/HDFEOS/SWATHS/HIRDLS/Geolocation Fields/Time" {
   DATATYPE H5T_IEEE_F64LE
   DATASPACE SIMPLE { ( 612 ) / ( 612 ) }
   DATA {
   (0): 4.21459e+008, 4.21459e+008, 4.21459e+008, 4.21459e+008, 4.21459e+008,
   (5): 4.21459e+008, 4.21459e+008, 4.21459e+008, 4.21459e+008, 4.21459e+008,

Changing the format from %g to %f we have

HDF5 "HIRPROF-OneOrb_b091_2006d131.he5" {
DATASET "/HDFEOS/SWATHS/HIRDLS/Geolocation Fields/Time" {
   DATATYPE H5T_IEEE_F64LE
   DATASPACE SIMPLE { ( 612 ) / ( 612 ) }
   DATA {
   (0): 421459205.506481, 421459224.754325, 421459236.430163,
   (3): 421459255.749970, 421459267.353832, 421459286.745660,
   (6): 421459298.265514, 421459317.753278, 421459329.189183,

Here's the documentation for the printf modifiers, which I copied verbatim form the printf man pages for your reference

A format specification, which consists of optional and required fields, has the following form:

%[flags] [width] [.precision] [{h | l | I64 | L}]type

Each field of the format specification is a single character or a number signifying a particular format option. The simplest format specification contains only the percent sign and a type character (for example, %s). If a percent sign is followed by a character that has no meaning as a format field, the character is copied to stdout. For example, to print a percent-sign character, use %%.

The optional fields, which appear before the type character, control other aspects of the formatting, as follows:

type

Required character that determines whether the associated argument is interpreted as a character, a string, or a number (see Table R.3).

flags

Optional character or characters that control justification of output and printing of signs, blanks, decimal points, and octal and hexadecimal prefixes (see Table R.4). More than one flag can appear in a format specification.

width

Optional number that specifies the minimum number of characters output. (See printf Width Specification.)

precision

Optional number that specifies the maximum number of characters printed for all or part of the output field, or the minimum number of digits printed for integer values (see Table R.5).

Regarding "type", for doubles

e
double
Signed value having the form [ - ]d.dddd e [sign]ddd where d is a single decimal digit, dddd is one or more decimal digits, ddd is exactly three decimal digits, and sign is + or -.
E
double
Identical to the e format except that E rather than e introduces the exponent.
f
double
Signed value having the form [ - ]dddd.dddd, where dddd is one or more decimal digits. The number of digits before the decimal point depends on the magnitude of the number, and the number of digits after the decimal point depends on the requested precision.
g
double
Signed value printed in f or e format, whichever is more compact for the given value and precision. The e format is used only when the exponent of the value is less than -4 or greater than or equal to the precision argument. Trailing zeros are truncated, and the decimal point appears only if one or more digits follow it.
G
double
Identical to the g format, except that E, rather than e, introduces the exponent (where appropriate).

Flag Directives

The first optional field of the format specification is flags. A flag directive is a character that justifies output and prints signs, blanks, decimal points, and octal and hexadecimal prefixes. More than one flag directive may appear in a format specification.

Table R.4 Flag Characters
Flag
Meaning
Default

···

#
When used with the e, E, or f format, the # flag forces the output value to contain a decimal point in all cases.
Decimal point appears only if digits follow it.

  Precision Specification

The third optional field of the format specification is the precision specification. It specifies a nonnegative decimal integer, preceded by a period (.), which specifies the number of characters to be printed, the number of decimal places, or the number of significant digits (see Table R.5). Unlike the width specification, the precision specification can cause either truncation of the output value or rounding of a floating-point value. If precision is specified as 0 and the value to be converted is 0, the result is no characters output, as shown below:

printf( "%.0d", 0 ); /* No
characters output */

If the precision specification is an asterisk (*), an int argument from the argument list supplies the value. The precision argument must precede the value being formatted in the argument list.

The type determines the interpretation of precision and the default when precision is omitted, as shown in Table R.5.

Table R.5 How Precision Values Affect Type
Type
Meaning
Default
e, E
The precision specifies the number of digits to be printed after the decimal point. The last printed digit is rounded.
Default precision is 6; if precision is 0 or the period (.) appears without a number following it, no decimal point is printed.
f
The precision value specifies the number of digits after the decimal point. If a decimal point appears, at least one digit appears before it. The value is rounded to the appropriate number of digits.
Default precision is 6; if precision is 0, or if the period (.) appears without a number following it, no decimal point is printed.
g, G
The precision specifies the maximum number of significant digits printed.
Six significant digits are printed, with any trailing zeros truncated.

printf Width Specification

The second optional field of the format specification is the width specification. The width argument is a nonnegative decimal integer controlling the minimum number of characters printed. If the number of characters in the output value is less than the specified width, blanks are added to the left or the right of the values ­ depending on whether the - flag (for left alignment) is specified ­ until the minimum width is reached. If width is prefixed with 0, zeros are added until the minimum width is reached (not useful for left-aligned numbers).

The width specification never causes a value to be truncated. If the number of characters in the output value is greater than the specified width, or if width is not given, all characters of the value are printed (subject to the precision specification).

If the width specification is an asterisk (*), an int argument from the argument list supplies the value. The width argument must precede the value being formatted in the argument list. A nonexistent or small field width does not cause the truncation of a field; if the result of a conversion is wider than the field width, the field expands to contain the conversion result.

--------------------------------------------------------------
Pedro Vicente (T) 217.265-0311
pvn@hdfgroup.org
The HDF Group. 1901 S. First. Champaign, IL 61820

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hello Chris,

The 2008 message was regarding an 'RFC' (Request for Comment), and it
appears that the -c
option was not actually implemented in h5dump.

I am able to successfully use h5dump with the -m option to dump data into
the floating point format that I specify, using this format:
  
  h5dump -m "%.Nf" -d 'datasetname' file.h5

For example:
  $ h5dump -m "%.15f" -d "/HDFEOS/GRIDS/HIRDLS/Data Fields/NO2DayColumn"
HIRDLS-Aura_L3SCOL_v06-00-00-c02_2005d022-2008d077.he5 | more
HDF5 "HIRDLS-Aura_L3SCOL_v06-00-00-c02_2005d022-2008d077.he5" {
DATASET "/HDFEOS/GRIDS/HIRDLS/Data Fields/NO2DayColumn" {
   DATATYPE H5T_IEEE_F32LE
   DATASPACE SIMPLE { ( 1151, 73, 180 ) / ( 1151, 73, 180 ) }
   DATA {
   (0,0,0): 3219228849078272.000000000000000,
   (0,0,1): 3232982139666432.000000000000000,
   (0,0,2): 3246586985447424.000000000000000,
   (0,0,3): 3259953728978944.000000000000000,
   (0,0,4): 3273001839624192.000000000000000,
   (0,0,5): 3285655618584576.000000000000000

  --- >8 --- cut --- >8 ---

Could you try that? (It looks like the -m worked with HDF5-1.8.3)

If you would like to join the HDF-Forum mailing list, you can do that from
this page:

  http://www.hdfgroup.org/services/community_support.html

..or go directly here:
  
  http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

-Barbara
help@hdfgroup.org

···

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/hdf-forum-RFC-h5dump-specify-precision-of-output-for-floating-point-numbers-tp193113p4024489.html
Sent from the hdf-forum mailing list archive at Nabble.com.