nbit compression example on big-endian machine

Leigh_Orf · November 10, 2010, 12:11am

Somewhat related to my last question, the examples I gave before were on an
Intel Xeon, a little-endian machine. Considering once again Example 11 code
on http://www.hdfgroup.org/HDF5/doc/UG/10_Datasets.html, modified by me /
removing extraneous stuff. I am leaving the comment block because it
provides the memory layout which determines the parameters for the function
calls:

/* Define single-precision floating-point type for dataset

···

*-------------------------------------------------------------------
   * size=4 byte, precision=20 bits, offset=7 bits,
   * mantissa size=13 bits, mantissa position=7,
   * exponent size=6 bits, exponent position=20,
   * exponent bias=31.
   * It can be illustrated in little-endian order as:
   * (S - sign bit, E - exponent bit, M - mantissa bit,
   * ? - padding bit)
   *
   * 3 2 1 0
   * ???SEE EEEEMMMM MMMMMMMM M???
   *
   * To create a new floating-point type, the following
   * properties must be set in the order of
   * set fields -> set offset -> set precision -> set size.
   * All these properties must be set before the type can function.
   * Other properties can be set anytime. Derived type size cannot
   * be expanded bigger than original size but can be decreased.
   * There should be no holes among the significant bits. Exponent
   * bias usually is set 2^(n-1)-1, where n is the exponent size.

*-------------------------------------------------------------------*/

/* I removed variable declarations */

msize = 13;
spos = 26;
epos = 20;
esize = 6;
mpos = 7;

precision = 20;
offset = 7;

datatype = H5Tcopy(H5T_IEEE_F32BE);
H5Tset_fields(datatype, spos, epos, esize, mpos, msize)
H5Tset_offset(datatype,offset)
H5Tset_precision(datatype,precision)
H5Tset_size(datatype, 4)
H5Tset_ebias(datatype, 31)

On a little-endian machine, I get expected behavior. If I want to further
reduce precision (and hence compressed file size) I can do this:

msize -= 4;
spos -= 4;
epos -= 4;
precision -= 4;

As I decrement each of the above, I end up with less precision and smaller
file sizes when followed by the gzip compression (am I doing this right? I
haven't changed offset, and it occurs to me that I probably should?)

Questions:

1. Why H5Tcopy(H5T_IEEE_F32BE) and not H5Tcopy(H5T_IEEE_F32LE)? After all,
this is a little endian machine, and the example is for a little endian
memory layout?

2. When I apply the above code on a big-endian machine (IBM Power5) I get
screwed up data. It appears I somehow have to fiddle with spos, epos, and
offset for a big endian machine perhaps?

3. Why H5Tset_size(datatype, 4) and not H5Tset_size(datatype, 2) - after
all, haven't we reduced the precision to 16 bits, i.e., 2 bytes?

My ultimate goal here is to get the proper behavior on a big-endian machine
since that's what I'm running my model on. I want to have fine-grained
control over the lossiness of the final compressed data. Perhaps if someone
could re-do Example 11 for a big endian machine things would become clearer
to me. And I'm still puzzled about why a pure n-bit filter doesn't reduce
file size (previous email).

Leigh

--
Leigh Orf
Associate Professor of Atmospheric Science
Department of Geology and Meteorology
Central Michigan University
Currently on sabbatical at the National Center for Atmospheric Research
in Boulder, CO
NCAR office phone: (303) 497-8200

myang6 · November 12, 2010, 8:35pm

Hi Leigh,

I will try to answer some questions.

Questions:

1. Why H5Tcopy(H5T_IEEE_F32BE) and not H5Tcopy(H5T_IEEE_F32LE)? After all, this is a little endian machine, and the example is for a little endian memory layout?

Seems that only big-endian IEEE floating type is widely accepted. I may be wrong.

2. When I apply the above code on a big-endian machine (IBM Power5) I get screwed up data. It appears I somehow have to fiddle with spos, epos, and offset for a big endian machine perhaps?

Yes, you have to figure out how to create your own n-bit datatype following your own data.

3. Why H5Tset_size(datatype, 4) and not H5Tset_size(datatype, 2) - after all, haven't we reduced the precision to 16 bits, i.e., 2 bytes?

N-bit type (16-bit ) by itself still occupies the unused bits by itself. That's why you need to apply the n-bit filter to pack the used bits.

My ultimate goal here is to get the proper behavior on a big-endian machine since that's what I'm running my model on. I want to have fine-grained control over the lossiness of the final compressed data. Perhaps if someone could re-do Example 11 for a big endian machine things would become clearer to me. And I'm still puzzled about why a pure n-bit filter doesn't reduce file size (previous email).

Correctly applying the n-bit filter with the n-bit datatype should reduce the file size.

Kent

···

Leigh

--
Leigh Orf
Associate Professor of Atmospheric Science
Department of Geology and Meteorology
Central Michigan University
Currently on sabbatical at the National Center for Atmospheric Research in Boulder, CO
NCAR office phone: (303) 497-8200

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

nbit compression example on big-endian machine