Three n-bit filter questions

matthieu.schaller · April 16, 2020, 7:49am

Hi all,

I have three questions relating to the n-bits filter applied to floating-point numbers.

What is the point of the offset? i.e. is there a reason not to always set the offset to 0 since I don’t care
where in my original 32-bits my final n-bits will land.
All my data is made of positive numbers. Is there a way to get rid of the sign-bit ?
Similarly, if all my data is in [0,1] (but with a large dynamic range), is there a way to get rid of the sign-bit of the exponent?

The documentation of the filter is really good but does not cover explicitly these aspects.

Thanks!

gheber · April 17, 2020, 11:13pm

Maybe instead of “original” and “final” we should be more specific and talk about what’s “in memory” and what’s “on disk”. The sole point of the offset is to describe what you have in memory. On your architecture, in your applications, it might be the case that it is always 0, but that was not the case historically and might still be a concern with certain “exotic” hardware.

I don’t know, but I can’t think of a way to do that.

I’m not sure I understand what you mean by that. The exponent doesn’t have a sign bit. In IEEE, it’s 8 or 11 bits with a bias of 127 or 1023.

On the whole, I think you might be better off using an opaque type and a suitable compression method on top of that.

G.

matthieu.schaller · April 18, 2020, 11:01am

Hi,

Thanks for taking the time to respond and the clarification on the working of the exponent sign/bias.

I am trying to avoid any method not entirely embedded in the library such as a home-baked compression method as it would make the data harder to use for external users not having that specific filter in their version of the library. All in all, I think I can achieve nearly what I want with the n-bits.

To be more specific, I have in memory an array of IEEE 4-byte floats. I know I can afford to store them only has IEEE 2-byte floats. I know my numbers are all positive so could skip the sign bit. And I know they are all in the range [0, 1], meaning that I don’t need large exponents.

My numbers, on disk, with an offset of 0 and just a simple reduction in the number of mantissa and exponent bits, would look like:

3        2        1        0
???????? ???????? SEEEEEMM MMMMMMMM

With an exponent bias of 15 this would be identical to an IEEE half-float.

For the (overall) sign bit, would it work to “bit-shift” it out of the result by setting an offset such that there isn’t enough space “on the left” for the sign bit? Could I set an offset of 17 and get:

3        2        1        0
EEEEEMMM MMMMMMM? ???????? ????????

i.e. would the library be able to decompress this correctly? (and make me gain 1 bit out of every 16)

As for the sign of the exponent, your correction of my statement about the sign being instead represented as a bias, means that I should be able to play with the bias value and trim off another bit.

Going for

3        2        1        0
EEEEMMMM MMMMMM?? ???????? ????????

and a bias of 0.
If I am not mistaken that would decompress into numbers in the range of binary exponents [-14, 0], which is all the precision I need for my numbers in the (decimal) range [0,1]. And would save me an extra bit.

Would you expect this to be supported correctly by the library’s n-bit filter?

Thanks again,
M.

steven · April 18, 2020, 12:43pm

Being on the practical side, here is the link to nbit-filer examples in C++. Why don’t you just try your idea, wiggle it until you like what you see?

steve

matthieu.schaller · April 20, 2020, 9:55am

Thanks,

Yes, I have tried that in my own code. As a matter of principle though, I am always happier if a feature is officially supported rather than the result of a potentially successful dirty hack which may break in corner cases.

Anyway, the library is not happy when bit-shifting the sign bit out of the number by setting the offset as I mention above. This leads to an error message from hdf5 when attempting to construct the type:

HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 139622726840128:
#000: ../../../src/H5Tfloat.c line 128 in H5Tset_fields(): sign location is not valid
major: Invalid arguments to routine
minor: Bad value

So at least in the way I suggested above this does work. Bringing me back to my original question of whether there is a suitable way of doing this.

Regarding, the use of a different exponent bias, that seems to work as intended with the remaining question being whether this is officially supported. The user guide only gives the example of setting the bias to the “standard” value it should take.

Cheers,
M.

gheber · April 20, 2020, 12:39pm

I would have to look at the code, but I believe the library would create an error because of a precision mismatch (between the in-file and in-memory representations).

I think it’s important to keep two things separate: 1) The N-bit filter will do whatever you tell it to do (via offset, and precision) and it doesn’t really care about specific atomic types. 2) From the library’s perspective, the representation of a floating-point number has the structure SEEE…MMM… (plus ebias and mantissa normalization). I have never set the ebias to 0, but there’s a good chance that that will just work. You can tweak the other parameters via H5Tset_fields, but I don’t see how you can get rid off the sign-bit (pattern violation). In other words, if you want the library to recognize your custom atomic type as a member of the floating-point family, that’s the pattern to follow.

Best, G.

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Three n-bit filter questions