Odd caching behavior with scale-offset filter


A user recently contributed a patch adding support for the
scale-offset filter in h5py, and we are seeing some odd behavior which
seems to be related to data caching. When the filter is set up for
lossy encoding (e.g. storing 32-bit ints with 1 bit of precision),
when a small dataset is written, subsequent reads produce to the
original, non-compressed data. Closing and reopening the file, or
writing larger datasets, seems to produce the expected
lossily-compressed data.

Is this expected behavior? Is there any way to get HDF5 not to cache
chunks when a lossy filter is used, or, preferably, to only cache
chunks after the transformation has been applied?

Andrew Collette

Here's a C file demonstrating this behavior for floating-point data.
The output of the program is (on my machine):

Data written:
1.129840 5.123983 2.129993 -2.199330
After immediate read:
1.129840 5.123983 2.129993 -2.199330
After reopening:
1.130670 5.120670 2.130670 -2.199330


so_test.c (1.28 KB)