I am using h5py to store 2D and 3D data sets. When combining the
fletcher32 checksum with szip compression and 64 bit data (float, int
numpy arrays), I get a checksum error when I try to read the data.
I created an issue with h5py, but it seems like this could be a bug in HDF5:
with h5py.File("test.h5", "w") as h5:
h5.create_dataset("image_A",
data=np.zeros(10000, dtype=np.float64),
fletcher32=True,
compression="szip",
)
with h5py.File("test.h5", "r") as h5:
print(h5["image_A"][0])
--------------
and here is the error message:
--------------
Traceback (most recent call last):
File "error.py", line 12, in <module>
print(h5["image_A"][0])
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2577)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2536)
File "/usr/lib/python3/dist-packages/h5py/_hl/dataset.py", line 482,
in __getitem__
self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2577)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2536)
File "h5py/h5d.pyx", line 181, in h5py.h5d.DatasetID.read
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/h5d.c:3123)
File "h5py/_proxy.pyx", line 130, in h5py._proxy.dset_rw
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_proxy.c:1769)
File "h5py/_proxy.pyx", line 84, in h5py._proxy.H5PY_H5Dread
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_proxy.c:1411)
OSError: Can't read data (Data error detected by fletcher32 checksum)
--------------
and this is my setup:
* Operating System: Ubuntu 16.04.3 LTS
* Python versions: 2.7.12 and 3.5.2
* Where Python was acquired: system Python (apt-get)
* h5py version: 2.7.1
* HDF5 version: 1.8.18
* The full traceback/stack trace shown (Python 3):
I was not able to find anything in previous posts in the mailing list
archive.
I would really like to use szip compression, because the resulting file
sizes are considerably smaller.
On Dec 13, 2017, at 10:55 AM, Paul Müller <paul_mueller@tu-dresden.de<mailto:paul_mueller@tu-dresden.de>> wrote:
Dear All,
I am using h5py to store 2D and 3D data sets. When combining the
fletcher32 checksum with szip compression and 64 bit data (float, int
numpy arrays), I get a checksum error when I try to read the data.
I created an issue with h5py, but it seems like this could be a bug in HDF5:
Here is the example code:
--------------
import h5py
import numpy as np
with h5py.File("test.h5", "w") as h5:
h5.create_dataset("image_A",
data=np.zeros(10000, dtype=np.float64),
fletcher32=True,
compression="szip",
)
with h5py.File("test.h5", "r") as h5:
print(h5["image_A"][0])
--------------
and here is the error message:
--------------
Traceback (most recent call last):
File "error.py", line 12, in <module>
print(h5["image_A"][0])
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2577)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2536)
File "/usr/lib/python3/dist-packages/h5py/_hl/dataset.py", line 482,
in __getitem__
self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2577)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2536)
File "h5py/h5d.pyx", line 181, in h5py.h5d.DatasetID.read
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/h5d.c:3123)
File "h5py/_proxy.pyx", line 130, in h5py._proxy.dset_rw
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_proxy.c:1769)
File "h5py/_proxy.pyx", line 84, in h5py._proxy.H5PY_H5Dread
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_proxy.c:1411)
OSError: Can't read data (Data error detected by fletcher32 checksum)
--------------
and this is my setup:
* Operating System: Ubuntu 16.04.3 LTS
* Python versions: 2.7.12 and 3.5.2
* Where Python was acquired: system Python (apt-get)
* h5py version: 2.7.1
* HDF5 version: 1.8.18
* The full traceback/stack trace shown (Python 3):
I was not able to find anything in previous posts in the mailing list
archive.
I would really like to use szip compression, because the resulting file
sizes are considerably smaller.
thanks again for your thorough answer. I created a PR which just got merged: https://github.com/h5py/h5py/pull/989
It looks like this bug will not be present in h5py >= 2.7.2.