64-bit integers written with 32-bit precision

Hi,

Jakob van Santen, a PyTables user has been bitten by a problem with HDF5 type
precisions. He explains the problem as follows:

"""
This turned out to be due to the way the file was written: 64-bit integers
were being written as 8-byte integers with 32-bit precision. The HDF5 library
noticed that the type only had 4 significant bytes and so only wrote out the
lower word in H5TBread_records(). Since PyTables prepares the data area with
numpy.empty and not numpy.empty_like, the memory is not zeroed. This is fine
as long as types always have precision==8*width, but it breaks otherwise.

This is more of a pseudo-bug in HDF5; it would seem more logical to pad out
the field with zeroes than simply leave the padding bytes unwritten.
"""

I'm reluctant to zero the memory of the data container before data is read
from HDF5 because of performance reasons. I'm wondering whether this issue
can be solved on the HDF5 part (i.e. zero memory before to write), or the user
has to be forced to zero the memory of its data containers prior to read data
(which would be bad news for performance).

Thanks,

···

--
Francesc Alted

Hi Francesc,

I'm reluctant to zero the memory of the data container before data is read
from HDF5 because of performance reasons. I'm wondering whether this issue
can be solved on the HDF5 part (i.e. zero memory before to write), or the user
has to be forced to zero the memory of its data containers prior to read data
(which would be bad news for performance).

Can this be addressed with the function H5Tset_pad? The H5T
documentation suggests that if you set the padding of the destination
type to H5T_PAD_ZERO, the parts of the type not occupied by
"meaningful" data should be set to zero. However, I've never tried
this...

Andrew

A Friday 26 March 2010 19:43:57 Andrew Collette escrigué:

Hi Francesc,

> I'm reluctant to zero the memory of the data container before data is
> read from HDF5 because of performance reasons. I'm wondering whether
> this issue can be solved on the HDF5 part (i.e. zero memory before to
> write), or the user has to be forced to zero the memory of its data
> containers prior to read data (which would be bad news for performance).

Can this be addressed with the function H5Tset_pad? The H5T
documentation suggests that if you set the padding of the destination
type to H5T_PAD_ZERO, the parts of the type not occupied by
"meaningful" data should be set to zero. However, I've never tried
this...

Excellent, I think I'll try this venue.

Thanks!

···

--
Francesc Alted