setting dataset string field results in TypeError


#1

I’m using Ubuntu 18.04.5 LTS with python 3.8.0 from apt. I created a fresh virtualenv and pip installed h5py for testing.

h5py version info:

>>> print(h5py.version.info)
Summary of the h5py configuration
---------------------------------

h5py    3.2.1
HDF5    1.12.0
Python  3.8.0 (default, Oct 28 2019, 16:14:01) 
[GCC 8.3.0]
sys.platform    linux
sys.maxsize     9223372036854775807
numpy   1.20.1
cython (built with) 0.29.22
numpy (built against) 1.17.5
HDF5 (built against) 1.12.0

I have a simple script where I’m trying to change the individual fields of a dataset.

import h5py
f = h5py.File('test.hdf5','w')
ds = f.create_dataset('test_dataset',1,dtype=[('a_float','<u8'),('a_string',h5py.string_dtype(encoding='utf-8'))])
ds[0] = (1, "some string")

test_float = 2
ds['a_float'] = test_float
assert ds[0][0] == test_float

test_string = "test string"
ds['a_string'] = test_string

In this script setting the float works as expected, but ds['a_string'] = test_string errors with:

Traceback (most recent call last):
  File "/softdev/akuurstr/python/h5py_test.py", line 11, in <module>
    ds['a_string'] = test_string
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/softdev/akuurstr/python/virtualenvs/h5py_test/lib/python3.8/site-packages/h5py/_hl/dataset.py", line 845, in __setitem__
    val = val.view(numpy.dtype([(names[0], dtype)]))
  File "/softdev/akuurstr/python/virtualenvs/h5py_test/lib/python3.8/site-packages/numpy/core/_internal.py", line 459, in _view_is_safe
    raise TypeError("Cannot change data-type for object array.")
TypeError: Cannot change data-type for object array.

The only way I can set the dataset’s string is to replace the entire record. However it seems like I should be able to set the string field individually. I checked git issues but couldn’t find anything so I’m not sure if this is expected behaviour. Should I be posting this as a bug on git?


#2

Hi @akuurstr,

Given the error is coming from a NumPy method, I think the best is to report this issue in the h5py GitHub repository.

H5py is fundamentally a converter of bytes between HDF5 library and NumPy objects in memory, and how Unicode strings are handled in HDF5 requires special treatment in order to work with NumPy. If not too difficult, I recommend storing/replacing the entire record in one go rather than per compound field separately.

Take care,

Aleksandar