Hello all,
I’m having difficulty with incorporating an array of strings within a Virtual Dataset. I’m not sure I should be creating them differently in the original .h5 file or preprocesssing them differently into VDS.
My actual case is more complicated, but the root of the problem lies in this simple example.
f1=h5py.File('teststrings.h5','w',libver='latest')
strings=['hello','worlds']
f1.create_dataset('text',data=strings,shape=(2,))
#this creates a dataset of h5py special type '|O'
#the same thing happens if I explicitly set dtype=h5py.string_dtype(encoding='utf-8')
f1.close()
f3=h5py.File('VDStest.h5','w',libver='latest')
layout=h5py.VirtualLayout(shape=(2,),dtype='str')
vsource=h5py.VirtualSource('teststrings.h5','text',shape=(2,),dtype='str')
layout[:2]=vsource
f3.create_virtual_dataset('text',layout,fillvalue=-1)
This results in a (truncated) error:
File "h5py/h5t.pyx", line 1754, in h5py.h5t.py_create
TypeError: No conversion path for dtype: dtype('<U')
If I use dtype=‘|O’ in the layout, I get
File "h5py/h5t.pyx", line 1748, in h5py.h5t.py_create
TypeError: Object dtype dtype('O') has no native HDF5 equivalent
If I use dtype=‘bytes’, I get
tid = h5t.py_create(dtype, logical=1)
File "h5py/h5t.pyx", line 1539, in h5py.h5t._c_string
ValueError: Size must be positive (size must be positive)
If I attempt to use .asstr() to decode before writing, the virtual source will not create as it complains:
/compat.py", line 19, in filename_encode
filename = fspath(filename)
^^^^^^^^^^^^^^^^
TypeError: expected str, bytes or os.PathLike object, not AsStrWrapper
I’ve tried a number of permutations on the above while reading the forums, documentation, and stack exchange, but something isn’t clicking for me. I’m not sure if I’m just getting caught up in the complicated world of string representations or if this represents a bug with VDS creation. Any input welcome!
If it matters, I am working with a recently constructed conda environment:
>>> print(h5py.version.info)
Summary of the h5py configuration
---------------------------------
h5py 3.9.0
HDF5 1.12.1
Python 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:10:28) [Clang 15.0.7 ]
sys.platform darwin
sys.maxsize 9223372036854775807
numpy 1.24.4
cython (built with) 0.29.36
numpy (built against) 1.23.5
HDF5 (built against) 1.12.1