Not sure this is a bug, a feature request or plain dumb bad use, therefore my post.
We do use a lot of h5 files with h5py, but also other binary files that we read with numpy.memmap. Until now we were stuck with h5py 2.8, but I am looking at upgrading to 3. I however face the new behavior when reading string data sets, that now return bytes. As our workflow specifies that we always stick to UTF-8, I’d like to always decode the strings. So I want to use the new AsStrWrapper. However, I am not sure how to properly use the Dataset.asstr() as it only affects the getitem.
Currently, we have one simple load action valid for both memmap and Dataset:
data = numpy.array(unloaded_data)
where unloaded_data is either a numpy.memmap or an h5py.Dataset. However, the AsStrWrapper does not overload array, such that this still leaves me with bytes.
Why would the wrapper not wrap array? Is this a design choice or is it not possible? Is our approach actually wrong, and we should always use the getitem method?
Thanks in advance.