Working around Numpy's Unicode type

daniel.uzodimma · November 20, 2024, 8:29pm

Hello,

I have a list of variables of different types including long strings and I want to put this list into a hdf5 file as a dataset. To my understanding, h5py converts the long strings to ‘<U’ type variables and then throws an error when trying to write it to the file. After searching the internet, I could only find fixes for lists with the same datatype, so I thought to split the list and append each list item to the same dataset but to no avail. Is there a way to work around this?

A simplified version of my code:

The error it throws:
error

Some other useful info:

hyoklee · November 21, 2024, 10:37pm

Hi, @daniel.uzodimma !

I modified your code and could run it successfully on Google Colab.

import h5py
some_list = ['C13', '6', 'Not Available', 'E:\\NominationDesk\\Thoughputs\\Temp\ewa21\\621863404.a21', '23', 'No',
'C:\\Model_67\\PlantModelaComposite_C19\MRET_T4F_C175_20_SCRAirAssist\\Stimulation.Current\\mret_c175_1_scrairassist.sdf']
with h5py.File('Test.hdf5', "w") as f:
    f.create_dataset('list', data=some_list)
with h5py.File('Test.hdf5', 'r') as hdfid:
     print(hdfid.keys())
     print(hdfid['list'][()])

Here’s the output:

<KeysViewHDF5 ['list']>
[b'C13' b'6' b'Not Available'
 b'E:\\NominationDesk\\Thoughputs\\Temp\\ewa21\\621863404.a21' b'23' b'No'
 b'C:\\Model_67\\PlantModelaComposite_C19\\MRET_T4F_C175_20_SCRAirAssist\\Stimulation.Current\\mret_c175_1_scrairassist.sdf']

Is there a reason that you cannot use the same type?
That is, using ‘6’ instead of 6 and ‘23’ instead of 23.

If not, how about splitting the list based on type and append data to 2 different datasets?
For example, 6 and 23 goes to integer dset1 and other strings go to string dset2.

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Working around Numpy's Unicode type