Writing byte strings to attributes using h5py


I have h5 files that I get from some software that I’m reading through the h5py package (which makes life much easier!). In the files I read it seems that all attributes that are strings are written as ascii data so when I read them with h5py, they are byte strings. Not a huge deals as I can do string.decode('utf8') to get a python string, so that’s what I’ve done everywhere I read the attributes.

However, I’m now at the stage I want to edit some of these attributes in the hdf5 file with h5py before sending them back to the original software but I’m unable to write the attributes as byte strings. I’ve tried a few things:
file['data_group'].attrs['title'] = data.encode('utf-8')
file['data_group'].attrs['title'] = bytes(data.encode('utf-8'))

but both of these write a standard string, not a byte string, so it’s not read by the original software. Reading the file back with h5py I also find that the attribute I’ve just changed is not a byte string anymore as i no longer need to decode it before use.

How should I write the string so that I match the original files format?

numpy.bytes_(data) should do it.


1 Like

@ajelenak Thanks. This appears to be what I was looking for.

One follow up question. When I query the file structure using HDFView 3.3.1 is see that the attributes that was previously stored as

is now stored as:

I’m not quite sure what the difference between H5T_STR_NULLTERM and H5T_STR_NULLPAD is for the padding and given that everything still seems to be working everywhere, is it even important?

In this case there are 26 bytes reserved for the bytes of your string. H5T_STR_NULLTERM means that a NULL byte follows the bytes of the string; H5T_STR_NULLPAD means that all the bytes after the bytes of the string are NULL. It’s a low-level information and since “everything seems to be working everywhere” not really important in your case.

H5py only stores strings with H5T_STR_NULLPAD and that’s why you are seeing this difference.

Take care,

Ok, understand now. thanks again. Fortunately, I don’t think I’m ever going to be affected by this subtle difference.