Hello, dear HDF5 team.
I’m wondering if you can clarify for me, please, if h5py
supports np.ma
arrays (i.e., masked numpy
arrays). In other words, when I create a dataset - for example as dataSet = groupName.create_dataset("dataSetName", data=np.ma.masked_array([1,2],mask=[True,False]))
- will it store data
as an np.ma
array or as a regular np
array? Documentation does not specify that explicitly (or I don’t understand how to interpret that table in the referenced link).
If follow up questions are allowed on this website, then how do I retrieve a masked array which I saved into an .hdf5
file beforehand?
Thank you in advance.
Ivan
Hi,
Would you please try reporting an issue at GitHub - h5py/h5py: HDF5 for Python -- The h5py package is a Pythonic interface to the HDF5 binary data format.?
I can’t claim to give an authoritative answer, but my understanding is that there is no mechanism to store masked arrays. I think it would only be implemented explicitly in h5py if HDF5 had a convention for storing dataset masks (which I doubt). I think you would have to implement your own convention by storing the field and masks as separate datasets, with some way of connecting them in your own code, e.g., by specifying the path to the mask as a dataset attribute or by naming the mask dataset by adding “_mask” to the dataset name. We use the former in the nexusformat package, which read and writes HDF5 files written using the NeXus standard.
By the way, the nexusformat package can be used to read and write HDF5 files that do not conform to the NeXus standard. To implement your example, you could use the following code:
from nexusformat.nexus import NXfield, nxopen
data = NXfield(np.ma.masked_array([1,2],mask=[True,False]))
with nxopen('mydata.h5', 'w') as root:
root['data'] = data
print(root.tree)
This produces the following HDF5 file.
root:NXroot
data = [-- 2]
@mask = 'data_mask'
data_mask = [ True False]
If you read it again, the datasets ‘data’ and ‘data_mask’ are returned as a single NXfield, which just wraps the NumPy array with its attributes.
with nxopen('mydata.h5') as root:
input_data = root['data']
The masked array is contained within the NXfield, accessible as the nxvalue
attribute.
masked_array(data=[--, 2],
mask=[ True, False],
fill_value=999999)