I’m trying to analyze the effect on disk usage of applying compression to our datasets. h5dump can tell me the size of a dataset on disk. However, h5dump is a bit of blunt instrument–I’d like to be able to compare the sizes of corresponding objects in different files among other things. Is there a way I can get the size information programmatically, in particular from within Python?
H5Dget_storage_size would be the HDF5 library function. I believe it’s exposed in the low-level h5py API. Something like this should work:
dsid = dset.id # where dset is a high-level h5py.Dataset
size_bytes = dsid.get_storage_size()
G.
Thanks! I’ll give that a try.
