When creating a VDS datasets the chunk size doesn’t seem to available in the created dataset and I
can’t see how this is handled in the documentation.
I’m using h5py but I think this is a VDS issue rather than h5py issue:
import h5py
import numpy as np
layout = h5py.VirtualLayout((1000, 16, 512, 128), dtype=np.uint32)
for i in range(16):
arr = np.full((1000, 512, 128), i, dtype=np.uint32)
with h5py.File(f'{i}.h5', 'w') as f:
ds = f.create_dataset('a', data=arr, chunks=(1, 512, 128))
layout[:, i] = h5py.VirtualSource(ds)
with h5py.File('vds.h5', 'w') as f:
f.create_virtual_dataset('a', layout)
f = h5py.File('vds.h5', 'r')
ds = f['a']
ds.chunks # is empty
When lazy opening with packages such as dask where you need to specify the chunk size
I would typically directly read the chuck sizes from ds.chunks to set this but I can’t see an easy way to determine the chunks for vds datasets.
Is this a bug or is there a way to get the chunking of a vds set ?