Determine chunk size from a VDS dataset


#1

When creating a VDS datasets the chunk size doesn’t seem to available in the created dataset and I
can’t see how this is handled in the documentation.

I’m using h5py but I think this is a VDS issue rather than h5py issue:

import h5py
import numpy as np
layout = h5py.VirtualLayout((1000, 16, 512, 128), dtype=np.uint32)

for i in range(16):
    arr = np.full((1000, 512, 128), i, dtype=np.uint32)
    with h5py.File(f'{i}.h5', 'w') as f:
        ds = f.create_dataset('a', data=arr, chunks=(1, 512, 128))
        layout[:, i] = h5py.VirtualSource(ds)

with h5py.File('vds.h5', 'w') as f:
    f.create_virtual_dataset('a', layout)
f = h5py.File('vds.h5', 'r')
ds = f['a']
ds.chunks  # is empty

When lazy opening with packages such as dask where you need to specify the chunk size
I would typically directly read the chuck sizes from ds.chunks to set this but I can’t see an easy way to determine the chunks for vds datasets.

Is this a bug or is there a way to get the chunking of a vds set ?


#2

Solved ….
Method to get the virtual sources and check their chunks, which should be the same, and then, get chunks for that.

if d.is_virtual:
    chunks = [d.chuncks for source in d.virtual_sources()][0]

else:
chunks = d.chunks