H5Dchunk_iter
is a great addition the API and really helps to make the chunked data within a HDF5 file available via other schemes. For example, a HDF5 file could be a Zarr shard by appending a dataset containing the linear offsets (addr) and the number of bytes (nbytes). H5Dchunk_iter
appears to be the fastest way to obtain this information.
Another opportunity here would be have HDF5 allocate the space for the chunks early, obtain the chunk addresses efficiently via H5Dchunk_iter
, and then use a parallel I/O or memory mapping to fill in the data for the chunks at the specified addresses.
As far as I can tell, the architecture for H5Dchunk_iter
exists in 1.12. For example, the internal API call H5Ddebug
appears to iterate through the B-tree to print this information for h5ls
.
Would it be possible for this API feature to be backported to the 1.12 branch in order to increase it’s availability? If so, I would be interested in contributing to that effort.