H5Dchunk_iter is a great addition the API and really helps to make the chunked data within a HDF5 file available via other schemes. For example, a HDF5 file could be a Zarr shard by appending a dataset containing the linear offsets (addr) and the number of bytes (nbytes). H5Dchunk_iter appears to be the fastest way to obtain this information.
Another opportunity here would be have HDF5 allocate the space for the chunks early, obtain the chunk addresses efficiently via H5Dchunk_iter, and then use a parallel I/O or memory mapping to fill in the data for the chunks at the specified addresses.
As far as I can tell, the architecture for H5Dchunk_iter exists in 1.12. For example, the internal API call H5Ddebug appears to iterate through the B-tree to print this information for h5ls.
Would it be possible for this API feature to be backported to the 1.12 branch in order to increase itās availability? If so, I would be interested in contributing to that effort.
Hey @kittisopikulm, thanks for offering to contribute. Iāll let @derobins chime in, but looking at the RELEASE SCHEDULE at https://github.com/HDFGroup/hdf5 , HDF5 1.12 will have one more release and then be superseded by 1.14. In other words, Iām not sure if itād be worth the effort. There might be value in backporting it to HDF5 1.10, which has plenty of ārelease lifeā remaining.
Gerd is correct. 1.12 has a limited lifespan due to the incompatible VOL interface. We will do one more maintenance release early next year and then that will be the end of the line for that branch. At that point, the 1.14.0 release will be out, so thereād be little point in updating 1.12.
Iāll consider it, if time permits, but 1.14 features will take precedence.
You can see our release timeline right in the README.md for the HDF5 repo on GitHub. 1.8 and 1.12 will get their final releases at the end of this year or early next year.
On the way, I noticed that was an attempt by @lrknox to incorporate H5Dchunk_iter into 1.12.1 which was reverted due to a failure in testflushrefresh.sh:
That seem related to the concern raised by @nfortne2 here regarding the use of H5D__chunk_flush_entry.
We can add the call to 1.10, but it needs to go to 1.12 first and we should address the issue in #1419 in develop before moving the feature downstream. Other than that, I have no objections.
Yes, 1.12 will be retired at the end of the year, but we also strive to keep the maintenance branches as supersets of each other. Itās confusing to have features randomly implemented across maintenance branches.
Iāve completed the backports through 1.12 and 1.10 for H5Dchunk_iter. Additionally, the arguments are now consistent with the types used by H5Dget_chunk_info.