Backporting H5Dchunk_iter to 1.12 and 1.10?


#1

H5Dchunk_iter is a great addition the API and really helps to make the chunked data within a HDF5 file available via other schemes. For example, a HDF5 file could be a Zarr shard by appending a dataset containing the linear offsets (addr) and the number of bytes (nbytes). H5Dchunk_iter appears to be the fastest way to obtain this information.

Another opportunity here would be have HDF5 allocate the space for the chunks early, obtain the chunk addresses efficiently via H5Dchunk_iter, and then use a parallel I/O or memory mapping to fill in the data for the chunks at the specified addresses.

As far as I can tell, the architecture for H5Dchunk_iter exists in 1.12. For example, the internal API call H5Ddebug appears to iterate through the B-tree to print this information for h5ls.

Would it be possible for this API feature to be backported to the 1.12 branch in order to increase it’s availability? If so, I would be interested in contributing to that effort.


#2

Hey @kittisopikulm, thanks for offering to contribute. I’ll let @derobins chime in, but looking at the RELEASE SCHEDULE at https://github.com/HDFGroup/hdf5 , HDF5 1.12 will have one more release and then be superseded by 1.14. In other words, I’m not sure if it’d be worth the effort. There might be value in backporting it to HDF5 1.10, which has plenty of “release life” remaining.

Best, G.


#3

Gerd is correct. 1.12 has a limited lifespan due to the incompatible VOL interface. We will do one more maintenance release early next year and then that will be the end of the line for that branch. At that point, the 1.14.0 release will be out, so there’d be little point in updating 1.12.

I’ll consider it, if time permits, but 1.14 features will take precedence.


#4

Thank you. This is very insightful. I did not quite appreciate that 1.12 was coming to an end so soon.


#5

You can see our release timeline right in the README.md for the HDF5 repo on GitHub. 1.8 and 1.12 will get their final releases at the end of this year or early next year.


#6

I’ve attempted a backport to HDF5 1.10 in

On the way, I noticed that was an attempt by @lrknox to incorporate H5Dchunk_iter into 1.12.1 which was reverted due to a failure in testflushrefresh.sh:

That seem related to the concern raised by @nfortne2 here regarding the use of H5D__chunk_flush_entry.


#7

We can add the call to 1.10, but it needs to go to 1.12 first and we should address the issue in #1419 in develop before moving the feature downstream. Other than that, I have no objections.


#8

I’m getting mixed messages. But, yes, let’s address #1419 first.


#9

Yes, 1.12 will be retired at the end of the year, but we also strive to keep the maintenance branches as supersets of each other. It’s confusing to have features randomly implemented across maintenance branches.


#10

I’ve addressed the issue in #1419 via


#11

Draft pull request for 1.12 branch created by reverting #733:

ctest passes locally for me:

100% tests passed, 0 tests failed out of 2101