Backporting H5Dchunk_iter to 1.12 and 1.10?

kittisopikulm · July 15, 2022, 4:57am

H5Dchunk_iter is a great addition the API and really helps to make the chunked data within a HDF5 file available via other schemes. For example, a HDF5 file could be a Zarr shard by appending a dataset containing the linear offsets (addr) and the number of bytes (nbytes). H5Dchunk_iter appears to be the fastest way to obtain this information.

Another opportunity here would be have HDF5 allocate the space for the chunks early, obtain the chunk addresses efficiently via H5Dchunk_iter, and then use a parallel I/O or memory mapping to fill in the data for the chunks at the specified addresses.

As far as I can tell, the architecture for H5Dchunk_iter exists in 1.12. For example, the internal API call H5Ddebug appears to iterate through the B-tree to print this information for h5ls.

Would it be possible for this API feature to be backported to the 1.12 branch in order to increase it’s availability? If so, I would be interested in contributing to that effort.

gheber · July 15, 2022, 11:57am

Hey @kittisopikulm, thanks for offering to contribute. I’ll let @derobins chime in, but looking at the RELEASE SCHEDULE at https://github.com/HDFGroup/hdf5 , HDF5 1.12 will have one more release and then be superseded by 1.14. In other words, I’m not sure if it’d be worth the effort. There might be value in backporting it to HDF5 1.10, which has plenty of “release life” remaining.

Best, G.

derobins · July 15, 2022, 3:56pm

Gerd is correct. 1.12 has a limited lifespan due to the incompatible VOL interface. We will do one more maintenance release early next year and then that will be the end of the line for that branch. At that point, the 1.14.0 release will be out, so there’d be little point in updating 1.12.

I’ll consider it, if time permits, but 1.14 features will take precedence.

kittisopikulm · July 15, 2022, 4:10pm

Thank you. This is very insightful. I did not quite appreciate that 1.12 was coming to an end so soon.

derobins · July 15, 2022, 6:13pm

You can see our release timeline right in the README.md for the HDF5 repo on GitHub. 1.8 and 1.12 will get their final releases at the end of this year or early next year.

kittisopikulm · August 4, 2022, 12:52am

I’ve attempted a backport to HDF5 1.10 in

github.com/HDFGroup/hdf5

Backport H5Dchunk_iter to 1.10 branch

hdf5_1_10 ← mkitti:mkitti/h5dchunk_iter_1_10

opened 12:44AM - 04 Aug 22 UTC

mkitti

+659 -1

This backports H5Dchunk_iter #6 to the 1.10 branch from the 1.13 development bra…nch. xref: https://forum.hdfgroup.org/t/possibility-of-backporting-h5dchunk-iter-to-1-12/9971/2 To do: - [x] Backport tests from `test/chunk_iter.c` - [x] Backport #1969 to fix offsets that were not multiplied by the chunk dimensions (see #1419) - [x] Finish backport of `H5Dchunk_iter` to 1.12: #1970

On the way, I noticed that was an attempt by @lrknox to incorporate H5Dchunk_iter into 1.12.1 which was reverted due to a failure in testflushrefresh.sh:

github.com/HDFGroup/hdf5

Hdf5 1 12 1 - revert H5Dchunk_iter()

hdf5_1_12_1 ← lrknox:hdf5_1_12_1

opened 01:43PM - 20 May 21 UTC

lrknox

+369 -358

Test failures: testflushrefresh.sh - segfaulted in make check-passthrough-vol … chunk_info test with pgcc 19.10 and --enable-internal-debug-all on platypus had this error: Error: PGC-F-Subscript out of range for array chunk_infos (/home/hdftest/snapshots-hdf5_1_12_1/current/test/chunk_info.c: 1681) subscript=2, upper bound=1, dimension=1

That seem related to the concern raised by @nfortne2 here regarding the use of H5D__chunk_flush_entry.

derobins · August 4, 2022, 6:02pm

We can add the call to 1.10, but it needs to go to 1.12 first and we should address the issue in #1419 in develop before moving the feature downstream. Other than that, I have no objections.

kittisopikulm · August 4, 2022, 7:59pm

I’m getting mixed messages. But, yes, let’s address #1419 first.

derobins · August 4, 2022, 8:12pm

Yes, 1.12 will be retired at the end of the year, but we also strive to keep the maintenance branches as supersets of each other. It’s confusing to have features randomly implemented across maintenance branches.

kittisopikulm · August 5, 2022, 12:24am

I’ve addressed the issue in #1419 via

kittisopikulm · August 5, 2022, 3:49am

Draft pull request for 1.12 branch created by reverting #733:

ctest passes locally for me:

100% tests passed, 0 tests failed out of 2101

kittisopikulm · December 21, 2022, 2:28am

I’ve completed the backports through 1.12 and 1.10 for H5Dchunk_iter. Additionally, the arguments are now consistent with the types used by H5Dget_chunk_info.

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Backporting H5Dchunk_iter to 1.12 and 1.10?