Merge 2 groups from the same h5 file

hamdaouihassane · December 17, 2021, 4:05pm

Is there a way to merge 2 groups from the same h5 file. The 2 groups have the same structure.
the output could be a group in the same or a new h5 file

gheber · December 18, 2021, 1:24am

What do you mean by that? Groups are collections of (named) links. If you mean by “same structure” that the links have the same names in both groups, then, no, they can’t be merged without renaming some of the links, because link names must be unique in the scope of a group.

Assuming the link names are different, then yes, you can just create a new group and copy the links into the new group via 'H5Lcopy`.

Best, G.

hamdaouihassane · December 20, 2021, 5:24pm

Thank you for your reply. I mean in a way that each dataset under the first group appends to the second group dataset that have same name like when merging ROOT files

contact · December 21, 2021, 1:03am

Hi @hamdaouihassane,

The merging you are describing is possible if certain conditions are met/respected, one of them being that the dataset of the second group is extendible (so that it can be extended to store the dataset of the first group).

gheber · December 21, 2021, 12:18pm

There are perhaps two different approaches:

You don’t want to copy any data
Copying data is not an issue
(There is no such thing as moving dataset elements in HDF5.)

Under 1. you would just create a new dataset whose layout is virtual and then map the (existing) constituent datasets. It’s just a metadata operation. No data is copied, but the virtual dataset looks and feels like the merger of its constituents.

Option 2. is a little more labor-intensive because there is no merge operation in HDF5. (There is an append function in the high-level library.) In this case, just create a dataset that can accommodate the constituents,
and then read and write the constituent datasets. Again, this one copies data, and if your datasets are large they may not fit into memory and you’d have to page through them, etc.

In both cases, the source and destination datasets can be in different files. The only reason I can think of where you don’t want virtual datasets is when your constituent datasets are tiny, and you’ve got lots of them. The overhead of indirection and more metadata will lead to poorer performance, especially for selections. Consolidating them into a larger dataset should improve performance.

Does that answer your question? (I don’t know anything about ROOT.)

Best, G.

hamdaouihassane · December 28, 2021, 5:25pm

the datasets are large so I will try the first option thanks greatly for your help

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Merge 2 groups from the same h5 file