Merge 2 groups from the same h5 file


#1

Is there a way to merge 2 groups from the same h5 file. The 2 groups have the same structure.
the output could be a group in the same or a new h5 file


#2

What do you mean by that? Groups are collections of (named) links. If you mean by “same structure” that the links have the same names in both groups, then, no, they can’t be merged without renaming some of the links, because link names must be unique in the scope of a group.

Assuming the link names are different, then yes, you can just create a new group and copy the links into the new group via 'H5Lcopy`.

Best, G.


#3

Thank you for your reply. I mean in a way that each dataset under the first group appends to the second group dataset that have same name like when merging ROOT files


#4

Hi @hamdaouihassane,

The merging you are describing is possible if certain conditions are met/respected, one of them being that the dataset of the second group is extendible (so that it can be extended to store the dataset of the first group).


#5

There are perhaps two different approaches:

  1. You don’t want to copy any data
  2. Copying data is not an issue
    (There is no such thing as moving dataset elements in HDF5.)

Under 1. you would just create a new dataset whose layout is virtual and then map the (existing) constituent datasets. It’s just a metadata operation. No data is copied, but the virtual dataset looks and feels like the merger of its constituents.

Option 2. is a little more labor-intensive because there is no merge operation in HDF5. (There is an append function in the high-level library.) In this case, just create a dataset that can accommodate the constituents,
and then read and write the constituent datasets. Again, this one copies data, and if your datasets are large they may not fit into memory and you’d have to page through them, etc.

In both cases, the source and destination datasets can be in different files. The only reason I can think of where you don’t want virtual datasets is when your constituent datasets are tiny, and you’ve got lots of them. The overhead of indirection and more metadata will lead to poorer performance, especially for selections. Consolidating them into a larger dataset should improve performance.

Does that answer your question? (I don’t know anything about ROOT.)

Best, G.


#6

the datasets are large so I will try the first option thanks greatly for your help