Incorrect returned name of an External link


#1

Hi,

Lets suppose that I have two files: file1.h5 and file2.h5.
Then In file1.h5 I create empty group myGroup.
Then I create external link to that group in fil2.h5.
After that I want to get link name of that external link

#include <iostream>

#include "hdf5.h"

#define FILE1_NAME "file1.h5"
#define FILE2_NAME "file2.h5"
#define GROUP_NAME "myGroup"
#define LINK_NAME "myLink"

int main(void) {
  hid_t file1, file2, group, link;

  file1 = H5Fcreate(FILE1_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
  file2 = H5Fcreate(FILE2_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

  group = H5Gcreate2(file1, GROUP_NAME, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);

  H5Lcreate_external(
        FILE1_NAME,
        GROUP_NAME,
        file2,
        LINK_NAME, H5P_DEFAULT, H5P_DEFAULT);

  // open created external link to a group as a group, is it correct?
  link = H5Gopen2(file2, LINK_NAME, H5P_DEFAULT);

  char link_name_out[20];
  H5Iget_name(link, link_name_out, 20);

  // returned nameof a link is `myGroup` and not `myLink` as I expected
  for (size_t i = 0; i < 20; i++)
    std::cout << link_name_out[i];
  std::cout << std::endl;

  return 0; // successfully terminated
}

The output of an example is:
image

I expect that returned name of a link would be the same as given when creating this link LINK_NAME but I get it equal to GROUP_NAME even if I opened this group (link) with the name of LINK_NAME
In the same time if I create soft link then the returned name is LINK_NAME (as expected) wich means there is no such problem whith soft link.

Moreover HDFViewer shows the name I expect:


#2

Comments inline:

#include "hdf5.h"

#include <iostream>
#include <string>

const std::string FILE1_NAME{"file1.h5"};
const std::string FILE2_NAME{"file2.h5"};
const std::string GROUP_NAME{"myGroup"};
const std::string LINK_NAME{"myLink"};

#define H5P_DEFAULTx2 H5P_DEFAULT, H5P_DEFAULT
#define H5P_DEFAULTx3 H5P_DEFAULT, H5P_DEFAULTx2

int main(void)
{
  // FILE_NAME1
  hid_t file = H5Fcreate(FILE1_NAME.c_str(), H5F_ACC_TRUNC, H5P_DEFAULTx2);
  // Create a group linked as "myGroup" in file1.h5.
  // Don't forget to close the acquired handle!
  // "You acquire a handle, you own it." (Colin Powell)
  H5Gclose(H5Gcreate2(file, GROUP_NAME.c_str(), H5P_DEFAULTx3));
  // No need to keep the file open. Let's close it!
  H5Fclose(file);

  // FILE_NAME2
  file = H5Fcreate(FILE2_NAME.c_str(), H5F_ACC_TRUNC, H5P_DEFAULTx2);
  // Create an external link in file2.h5 called "mylink" that
  // refers to (file1.h5, "myGroup")
  H5Lcreate_external(FILE1_NAME.c_str(), GROUP_NAME.c_str(),
                     file, LINK_NAME.c_str(), H5P_DEFAULTx2);
  // Create an external link in file2.h5 called "mylink1" that
  // refers to (file1.h5, "theGroupThatWasn'tThere")
  H5Lcreate_external(FILE1_NAME.c_str(), GROUP_NAME.c_str(),
                     file, "theGroupThatWasn'tThere", H5P_DEFAULTx2);

  // The following works, but just by dumb luck.
  // We make two (untested) assumptions:
  // 1. The external link can be resolved to an object in the same
  //    or another HDF5 file.
  // 2. The object (externally) linked that way is a group.
  // Assumptions 1 and 2 are valid, and the HDF5 library will go ahead,
  // open the file "mentioned" in the external link, traverse
  // the link according to the path name embedded in the external link,
  // and open the linked object (group).
  hid_t group = H5Gopen2(file, LINK_NAME.c_str(), H5P_DEFAULT);

  char link_name_out[20];
  // This is perhaps where the confusion kicks in. =group= is a handle
  // on an object in "file1.h5". To determine a(!) path name for this object,
  // H5Iget_name first determines the file that contains the object,
  // which happens to be "file1.h5". The ONLY path name for the group in that
  // file is indeed "/myGroup". In other words, H5Iget_name does NOT know
  // how we got a hold of that handle, that it involved the traversal of
  // an external link, etc. OK?
  H5Iget_name(group, link_name_out, 20);

  // returned nameof a link is `myGroup` and not `myLink` as I expected
  // Does that make more sense now?
  std::cout << std::string(link_name_out) << std::endl;

  H5Gclose(group);
  H5Fclose(file);

  return 0;
}

G.


#3

@gheber thank you for explanation.
In this case is there a way to retrieve three parameters from that external link:

  1. real link name that is equal to LINK_NAME
  2. path to where this link points to (in this example it is a GROUP_NAME)
  3. filename where the object that links points to resides (FILE1_NAME)

I thought that I should use combination of: H5Iget_name, H5Lget_val and H5Lunpack_elink_val
But H5Lget_val works with link name (that is equal to LINK_NAME) while H5Iget_name retrieves path to where this link points to


#4

Yes, almost there:

  1. First, H5Gopen the group (= a collection of links) in which the link that you want to examine resides.
  2. If you know the link name, use H5Lget_info to determine the link type (you are looking for H5L_TYPE_EXTERNAL)
    2a. If you don’t know the link name or want to examine all links in a group, use H5Literate to examine them one-by-one and, in the callback, call H5Lget_info to pick out the one(s) you like.
  3. Call H5Lget_val to retrieve the link “value.”
  4. Use H5Lunpack_elink_val to retrieve the file and path names.

CAUTION External links behave like symbolic links. There’s no guarantee that you can traverse such a link, i.e., resolve it to an object. Also, there are so-called link access properties, one of them being an external link prefix. This potentially affects how the file name is interpreted, i.e., as a file in the current directory or the file path prefixed by something else. OK?

G.


#5

@gheber thank you, I understood the idea.
it’s a pity that H5Iget_name returns the path to link for Hard and Soft links and for External link it returns the path to where target object resides. That have broken my logic :slight_smile:

Could you please explain a little bit how to use H5Pset_elink_prefix ? As I don’t fully understand what it exactly do (what it affects on) it is difficult to test it. For example in our example above how could I possible apply it?

#include "hdf5.h"

#include <iostream>
#include <string>

const std::string FILE1_NAME{"file1.h5"};
const std::string FILE2_NAME{"file2.h5"};
const std::string GROUP_NAME{"myGroup"};
const std::string LINK_NAME{"myLink"};

#define H5P_DEFAULTx2 H5P_DEFAULT, H5P_DEFAULT
#define H5P_DEFAULTx3 H5P_DEFAULT, H5P_DEFAULTx2

int main(void)
{
  // FILE_NAME1
  hid_t file = H5Fcreate(FILE1_NAME.c_str(), H5F_ACC_TRUNC, H5P_DEFAULTx2);
  // Create a group linked as "myGroup" in file1.h5.
  // Don't forget to close the acquired handle!
  // "You acquire a handle, you own it." (Colin Powell)
  H5Gclose(H5Gcreate2(file, GROUP_NAME.c_str(), H5P_DEFAULTx3));
  // No need to keep the file open. Let's close it!
  H5Fclose(file);

  // FILE_NAME2
  file = H5Fcreate(FILE2_NAME.c_str(), H5F_ACC_TRUNC, H5P_DEFAULTx2);
  // Create an external link in file2.h5 called "mylink" that
  // refers to (file1.h5, "myGroup")
  H5Lcreate_external(FILE1_NAME.c_str(), GROUP_NAME.c_str(),
                     file, LINK_NAME.c_str(), H5P_DEFAULTx2);
  // Create an external link in file2.h5 called "mylink1" that
  // refers to (file1.h5, "theGroupThatWasn'tThere")
  H5Lcreate_external(FILE1_NAME.c_str(), GROUP_NAME.c_str(),
                     file, "theGroupThatWasn'tThere", H5P_DEFAULTx2);

  // ----------Here I'm trying to set external prefix----------
  hid_t groupAccess = H5Pcreate(H5P_GROUP_ACCESS);
  H5Pset_elink_prefix(groupAccess, "my/prefix");
  hid_t group = H5Gopen2(file, LINK_NAME.c_str(), groupAccess);

  char link_name_out[20];
  // This is perhaps where the confusion kicks in. =group= is a handle
  // on an object in "file1.h5". To determine a(!) path name for this object,
  // H5Iget_name first determines the file that contains the object,
  // which happens to be "file1.h5". The ONLY path name for the group in that
  // file is indeed "/myGroup". In other words, H5Iget_name does NOT know
  // how we got a hold of that handle, that it involved the traversal of
  // an external link, etc. OK?
  H5Iget_name(group, link_name_out, 20);

  // returned nameof a link is `myGroup` and not `myLink` as I expected
  // Does that make more sense now?
  std::cout << std::string(link_name_out) << std::endl;

  H5Gclose(group);
  H5Fclose(file);

  return 0;
}

I understand that oppening the external link as a group is not the best way but I work with HDF5 and HDF5 C++ external wrapper so I can’t establish a connection between External link and corresponding C++ object oriented model

If I have only Eternal link id then is it possible to get filename and path to where this external link resides (not the target but exactly this External link)?

And how to test if the link is dangling (dead)?


#6

Let’s say you’ve created an external link in foo.h5 and the external link refers to a path name in file bar.h5. If (by intention or accident) the relative location of foo.h5 and bar.h5 changes, the external link becomes invalid, because the HDF5 library can’t locate the file bar.h5. To deal with a situation like that, you can create a so-called link access property list, which contains a prefix that you’d like to apply to the file names of external links before the library attempts to traverse them. That prefix is added to the property list via H5Pset_elink_prefix (https://portal.hdfgroup.org/display/HDF5/H5P_SET_ELINK_PREFIX). In your modified example, just move the file referenced in the external link and see it break. Than adjust the prefix, and see it return to the expected behavior. OK?

I’m not sure I understand what you mean by an “external link ID.” There is no such thing. If you have an identifier/handle (of type hid_t) it’s either invalid H5I_INVALID_ID or refers to something, such as a dataset, a group, an attribute, a dataspace, etc.

There’s some good news and some not-so-good news. The good news is that for so-called hard links (non-symbolic and non-user-define links) since they are reference counted, dangling is not an option. All bets are off otherwise. How many times a day do you encounter broken links on the interweb? It’s trial and error, step-by-step if you want to make it robust. Depending on your use case, you can optimize your strategy, i.e., if you expect objects, you can always try H5Oopen (notice the lapl argument?) https://portal.hdfgroup.org/display/HDF5/H5O_OPEN, but be prepared for an H5I_INVALID_ID return value. Combined with proper handle discipline, you can’t really “break” anything that way. OK?

G.


#7

@gheber thank you for explanation, now most of things became clear.

I will try to ask more precisely because it is an important thing to understand to me.

I mean let’s suppose I have dataset data in file1.h5. Then I create external link to that dataset in file2.h5 as "myLink". After that I open this link as dataset hid_t data_link = H5Dopen(file2, "myLink");.
So from now I have data_link and let’s suppose that I have forgotten all information about this link (the name of a link myLink, file1.h5, file2.h5 and data <- all of this I forgot). I want to get parent group and file where this link resides. As it was said before through H5Iget_name and H5Lget_val I can get information about the path where the data exist and further its file, so there is no problem to get information about data and file1.h5. But how can I get parent object ID (file or group where link resides) and link name if I only have ID of that dataset gotten through that link data_link (in this case I want to get back "myLink" and file2)?


#8

data_link is a handle to a dataset. Internally, it’s just an integer and it has no information on how it came about. It’s ephemeral, in the sense that it goes away when you close the dataset or your application shuts down, and it’s very likely to be different the next time you open the same dataset.

That’s an impossible task for several reasons. For one, data_link has nothing to do with links or an HDF5 file’s link structure. The information you are looking for is not contained in data_link. Secondly, this is a bit like asking which pages on the internet are referencing a given URL. While that’s possible to track in principle (via bi-directional links), it’s not really practical. It may appear that way, but an HDF5 object does NOT have a “parent object.” This concept makes sense only in special types of graphs, e.g., trees. An HDF5 file’s link structure is generally a (directed) multi-graph, i.e., there can be multiple egdes between two nodes (different link names, though), a given node can have multiple different “parents”, and there can be even cycles. (Who’s the parent and who’s the child in a graph that contains cycles?)

HDF5 links are single-source/single-destination and uni-directional, and the source is implicitly the hosting group. There is a designated root group from which that web can be “disentangled”. If you want to think of groups as boxes containing something, then they contain links. They do NOT contain objects. That (conceptual) simplification is only possible for tree-like graph structures. Since HDF5 is not built with that assumption, there are no built-in facilities for this special case.

OK? G.


#9

@gheber thank you for such informative answer.
Now I understand the hdf5 logic a little bit better.