External links & VFDs

As followup to the discussion in this week’s HUG21, to elaborate more on the relationship and issues of external links and VFD’s: All is fine if the external links should be opened with the same VFD as the main file. But that is not always the case or desirable.

In theory, an external link should work like an HDF5 filter for a dataset: The application (e.g. h5ls) can just open the object (dataset, group), not knowing that it refers to an external location. However, data in that external location may require another VFD to be used, for instance, that dataset is stored in an HDF5 file that was created using the split file driver, or it may even refer to an online location that requires access to via the S3 VFD.

Currently to handle such cases, the external link needs to be opened with the VFD set in the H5Pset_elink_fapl() file access property lists. That means that the application needs to check every dataset and every group in every HDF5 file to find out whether it is an external link or not, and if it is, then do some special VFD FAPL handling to find out which VFD to use.

That is utterly unsatisfying and not practical. Particularly with VFD’s becoming runtime plugins, like H5Z filters, there ought to be a better, more automatic way such that any application can follow external links without such special handling.

1 Like