Can we use external links to read-only data in writable files?

(Just posted to stackoverflow - perhaps there is someone here who can help - apologies for cross posting)

https://stackoverflow.com/questions/70967898/how-can-we-create-an-externallink-to-data-in-read-only-files-using-h5py-hdf5

The problem is to create an external link to data in a file that is read-only and then read that data in a file where we are writing results. I think my code is failing because it tries to open the external source data files with the same file permission as the results file.

Something like H5Pset_elink_fapl looks like it might help ? But I didn’t figure out how to use it from python. Is this even possible, or should we be doing things differently anyway?

Thanks for your help!

I think you want H5Pset_elink_acc_flags. I do not know if that’s exposed in h5py, but very likely present in the low-level interface.

Best, G.

I did not locate this function in h5py so I attempted to create an external link using some C code (below). This gives the same problem so I am probably missing a step somewhere. The external data in read only files are only showing up as readable if the file that contains the link is opened as read-only too.

Is this link access property list saved into the hdf5 file itself? I was not able to easily locate it in the detailed description of the file format. If this flag needs to be set every time the file is opened then it might be easier to just store the filename + dataset as strings…

Many thanks for your help!

import h5py, os, stat, sys

# create the target data as a read-only file
dname = 'target.h5'

if os.path.exists( dname ): 
    os.chmod( dname, stat.S_IRUSR|stat.S_IWUSR )
    os.remove(dname)
# create a target file with some data in it:
with h5py.File(dname,"w") as h:
    h['/root/data'] = list(range(20))
# make this file read only
os.chmod( dname, stat.S_IRUSR )

# create a file that links to the read only data
with h5py.File("linker.h5","w") as h:
    h['/root/pointer'] = h5py.ExternalLink( dname, '/root/data' )

# Now try add a read-only external link using H5Pset_elink_acc_flags:
with open( 'elink_create.c', 'wb' ) as ccode:
    ccode.write( b"""
#include <stdio.h>
#include "hdf5.h"
int main( int argc, char* argv[]){
    if (argc != 5) return -1;
    printf("Writing in %s::%s a link pointing to %s::%s\\n", argv[1],argv[2],argv[3],argv[4]);
    hid_t fout_id = H5Fopen( argv[1], H5F_ACC_RDWR, H5P_DEFAULT ); /* write to this file */
    hid_t lapl_id = H5Pcreate(H5P_LINK_ACCESS);
    herr_t err = H5Pset_elink_acc_flags( lapl_id, H5F_ACC_RDONLY );    
    err = H5Lcreate_external( argv[3], argv[4], fout_id, argv[2], H5P_DEFAULT, lapl_id );
    H5Fclose( fout_id );
    H5Pclose( lapl_id );
    return 0;
}
""")
    
os.system( 'h5cc elink_create.c -o elink_create' )
print(os.popen('elink_create linker.h5 readonlylink target.h5 /root/data').read())

# Attempt to open and read the data:
for mode in "r", "r+", "a":
    for linkname in '/readonlylink','/root/pointer':
        try:
            # now retreive the read only data
            with h5py.File("linker.h5",mode) as h:
                data = h[linkname][:]
                print( linkname,'mode',mode,data.shape )
        except Exception as e:
            print(linkname, 'mode',mode,'fails', str(e))
        

This outputs:

Writing in linker.h5::readonlylink a link pointing to target.h5::/root/data

/readonlylink mode r (20,)
/root/pointer mode r (20,)
/readonlylink mode r+ fails "Unable to open object (unable to open file, file name = 'target.h5', temp_file_name = 'target.h5')"
/root/pointer mode r+ fails "Unable to open object (unable to open file, file name = 'target.h5', temp_file_name = 'target.h5')"
/readonlylink mode a fails "Unable to open object (unable to open file, file name = 'target.h5', temp_file_name = 'target.h5')"
/root/pointer mode a fails "Unable to open object (unable to open file, file name = 'target.h5', temp_file_name = 'target.h5')"

OK, maybe there is some confusion here. To create the external link, you don’t need to do anything special, because the library will not attempt to traverse the link as part of the creation. Like symbolic links, you can create irresolvable external links. You will need the link access property list only on traversal, e.g., you call H5Oopen on the external object.

Since it is an access property list, it is not stored in the file.

OK? G.

Thanks for clarifying ! The short answer is “don’t use links for this”.

For debugging, it would help to bubble up some more error status when things go wrong for external files. I spent a while getting confused when links were working for h5dump but not when I tried to write results elsewhere in the file. Strace gives a difference of a “exists but you can’t write”:

openat(AT_FDCWD, "target.h5", O_RDWR)   = -1 EACCES (Permission denied)

versus when the file is missing:

openat(AT_FDCWD, "target.h5", O_RDWR)   = -1 ENOENT (No such file or directory)

I get the same error stack from hdf5 (via C or h5py) for both cases. This seems strange because it looks like H5FD__sec2_open sees what was the problem and wants to tell me? Perhaps the information gets lost in here somewhere:

  #008: ../../../src/H5Fint.c line 738 in H5F_prefix_open_file(): unable to open file, file name = 'target.h5', temp_file_name = 'target.h5'
    major: File accessibilty
    minor: Unable to open file

The H5E_clear_stack perhaps ?

Thank you for the detective work. I’ve created this GitHub issue.

Best, G.

It doesn’t look like h5py exposes set_elink_acc_flags even at the low level. Adding that would be easy enough, but you’d have to drop down to the low-level API to open the object - more work would be needed to expose a way to use it from the high-level API.

This also reminds me of a similar issue we had with virtual datasets pointing to read only files. In that case, you don’t even get a generic error message, you just get data that appears to be missing: