Authenticated AWS S3


#1

The ros3 driver is very slick. It works great for public h5 files.

I am trying to perform a read of an H5 file using ros3 that requires authentication credentials.
I have been successful in using h5ls/h5dump in accessing an h5 file against an authenticated AWS S3 call using --vfd=ros3 and --s3-cred=(<region,<key_id>,<key_secret>).

However, I cannot get this to work with h5py.
I see that in h5p.pyx, set_fapl_ros3 appears to accept strings for aws_region, secret_id, and secret_key. These appear to be the same three parameters that h5ls/h5dump accept for --s3-cred. However, I have not been able to get this to work.

My call looks like:
h5f = h5py.File(<url>, driver='ros3', aws_region='<region>'.encode('utf-8'), secret_id='<secret_id>'.encode('utf-8'), secret_key='<secret_key>'.encode('utf-8') )

The error is not too informative: OSError: Unable to open file (curl cannot perform request) and it breaks on the h5f.open() call

Can anyone share a minimum working example? I can’t seem to find this documented anywhere. Any help would be much appreciated. Thank you.


#2

Add this to your code:

h5py._errors.unsilence_errors()

and you should get the complete HDF5 library error stack. That may give some useful information about the problem.

-Aleksandar


#3

Thank you, that was very informative. I was able to get it working. My example syntax above was correct, I had an error in the URL. This works now!


#4

A couple follow-up notes/observations:

I was wrong, the authentication parameters are documented under the ros3 subsection of the File Drivers section of the File Objects page.

This S3 connector (both HDF5 and h5py) is not specific to just AWS’s S3 implementation. It appears to support generic S3 providers in both public and authenticated modes. For example, I was able to get the h5py ros3 driver to work with GCP Cloud Storage (public and authenticated) using their interoperability API that is S3 compatible. Authenticated calls are supported with HMAC keys.

Thank you again!


#5

I can’t get this to work by specifying only the aws_region keyword (in addition to the driver).

This works:

>>> url = "https://...h5"  # Pointing to an S3 object
>>> h5f = h5py.File(url, driver="ros3")
>>> h5f
<HDF5 file "...h5" (mode r)>

but this fails:

>>> h5f = h5py.File(url, driver="ros3", aws_region="us-west-2".encode("utf-8"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/envs/gedi_subset/lib/python3.10/site-packages/h5py/_hl/files.py", line 502, in __init__
    fapl = make_fapl(driver, libver, rdcc_nslots, rdcc_nbytes, rdcc_w0,
  File "/opt/conda/envs/gedi_subset/lib/python3.10/site-packages/h5py/_hl/files.py", line 166, in make_fapl
    set_fapl(plist, **kwds)
  File "/opt/conda/envs/gedi_subset/lib/python3.10/site-packages/h5py/_hl/files.py", line 78, in <lambda>
    _drivers['ros3'] = lambda plist, **kwargs: plist.set_fapl_ros3(**kwargs)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5p.pyx", line 1072, in h5py.h5p.PropFAID.set_fapl_ros3
ValueError: Invalid ros3 config (Inconsistent authentication information)

Should the aws_region be specified only when I also need to supply access_id and access_key as well? In other words, is aws_region not allowed for unauthenticated access?

I might suspect that aws_region is unnecessary when using the https url, so I tried using the corresponding s3:// url instead, and that bombed with the same error as above.

Further, when I used the s3:// url without specifying aws_region, I got the following error instead:

>>> h5 = h5py.File("s3://...h5", driver="ros3")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/envs/gedi_subset/lib/python3.10/site-packages/h5py/_hl/files.py", line 507, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
  File "/opt/conda/envs/gedi_subset/lib/python3.10/site-packages/h5py/_hl/files.py", line 220, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 106, in h5py.h5f.open
OSError: Unable to open file (invalid SCHEME construction)

Unfortunately, the official docs are not particularly informative regarding such details, so I’m having a hard time unraveling this mystery.


#6

Hi @chuck,

This is in the process of getting fixed in h5py. See h5py issue #2133 and the PR #2140. My first PR that fixes your authentication error has already been merged and the requirements are (hopefully) better explained in the latest documentation version.

-Aleksandar


#7

Perfect! Thanks for the prompt reply.