Authenticated AWS S3


#1

The ros3 driver is very slick. It works great for public h5 files.

I am trying to perform a read of an H5 file using ros3 that requires authentication credentials.
I have been successful in using h5ls/h5dump in accessing an h5 file against an authenticated AWS S3 call using --vfd=ros3 and --s3-cred=(<region,<key_id>,<key_secret>).

However, I cannot get this to work with h5py.
I see that in h5p.pyx, set_fapl_ros3 appears to accept strings for aws_region, secret_id, and secret_key. These appear to be the same three parameters that h5ls/h5dump accept for --s3-cred. However, I have not been able to get this to work.

My call looks like:
h5f = h5py.File(<url>, driver='ros3', aws_region='<region>'.encode('utf-8'), secret_id='<secret_id>'.encode('utf-8'), secret_key='<secret_key>'.encode('utf-8') )

The error is not too informative: OSError: Unable to open file (curl cannot perform request) and it breaks on the h5f.open() call

Can anyone share a minimum working example? I can’t seem to find this documented anywhere. Any help would be much appreciated. Thank you.


#2

Add this to your code:

h5py._errors.unsilence_errors()

and you should get the complete HDF5 library error stack. That may give some useful information about the problem.

-Aleksandar


#3

Thank you, that was very informative. I was able to get it working. My example syntax above was correct, I had an error in the URL. This works now!


#4

A couple follow-up notes/observations:

I was wrong, the authentication parameters are documented under the ros3 subsection of the File Drivers section of the File Objects page.

This S3 connector (both HDF5 and h5py) is not specific to just AWS’s S3 implementation. It appears to support generic S3 providers in both public and authenticated modes. For example, I was able to get the h5py ros3 driver to work with GCP Cloud Storage (public and authenticated) using their interoperability API that is S3 compatible. Authenticated calls are supported with HMAC keys.

Thank you again!