ROS3: Temporary security credentials

Hello,

The other way, I extended the ROS3 implementation by also allowing to specify a session/security token needed to gain access to S3 when using temporary security credentials. This implementation is a proof-of-concept implementation to demonstrate it works and yes I got it working. To make it all work I needed to modify the C struct H5FD_ros3_fapl_t by added another char array to be able to store the session token.

Now, I would like to change my proof-of-concept implementation such I can upstream it to HDF5 github repo. When reading the documentation I noticed there several compatibility macros when a struct is changed.

My question is this the desired implementation?

It will mean I need rename old H5FD_ros3_fapl_t struct to H5FD_ros3_fapl1_t and create a new H5FD_ros3_fapl2_t with the additional char array for storing then session token. Furthermore, create similar functions which make use of H5FD_ros3_fapl2_t instead of the old struct. It is all a bit more involved than I was expecting so I want to be sure that this is the desired implementation.

1 Like

A moment ago, I noticed that ROS3 implementation is a Virtual File Driver implementation which makes it even more complex or even impossible to use some kind of compatibility macros.

One option that would work is create another Virtual File Driver implementation based on the current ROS3 implementation. The unfortunate thing about this approach is almost a copy of the original implementation with a few tweaks to handle the additional session token char array.

Second option is to extend the current ROS3 implementation resulting most likely an API breaking change. This is definitely not in the spirit of HDF5 library since you all put a lot of effort to keep everything backwards compatible.

The last option I can think of is not supporting the temporary security credentials at all and leave the current as is. This would be vey unfortunate for me and other users which deal with these temporary credentials.

If you have any other suggestions. Please let me know.

Of course, it is highly desired especially if it works with NASA cloud data [1].
The Step 2 in [1] explains how to get the temporary token for NASA cloud data access.
If you haven’t tried your implementation with [1], please try it as a test case and let us know how it went.

[1] https://nsidc.org/data/user-resources/help-center/nasa-earthdata-cloud-data-access-guide

Hi @jan-willem.blokland,

Typically, any past changes to a VFD’s API appear to have been done in an API-breaking manner, with the changes being released in the next major release of HDF5. While that could work in this case, it would mean that you might be waiting a while before HDF5 1.16 appears. Also, I think it would be good to start integrating API-breaking changes to VFDs with the backward compatibility macro infrastructure, which would complicate things a bit as you noted.

In the spirit of easier software maintenance, I would generally prefer not to duplicate the existing ROS3 VFD, but there are at least two other implementation strategies that I can think of which might be used to work around the versioning issue without needing to wait for the next major release of HDF5:

  • Allow the session token to be specified in an environment variable

For better or worse, usage of environment variables has become more prevalent in HDF5, mostly to support VOL connectors. I don’t know the exact security implication of using an environment variable to specify your session token, but this seems like an easy approach that would only add new code without needing versioning.

  • Add a separate FAPL API routine for specifying session tokens

This is a typical way that has been used to extend VFD/VOL functionality without making API-breaking changes. HDF5’s property list interface is already fairly large and growing, but its intended purpose was to make HDF5 flexible so I think it makes sense to use here.

It can be fairly cumbersome if you haven’t worked with HDF5’s property list interface before, but the way you’d usually implement this is to have the new API routine accept a FAPL ID parameter and then have the routine check for and set a new property on the property list that contains your session token using H5Pexist, H5Pset and H5Pinsert2. You’d also need to implement a minimum set of callbacks that handle working with the new property, but those are usually fairly small. Then, the ROS3 VFD code would be updated to check for the property on the FAPL during file open and react accordingly.

Assuming the code would be upstreamed into HDF5, an example implementation (quickly copy and pasted, so might need some work) might be something like:

#define ROS3_CRED_PROP_NAME "ros3_cred_prop"

herr_t
H5Pset_fapl_ros3_creds(hid_t fapl_id, const char *cred)
{
    H5P_genplist_t *plist = NULL;
    htri_t          prop_exists;
    herr_t          ret_value = FAIL;

    FUNC_ENTER_API(FAIL)
    H5TRACE2("e", "i*s", fapl_id, cred);

    if (fapl_id == H5P_DEFAULT)
        HGOTO_ERROR(H5E_PLIST, H5E_BADVALUE, FAIL, "can't set values in default property list")
    if (NULL == (plist = H5P_object_verify(fapl_id, H5P_FILE_ACCESS)))
        HGOTO_ERROR(H5E_PLIST, H5E_BADTYPE, FAIL, "not a file access property list")

    if ((prop_exists = H5P_exist_plist(plist, ROS3_CRED_PROP_NAME)) < 0)
        HGOTO_ERROR(H5E_PLIST, H5E_CANTGET, FAIL, "failed to check if property exists in plist")

    if (prop_exists) {
        if (H5P_set(plist, ROS3_CRED_PROP_NAME, &cred) < 0)
            HGOTO_ERROR(H5E_PLIST, H5E_CANTSET, FAIL, "unable to set value")
    }
    else {
        if (H5P_insert(plist, ROS3_CRED_PROP_NAME, sizeof(char *), &cred, NULL, NULL, NULL, NULL,
            H5FD__ros3_str_prop_delete, H5FD__ros3_str_prop_copy, H5FD__ros3_str_prop_cmp, H5FD__ros3_str_prop_close) < 0)
            HGOTO_ERROR(H5E_PLIST, H5E_CANTREGISTER, FAIL, "unable to register property in plist")
    }

done:
    FUNC_LEAVE_API(ret_value)
}

static herr_t
H5FD__ros3_str_prop_copy(const char *name, size_t size, void *_value)
{
    char **value     = (char **)_value;
    herr_t ret_value = SUCCEED;

    FUNC_ENTER_PACKAGE

    if (*value)
        if (NULL == (*value = HDstrdup(*value)))
            HGOTO_ERROR(H5E_RESOURCE, H5E_CANTALLOC, FAIL, "can't copy string property")

done:
    FUNC_LEAVE_NOAPI(ret_value);
}

static herr_t
H5FD__ros3_str_prop_cmp(const void *_value1, const void *_value2, size_t size)
{
    char *const *value1 = (char *const *)_value1;
    char *const *value2 = (char *const *)_value2;
    int          ret_value;

    FUNC_ENTER_PACKAGE

    if (*value1) {
        if (*value2)
            ret_value = HDstrcmp(*value1, *value2);
        else
            ret_value = 1;
    }
    else {
        if (*value2)
            ret_value = -1;
        else
            ret_value = 0;
    }

done:
    FUNC_LEAVE_NOAPI(ret_value)
}

static herr_t
H5FD__ros3_str_prop_close(const char *name, size_t size, void *_value)
{
    char **value = (char **)_value;
    herr_t ret_value = SUCCEED;

    FUNC_ENTER_PACKAGE

    if (*value)
        HDfree(*value);

done:
    FUNC_LEAVE_NOAPI(ret_value)
}

static herr_t
H5FD__ros3_str_prop_delete(hid_t prop_id, const char *name, size_t size, void *_value)
{
    char **value = (char **)_value;
    herr_t ret_value = SUCCEED;

    FUNC_ENTER_PACKAGE

    if (*value)
        HDfree(*value);

done:
    FUNC_LEAVE_NOAPI(ret_value)
}

Then, you would update H5FD__ros3_open to check the FAPL for your new property using something like:

H5P_genplist_t *plist = NULL;
char           *cred  = NULL;

if (NULL == (plist = H5P_object_verify(fapl_id, H5P_FILE_ACCESS)))
    HGOTO_ERROR(H5E_PLIST, H5E_BADTYPE, NULL, "not a file access property list")

if ((prop_exists = H5P_exist_plist(plist, ROS3_CRED_PROP_NAME)) < 0)
    HGOTO_ERROR(H5E_PLIST, H5E_CANTGET, NULL, "failed to check if property exists in plist")

if (prop_exists) {
    if (H5P_get(plist, ROS3_CRED_PROP_NAME, &cred) < 0)
        HGOTO_ERROR(H5E_PLIST, H5E_CANTGET, NULL, "unable to get value")

    /* 'cred' now points to internal memory for string stored on FAPL. Do something with 'cred', but don't free or modify it! */
}
2 Likes

Hi @hyoklee,

The use case I am working on also makes use of the temporary credentials of AWS S3. Therefore, I am pretty confident that my solution will also work for NASA cloud data access.

1 Like

Hi @jhenderson,

Thanks for your suggestions. Reading your 2 suggestions, I think the best solution is the “add a separate FAPL API routine for specifying session tokens”. It does not break the API and avoids the usage of an environment variable. Personally, I consider the usage of environment variables a last resort solution.

I did not know one could extend a property list like this. For me it demonstrates that HDF5 is more flexible than I expected to be. Very nice to have this flexibility.

2 Likes