How to issue requests to HSDS Lambda?


#1

I successfully deployed the HSDS lambda function but have no idea about how to request it. I used the following example in Lambda test page. But it still returned the test response (e.g. welcome to hsds).

Here is a log from the cloudwatch:
2022-08-24T17:48:08.682-04:00 REQ> GET: /about [/home/admin/hsds.hdf5]
It seems the request is concatenated after /about.

{
“method”: “GET”,
“request”: “/datasets/test/value”,
“params”: {
“domain”: “/home/admin/hsds.hdf5”,
“select”: “[0:100,620,1401]”,
“bucket”: “test-bucket”
}
}


#2

Hi, thanks for trying out HSDS lambda functions!

For the lambda event, you’ll need to provide a JSON that will tell the code how to fetch the data. The structure is basically what’s documented here: https://github.com/HDFGroup/hdf-rest-api wrapped up in a json with keys: “method”, “path”, and “params”. (the example in the docs are using an out-dated format that had “request” instead of “path”).

method will be: “GET”, “POST”, “PUT”, or “DELETE”. (the standard http actions)
path: is just the http path (i.e. the part after the service endpoint)
params: are the query parameters for the request

Also you can have a “body” key (for POST requests), and a “headers” key for any http headers (typically you don’t need this).

In the AWS Management Console for Lambda, it’s convenient to setup some test events like this one:

  {
      "method": "GET",
      "path": "/datasets/d-d29fda32-85f3-11e7-bf89-0242ac110008/value",
      "params": {
        "domain": "/nrel/wtk-us.h5",
        "select": "[:5000,620,1401]",
        "bucket": "nrel-pds-hsds"
      }
    }

With these you should get an output equivalent to the following h5pyd code:

f = h5pyd.File("/nrel/wtk-us.h5", bucket="nrel-pds-hsds")
dset = f["windspeed_80m"]
data = dset[0:100,620,1401]
print(data)

Let me know if that works for you.

In your example, I’m guessing that lambda code isn’t able to find any relevant data. It should return a 404 (not found) or 403 (unauthorized), but instead is just returning the /about response. I’ll look into this.


#3

Hi John,

I replaced the “request” in json with “path” and changed the dataset name to its id as follows. Seems the request was parsed correctly but I still got a 500 status code. The log is also attached.

{
“method”: “GET”,
“path”: “/datasets/d-2f22962d-e0592840-e15c-625d94-9e4cbf/value”,
“params”: {
“domain”: “/home/admin/hsds.hdf5”,
“select”: “[0:100,0:100]”,
“bucket”: “hsds-hyperslab-test”
}
}

REQ> GET: /datasets/d-2f22962d-e0592840-e15c-625d94-9e4cbf/value [/home/admin/hsds.hdf5]
sn INFO> getObjectJson d-2f22962d-e0592840-e15c-625d94-9e4cbf
sn INFO> http_get(‘http+unix://%2Ftmp%2Fhs161b1acd%2Fdn_1.sock/datasets/d-2f22962d-e0592840-e15c-625d94-9e4cbf’)
sn INFO> Initiating UnixConnector with path: /tmp/hs161b1acd/dn_1.sock
sn INFO> Socket Ready: /tmp/hs161b1acd/dn_1.sock
sn INFO> http_get status: 500 for req: http://127.0.0.1/datasets/d-2f22962d-e0592840-e15c-625d94-9e4cbf
sn ERROR> request to http://127.0.0.1/datasets/d-2f22962d-e0592840-e15c-625d94-9e4cbf failed with code: 500
INFO: hsds app stop at 1661538878.1163843
INFO: sending SIGINT to hsds-servicenode
INFO: sending SIGINT to hsds-rangeget
INFO: sending SIGINT to hsds-datanode


#4

Do you see any ERROR lines in the cloudwatch logs that start with dn.


#5

There is one line:

dn1 ERROR> FileClient init: root_dir config not set


#6

Ah - you’ll need to set the AWS_S3_GATEWAY environment variable in the Lambda configuration table to “http://s3.region.amazonaws.com”, where region is the region you are running lambda in (e.g. us-east-1).
Also set BUCKET_NAME if you want to have a default bucket location.


#7

I see. Let me take a try.


#8

dn no longer has error but sn still has the same error with 500 code.

sn INFO> http_get status: 500 for req: http://127.0.0.1/datasets/d-2f22962d-e0592840-e15c-625d94-9e4cbf
sn ERROR> request to http://127.0.0.1/datasets/d-2f22962d-e0592840-e15c-625d94-9e4cbf failed with code: 500


#9

Are you sure there is not an ERROR line from the dn node? Typically the SN errors are reflecting an error from the DN node. In cloud watch, you’ll want to click “Load more” to see older events.

Also, try setting the LOG_LEVEL to DEBUG in the configuration. This might provide more clues to what is going on.

If nothing obvious shows up, please download the logs and send them to help@hdfgroup.org. If you use an event like the NREL one I posted earlier, I can run myself and compare the results.


#10

I solved the previous error by building the image from the repo. However, I got a 403 error while sending value request. I’ve set the read permission for the lambda function using hsacl.

022-08-29T16:18:59.094-04:00 make_request: http+unix://%2Ftmp%2Fhs91fa38fa%2Fsn_1.sock/datasets/d-2f22962d-e0592840-e15c-625d94-9e4cbf/value
2022-08-29T16:18:59.403-04:00 got status_code: 403 from req: http+unix://%2Ftmp%2Fhs91fa38fa%2Fsn_1.sock/datasets/d-2f22962d-e0592840-e15c-625d94-9e4cbf/value
2022-08-29T16:18:59.405-04:00 request done

I’ve sent the whole log to your email.


#11

I tested a smaller selection and it finally worked.


#12

To make a larger selection (about 1G data), I changed the max_request_size and client_max_body_size to a larger number. But after running the lambda, it gave an error of running out of memory (I set to 10G). How does reading 1G data run out of 10G memory? Is there anything I did wrong here?


#13

Glad it worked with the smaller selection!

One problem with the Lambda implementation for HSDS is that it only supported JSON responses. For data selections, converting binary data to JSON adds a lot of overhead and memory usage.

To improve this I pushed out an HSDS update yesterday that enables hex-encoded responses. You just need to add a header specifying “octet-stream”. Here’s an example event:

{
  "method": "GET",
  "path": "/datasets/d-096b7930-5dc5b556-dbc8-00c5ad-8aca89/value",
   "headers": {
    "accept": "application/octet-stream"
  },
  "params": {
    "domain": "/nrel/nsrdb/v3/nsrdb_2000.h5",
    "select": "[0:1000,0:1000]",
    "bucket": "nrel-pds-hsds"
  }
}

You’ll still get a JSON response from Lambda, but the body key will have a hex-encoded value (i.e. it will use twice as many bytes as the binary equivalent).

The above request took 6.3 seconds to run and consumed 268 MB of memory.

I was hopefully that a larger selection would work as well, but with a [1000,10000] selection I get an AWS error:

“Response payload size exceeded maximum allowed payload size (6291556 bytes).”

So it looks like there’s no support yet for responses larger than 6MB.

What would be nice would be if AWS Lambda supported true binary responses and HTTP streaming as discussed in my blog from last week: https://www.hdfgroup.org/2022/08/hsds-streaming/. Amazon has been adding new features to Lambda each year, so we can be hopeful!

BTW, you can now use h5pyd with Lambda. You just need to setup your .hscfg like this:

hs_endpoint = http+lambda://hslambda
hs_username = hslambda
hs_password = lambda
hs_api_key = None

Where the endpoint is “http+lambda://” plus the name of your lambda function (“hslambda” in my case). Other than that h5pyd programs should work the same as with a regular HSDS server (if not as fast).

Anyway, would just setting up an HSDS server be the best approach in your case? In my view Lambda works best for moderately sized selections when there is a fairly large reduction in the amount of data returned vs. the number of chunks touched. E.g. with the NSRDB data above, if I have a selection of [1234, 0: 2018392] it hits 10 GB of chunk data to return a 4MB response. In this case, it’s a big advantage to run Lambda in the same AWS region as the S3 store vs. having to move the entire 10 GB out of Amazon (say you were using the ros3 VFD on your laptop).