Glad it worked with the smaller selection!
One problem with the Lambda implementation for HSDS is that it only supported JSON responses. For data selections, converting binary data to JSON adds a lot of overhead and memory usage.
To improve this I pushed out an HSDS update yesterday that enables hex-encoded responses. You just need to add a header specifying “octet-stream”. Here’s an example event:
{
"method": "GET",
"path": "/datasets/d-096b7930-5dc5b556-dbc8-00c5ad-8aca89/value",
"headers": {
"accept": "application/octet-stream"
},
"params": {
"domain": "/nrel/nsrdb/v3/nsrdb_2000.h5",
"select": "[0:1000,0:1000]",
"bucket": "nrel-pds-hsds"
}
}
You’ll still get a JSON response from Lambda, but the body key will have a hex-encoded value (i.e. it will use twice as many bytes as the binary equivalent).
The above request took 6.3 seconds to run and consumed 268 MB of memory.
I was hopefully that a larger selection would work as well, but with a [1000,10000] selection I get an AWS error:
“Response payload size exceeded maximum allowed payload size (6291556 bytes).”
So it looks like there’s no support yet for responses larger than 6MB.
What would be nice would be if AWS Lambda supported true binary responses and HTTP streaming as discussed in my blog from last week: https://www.hdfgroup.org/2022/08/hsds-streaming/. Amazon has been adding new features to Lambda each year, so we can be hopeful!
BTW, you can now use h5pyd with Lambda. You just need to setup your .hscfg like this:
hs_endpoint = http+lambda://hslambda
hs_username = hslambda
hs_password = lambda
hs_api_key = None
Where the endpoint is “http+lambda://” plus the name of your lambda function (“hslambda” in my case). Other than that h5pyd programs should work the same as with a regular HSDS server (if not as fast).
Anyway, would just setting up an HSDS server be the best approach in your case? In my view Lambda works best for moderately sized selections when there is a fairly large reduction in the amount of data returned vs. the number of chunks touched. E.g. with the NSRDB data above, if I have a selection of [1234, 0: 2018392] it hits 10 GB of chunk data to return a 4MB response. In this case, it’s a big advantage to run Lambda in the same AWS region as the S3 store vs. having to move the entire 10 GB out of Amazon (say you were using the ros3 VFD on your laptop).