Hi! We are trying to use the NREL sup3r data but having no luck. When I try to run the “using the data” example [sup3r/examples/sup3rcc/using_the_data.ipynb at main · NREL/sup3r (github.com)] I get the same error discussed, that is “Error retrieving data: None”. Is it possible that even to run this example I would have to have a local HSDS server set up? Is there any way/setting to increase the capacity of the API?
In analyzing this example, I am able to see data labels, such as:
dsets = handler_2015.dsets
print(dsets)
But I get an error and 0 bytes read when it goes to use/access the actual data, such as in the command:
coord = (39.741, -105.171)
gid = handler_2015.lat_lon_gid(coord)
It does not seem like I should have to set up a local HSDS server just for this small example. Thanks for your help!
Colin
Hi,
I hadn’t come across the sup3r package before, thanks for bringing it to my attention.
The NREL HSDS service can be heavily loaded at times. Also each user’s API key has a request limit. If that gets exceeded, I don’t think REX surfaces any indication that is the reason.
I’d suggest trying to run your own HSDS server as an experiment at least. You can just do: ./runall.sh --no-docker-tcp
and HSDS will be accessible on http://localhost:5101. Is there a way to setup the HSDS endpoint in sup3r?
Thanks jready for your response. I started to set up my own HSDS server. I am actually running a Windows machine, so I am trying to run runall.bat. I am struggling a little here because the rex site ( Highly Scalable Data Service (HSDS) — rex 0.2.83 documentation (nrel.github.io) has a set of instructions to set up the server that differ from the HDF Group HSDS site.
For example, will a windows machine refer to the file override.yml?
Yes, you can still use override.yml on Windows. The runall.bat script is just:
hsds --root_dir %ROOT_DIR% --host localhost --port 5101 --password_file admin/config/passwd.txt --logfile hs.log --loglevel DEBUG --config_dir=admin/config --count=4
The --config-dir
argument gives the directory to look at for the config.yml and override.yml files.
I did find a bug when testing on Windows today with Anaconda Python… the HSDS binaries weren’t found in the expected place. I have a fix in the branch “winfix” that you can try out if you like.
There’s another issue with that needs to get sorted out though before you could use the HSDS package and fetch data from the NREL data on AWS S3…
When you have the argument --root_dir
argument (or ROOT_DIR
environment variable), HSDS expects to use POSIX for both reading and writing data. In this case though, we’d want to use S3 for reading from the NREL bucket, but use POSIX (i.e. a directory on the machine’s drive) for default read/write operations.
Let me take a look at this and see what I can come up with.
Thanks! This is excellent information.
I will try your winfix branch. I haven’t cloned from branches before so looking for the right command.
Yes, let me know when you have the reading issue resolved. I will be ready to use it.
So far, I did get a local HSDS server going with runall.bat and I had success using testall.py.
Then I ran hsconfigure as follows:
Updated endpoint [http://localhost:5101]:
Username [test_user1]:
Password [test]:
API Key [None]:
Testing connection…
connection ok
But after that, I was not able to access data with simple code like this:
import h5pyd
with h5pyd.Folder(‘/nrel/’) as f:
print(list(f))
Thanks so much for your help! I look forward to hearing more.
If you are able to get the server running with runall.bat, then there’s no urgency to get the winfix branch - I think the issue it resolves only comes up in certain Python setups.
The way POSIX data is accessed in HSDS could use some explanation…
For S3 everything is just referred to by the bucket name. So if there’s a folder myfolder on s3://mybucket/myfolder/, you would open in h5pyd by h5pyd.Folder(“/myfolder/”, bucket=“mybucket”). If the bucket name is the default bucket (i.e. same as the BUCKET_NAME config), you can just do: h5pyd.Folder(“/myfolder/”).
For Posix, the equivalent of buckets are the top level directories under ROOT_DIR. So the equivalent scenerio would be a directory structure like ROOT_DIR/mybucket/myfolder/.
If you can copy the NREL files to your disk under ROOT_DIR you can access them as above (and it will be quite fast as everything is local).
As I mentioned in my last post, I expect many people would like to run an HSDS server using POSIX while also accessing S3 data. That will need a bit of coding work though!
Thanks. I will start experimenting with commands based on your explanation to see if I can get it working.
One thing I noticed when testing my local server with testall.py is that I had to use the original config.yml file (copied into override.yml). The instructions at Highly Scalable Data Service (HSDS) — rex 0.2.83 documentation (nrel.github.io) suggest changing the parameters of override.yml as follows.
aws_region: us-west-2 (original was us-east-1)
aws_s3_gateway: http://s3.us-west-2.amazonaws.com/ (original was null)
aws_s3_no_sign_request: True (original was false)
hsds_endpoint: local (original was http://hsds.hdf.test)
root_dir: /<your_hsds_repo_directory>/hsds_data/ (original was null)
bucket_name: nrel-pds-hsds (original was hsdstest)
Any comments on the other parameters in override.yml?
I can see on my computer that testall.py put the data in the bucket_name (ie folder) hsdstest.
Also, I had set root_dir and set hsds_endpoint prior to starting the server with runall.bat.
In the case of h5pyd.Folder(“/myfolder/”, bucket=“mybucket”), both myfolder and mybucket are on my own computer (in a drive you are calling S3? Sorry, that part is confusing.). So, just to clarify, are you saying that I should manually download the files I want to use before I start using them with the h5pyd.Folder command?
Thanks, Colin
I think at least the root_dir is wrong. Like I was saying that will cause POSIX to be used for everything, but you’ll want S3 for reading from the nrel-pds-hsds bucket at least.
Let me try some things out and get back on this.
Thanks so much for your follow-up! In the meantime, I got my local HSDS server up and running and I am able to run the little snippet of code listed above and other code, such as in NREL/hsds-examples: Examples of using the HSDS Service to Access NREL WIND Toolkit data (github.com).
In terms of the rood directory, I placed my roor directory in the override.yml file as listed above, but I also did a set root_dir=c:\users.…etc… at the command prompt prior to running runall.bat.
But that said, I seem to be able to access the S3 data from the Amazon server.
Right, the way the logic works, if AWS_S3_GATEWAY is set, HSDS will always use S3 to read and write data. So setting ROOT_DIR doesn’t actually have any effect - i.e. you can’t say read data from the NREL bucket and write to a directory under your root dir.
That’s probably okay for you to get going - would be nice to have someway to combine S3 and local access!
One other issue I notice is that while you have aws_no_sign_request, I get errors if I’ve set the AWS_S3_GATEWAY, but not AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID. Do you have these set?
I left those unchanged. So they are:
aws_secret_access_key: xxx
aws_access_key_id: xxx
and my .hscfg file is:
HDFCloud configuration file
hs_endpoint = http://localhost:5101
hs_username = test_user1
hs_password = test
hs_api_key =
Ok, I see what was going on. I had copied and pasted this from your post above:
aws_region: us-west-2 (original was us-east-1)
aws_s3_gateway: http://s3.us-west-2.amazonaws.com/ (original was null)
aws_s3_no_sign_request: True (original was false)
hsds_endpoint: local (original was [http://hsds.hdf.test](http://hsds.hdf.test/))
root_dir: /<your_hsds_repo_directory>/hsds_data/ (original was null)
bucket_name: nrel-pds-hsds (original was hsdstest)
But the comments got added to the config values!
If you’d like to add comments in the override.yml you can preface with a hash like so:
aws_region: us-west-2 # (original was us-east-1)
aws_s3_gateway: http://s3.us-west-2.amazonaws.com/ # (original was null)
aws_s3_no_sign_request: True # (original was false)
bucket_name: nrel-pds-hsds # NREL's HSDS S3 bucket
I just used these three and am able to start HSDS and get output like:
$ hsls /nrel/
nrel_admin folder 2017-08-20 23:46:58 /nrel/
nrel_admin folder 2020-08-12 22:31:43 /nrel/US_wave
nrel_admin folder 2021-02-01 23:42:38 /nrel/building_synthetic_dataset
nrel_admin folder 2021-09-23 23:11:05 /nrel/dsgrid-2018-efs
nrel_admin folder 2019-10-01 16:08:23 /nrel/nsrdb
nrel_admin folder 2020-09-03 16:51:36 /nrel/porotomo
nrel_admin folder 2023-04-21 00:56:35 /nrel/sup3rcc
nrel_admin domain 2023-05-05 17:37:59 /nrel/sup3rcc_conus_mriesm20_ssp585_r1i1p1f1_pressure_2015.h5
nrel_admin folder 2020-08-31 19:36:19 /nrel/umcm
nrel_admin domain 2017-08-21 00:06:29 /nrel/wtk-us.h5
nrel_admin folder 2019-10-01 16:07:38 /nrel/wtk
So, the anonymous S3 access looks to be working.
I’ll put out some updates soon that make it easier to run hsds on the command line (without docker). Stay tuned!
Now that you have HSDS running on your machine, did your sup3r problems get resovled?
Yes, I got my sup3r problem solved. Once my hsds server was running, I used the following:
hsds_kwargs = {‘endpoint’: ‘http://localhost:5101’,
‘hs_username’: ‘test_user1’,
‘hs_password’: ‘test’
}
Also, in getting the hsds server running, I had to use:
python setup.py install
because pip install hsds did not seem to work for me.
One thing I found in testing the hsds-examples repo, 01_WTK_introduction.ipynb, was that in cell 23 (23rd cell # Fetch full timeseries data for all seven years), it times out…gives a timeout error.
What would cause that? Is there an hsds server setting to solve it?
Thanks!!
I’m in the process of modifying the setup script. Once that’s done, I’ll push to PyPI and verify everything is working.
For running the hsds-examples, i was able to run it (the cell execution took 37 sec), but I was running HSDS on a EC2 instance in the us-west-2 region. Running hsds on a laptop would be problematic in this case. Here’s the issue: to get the seven year time series HSDS needs to read a huge amount of data from S3. If this has to be transmitted across a normal broadband connection (e.g. between the AWS data center and your notebook), there are likely to be issues getting all the data trhough. On the other hand, HSDS running on a EC2 instance (particularly if it’s in the same region as the NREL s3 bucket: us-west-2) will have a very fast connection and there’s no problem.
In this setup, I can run the notebook (I had opened the notebook file in VSCode) on my laptop and access the HSDS instance on EC2 and everything runs fine. The amount of data returned for the actual timeseries is fairly small. It’s the traffic between HSDS and S3 that’s the bottleneck.
If you’d rather not go through the bother of running an EC2 instance, modifying the notebook cell to use a smaller timespan. If a one year interval works, you can download year by year and then concatenate the results.
Please report back on what works or doesn’t work for you!