Error using HSDS to download NSRDB data

Hello, I recently posted this issue on github, but posting here as well in case this is monitored more frequently.

I have set up a local HSDS server using a version of these steps. I am trying to run SAM using nsrdb data with the get_SAM_gid function (i.e., a small sample of my code looks like

with NSRDBX(f'/nrel/nsrdb/current/nsrdb_{year}.h5', hsds=True) as f:
        data_segment = f.get_SAM_gid(gid_set)

where gid_set is a set of gids in the NSRDB data. However, recently when I have been trying to run this code, I am getting a connection error with the HSDS server. This is a snippet of the output from the terminal where I start the HSDS server:

(hsds) annacheyette@b0-be-83-57-dd-ce hsds % sh runall.sh --no-docker-tcp
use tcp with no-docker option
using password file: admin/config/passwd.default
AWS_S3_GATEWAY set, using nrel-pds-hsds S3 Bucket (verify that this bucket exists)
no docker startup
Using S3 Gateway
set logging to:: 20
logfile: /Users/annacheyette/hsds/hs.log
got command line arg for config_dir: admin/config
INFO:root:using cmd_dir: /opt/homebrew/bin
got command line arg for config_dir: admin/config
got command line arg for config_dir: admin/config
got command line arg for config_dir: admin/config
got command line arg for config_dir: admin/config
got command line arg for config_dir: admin/config
INFO:root:all processes ready!
INFO:root:Ready after: 5.03 s

READY! use endpoint: http://localhost:5101

Exception in callback RequestHandler.connection_made(<_SelectorSoc...e, bufsize=0>>)
handle: <Handle RequestHandler.connection_made(<_SelectorSoc...e, bufsize=0>>)>
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/opt/homebrew/lib/python3.11/site-packages/aiohttp/web_protocol.py", line 275, in connection_made
    tcp_keepalive(real_transport)
  File "/opt/homebrew/lib/python3.11/site-packages/aiohttp/tcp_helpers.py", line 16, in tcp_keepalive
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
  File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/trsock.py", line 74, in setsockopt
    self._sock.setsockopt(*args, **kwargs)
OSError: [Errno 22] Invalid argument
Exception in callback RequestHandler.connection_made(<_SelectorSoc...e, bufsize=0>>)
handle: <Handle RequestHandler.connection_made(<_SelectorSoc...e, bufsize=0>>)>

and here is a snippet from the hs.log file:

sn INFO> read_chunk_hyperslab, chunk_id: c-02780ea8-a9348af3-ddd3-ed384f-f74652_14, bucket: nrel-pds-hsds
dn2 WARN> expected to find c-02780ea8-a9348af3-ddd3-ed384f-f74652_5 in pending_s3_read map
dn3 WARN> expected to find c-02780ea8-a9348af3-ddd3-ed384f-f74652_9 in pending_s3_read map
dn4 INFO> s3Client.get_object(current/nsrdb_2021.h5[86539472:90639152] bucket=nrel-pds-nsrdb) start=1702587810.5875 finish=1702587874.0361 elapsed=63.4486 bytes=4099680
sn INFO> http_get('http://localhost:6102/chunks/c-02780ea8-a9348af3-ddd3-ed384f-f74652_14')
dn2 ERROR> Exception during binary data write: Cannot write to closing transport
dn3 ERROR> Exception during binary data write: Cannot write to closing transport
dn4 INFO> read: 4099680 bytes for key: current/nsrdb_2021.h5
sn INFO> http_get status: 200 for req: http://localhost:6102/chunks/c-02780ea8-a9348af3-ddd3-ed384f-f74652_14
dn2 INFO> s3Client.get_object(current/nsrdb_2021.h5[66041072:70140752] bucket=nrel-pds-nsrdb) start=1702587811.0324 finish=1702587873.8726 elapsed=62.8402 bytes=4099680
dn3 INFO> s3sync nothing to update
dn4 WARN> expected to find c-02780ea8-a9348af3-ddd3-ed384f-f74652_21 in pending_s3_read map
sn INFO> chunk_arr shape: (31536,)
dn2 INFO> read: 4099680 bytes for key: current/nsrdb_2021.h5
dn3 INFO> s3syncCheck no objects to write, sleeping for 1.00
dn4 ERROR> Exception during binary data write: Cannot write to closing transport
sn INFO> data_sel: (slice(441504, 473040, 1),)
dn2 WARN> expected to find c-02780ea8-a9348af3-ddd3-ed384f-f74652_16 in pending_s3_read map
dn4 INFO> s3Client.get_object(current/nsrdb_2021.h5[61941392:66041072] bucket=nrel-pds-nsrdb) start=1702587780.8125 finish=1702587874.2291 elapsed=93.4166 bytes=4099680
sn INFO> np_arr shape: (672755,)
dn2 ERROR> Exception during binary data write: Cannot write to closing transport
dn4 INFO> read: 4099680 bytes for key: current/nsrdb_2021.h5
sn INFO> ChunkCrawler - worker status for chunk c-02780ea8-a9348af3-ddd3-ed384f-f74652_14: 200
dn2 REQ> GET: /chunks/c-02780ea8-a9348af3-ddd3-ed384f-f74652_14 [localhost:6102]
dn4 WARN> expected to find c-02780ea8-a9348af3-ddd3-ed384f-f74652_15 in pending_s3_read map
sn INFO> ChunkCrawler - join complete - count: 22
dn2 INFO> get_metadata_obj: d-02780ea8-a9348af3-ddd3-ed384f-f74652 bucket: nrel-pds-hsds
dn4 ERROR> Exception during binary data write: Cannot write to closing transport
sn INFO> doReadSelection complete - status:  200
dn2 REQ> GET: /chunks/c-02780ea8-a9348af3-ddd3-ed384f-f74652_14 [localhost:6102]
dn4 INFO> s3Client.get_object(current/nsrdb_2021.h5[61941392:66041072] bucket=nrel-pds-nsrdb) 

It appears there is some sort of connection error with the server where the nsrdb server is stored. Do you have any ideas about what could be causing this issue? Thank you!

It appears you’re using Python 3.11. HSDS is only CI testing for Python 3.8-3.10 - it may be a version incompatibility with that, or one of HSDS’s libraries (possibly aiohttp). @jreadey Any input?

1 Like

It might be advisable to back off to Python 3.10, since we haven’t done any testing with 3.11…

I see some writes in the log - you are not trying to write to the NREL s3 bucket are you? (that should fail with a permission error)

Looks like you have setup the AWS_S3_GATEWAY environment variable at least.
Do you have AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID as well? It seems HSDS isn’t doing the right thing for anonymous S3 access.

How does the following short script work?


import h5pyd
domain_path = "/nrel/nsrdb/conus/nsrdb_conus_2021.h5"
bucket = "nrel-pds-hsds"
h5path = "wind_speed"
f = h5pyd.File(domain_path, bucket=bucket)
dset = f[h5path]
print(dset)
arr = dset[12345, :1000]
print(arr.min(), arr.max(), arr.mean())
1 Like

Thank you, I went down to python 3.10 and it seems to be working fine now. I am now wondering, though, if the local HSDS server I have on my laptop will shut down if I close my laptop? This seems like an obvious possibility now (and I am probably showing how much I don’t understand about computing here), but I had been under the impression that if I set my script to run using nohup and & through the terminal, that I was fine to let me computer sleep. But I had a realization that it seems like I start getting run errors with HSDS shortly after I close my laptop.

Is it the case that HSDS will only run if my laptop is not in sleep mode? And is there a way to disable this, or should I just change the settings on my computer to not sleep while I am running code that uses HSDS?

Thanks!

Glad to hear it is working with python 3.10. I’ll investigate the issues with python 3.11 when I get a chance.

HSDS won’t be running while your laptop is in sleep mode. There shouldn’t be a problem with it starting up again when the laptop comes out of sleep mode.

It depends on your particular brand of laptop, but there should be a setting that lets you close the lid without the computer going to sleep.