How do I solve this HSDS S3 error?

I am installing HSDS on a Cent OS 7 system. I have been installing it outside of Docker because my end client is not able to use Docker. I am also on commit hash 6ad25d3 on master because my client must use Python 3.6 and there have been some changes in the latest master that use some features from Python 3.7.

I have done the following:

  • Created a new Python virtual environment for HSDS (Python 3.6.8)
  • Cloned down the HSDS repo and set the head to 6ad25d3.
  • Copied admin/config/passwd.default to admin/config/passwd.txt and added my user name and password I’m gonna use with HSDS.
  • Installed requirements in virtual environment using pip install . from the top level of the repo folder. Also installed h5pyd.
  • Create S3 bucket called “hsds-example” in my S3 account, create IAM user, and create policy for IAM user that allows listing, reading, and writing contents of hsds-example bucket.
  • Set the following environment variables (sensitive values redacted):
AWS_ACCESS_KEY_ID=<from S3 IAM user, redacted>
AWS_SECRET_ACCESS_KEY=<from S3 IAM user, redacted>
BUCKET_NAME="hsds-example"
AWS_REGION="us-east-1"
AWS_S3_GATEWAY="http://s3.amazonaws.com"
HS_ENDPOINT="http://127.0.0.1:5101"    # Use the machines DNS name or create virtual name in /etc/hosts
HS_USERNAME=<redacted>
HS_PASSWORD=<redacted>
CONFIG_DIR="/var/www/html/hsds/admin/config"
PASSWORD_FILE="/var/www/html/hsds/admin/config/passwd.txt"

The HS_ENDPOINT and HS_USERNAME variables match the username and password I wrote in the passwd.txt file, and I got the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY values directly from my S3 IAM user settings.

I am using port 5101 because this setup has the load balancer running at port 5100, one service node at 5101, and one data node at 6101. I assumed that I need to use port 5101 because the service node is responsible for receiving requests and controlling the data node.

To start the server, I activate the virtual environment and run this command: hsds --bucket-name hsds-example.

I decided to test whether the server works by using the command-line programs h5pyd provides, so I activated the virtual environment in another window and then ran the command hsinfo and got this output:

2020-06-26 11:07:47,237 connection error: HTTPConnectionPool(host='127.0.0.1', port=5101): Max retries exceeded with url: /about (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc9d1fdc080>: Failed to establish a new connection: [Errno 111] Connection refused',))
Error: Connection Error
(venv-hsds) [joeykleingers@caboose hsds]$ hsinfo
server name: Highly Scalable Data Service (HSDS)
server state: READY
endpoint: http://127.0.0.1:5101
username: joeykleingers
password: ********
Error: HTTPConnectionPool(host='127.0.0.1', port=5101): Max retries exceeded with url: /?domain=%2Fhome (Caused by ResponseError('too many 500 error responses',))

The output from the server:

REQ> GET: / [hsds-example/home]
INFO> got domain: hsds-example/home
INFO> getDomainJson(hsds-example/home, reload=True)
INFO> http_get('http://localhost:6101/domains')
REQ> GET: /domains [hsds-example/home]
INFO> get_metadata_obj: hsds-example/home bucket: None
INFO> getStorJSONObj(hsds-example)/home/.domain.json
ERROR> Unexpected Exception <class 'AttributeError'> get s3 obj home/.domain.json: 'ClientCreatorContext' object has no attribute 'get_object'
WARN> HTTPInternalServerError error for home/.domain.json bucket:hsds-example
INFO> http_get status: 500
WARN> request to http://localhost:6101/domains failed with code: 500
ERROR> Error for http_get_json(http://localhost:6101/domains): 500

What do I need to do to solve this error?

I also am interested in purchasing paid support so that someone from the HDF Group can assist me in setting this up, so if someone could contact me about that, that would be great.

Hi Joey,

I’ve just checked in some changes that may help with running HSDS outside of docker: https://github.com/HDFGroup/hsds/commit/701765ddf78068f6dce8537bf8f287405ba8f1b3.

One problem I’ve seen is that the latest versions aiohttp and aiobotocore are not compatible with the current code. In setup.py I have restrictions for the version, but hoping to get these removed soon (re: https://github.com/HDFGroup/hsds/issues/60).

I’d recommend moving to Python 3,8 in. your virtual env since that is what the docker version is on.

This is the script I’ve been using to run the server:

hsds --bucket-name ${BUCKET_NAME} --password-file ${PWD}/admin/config/passwd.txt --s3-gateway ${AWS_S3_GATEWAY} --access-key-id ${AWS_ACCESS_KEY_ID} --secret-access-key ${AWS_SECRET_ACCESS_KEY} hsds-example

Headnode, data node, and service node all support an about method, so you can do a curl to verify they are running on the expected port. E.g.

$ curl http://localhost:5101/about {"start_time": 1593649043, "state": "READY", "hsds_version": "0.6_beta", "name": "Highly Scalable Data Service (HSDS)", "greeting": "Welcome to HSDS!", "about": "HSDS is a webservice for HDF data", "node_count": 1}

Why are you using a load balancer? Do you intend to run multiple SN nodes?