HSDS Installation Issues


#1

we are trying to spin up an HSDS server that runs that just uses the local file system. Our machine is a fully up to date CentOS 7x64 machine. We installed the latest Docker and docker compose as per the instructions. We adusted the hsds_endpoint and head_endpoint in the config.yml file to be http://192.168.242.121 and copied that file to /etc/hsds/.

When we try to execute the runall.sh file we are getting errors:

(hsds) 252:[mjackson@caboose:hsds]$ sudo ./runall.sh
no persistent storage configured, using OPENIO ephemeral storage, admin/docker/docker-compose.openio.yml
dn cores: 1
AWS_S3_GATEWAY: http://openio:6007
AWS_ACCESS_KEY_ID: demo:demo
AWS_SECRET_ACCESS_KEY: ******
BUCKET_NAME: hsds.test
CORES: 1/1
HSDS_ENDPOINT: http://localhost
PUBLIC_DNS: localhost
no load balancer
setting sn_port to 80
./runall.sh: line 89: docker-compose: command not found
./runall.sh: line 97: pip: command not found
./runall.sh: line 97: pip: command not found
make bucket hsds.test (may need some retries)
./runall.sh: line 104: aws: command not found
./runall.sh: line 104: aws: command not found
./runall.sh: line 104: aws: command not found
./runall.sh: line 104: aws: command not found
./runall.sh: line 104: aws: command not found
./runall.sh: line 104: aws: command not found
./runall.sh: line 104: aws: command not found
./runall.sh: line 104: aws: command not found
./runall.sh: line 104: aws: command not found
./runall.sh: line 104: aws: command not found
./runall.sh: line 107: aws: command not found
failed to create bucket hsds.test
(hsds) 252:[mjackson@caboose:hsds]$

We were trying to configure to just use the local file system for storage but I guess it is using the OpenIO connector? We just did a git clone from today with a git hash of 6b6680aafccbb916419de952e09db41e7632a53d.

we tried some of the tagged releases but none of those allow local storage. Just AWS.

Is there someone that could help track this down at the HDFGroup.


Mike Jackson


#2

Hi Mike,

The runall.sh is basically a switch to select between several different docker-compose files that live in hsds/admin/docker/ :

  • docker-compose.aws.yml: AWS/S3 storage
  • docker-compose.azure.yml: Azure/Blob storage
  • docker-compose.posix.yml: Local posix storage
  • docker-compose.openio.yml: OpenIO S3 compatible, ephemeral storage

The runall script chooses between these based on the presence or absence of various environment variables. E.g. if AWS_S3_GATEWAY is defined, it assumes you want to use S3 storage. The last resort option is a test version of OpenIO running in a container. This is handy for kicking the tires on the service, but any data stored will go away when the server shuts down!

If you run, docker-compose, or pip from the command line, are these commands found? It’s best if you are running Python 3. I’ve found Miniconda the easiest way to setup my Python environment.

The instructions for running with posix are here: https://github.com/HDFGroup/hsds/blob/master/docs/docker_install_posix.md.

Basically you need to create a directory for the data to live in and set the environment variable ROOT_DIR to point to it.

Let us know how this works for you.


#3

We have tried a whole bunch of different scenarios. We ultimately need to run it outside of docker because our end client is not allowed to run Docker on their systems. I’m down to trying to step through all the various layers and attempting to piece the flow of information together. right now one of the nodes wants to fire up on 0.0.0.0:5100 which clearly isn’t going to work but I have no idea where it is getting that address from. We have fully DNS name resolution on our internal network.

I have also found that i need to manually export all the enviroment variables in my shell. I tried putting them in my .bashrc but something (VS COde maybe) is doing something to the environment.

We are using Anaconda3 to load up a python virtual environment. We have created /opt/local/hsds and /opt/local/hsds-data for the source and data.

we have gotten as far as trying to run the tests:

(base) 391:[mjackson@caboose:hsds]$ python testall.py
arrayUtilTest

Ran 6 tests in 0.008s

OK
chunkUtilTest
Traceback (most recent call last):
File “chunkUtilTest.py”, line 18, in
from hsds.util.dsetUtil import getHyperslabSelection
File “…/…/hsds/util/dsetUtil.py”, line 13, in
from aiohttp.web_exceptions import HTTPBadRequest, HTTPInternalServerError
ModuleNotFoundError: No module named ‘aiohttp’
Failed

Thoughts?

We tried to run the hsconfigure and get an “authentication” error.


#4

This got us much further down the road.

Thanks
Mike Jackson


#5

It looks like you are not the only one looking to run HSDS outside of docker, this thread is also looking for this: How do I solve this HSDS S3 error?.

I haven’t been running HSDS outside of docker/kubernetes myself - I’ll take a look and get back to you.

Re: the testall error, there are a bunch of package dependencies that don’t matter that much when running from docker images, but you’ll need if running without docker. See: https://github.com/HDFGroup/hdf-docker/blob/master/python38/Dockerfile (this is the base image for the hsds image). These should be getting pulled in by https://github.com/HDFGroup/hsds/blob/master/setup.py, so I’m surprised you are getting an aiohttp not found error.


#6

Mike,

I’ve updated this thread about running HSDS without docker: How do I solve this HSDS S3 error?.


#7

Joey is one of my engineers so we are working to the same goal.


Mike Jackson


#8

John,

To simplify things a bit, we decided to just set up HSDS in the Docker container and use POSIX storage to take S3 out of the equation so that we can verify we can just get that simple setup working before proceeding to install it without Docker and with S3.

So far we are able to run the Docker image using the ./runall.sh command and then start following the directions in the post_install.md file.

The first issue we ran into was an authentication error because for whatever reason the Docker image wasn’t picking up our passwd.txt file even though it’s located at admin/config in the HSDS repository… we followed the directions and copied the template file to create passwd.txt. I assume that Docker mounts admin/config as /config in the Docker image, and that’s how Docker should be accessing the file. However, that appears to not be happening since we were still getting an authentication error. We got around this by just copying the passwd.txt file into the Docker image itself, but that seems like an insecure/hacky solution.

Next, we ran h5configure, created the home directory (hstouch -u admin -p <admin_passwd> /home/), and setup home folders for our user names (hstouch -u admin -p <admin_passd> -o <username> /home/<username>/)

I was then able to run hsload -v ~/path/to/HDF5/file /home/<username>/ to upload an HDF5 file to my username’s home folder. I verified that it worked by navigating in a terminal to the location of the home folder and the file was sitting there. However, whenever I then try to run hsget -v /home/<username>/ /path/to/created/HDF5/file it gives me this error:

Error opening domain /home/<username>/: [Errno 400] Invalid domain name

As I said, I checked that my domain exists because it’s right there on the file system with the file that I’m trying to download sitting in there. Maybe I’m using the hsget command incorrectly? Is there an example somewhere on how to use that command?

One other question: how do I delete a non-empty user home domain so that I can create it again and re-upload the same file again? I tried using hsdel -v -u admin -p admin --bucket hsds.test /home/<username>/ but it keeps giving me this:

Parent domain: /home/ not found


#9

Getting things going first with Docker makes sense.

When you do runalls.sh, do you see this in the first line of output:

ROOT_DIR set, using admin/docker/docker-compose.posix.yml

?

docker-compose.posix.yml mounts $PWD/admin/config to /config in the container, so the password file should be there.

If you do: docker exec -it hsds_sn_1 bash, you can just do: ls /config to verify this.

Not sure what is going on with the domain not found errors, but I’d recommend sorting out the startup issues and then verifying the integration tests pass first.