Hi,
I successfully deployed HSDS on my local machine and connected it with an S3 bucket. I tested reading/writing using h5pyd and both worked. However, one thing confused me is that on my S3 dashboard, I can only see folders (e.g. /home/admin/test.h5) and json files, without data files. Since HSDS stores each chunk in one object so I think I should see as many data files as the number of chunks. This is the first time I use S3 so I’m not sure if there is anything wrong here.
Best,
Ruochen
Hey,
Do you see a folder in your bucket named “db”? All the data files will be there.
In the HSDS schema, folders and domain names are stored by their path (e.g. /shared/mydata.h5 will be stored in s3://bucketname/shared/mydata.h5/domain.json). Inside domain.json there will be a key for the root group id which points to an object under the db/ path.
The rational for this approach is that it makes it easy to move or rename domain names without having to move each data object.
If you are curious, this doc: https://github.com/HDFGroup/hsds/blob/master/docs/design/obj_store_schema/obj_store_schema_v2.md, describes the storage model in some detail.
BTW, if you run: hsinfo <domain_name>, you’ll get information about the number of json objects, data objects, storage size, etc. for that domain. E.g.:
$ hsinfo /shared/bioconductor/tenx_full.h5
domain: /shared/bioconductor/tenx_full.h5
owner: admin
id: g-eed60fdd-3e56eab3-665e-8755b6-de623b
last modified: 2021-06-19 22:22:58
last scan: 2021-06-19 22:19:22
md5 sum: 4fafc30a05df174ca5cc8f05e4c6e659
total_size: 6112211528
allocated_bytes: 6112210230
metadata_bytes: 875
num objects: 2
num chunks: 105273
Yeah I see the db folder and the data. Thanks a lot for the help! BTW, curious does HSDS support multiple datanodes with different configurations, e.g. one using S3 and others using POSIX.
Best,
Ruochen
No, each DN node in a deployment needs to have the same configuration.
There’s nothing to stop you from having two different HSDS deployments on the same machine though. You’d have one endpoint that would server S3 data and another for Posix.
Similarly, you can have two HSDS deployments on a Kuberentes cluster. As long as the deployments are to different namespaces, the HSDS pods will just talk to pods in their own deployment.
Hi, I tried to deploy two different HSDS deployments by docker-compose two files separately. I changed the project name (both COMPOSE_PROJECT_NAME and container_name in yml file) to guarantee two deployments have different names. However, the service failed to start (503) after I changed those names, although all containers were created successfully. Is there anything wrong here?
Looks like the compose yml is using some hard-coded container names. I’ve updated the compose files in github and made a code fix. Please try it out and let me know if this resolves the issue.
If you are using the runall.sh script, set the COMPOSE_PROJECT_NAME env var to the desired value (otherwise will default to “hsds”). You’ll also need set SN_PORT so the public ports don’t clash.
Grab the latest code from master, or pull the image from dockehub: hdfgroup/hsds:v0.7.0beta4
It works perfectly now. Thank you!
Awesome! Glad to hear it.
Hi John, I just met the other problem: after starting two services, I tried hsinfo
and hstouch
to create folders. But it only works for one service. For the other one, it keeps returning Error: [Errno 400] Invalid domain name
. I used hsconfigure
to change endpoint before doing this and the connection was OK. For these two services, I changed all the ports (head, sn, dn, rangegate) to be different in yml files. Is there anything I did wrong here?
It also doesn’t work for AWS, even if it is the first service I started.
Hey,
You shouldn’t need to mess with the head port, etc. It’s only if the port is exposed on the host that’s there’s a potential for conflict. If you look at the port lines in docker-compose, e.g. for the posix version: https://github.com/HDFGroup/hsds/blob/master/admin/docker/docker-compose.posix.yml, you should see only the SN_PORT has both external and internal mappings.
(actually I goofed in my last update and forgot to remove the external port for the rangeget proxy. I’ve fixed this now.)
In general, if the COMPOSE_PROJECT_NAME is different, two containers can have the same internal port, but they’ll be on different internal networks, so shouldn’t conflict.
I setup two projects one using AWS the other using posix. Here’s what my docker ps looks like:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
66e0e9140ef4 hdfgroup/hsds "/bin/bash -c 'sourc…" 12 minutes ago Up 12 minutes 5100-5999/tcp, 0.0.0.0:32777->6101/tcp posix_dn_1
5f2bc6a08d5b hdfgroup/hsds "/bin/bash -c 'sourc…" 13 minutes ago Up 12 minutes 5100-5999/tcp, 0.0.0.0:32776->6900/tcp posix_rangeget_1
74a28db22b8e hdfgroup/hsds "/bin/bash -c 'sourc…" 13 minutes ago Up 12 minutes 5100-5999/tcp, 0.0.0.0:8080->8080/tcp posix_sn_1
89782f803c1f hdfgroup/hsds "/bin/bash -c 'sourc…" 13 minutes ago Up 12 minutes 5101-5999/tcp, 0.0.0.0:32775->5100/tcp posix_head_1
9a57450213f5 hdfgroup/hsds "/bin/bash -c 'sourc…" 36 minutes ago Up 36 minutes 5100-5999/tcp, 0.0.0.0:32769->6101/tcp aws_dn_1
6871638c30ba hdfgroup/hsds "/bin/bash -c 'sourc…" 36 minutes ago Up 36 minutes 5100/tcp, 5102-5999/tcp, 0.0.0.0:5101->5101/tcp aws_sn_1
2d07d7c617b2 hdfgroup/hsds "/bin/bash -c 'sourc…" 36 minutes ago Up 36 minutes 5100-5999/tcp, 0.0.0.0:6900->6900/tcp aws_rangeget_1
6525b17c0f80 hdfgroup/hsds "/bin/bash -c 'sourc…" 36 minutes ago Up 36 minutes 5101-5999/tcp, 0.0.0.0:32768->5100/tcp aws_head_1
By setting HS_ENDPOINT to http://localhost:5101 or http://localhost:8080 I can read/write to AWS S3 or local posix respectively.
Hope that helps!
I still can’t run hsinfo
or hstouch
successfully even on a single AWS project. It still returns Error: [Errno 400] Invalid domain name.
My docker ps looks like the following:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
40cd68ff1173 hdfgroup/hsds "/bin/bash -c 'sourc…" 46 seconds ago Up 45 seconds 5100-5999/tcp, 0.0.0.0:49170->6101/tcp, :::49170->6101/tcp hsds_dn_1
8ab41e338d9e hdfgroup/hsds "/bin/bash -c 'sourc…" 48 seconds ago Up 47 seconds 5100/tcp, 5102-5999/tcp, 0.0.0.0:5101->5101/tcp, :::5101->5101/tcp hsds_sn_1
f87c4f497ff1 hdfgroup/hsds "/bin/bash -c 'sourc…" 48 seconds ago Up 46 seconds 5100-5999/tcp, 0.0.0.0:49169->6900/tcp, :::49169->6900/tcp hsds_rangeget_1
123ada60e5b1 hdfgroup/hsds "/bin/bash -c 'sourc…" 48 seconds ago Up 47 seconds 5101-5999/tcp, 0.0.0.0:49168->5100/tcp, :::49168->5100/tcp hsds_head_1
Is HS_ENDPOINT here the hsds_endpoint in config file? I set it as http://localhost without any port while I set endpoint as http://localhost:5101 in hsconfigure.
I’m still not sure if this error is caused by AWS connection or the local connection, because this error doesn’t occur when I setup single POSIX project.
If you set the env variable HS_ENDPOINT it will override what’s in the config file. Your hsds_sn_1 container is exposed on port 5101, so that is what you should use.
Try: curl http://localhost:5101/about
as a sanity check.