How to add data in Azure Storage container to HSDS server

I have deployed a HSDS server on Azure using docker following the docs. I have an Azure storage container that contains .h5 files (from the NREL WTK dataset)

My question is how to load/link the data in the container to the HSDS server

The server successfully starts with the runall.sh and I get a response from /about

/domains returns
{“domains”: [], “hrefs”: []}

how can I add domain/s and link all the h5 already in a folder in the Azure Storage Container

thank you

Brendon

Hey, thanks for trying out HSDS on Azure!

Did you go through the post-install configuration? (https://github.com/HDFGroup/hsds/blob/master/docs/post_install.md)

Once you do that, you should see home and user folders show up in the storage container.

For HSDS to “see” a HDF5 file, it needs to be loaded using the hsload util (part of the h5pyd package). On AWS it’s possible to have an HDF5 file on S3 and hsload the file from there. With the “hsload --link” option, only the metadata needs to be copied. Otherwise, both the chunk data and metadata needs to be copied from the source HDF5 file. The later option requires more storage space (assuming you want to keep the original file), but is slightly faster to access.

On Azure though, we don’t yet have support for the --link option, so your only choice is to use hsload without the --link. Since hsload can’t access files on Azure storage, you’ll need to copy the file to a local disk first. Another option would be to just use the NREL s3 URI with hsload. That will be somewhat slower, but you’ll save the step of getting a local copy. The NREL files are generally really large, so either way it will take some time. You’ll want to run from a Azure instance using nohup, so the load won’t get interrupted if you log out or lose your connection.

Let me know if this helps!

1 Like

Thanks very much for the info. I’ll work through the option of using hsload on Azure as we want the best performance and report back
Appreciate your assistance!

Hi, I managed to get the test data loaded and can query that using h5pyd. I still cannot figure out how to make use of the NREL wind data though. For business reasons I have to use Azure. I have tried using hsload with the s3 bucket with

hsload -v -u admin -p XXXXXX s3://nrel-pds-hsds/nrel/wtk-us.h5/ /home/test/

but I get the following

Error opening file s3://nrel-pds-hsds/nrel/wtk-us.h5/: Forbidden

I can however see it with the aws cli
aws s3 ls --no-sign-request nrel-pds-hsds/nrel/wtk-us.h5

is there a way to use --no-sign-request or similar with hsload?

thank you

Yes, that won’t work because nrel-pds-hsds is not a publicly accessible bucket.

Let me ping my contact at NREL and see if there’s anything that can be done.

Are there other files you are interested in or is it just wtk-us.h5?

Thank you John, just the wtk-us.h5

Hi Brendon, Sorry, but NREL doesn’t have any data available on Azure (at least not yet).

I suggest you contact NREL to find out more about their timeline. Alternatively, NREL could grant you temporary access to the nrel-pds-hsds bucket in order for you to copy the objects over.

Once you get access to the data, I’ll be happy to help you get HSDS setup on Azure.