We have an HSDS cluster running on AWS EKS (us-west-2).
We are attempting to link to an OpenData H5 dataset (s3://nrel-pds-wtk/canada/v1.1.0bc/wtk_canada_2014.h5) also in us-west-2.
Our cluster is operating normally; we’ve run the HSDS test suites; have been able to load and link to other datasets, etc.
The issue we are seeing is that
hsload --link performance is slow when reading chunks from the linked files. In utillib.py a call to get_chunk_info sometimes takes several minutes. Other times several hundred calls process per second.
The h5 file is 1.7TB. We can download the file to a local server, so s3 access isn’t an issue.
We’ve set up an EC2 instance in us-west-2 in the same VPC as the cluster with a vpc gateway to S3, but we’re still seeing the same performance issue.
Oddly, the time to download the entire h5 file to a local server is faster than
hsload --link within us-west-2.
Is this expected behaviour, or are we doing something wrong?