Thanks for your question!
In the Dockerfile, there is the line: EXPOSE 5100-5999. This is to document the range that the SN containers will use for incoming requests. The actual publishing of the ports happen in the docker-compose file. There the number of ports opened will be equal to the number of containers that you are running.
Most of these ports (esp. if you are using nginx for load balancing) are for container to container communication. Ports open to the outside world would typically be port 80 (or 5101-510n if you want to send requests directly to the SN nodes. You can control which of these ports are actually accessible from outside the system using iptables or similar tool. (on AWS you can control ports access to EC2 instances using security groups).
On the other hand, if your clients will be running on the same machine as the server, you don’t need to have any external ports open. It’s hard to anticipate every possible scenario, so the docker-compose file (or k8s_deployment.yml for Kubernetes) will likely need some tweaking for your particular situation.
It is possible to run HSDS without Docker. In fact we took that approach with the OpenIO integration - see https://www.hdfgroup.org/wp-content/uploads/2019/09/OpenIO-HDF-ESRFV-final20190917.pdf. It is a bit more work however - you need to have the appropriate runtime environment (Python version and packages) and then manage a bunch of sub-processes.
We are planning to support running processes directly as part of our “direct access” project, i.e. server features without the server. See: https://github.com/HDFGroup/hsds/blob/master/docs/design/direct_access/direct_access.md.