Hello, I recently got introduced with HDF (HDF5 format and HSDS).
In our company we have an internal small cloud (kubernetes-based) to run internal services. We also run Jupyterhub on it. I was reading about HSDS and the HDF Lab and it made me thinking if we could (and should) deploy something similar to our cloud. As I understand, the HSDS allows to keep and access the HDF5 files in the cloud, by keeping them as Storage Objects (S3). So I was wondering - is HSDS self-hosted? I could not find this terms in the docs and this is the reason why I’m asking it here. The HSDS is open source but how hard will it be to deploy it internally? Have someone already done it? How hard will it be to connect it to our Jupyterhub instance?
Have a look at the Quick Start section! If your Jupyterhub instance is running a Python kernel, all you need is h5pyd
, which is h5py
compatible and you can pip install
it. That’s about as simple as it gets.
G.
Yes - the intent is to make it as easy as possible to host HSDS under your own cloud account, or on-premise server and many organizations have done just that. You can catch a video overview of HSDS here: https://www.youtube.com/watch?v=9b5TO7drqqE&t=2161s.
It no big deal to connect with Jupyterhub either – that’s what we do with HDFLab: https://www.hdfgroup.org/hdfkitalab/. HDFLab runs on a Kubernetes cluster with some pods running HSDS and other running user’s jupyter instance. One nice thing about this setup compared with traditional jupyterhub setups, is that it’s easy to share content. Different users can access the same HDF data hosted by HSDS.