Port HDF5 application program to cloud


#1

I wrote a parallel C program that uses HDF5 to write and read data (using parallel HDF) on a local filesystem, and wish to port this to the cloud. What resources should I consult on how to port my program to the cloud using HSDS? I am not familiar with containers or how to program for them, so any pointers for that aspect as well would be helpful. I attended the Allotrope webinar and I asked this question; they mentioned Kita Lab as the best way to get started, however, the only information I can find on it is a brief FAQ.


#2

Hi @lnp, you might be interested in the HDF5 REST VOL Connector, which is under development. The REST VOL connector is a plugin for HDF5 that can allow HDF5 C applications to write to/read from HSDS directly. While the connector is not quite HDF5 feature-complete and has not yet been tested with parallel HDF5, we are certainly interested in having more people test the connector and give feedback on what works and doesn’t work for them.


#3

@jhenderson, absolutely am interested in testing the REST VOL connector with the H5s scanner. Would be building on Windows 10.

  • Do I build the VOL plugin using the master or hdf5_1_12_update branch? H5s is at HDF5 1.12.0.
  • My working understanding is that existing HDF Servers with preloaded data are provided as part of Kita and they are available for initial testing. Do any have public endpoints or is a Kita account required?

#4

Hi @inp,

For KitaLab, are you referring to this page: https://www.hdfgroup.org/hdfkitalab/, which links to the FAQ? Let us know what additional information might be helpful to add to the FAQ. Since KitaLab uses the popular JupyterLab platform, there’s lots of information on the web if you are a new to Jupyter user. Example: https://www.youtube.com/watch?v=Gzun8PpyBCo.

KitaLab is primarily designed for running Python notebooks that execute in the AWS datacenter - i.e. latency to the server is reduced and throughput is greater than you would get accessing the server from your desktop. To experiment with the REST VOL, probably the easiest method would be to just run HSDS on your desktop using your local disk for storage. Once you get things running there it will be easy to scale up to code running in the cloud and storing your data in S3.

We have instructions for desktop setup here: https://github.com/HDFGroup/hsds/blob/master/docs/docker_install_posix.md. Unfortunately you’ll need to tweak some of the steps for running on Windows, but I don’t think there should be any major issues. Let us know how it goes!


#5

Hi @rhs,

Great! For now, you should build the VOL plugin with the hdf5_1_12_update branch until it gets merged to master. It contains some things that the master branch is missing for compatibility with the HDF5 1.12.0 release. As @jreadey mentioned, it would probably be easiest to setup HSDS locally if you’re mostly just interested in doing some quick testing first before considering moving to the cloud.


#6

Thank you for the information. Right now I am just collecting information about to tackle this problem; it seems like the REST VOL connector is a promising place to start.


#7

Yes, that looks like the page. I should have added I am using Linux, but as I am not using Python it seems KitaLab is not the right approach for me, which is disappointing because the reduction of latency is very appealing indeed! Since I don’t have full access to AWS yet on a scale that my program needs, the experimentation on my desktop is the right way to go.

Thanks for the explanation and suggestions.