Presentation: Highly Scalable Data Service (HSDS): an open source reference implementation of HDF5 in the cloud


#1

Hi all,

John Readey (@jreadey) of The HDF Group will be presenting during the May 2020 Allotrope Connect, which is free and accessible for everyone. The abstract for this session follows.

You can join John’s presentation on Wednesday, May 13, at 10:00 a.m. Central by registering.

More info on the entire event can be found at https://www.allotrope.org/may2020-allotrope-connect

Highly Scalable Data Service (HSDS): an open source reference implementation of HDF5 in the cloud

The Hierarchical Data Format (HDF) implements a model for managing and storing data. The model includes an abstract data model and storage model (the data format), and libraries to implement the abstract model and map the storage model to different storage mechanisms. HDF5 is the current open-source library that provides users a programming interface to a concrete implementation of the abstract models and serves as the basis for the Allotrope Data Format (ADF).

When HDF5 was released, it was designed to address current and anticipated requirements of modern systems, applications, and storage mechanisms. Unfortunately, this did not include object storage and access as a web service. In order to meet the growing expansion and interest by the HDF community into the cloud, The HDF Group has implemented the existing HDF data model into a new, REST-based web service for HDF5 data stores. This new implementation is called the Highly Scalable Data Service (HSDS).

HSDS is an open-source solution for reading and writing HDF data using a storage schema designed to work well with object storage (e.g. AWS S3, Azure Blobs) or POSIX-based storage. Developed to make large datasets accessible in a manner that’s both fast and cost-effective, HSDS stores HDF5 files as objects, but provides the functionality traditionally offered by the HDF5 library as accessible by any HTTP client. This presentation provides an overview of HSDS including how data can be stored in either a POSIX files system or using object-based storage such as AWS S3, Azure Blob Storage, or OpenIO. In addition, we will discuss how HSDS can be deployed on a single machine using Docker or on a cluster using Kubernetes (or AKS on Microsoft Azure).


#2

Hi all,

Materials from the 2020 Allotrope Connect have been posted. You can view the recording and the slide deck.

Additional information about the 2020 Allotrope Connect: https://www.allotrope.org/may2020-allotrope-connect

Thanks to John Readey (@jreadey) for the presentation and thanks to everyone who was able to attend live!