Toward Cloud-Optimized HDF5 Files - Poster Session at the AGU Fall Meeting


Aleksandar Jelenak (@ajelenak) will be presenting a poster during the AGU Fall Meeting next week. Aleksandar will be presenting the poster during the session on Monday, December 12th, from 2:45 p.m. to 6:15 p.m. in McCormick Place - Poster Hall, Hall A (South, Level 3)

IN15B-0284 - Toward Cloud-Optimized HDF5 Files

Earth Science data in the HDF5 format are prevalent although sometimes under different names. With growing adoption of cloud computing as the foundation of open science research, the accessibility of geoscience HDF5 data in cloud object storage became a very important factor in the development of efficient cloud-based data analysis workflows. One approach for already existing data file formats is to reorganize internal file structures in such a way that cloud-based data access becomes more efficient and, hence, yield improved performance. This was demonstrated to great success in recent years with the Cloud-Optimized GeoTIFF, a very popular file format for land remote sensing data.

Achieving something similar with the HDF5 file format would be very beneficial, namely: increased usability of HDF5 files across different storage systems and computing frameworks; lessen the amount of data reformatting; minimize data duplication in cloud object stores; reduce the need for custom limited-features HDF5 format readers. We are going to present the technical information and best practices for producing cloud-optimized HDF5 files using currently available HDF5 library settings. We will demonstrate the benefits of cloud-optimized HDF5 files with a few tasks typical for use cases related to local file access, OPeNDAP web services, and software stack based on xarray Python package. Data producers, cloud data managers, DevOps engineers, and geoscientists should be aware of this information in order to achieve quicker migration time and avoid any data usability loss.

Complete info in the AGU Fall meeting program:

We will post a link to the poster shortly, and of course if you’re not attending the AGU meeting, feel free to post your questions to Aleksandar here, or join Aleksandar’s next Call the Doctor session on Tuesday, December 20th at 12:20 p.m. Central (US & Canada).