HDF5 BoF at SC23

Dear members of the HDF5 community,

On behalf of the organizing committee I invite you to the SC23 HDF5 BoF session “Building on 25 Years of Success” on Wednesday, 15 November 2023, 5:15 - 6:45 pm, rooms 401-402.

Please watch BoF Website for the session agenda and contact organizers if you would like to participate in the BoF panel and share your thoughts on HDF5 past, present and future.

Thank you and see you at SC23!

Elena Pourmal
Dana Robinson
Suren Byna
Quincey Koziol

1 Like

Just a reminder the SC23 BOF, HDF5: Building on 25 Years of Success, is tomorrow!

Wednesday, 15 November 2023, 5:15pm-6:45pm MST
Location: 401-402

HDF5 is a unique, open-source, high-performance technology suite that consists of an abstract data model, library, and file format used for storing and managing extremely large and/or complex data collections. It is used worldwide by government, industry, and academia in a wide range of science, engineering, and business disciplines.

The HDF5 community is both deep and broad: HDF5 is included by every major HPC system vendor as part of their core software, due to its broad adoption by science applications and its ability to improve I/O performance and data organization within HPC environments. Additionally, there are over 1000 projects on GitHub utilizing HDF5 due to its versatile, self-describing data model that can represent very complex data objects, relationships between the objects and objects’ metadata; portable binary file format with no limits on the number or size of data objects; software library optimized for efficient I/O; and tools for managing, manipulating, viewing, and analyzing HDF5 data.

The HDF5 community has continued adding features to access data in object and cloud storage, as well as exploit storage systems being deployed on today’s exascale systems. These features take advantage of the new storage paradigms and require minimum changes to current HDF5 applications. In the past decade, the amount of simulation, modeling, experimental, and observational data stored in HDF5 and the rate at which this data is collected have created new challenges for the scientists and triggered requests for using these new storage paradigms. Moreover, AI applications using HDF5 have requirements in reading data many times and shuffling data.

The HDF Group, The Ohio State University, Lawrence Berkeley Lab, Lifeboat , LLC, and Amazon AWS HPC teams have been working on enhancing HDF5 to address these challenges. We will present the latest HDF5 enhancements that will help applications run on exascale systems, exploit object storage, migrate to the cloud, and collect and store new types of data .We will demonstrate how the HDF5 virtual object layer (VOL) and virtual file driver (VFD) architectures now allow users to tackle scalable I/O on parallel file systems, data access on object store, asynchronous I/O and multi-threaded access to data, and more.

The target audience of this BoF includes numerous HDF5 users. A sample of them are: existing HDF5 users such as Exascale Computing Project (ECP) application developers and accelerator scientists, and new users such as the high-energy physics community who are exploring HDF5 as an alternative file format.

Our session format is focused on encouraging HDF5 community members to discuss challenges when using HDF5 and providing feedback to HDF5 developers. We will present a brief roadmap of HDF5, then invite current HDF5 users to share their experiences with the HDF5’s numerous features applied to real-world problems, and will solicit feedback on HDF5 improvements and gather requirements from the new users.

Agenda

Time slot (MST) Presenter Topic
17:15–17:25 Dana Robinson (The HDF Group) Introduction and HDF5 Roadmap
17:25–17:35 Jay Lofstead (Sandia National Lab) Fast, Searchable Data Annotations for Accelerating Time to Insight
17:35 17:45 Ravi Madduri (Argonne National Lab) Advanced Privacy preserving Federated Learning as a Service: Challenges and Opportunities
17:45–17:55 William Godoy (Oak Ridge National Lab) HDF5 as a critical component in the Julia HPC ecosystem
17:55–18:05 Johannes Blaschke (Lawrence Berkeley Lab) Perspectives from Data-Intensive HPC at NERSC
18:15–18:25 Glenn Lockwood (Microsoft) I/O middleware for artificial intelligence: real intelligence required
18:25–18:45 Panel The next 25 years of HDF5

Hi all,

We have posted links to all sessions that were part of our 2023 Birds of a Feather (BoF) session, HDF5: Building on 25 Years of Success at SC 23. You can check those out on our site:

Feel free to post here or reach out if you’d like to continue discussion on any of these topics!