The HDF Group is seeking Beta Testers for HDF5 Spark Connector


The HDF Group is pleased to announce that we are actively developing an HDF5 Spark Connector and are seeking Beta Users for this software. The HDF5 Spark Connector allows users of the Apache Spark open source processing engine to natively query data stored in HDF5 files.

This software is being developed in response to interest from members of the HDF5 user community. Many of them are interested in using Spark to obtain the same kind of speed, scalability, and reliability in data processing that they look for in I/O from HDF5. To date, they have been hampered by Spark's inability to directly access HDF5 files. Without this software, as a workaround, they have had to first perform an unwanted conversion of existing data from HDF5 to another data storage tool that Spark can directly read. We consider this software to be an exciting bridge between two very different but important and influential open source big data technologies:

. In use for more than 30 years, HDF5 (Hierarchical Data Format 5) addresses the problems of how to organize, store, discover, access, analyze, share, and preserve data in the face of enormous growth in size and complexity. Since its release, HDF5 has become the de-facto standard for the collection, storage, and provisioning for large, complex scientific datasets. HDF5 and its predecessors have supported mission-critical computing needs for Big Data and NoSQL with open source software since 1989, long before anyone was using the terms Big Data, NoSQL, or open source!

. Apache Spark is a powerful open source processing engine built around speed, ease of use, and sophisticated analytics. Originally developed at UC Berkeley in 2009 and now considered an essential piece of the Hadoop ecosystem, Spark is the largest open source project in data processing. Since its release, Apache Spark has seen rapid adoption by enterprises across a wide range of industries. Internet powerhouses such as Netflix, Yahoo, and eBay have deployed Spark at massive scale, collectively processing multiple petabytes of data on clusters of over 8,000 nodes.

The HDF Group is eager to speak with HDF5 users who are interested in joining the Beta Test program for the HDF5 Spark Connector. As a Beta Tester, you will have an opportunity to begin using this software and in the next few months provide crucial feedback to The HDF Group that will help guide the functionality and roadmap for this product.

For more information on the HDF5 Spark Connector:

If you're interested in becoming a beta tester, would like to be kept up-to-date on this product, or have other questions or concerns, you can use this form to communicate with The HDF Group: