Hello HDF5 community,
In the past few months, we have been trying to identify use-cases on how HDF5 data is usually (batch-)processed in disparate scenarios. We have also been thinking on how HDFql, a high-level (declarative) programming language to manage HDF5 data, could help in this task.
We have now a proposal (which can be found attached to this post) that introduces an extension to HDFql’s SELECT operation that we would like to share. It basically consists in allowing the SELECT operation to read and (post-)process multiple datasets/attributes potentially held across multiple HDF5 files. The extension will effectively lower the complexity of batch-processing HDF5 data through (the execution of) one single (HDFql) operation while guaranteeing excellent performance and availability of HDF5 functionalities.
We would like now to ask for feedback concerning the proposal and eventually if the typical HDF5 (batch-)processing use-cases your organization faces could be shared. Feel free to post here your feedback/use-cases or to eventually contact us through https://www.hdfql.com/#contact.
Hopefully, the present post will trigger a wider discussion on the topic of (batch-)processing HDF5 data (which seems to be not much discussed) so that not only HDFql but the HDF5 ecosystem as well may benefit from this discussion.
We would like to deeply thank @gheber for his great support and feedback concerning this proposal!
Rick (for the HDFql Team)
hdfql_batch_processing_proposal.pdf (69.9 KB)