On behalf of my team, I’m happy to announce the first release of HDF5-UDF: user-defined-functions for HDF5. The project enables the embedding of Lua scripts in HDF5 so that users can programmatically define a dataset whose data is generated on-the-fly each time that dataset is read.
The primary motivation for this project is to dramatically reduce the disk space used by datasets that are a variation of existing data. We have successfully used HDF-UDF to virtually eliminate the impact of derived data in a number of use cases; grids that used to take a few gigabytes on disk, uncompressed, now require just a couple of kilobytes.
Underneath, the source code is converted to a bytecode representation that LuaJIT executes when the dataset is read by the application. Through Just-In-Time compilation the overhead of virtualization is barely noticed: outputting grids that have no dependency on existing datasets can be an order of magnitude faster than reaching out to disk for I/O.
We invite everyone to try it out and to open pull requests. We hope you find it as useful as we do.