HDF5-UDF 2.0 released: UDF signing, trust profiles, Python bindings, and more!

Thanks! Porting HDF5-UDF to other operating systems should not be too difficult. Given that it’s possible to disable sandboxing, one should be able to get basic functionality and then progressively incorporate security-related features.

The codebase is primarily written in C++. Python is one of the programming languages you can use to write UDFs (the other two are Lua and C/C++) and, starting with this release, the first one to provide bindings of HDF-UDF’s main library.

On the purpose of UDFs: they are meant to extend HDF5 by letting one to generate datasets procedurally. There are several use cases:

  • Data virtualization: use HDF5 as interface for files in other formats such as CSV and GeoTIFF

  • Gateway for IoT devices: embed the logic to retrieve live data from sensors and arrange them as if they were static HDF5 datasets

  • Storage and network bandwidth savings: if you have a dataset C that’s produced by combining datasets A and B, then just attach that logic as a compiled UDF that will grow the HDF5 file by just a few KBs

  • Process data where data lives: UDFs bring HDF5 one step closer to computational storage

  • Process data when it’s needed: some data ingestion pipelines try to preprocess data in advance with hopes that the produced data will be used at some point. When such preprocessing scripts are attached as UDFs, data is only processed if the application requests it (i.e., when the UDF dataset is read)

  • Keep your scripts next to the data they process: never lose track of which scripts produced a given dataset

These are just some examples that should give you an idea of the power of UDFs. Please let me know if you have any more questions.

1 Like