I’m hoping to have an official release of HSDS 0.9 out early this year, so it would be a good idea to take a stab at finalizing what features to include. There’s already a bunch of stuff that is in the master branch, that’s not in the current 0.8.5 release. On top of that there are some other things that would be nice to include but are still in the design phase (meaning that to include them will likely push out the release a bit).
Here’s a feature list going from those that are already implemented to those that are just an idea at this point:
- Shape reduction: datasets can be reduced in size - implemented
- Broadcasting: Numpy-style broadcasting of values over the entire selection - implemented
- UTF8 fixed size strings: Enable UTF8 with a fixed number of bytes - implemented
- Quickscan: Get domain wide usage (number of objects, storage used, etc.) on demand - implemented
- n-bit and scale-offset filters: Enable these filters to be specified (though they don’t actually do anything!): implemented
- bitshuffle filter: Support for bitshuffle (similar to byteshuffle but at bit level - see: GitHub - kiyo-masui/bitshuffle: Filter for improving compression of typed binary data.): implemented, but has an open issue
- Update for array types: enhanced support for array types - implemented, but still some issues to resolve
- Fieldops: read/write any subset of fields for a compound type dataset - implemented
- Support for long attribute names/non-utf8 encodable attributes - implemented
- Multiop attribute: Read or write multiple attribute from (possibly) multiple objects in one request - implemented
- Support for long link names/non-utf8 encodable link names - WIP
- Multiop links: Read or write multiple links from (possibly) multiple objects in one request - WIP
- Hyperchunking: use efficient chunk shape when linking to HDF5 files that have smaller chunks - implemented for 1D datasets, planned for multi-dimensional
- h5copy/h5move - enable these hdf5 library style operations - not started, but have a design doc here: https://github.com/HDFGroup/hsds/blob/master/docs/design/async_tasks/async_tasks.md
- use parquet for variable length chunk storage: will enable better performance for variable length datasets: not started
These changes are all backward compatible at the REST API level (meaning existing clients shouldn’t break), but utilizing new features will require some changes. To this end, I’d like to coordinate new releases for h5pyd and the REST-vol so that these features will be available to Python and C users.
And since this release does involve REST API changes, it would be a great time to update the api documentation, which is years out of date (though still useful): h5serv Developer Documentation — h5serv 0.1 documentation.
Anyway, this the plan! Reply here if you have questions about a particular feature, or if you have something else you’d like to see.
For features that are already implemented, you are free to try them out by building the HSDS image from the master branch. The test suite is fairly robust at this point, so the intent is that any feature should be working as designed. Will be happy to get any feedback on the pre-release code.
BTW, if you are curious why we are still not at a v1.0 of HSDS yet… When the project first got started, the idea was to have a v1.0 once we supported all the major features of the HDF5 library in HSDS. We are still not quite there yet though getting closer! After 0.9, the two missing items will be support for Opaque types and Region references.