Blosc filter for HDF5

Blosc <http://www.blosc.org> is a compression library that claims to be
faster than standard zlib. There is also a Blosc filter for HDF5 <
https://github.com/Blosc/hdf5-blosc>. The Julia language <
http://julialang.org> uses HDF5 as the "default" way to store binary data
on disk <https://github.com/JuliaLang/JLD.jl>.

It so happens that JLD uses Blosc by default; if you install JLD, then HDF5
and Blosc will both be installed automatically.

This also means that the respective HDF5 files are not readable by vanilla
HDF5 implementations, since these do not know about the Blosc filter.

Are you aware of Blosc? What are your thoughts about it -- is it worthwhile
to use it to improve compression speed? Is it feasible to include Blosc
with HDF5 by default, or to suggest to install Blosc in HDF5's install
instructions, or to include a Blosc decompressor in HDF5?

-erik

···

--
Erik Schnetter <schnetter@gmail.com>
http://www.perimeterinstitute.ca/personal/eschnetter/

Erik,

The HDF Group is aware of the BLOSC filter. Unfortunately, we cannot support all available filters and another solution is needed.

As you probably know, we implemented dynamically loaded filters<https://www.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/> in the HDF5 1.8.11 release. Starting with this release custom HDF5 filters can be used with vanilla HDF5 implementations. But this is not enough since there is no centralized place from where the users can download the source (or better - tested binaries) of the registered HDF5 filters. For example, in order to read HDF5 file created by PyTables using bzip2, one has to use PyTables, or reimplement the filter since bzip2 plugin is not publicly available.

The HDF Group and NeXus organization started the effort to create such repository on Github. I hope to report more on this effort in the nearest future.

Thank you!

Elena

···

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Nov 21, 2015, at 9:21 PM, Erik Schnetter <schnetter@gmail.com<mailto:schnetter@gmail.com>> wrote:

Blosc <http://www.blosc.org<http://www.blosc.org/>> is a compression library that claims to be faster than standard zlib. There is also a Blosc filter for HDF5 <https://github.com/Blosc/hdf5-blosc>. The Julia language <http://julialang.org<http://julialang.org/>> uses HDF5 as the "default" way to store binary data on disk <https://github.com/JuliaLang/JLD.jl>.

It so happens that JLD uses Blosc by default; if you install JLD, then HDF5 and Blosc will both be installed automatically.

This also means that the respective HDF5 files are not readable by vanilla HDF5 implementations, since these do not know about the Blosc filter.

Are you aware of Blosc? What are your thoughts about it -- is it worthwhile to use it to improve compression speed? Is it feasible to include Blosc with HDF5 by default, or to suggest to install Blosc in HDF5's install instructions, or to include a Blosc decompressor in HDF5?

-erik

--
Erik Schnetter <schnetter@gmail.com<mailto:schnetter@gmail.com>> http://www.perimeterinstitute.ca/personal/eschnetter/
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5