ANN: Blosc 1.0rc1: a blocking, shuffling and loss-less compression library

Francesc_Alted2 · June 28, 2010, 3:42pm

Hi,

This is only a note to announce a nice compressor in which I've been working
lately, and that I think it is now ready for public testing. I've included a
small example on how to include support for Blosc as a generic filter in the
HDF5 library (see hdf5/ directory in sources).

I must warn you that, unfortunately, HDF5 cannot get the most out of Blosc
because of one additional memcpy() call after / before the compression /
decompression process. However, as this copy takes place, in general, in the
CPU cache (mostly in L2 in modern CPUs), this effect is not very important.

The PyTables community has already tested it quite intensively both stand-
alone and inside PyTables, and I happy to say that it seems to work nicely so
far.

Enjoy!

···

===============================================================
Announcing Blosc 1.0rc1
A blocking, shuffling and lossless compression library

:Author: Francesc Alted i Abad
:Contact: faltet@pytables.org
:URL: http://blosc.pytables.org

What is new?

Everything This is the first public release of a project that
started more than a year ago and that, after very intensive testing
(several hundreds of TB compressed and decompressed without a glitch),
it is finally getting ready for public consumption.

This is Release Candidate 1 for Blosc 1.0 release, so please test it
and report back any problem you may have with it.

What is it?

Blosc [1]_ is a high performance compressor optimized for binary data.
It has been designed to transmit data to the processor cache faster
than the traditional, non-compressed, direct memory fetch approach via
a memcpy() OS call. Blosc is the first compressor (that I'm aware of)
that is meant not only to reduce the size of large datasets on-disk or
in-memory, but also to accelerate memory-bound computations.

It uses the blocking technique (as described in [2]_) to reduce
activity on the memory bus as much as possible. In short, this
technique works by dividing datasets in blocks that are small enough
to fit in caches of modern processors and perform compression /
decompression there. It also leverages, if available, SIMD
instructions (SSE2) and multi-threading capabilities of CPUs, in order
to accelerate the compression / decompression process to a maximum.

You can see some recent benchmarks about Blosc performance in [3]_

Blosc is distributed using the MIT license, see file LICENSES
directory for details.

.. [1] http://blosc.pytables.org
.. [2] http://www.pytables.org/docs/CISE-12-2-ScientificPro.pdf
.. [3] http://blosc.pytables.org/trac/wiki/SyntheticBenchmarks

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

ANN: Blosc 1.0rc1: a blocking, shuffling and loss-less compression library

===============================================================
Announcing Blosc 1.0rc1
A blocking, shuffling and lossless compression library

What is new?

What is it?

Download sources

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

ANN: Blosc 1.0rc1: a blocking, shuffling and loss-less compression library

=============================================================== Announcing Blosc 1.0rc1 A blocking, shuffling and lossless compression library

What is new?

What is it?

Download sources

===============================================================
Announcing Blosc 1.0rc1
A blocking, shuffling and lossless compression library