New HDF5 compression plugin

elvis.stansvik · October 28, 2016, 3:20pm

> I second this request big time and would add zstd, if we are already
> trying
> out various encoders.

This may not be of interest, and does not include zstd, but I'm
attaching an excerpt from some of the results I got when back when
doing our basic benchmarking of some algorithms (all lossless).

It was based on those that we settled on Blosc_LZ4HC at level 4, since
we were looking for very fast decompression times, while longer
compression times and slightly larger file size was acceptable up to
certain points. The gzip results are included mostly because that's
what we were using at the time and I wanted them as a comparison, but
we knew we wanted something else. The input for those benchmarks was a
500x300x300 float dataset containing a tomographic 3D image.

Zstd was included in Blosc a while ago:

http://blosc.org/blog/zstd-has-just-landed-in-blosc.html

and its performance really shines, even on real data:

Genotype compressor benchmark

(although here, being only integers of 1 byte, only the BITSHUFFLE filter is
used, but not the faster SHUFFLE).

As Blosc offers the same API for a number of codecs, trying it in
combination with Zstd should be really easy.

Zstd indeed looks very well-balanced. The reason I didn't include it
back when I did those benchmarks was that we were really focused on
decompression speed in our application, compression speed was very
much secondary. So I included mostly LZ4 codecs.

I might try to dig up the script I used for the benchmark and see if
we still have the input I used, and do a test with lossy ZFP. It could
be very interesting for creating 3D "thumbnails" in our application.

It would be nice if your benchmark code (and dataset) can be made publicly
available so as to serve to others as a good comparison.

The dataset is unfortunately confidential and not something I can
release. I'm attaching the script I used though, it's very simple.

But, a disclaimer: The benchmarks I did were not really thorough. They
were also internal and never really meant to be published. It was
mostly a quick and dirty test to see which of these LZ4 codecs would
be in the right ballpark for us.

Elvis

compression-benchmark.py (3.67 KB)

···

2016-10-28 16:33 GMT+02:00 Francesc Alted <faltet@gmail.com>:

2016-10-28 13:59 GMT+02:00 Elvis Stansvik <elvis.stansvik@orexplore.com>:

2016-10-28 13:23 GMT+02:00 Peter Steinbach <steinbach@scionics.de>:

Elvis

>
> P
>
>
> On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>>
>> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <miller86@llnl.gov>:
>>>
>>> Hi All,
>>>
>>> Just wanted to mention a new HDF5 floating point compression plugin
>>> available on github...
>>>
>>> GitHub - LLNL/H5Z-ZFP: A registered ZFP compression plugin for HDF5
>>>
>>> This plugin will come embedded in the next release of the Silo library
>>> as
>>> well.
>>
>>
>> Thanks for the pointer. That's very interesting. I had not heard about
>> ZFP before. The ability to set a bound on the error in the lossless
>> case seems very useful.
>>
>> Do you know if there has been any comparative benchmarks of ZFP
>> against other compressors?
>>
>> After some basic benchmarking, we recently settled on Blosc_LZ4HC at
>> level 4 for our datasets (3D float tomography data), but maybe it
>> would be worthwhile to look at ZFP as well..
>>
>> Best regards,
>> Elvis
>>
>>>
>>> --
>>> Mark C. Miller, LLNL
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> Hdf-forum@lists.hdfgroup.org
>>>
>>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> Twitter: x.com
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> Hdf-forum@lists.hdfgroup.org
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: x.com
>>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: x.com

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

--
Francesc Alted

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

pl1 · October 31, 2016, 9:50pm

My bad--I saw the table of six integers in README_MORE and assumed those were the six integers in PARAMS. I should have kept reading...

Still, looking at the integer parameters there appear to be some unintentional byte swapping issues. The first four bytes should be 'z', 'f', 'p', \005, and these bytes show up in the second (32-bit) integer parameter (91252346 = 0x0570667a = ((\005 << 24) + ('p' << 16) + ('f' << 8) + ('z' << 0))).

Mark and I will work to track down the issue. Elvis, can you confirm that you compiled zfp with -DBIT_STREAM_WORD_TYPE=uint8? See the zfp FAQ and H5Z-ZFP/README_MORE for more information on this.

···

On 10/31/16 11:18, Miller, Mark C. wrote:

Hi Elvis,

The PARAMS dumped from the dataset header are *not* the same cd_values passed in memory between caller and plugin.

The dataset header in the file gets something different; ZFP's header including mode and metadata + H5Z-ZFP plugin version info

The README_MORE file in plugin on github explains this.

Long story short, you can't deduce much from those values dumped by h5dump/h5ls, etc. You'd have to reverse engineer how
ZFP library encodes magic, version, mode and metadata into its header.

Mark

--
Mark C. Miller, LLNL

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org <mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Peter Lindstrom <pl@llnl.gov <mailto:pl@llnl.gov>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org <mailto:hdf-forum@lists.hdfgroup.org>>
Date: Monday, October 31, 2016 at 11:03 AM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org <mailto:hdf-forum@lists.hdfgroup.org>>
Subject: Re: [Hdf-forum] New HDF5 compression plugin

    Elvis,

    Your compression parameters look suspicious to me:

    PARAMS { 5242928 91252346 313532218 -1043792 -937099264 67112167 }

    I would start by debugging those. If I understand Mark's filter
    correctly, the first integer should be 1, 2, 3, or 4.

    On 10/31/16 09:55, Elvis Stansvik wrote:

    Den 31 okt. 2016 5:41 em skrev "Miller, Mark C."
    <miller86@llnl.gov <mailto:miller86@llnl.gov>>:
    >
    > Hi Elvis,
    >
    >>
    >> I've successfully tried using the plugin through h5py. I did
    have a
    >> problem with the fixed-accuracy mode however and filed an issue:
    >>
    >> Strange result in fixed-accuracy mode · Issue #1 · LLNL/H5Z-ZFP · GitHub
    >>
    >> It's very likely that I'm doing something wrong though.
    >
    > Thanks so much!!
    >
    > I will take a look at the issue you reported later this week.
    >
    > I may have to rope the researcher who developed ZFP in to help.
    >
    > AFAICT, I am calling ZFP with correct parameters and, your
    email confirms you see zfp_stream_set_accuracy() being called.

    Alright, no hurry.

    I've been in contact with Peter earlier with some other
    questions. He seems very helpful.

    And yes, zfp_stream_set_accuracy seems to be called correctly
    with the parameters I pass through h5py (same as when I do the
    equivalent with the zfp command line tool), so I think something
    else is going on. The dataset type class and dimensions are also
    correctly identified by the filter plugin.

    Elvis

    >
    > Mark
    >
    > _______________________________________________
    > Hdf-forum is for HDF software users discussion.
    > Hdf-forum@lists.hdfgroup.org <mailto:Hdf-forum@lists.hdfgroup.org>
    >
    http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
    > Twitter: x.com

    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    Hdf-forum@lists.hdfgroup.orghttp://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
    Twitter:x.com

    -- Peter Lindstrom .pl@llnl.gov .lindstrom2 | people.llnl.gov . 925-423-5925

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

--
Peter Lindstrom . pl@llnl.gov . lindstrom2 | people.llnl.gov . 925-423-5925

epourmal · October 31, 2016, 3:25pm

All,

···

On Oct 28, 2016, at 7:17 AM, Peter Steinbach <steinbach@scionics.de> wrote:

Hi Elvis,

interesting I am mostly looking into 3D optical tomography images (which exclusively use voxels represented by integers).

This may not be of interest, and does not include zstd, but I'm
attaching an excerpt from some of the results I got when back when
doing our basic benchmarking of some algorithms (all lossless).

We've seen a rough factor of (2.+/-0.5) with lz4 r131 in compression as well with unfiltered data. In my cases we are mostly interested in high compression bandwidth and high compression ratio. lz4 so far gives compression bandwidths up to 1GB/s depending on the quality aspired (of course the compression ratios tend to be lower then).

It was based on those that we settled on Blosc_LZ4HC at level 4, since
we were looking for very fast decompression times, while longer
compression times and slightly larger file size was acceptable up to
certain points. The gzip results are included mostly because that's
what we were using at the time and I wanted them as a comparison, but
we knew we wanted something else. The input for those benchmarks was a
500x300x300 float dataset containing a tomographic 3D image.

to be honest, I am still surprised that hdf5 doesn't contain these state-of-the-art encoders, but rather ships bzip2 et al. which are painfully slow and don't make any account of computer architectures (lz4 is cache aware AFAIK). But hey, coming up with a hdf5 compressor is straight forward after one wrangled with the docs. I just don't know how contributing to hdf5 works.

The HDF5 library has only two built-in "thrif-paty" compression methods - GZIP and SZIP. Support for those goes back to the first releases of HDF5.

To take advantage of the new compression methods one has to use a filter plugin mechanism and the corresponding filter. The HDF Group maintains plugin functionality and, as Mark Miller mentioned in his email to the FORUM, we also support filters registration. We have been looking into how to make the registered filters available with the HDF5 releases without increasing our maintenance cost.

Thank you!

Elena

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal
Client Management Director
Interim Engineering Director
The HDF Group
1800 So. Oak St., Suite 203,
Champaign, IL 61820

(217)531-6112 (office)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I might try to dig up the script I used for the benchmark and see if
we still have the input I used, and do a test with lossy ZFP. It could
be very interesting for creating 3D "thumbnails" in our application.

indeed, that would be interesting to see.
Best,
Peter

Elvis

P

On 10/28/2016 01:12 PM, Elvis Stansvik wrote:

2016-10-28 1:53 GMT+02:00 Miller, Mark C. <miller86@llnl.gov>:

Hi All,

Just wanted to mention a new HDF5 floating point compression plugin
available on github...

GitHub - LLNL/H5Z-ZFP: A registered ZFP compression plugin for HDF5

This plugin will come embedded in the next release of the Silo library as
well.

Thanks for the pointer. That's very interesting. I had not heard about
ZFP before. The ability to set a bound on the error in the lossless
case seems very useful.

Do you know if there has been any comparative benchmarks of ZFP
against other compressors?

After some basic benchmarking, we recently settled on Blosc_LZ4HC at
level 4 for our datasets (3D float tomography data), but maybe it
would be worthwhile to look at ZFP as well..

Best regards,
Elvis

--
Mark C. Miller, LLNL

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

elvis.stansvik · October 31, 2016, 7:01pm

Hi Elvis,

The PARAMS dumped from the dataset header are *not* the same cd_values
passed in memory between caller and plugin.

The dataset header in the file gets something different; ZFP's header
including mode and metadata + H5Z-ZFP plugin version info

The README_MORE file in plugin on github explains this.

Long story short, you can't deduce much from those values dumped by
h5dump/h5ls, etc. You'd have to reverse engineer how
ZFP library encodes magic, version, mode and metadata into its header.

Yes, I am aware Peter probably got suspicious when he looked at my
h5dump output in the GitHub issue (since that will show the parameters
after they have been transformed by the filter). I'm pretty sure I'm
feeding the right cd_values to the filter though.

But maybe we better continue this discussion on the GitHub issue?

Elvis

···

2016-10-31 19:18 GMT+01:00 Miller, Mark C. <miller86@llnl.gov>:

Mark

--
Mark C. Miller, LLNL

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Peter
Lindstrom <pl@llnl.gov>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Date: Monday, October 31, 2016 at 11:03 AM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] New HDF5 compression plugin

Elvis,

Your compression parameters look suspicious to me:

PARAMS { 5242928 91252346 313532218 -1043792 -937099264 67112167 }

I would start by debugging those. If I understand Mark's filter correctly,
the first integer should be 1, 2, 3, or 4.

On 10/31/16 09:55, Elvis Stansvik wrote:

Den 31 okt. 2016 5:41 em skrev "Miller, Mark C." <miller86@llnl.gov>:

Hi Elvis,

I've successfully tried using the plugin through h5py. I did have a
problem with the fixed-accuracy mode however and filed an issue:

Strange result in fixed-accuracy mode · Issue #1 · LLNL/H5Z-ZFP · GitHub

It's very likely that I'm doing something wrong though.

Thanks so much!!

I will take a look at the issue you reported later this week.

I may have to rope the researcher who developed ZFP in to help.

AFAICT, I am calling ZFP with correct parameters and, your email confirms
you see zfp_stream_set_accuracy() being called.

Alright, no hurry.

I've been in contact with Peter earlier with some other questions. He seems
very helpful.

And yes, zfp_stream_set_accuracy seems to be called correctly with the
parameters I pass through h5py (same as when I do the equivalent with the
zfp command line tool), so I think something else is going on. The dataset
type class and dimensions are also correctly identified by the filter
plugin.

Elvis

Mark

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.orghttp://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

--
Peter Lindstrom . pl@llnl.gov . lindstrom2 | people.llnl.gov . 925-423-5925

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

elvis.stansvik · October 31, 2016, 10:11pm

My bad--I saw the table of six integers in README_MORE and assumed those
were the six integers in PARAMS. I should have kept reading...

Still, looking at the integer parameters there appear to be some
unintentional byte swapping issues. The first four bytes should be 'z',
'f', 'p', \005, and these bytes show up in the second (32-bit) integer
parameter (91252346 = 0x0570667a = ((\005 << 24) + ('p' << 16) + ('f' << 8)
+ ('z' << 0))).

Aha, you have a good eye. Then I bet that's the issue. I'll have a
look at it tomorrow again when I'm back at work.

Mark and I will work to track down the issue. Elvis, can you confirm that
you compiled zfp with -DBIT_STREAM_WORD_TYPE=uint8? See the zfp FAQ and
H5Z-ZFP/README_MORE for more information on this.

Yep, saw that bit in README_MORE and compiled with
-DBIT_STREAM_WORD_TYPE=uint8, but I'll triple check tomorrow.

Thanks a lot for helping me figure this out.

Elvis

···

2016-10-31 22:50 GMT+01:00 Peter Lindstrom <pl@llnl.gov>:

On 10/31/16 11:18, Miller, Mark C. wrote:

Hi Elvis,

The PARAMS dumped from the dataset header are *not* the same cd_values
passed in memory between caller and plugin.

The dataset header in the file gets something different; ZFP's header
including mode and metadata + H5Z-ZFP plugin version info

The README_MORE file in plugin on github explains this.

Long story short, you can't deduce much from those values dumped by
h5dump/h5ls, etc. You'd have to reverse engineer how
ZFP library encodes magic, version, mode and metadata into its header.

Mark

--
Mark C. Miller, LLNL

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Peter
Lindstrom <pl@llnl.gov>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Date: Monday, October 31, 2016 at 11:03 AM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] New HDF5 compression plugin

Elvis,

Your compression parameters look suspicious to me:

PARAMS { 5242928 91252346 313532218 -1043792 -937099264 67112167 }

I would start by debugging those. If I understand Mark's filter correctly,
the first integer should be 1, 2, 3, or 4.

On 10/31/16 09:55, Elvis Stansvik wrote:

Den 31 okt. 2016 5:41 em skrev "Miller, Mark C." <miller86@llnl.gov>:

Hi Elvis,

I've successfully tried using the plugin through h5py. I did have a
problem with the fixed-accuracy mode however and filed an issue:

Strange result in fixed-accuracy mode · Issue #1 · LLNL/H5Z-ZFP · GitHub

It's very likely that I'm doing something wrong though.

Thanks so much!!

I will take a look at the issue you reported later this week.

I may have to rope the researcher who developed ZFP in to help.

AFAICT, I am calling ZFP with correct parameters and, your email confirms
you see zfp_stream_set_accuracy() being called.

Alright, no hurry.

I've been in contact with Peter earlier with some other questions. He seems
very helpful.

And yes, zfp_stream_set_accuracy seems to be called correctly with the
parameters I pass through h5py (same as when I do the equivalent with the
zfp command line tool), so I think something else is going on. The dataset
type class and dimensions are also correctly identified by the filter
plugin.

Elvis

Mark

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.orghttp://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

--
Peter Lindstrom . pl@llnl.gov . lindstrom2 | people.llnl.gov . 925-423-5925

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

--
Peter Lindstrom . pl@llnl.gov . lindstrom2 | people.llnl.gov . 925-423-5925

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

elvis.stansvik · November 1, 2016, 6:57am

My bad--I saw the table of six integers in README_MORE and assumed those
were the six integers in PARAMS. I should have kept reading...

Still, looking at the integer parameters there appear to be some
unintentional byte swapping issues. The first four bytes should be 'z',
'f', 'p', \005, and these bytes show up in the second (32-bit) integer
parameter (91252346 = 0x0570667a = ((\005 << 24) + ('p' << 16) + ('f' << 8)
+ ('z' << 0))).

I see now that the first entry (5242928 = 0x00500030) is the version
of ZFP / H5Z-ZFP (0.5.0 / 0.3.0), the filter encodes the ZFP header
from entry 1 and forward.

Elvis

···

2016-10-31 22:50 GMT+01:00 Peter Lindstrom <pl@llnl.gov>:

Mark and I will work to track down the issue. Elvis, can you confirm that
you compiled zfp with -DBIT_STREAM_WORD_TYPE=uint8? See the zfp FAQ and
H5Z-ZFP/README_MORE for more information on this.

On 10/31/16 11:18, Miller, Mark C. wrote:

Hi Elvis,

The PARAMS dumped from the dataset header are *not* the same cd_values
passed in memory between caller and plugin.

The dataset header in the file gets something different; ZFP's header
including mode and metadata + H5Z-ZFP plugin version info

The README_MORE file in plugin on github explains this.

Long story short, you can't deduce much from those values dumped by
h5dump/h5ls, etc. You'd have to reverse engineer how
ZFP library encodes magic, version, mode and metadata into its header.

Mark

--
Mark C. Miller, LLNL

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Peter
Lindstrom <pl@llnl.gov>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Date: Monday, October 31, 2016 at 11:03 AM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] New HDF5 compression plugin

Elvis,

Your compression parameters look suspicious to me:

PARAMS { 5242928 91252346 313532218 -1043792 -937099264 67112167 }

I would start by debugging those. If I understand Mark's filter correctly,
the first integer should be 1, 2, 3, or 4.

On 10/31/16 09:55, Elvis Stansvik wrote:

Den 31 okt. 2016 5:41 em skrev "Miller, Mark C." <miller86@llnl.gov>:

Hi Elvis,

I've successfully tried using the plugin through h5py. I did have a
problem with the fixed-accuracy mode however and filed an issue:

Strange result in fixed-accuracy mode · Issue #1 · LLNL/H5Z-ZFP · GitHub

It's very likely that I'm doing something wrong though.

Thanks so much!!

I will take a look at the issue you reported later this week.

I may have to rope the researcher who developed ZFP in to help.

AFAICT, I am calling ZFP with correct parameters and, your email confirms
you see zfp_stream_set_accuracy() being called.

Alright, no hurry.

I've been in contact with Peter earlier with some other questions. He seems
very helpful.

And yes, zfp_stream_set_accuracy seems to be called correctly with the
parameters I pass through h5py (same as when I do the equivalent with the
zfp command line tool), so I think something else is going on. The dataset
type class and dimensions are also correctly identified by the filter
plugin.

Elvis

Mark

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.orghttp://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

--
Peter Lindstrom . pl@llnl.gov . lindstrom2 | people.llnl.gov . 925-423-5925

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

--
Peter Lindstrom . pl@llnl.gov . lindstrom2 | people.llnl.gov . 925-423-5925

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

elvis.stansvik · October 28, 2016, 4:04pm

> <elvis.stansvik@orexplore.com>:
>>
>> > I second this request big time and would add zstd, if we are already
>> > trying
>> > out various encoders.
>>
>> This may not be of interest, and does not include zstd, but I'm
>> attaching an excerpt from some of the results I got when back when
>> doing our basic benchmarking of some algorithms (all lossless).
>>
>> It was based on those that we settled on Blosc_LZ4HC at level 4, since
>> we were looking for very fast decompression times, while longer
>> compression times and slightly larger file size was acceptable up to
>> certain points. The gzip results are included mostly because that's
>> what we were using at the time and I wanted them as a comparison, but
>> we knew we wanted something else. The input for those benchmarks was a
>> 500x300x300 float dataset containing a tomographic 3D image.
>
>
> Zstd was included in Blosc a while ago:
>
> http://blosc.org/blog/zstd-has-just-landed-in-blosc.html
>
> and its performance really shines, even on real data:
>
>
> Genotype compressor benchmark
>
> (although here, being only integers of 1 byte, only the BITSHUFFLE
> filter is
> used, but not the faster SHUFFLE).
>
> As Blosc offers the same API for a number of codecs, trying it in
> combination with Zstd should be really easy.

Zstd indeed looks very well-balanced. The reason I didn't include it
back when I did those benchmarks was that we were really focused on
decompression speed in our application, compression speed was very
much secondary. So I included mostly LZ4 codecs.

Yes, that makes sense, but I think you should give a try at least at the
lowest compression levels for Blosc+Zstd (1, 2 and probably 3 too). For
these low compression levels Blosc chooses a block size that comfortably
fits in L2. Also, note that the benchmarks above where for in-memory data,
so for a typical disk-based workflow using HDF5, Blosc+Zstd can still
perform well enough.

Alright, thanks for the tip. I read the benchmarks too fast and didn't
realize it was all in-memory. I should definitely at Zstd.

In our use case it's always from disk (or well, SSD), and sometimes
even slow-ish network mounts.

Elvis

···

2016-10-28 17:53 GMT+02:00 Francesc Alted <faltet@gmail.com>:

2016-10-28 17:20 GMT+02:00 Elvis Stansvik <elvis.stansvik@orexplore.com>:

2016-10-28 16:33 GMT+02:00 Francesc Alted <faltet@gmail.com>:
> 2016-10-28 13:59 GMT+02:00 Elvis Stansvik
>> 2016-10-28 13:23 GMT+02:00 Peter Steinbach <steinbach@scionics.de>:

>
>>
>> I might try to dig up the script I used for the benchmark and see if
>> we still have the input I used, and do a test with lossy ZFP. It could
>> be very interesting for creating 3D "thumbnails" in our application.
>
>
> It would be nice if your benchmark code (and dataset) can be made
> publicly
> available so as to serve to others as a good comparison.

The dataset is unfortunately confidential and not something I can
release. I'm attaching the script I used though, it's very simple.

But, a disclaimer: The benchmarks I did were not really thorough. They
were also internal and never really meant to be published. It was
mostly a quick and dirty test to see which of these LZ4 codecs would
be in the right ballpark for us.

Ok. Thanks anyway.

Elvis

>
>>
>>
>> Elvis
>>
>> >
>> > P
>> >
>> >
>> > On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>> >>
>> >> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <miller86@llnl.gov>:
>> >>>
>> >>> Hi All,
>> >>>
>> >>> Just wanted to mention a new HDF5 floating point compression plugin
>> >>> available on github...
>> >>>
>> >>> GitHub - LLNL/H5Z-ZFP: A registered ZFP compression plugin for HDF5
>> >>>
>> >>> This plugin will come embedded in the next release of the Silo
>> >>> library
>> >>> as
>> >>> well.
>> >>
>> >>
>> >> Thanks for the pointer. That's very interesting. I had not heard
>> >> about
>> >> ZFP before. The ability to set a bound on the error in the lossless
>> >> case seems very useful.
>> >>
>> >> Do you know if there has been any comparative benchmarks of ZFP
>> >> against other compressors?
>> >>
>> >> After some basic benchmarking, we recently settled on Blosc_LZ4HC at
>> >> level 4 for our datasets (3D float tomography data), but maybe it
>> >> would be worthwhile to look at ZFP as well..
>> >>
>> >> Best regards,
>> >> Elvis
>> >>
>> >>>
>> >>> --
>> >>> Mark C. Miller, LLNL
>> >>>
>> >>> _______________________________________________
>> >>> Hdf-forum is for HDF software users discussion.
>> >>> Hdf-forum@lists.hdfgroup.org
>> >>>
>> >>>
>> >>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >>> Twitter: x.com
>> >>
>> >>
>> >> _______________________________________________
>> >> Hdf-forum is for HDF software users discussion.
>> >> Hdf-forum@lists.hdfgroup.org
>> >>
>> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> Twitter: x.com
>> >>
>> >
>> > _______________________________________________
>> > Hdf-forum is for HDF software users discussion.
>> > Hdf-forum@lists.hdfgroup.org
>> >
>> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> > Twitter: x.com
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> Hdf-forum@lists.hdfgroup.org
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: x.com
>
>
>
>
> --
> Francesc Alted
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: x.com

--
Francesc Alted

elvis.stansvik · November 1, 2016, 7:31am

> My bad--I saw the table of six integers in README_MORE and assumed those
> were the six integers in PARAMS. I should have kept reading...
>
> Still, looking at the integer parameters there appear to be some
> unintentional byte swapping issues. The first four bytes should be 'z',
> 'f', 'p', \005, and these bytes show up in the second (32-bit) integer
> parameter (91252346 = 0x0570667a = ((\005 << 24) + ('p' << 16) + ('f'

<< 8)

> + ('z' << 0))).

I see now that the first entry (5242928 = 0x00500030) is the version
of ZFP / H5Z-ZFP (0.5.0 / 0.3.0), the filter encodes the ZFP header
from entry 1 and forward.

Sorry I somehow missed that you continued the discussion on the GitHub
issue Peter.

To all others: This discussion is now at

Strange result in fixed-accuracy mode · Issue #1 · LLNL/H5Z-ZFP · GitHub

Elvis

Elvis

>
> Mark and I will work to track down the issue. Elvis, can you confirm

that

> you compiled zfp with -DBIT_STREAM_WORD_TYPE=uint8? See the zfp FAQ and
> H5Z-ZFP/README_MORE for more information on this.
>
>
>
> Hi Elvis,
>
> The PARAMS dumped from the dataset header are *not* the same cd_values
> passed in memory between caller and plugin.
>
> The dataset header in the file gets something different; ZFP's header
> including mode and metadata + H5Z-ZFP plugin version info
>
> The README_MORE file in plugin on github explains this.
>
> Long story short, you can't deduce much from those values dumped by
> h5dump/h5ls, etc. You'd have to reverse engineer how
> ZFP library encodes magic, version, mode and metadata into its header.
>
> Mark
>
>
>
> --
> Mark C. Miller, LLNL
>
> From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of

Peter

> Lindstrom <pl@llnl.gov>
> Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
> Date: Monday, October 31, 2016 at 11:03 AM
> To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
> Subject: Re: [Hdf-forum] New HDF5 compression plugin
>
> Elvis,
>
> Your compression parameters look suspicious to me:
>
> PARAMS { 5242928 91252346 313532218 -1043792 -937099264 67112167 }
>
> I would start by debugging those. If I understand Mark's filter

correctly,

> the first integer should be 1, 2, 3, or 4.
>
>
>>
>> Hi Elvis,
>>
>>>
>>> I've successfully tried using the plugin through h5py. I did have a
>>> problem with the fixed-accuracy mode however and filed an issue:
>>>
>>> Strange result in fixed-accuracy mode · Issue #1 · LLNL/H5Z-ZFP · GitHub
>>>
>>> It's very likely that I'm doing something wrong though.
>>
>>
>> Thanks so much!!
>>
>> I will take a look at the issue you reported later this week.
>>
>> I may have to rope the researcher who developed ZFP in to help.
>>
>> AFAICT, I am calling ZFP with correct parameters and, your email

confirms

>> you see zfp_stream_set_accuracy() being called.
>
> Alright, no hurry.
>
> I've been in contact with Peter earlier with some other questions. He

seems

> very helpful.
>
> And yes, zfp_stream_set_accuracy seems to be called correctly with the
> parameters I pass through h5py (same as when I do the equivalent with

the

> zfp command line tool), so I think something else is going on. The

dataset

> type class and dimensions are also correctly identified by the filter
> plugin.
>
> Elvis
>
>>
>> Mark
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> Hdf-forum@lists.hdfgroup.org
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: x.com
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.orghttp://

lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

···

Den 1 nov. 2016 7:57 fm skrev "Elvis Stansvik" <elvis.stansvik@orexplore.com >:

2016-10-31 22:50 GMT+01:00 Peter Lindstrom <pl@llnl.gov>:
> On 10/31/16 11:18, Miller, Mark C. wrote:
> On 10/31/16 09:55, Elvis Stansvik wrote:
> Den 31 okt. 2016 5:41 em skrev "Miller, Mark C." <miller86@llnl.gov>:
> Twitter: x.com
>
>
>
> --
> Peter Lindstrom . pl@llnl.gov . lindstrom2 | people.llnl.gov . 925-423-5925
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: x.com
>
>
>
> --
> Peter Lindstrom . pl@llnl.gov . lindstrom2 | people.llnl.gov . 925-423-5925
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: x.com

faltet · October 28, 2016, 4:14pm

>
>
:
>>
>> > <elvis.stansvik@orexplore.com>:
>> >>
>> >> > I second this request big time and would add zstd, if we are
already
>> >> > trying
>> >> > out various encoders.
>> >>
>> >> This may not be of interest, and does not include zstd, but I'm
>> >> attaching an excerpt from some of the results I got when back when
>> >> doing our basic benchmarking of some algorithms (all lossless).
>> >>
>> >> It was based on those that we settled on Blosc_LZ4HC at level 4,
since
>> >> we were looking for very fast decompression times, while longer
>> >> compression times and slightly larger file size was acceptable up to
>> >> certain points. The gzip results are included mostly because that's
>> >> what we were using at the time and I wanted them as a comparison, but
>> >> we knew we wanted something else. The input for those benchmarks was
a
>> >> 500x300x300 float dataset containing a tomographic 3D image.
>> >
>> >
>> > Zstd was included in Blosc a while ago:
>> >
>> > http://blosc.org/blog/zstd-has-just-landed-in-blosc.html
>> >
>> > and its performance really shines, even on real data:
>> >
>> >
>> > http://alimanfoo.github.io/2016/09/21/genotype-
compression-benchmark.html
>> >
>> > (although here, being only integers of 1 byte, only the BITSHUFFLE
>> > filter is
>> > used, but not the faster SHUFFLE).
>> >
>> > As Blosc offers the same API for a number of codecs, trying it in
>> > combination with Zstd should be really easy.
>>
>> Zstd indeed looks very well-balanced. The reason I didn't include it
>> back when I did those benchmarks was that we were really focused on
>> decompression speed in our application, compression speed was very
>> much secondary. So I included mostly LZ4 codecs.
>
>
> Yes, that makes sense, but I think you should give a try at least at the
> lowest compression levels for Blosc+Zstd (1, 2 and probably 3 too). For
> these low compression levels Blosc chooses a block size that comfortably
> fits in L2. Also, note that the benchmarks above where for in-memory
data,
> so for a typical disk-based workflow using HDF5, Blosc+Zstd can still
> perform well enough.

Alright, thanks for the tip. I read the benchmarks too fast and didn't
realize it was all in-memory. I should definitely at Zstd.

In our use case it's always from disk (or well, SSD), and sometimes
even slow-ish network mounts.

Cool. Keep us informed. I am definitely interested.

···

2016-10-28 18:04 GMT+02:00 Elvis Stansvik <elvis.stansvik@orexplore.com>:

2016-10-28 17:53 GMT+02:00 Francesc Alted <faltet@gmail.com>:
> 2016-10-28 17:20 GMT+02:00 Elvis Stansvik <elvis.stansvik@orexplore.com>
>> 2016-10-28 16:33 GMT+02:00 Francesc Alted <faltet@gmail.com>:
>> > 2016-10-28 13:59 GMT+02:00 Elvis Stansvik
>> >> 2016-10-28 13:23 GMT+02:00 Peter Steinbach <steinbach@scionics.de>:

Elvis

>
>
>>
>>
>> >
>> >>
>> >> I might try to dig up the script I used for the benchmark and see if
>> >> we still have the input I used, and do a test with lossy ZFP. It
could
>> >> be very interesting for creating 3D "thumbnails" in our application.
>> >
>> >
>> > It would be nice if your benchmark code (and dataset) can be made
>> > publicly
>> > available so as to serve to others as a good comparison.
>>
>> The dataset is unfortunately confidential and not something I can
>> release. I'm attaching the script I used though, it's very simple.
>>
>> But, a disclaimer: The benchmarks I did were not really thorough. They
>> were also internal and never really meant to be published. It was
>> mostly a quick and dirty test to see which of these LZ4 codecs would
>> be in the right ballpark for us.
>
>
> Ok. Thanks anyway.
>
>>
>>
>> Elvis
>>
>> >
>> >>
>> >>
>> >> Elvis
>> >>
>> >> >
>> >> > P
>> >> >
>> >> >
>> >> > On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>> >> >>
>> >> >> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <miller86@llnl.gov>:
>> >> >>>
>> >> >>> Hi All,
>> >> >>>
>> >> >>> Just wanted to mention a new HDF5 floating point compression
plugin
>> >> >>> available on github...
>> >> >>>
>> >> >>> https://github.com/LLNL/H5Z-ZFP
>> >> >>>
>> >> >>> This plugin will come embedded in the next release of the Silo
>> >> >>> library
>> >> >>> as
>> >> >>> well.
>> >> >>
>> >> >>
>> >> >> Thanks for the pointer. That's very interesting. I had not heard
>> >> >> about
>> >> >> ZFP before. The ability to set a bound on the error in the
lossless
>> >> >> case seems very useful.
>> >> >>
>> >> >> Do you know if there has been any comparative benchmarks of ZFP
>> >> >> against other compressors?
>> >> >>
>> >> >> After some basic benchmarking, we recently settled on Blosc_LZ4HC
at
>> >> >> level 4 for our datasets (3D float tomography data), but maybe it
>> >> >> would be worthwhile to look at ZFP as well..
>> >> >>
>> >> >> Best regards,
>> >> >> Elvis
>> >> >>
>> >> >>>
>> >> >>> --
>> >> >>> Mark C. Miller, LLNL
>> >> >>>
>> >> >>> _______________________________________________
>> >> >>> Hdf-forum is for HDF software users discussion.
>> >> >>> Hdf-forum@lists.hdfgroup.org
>> >> >>>
>> >> >>>
>> >> >>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_
lists.hdfgroup.org
>> >> >>> Twitter: https://twitter.com/hdf5
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> Hdf-forum is for HDF software users discussion.
>> >> >> Hdf-forum@lists.hdfgroup.org
>> >> >>
>> >> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_
lists.hdfgroup.org
>> >> >> Twitter: https://twitter.com/hdf5
>> >> >>
>> >> >
>> >> > _______________________________________________
>> >> > Hdf-forum is for HDF software users discussion.
>> >> > Hdf-forum@lists.hdfgroup.org
>> >> >
>> >> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_
lists.hdfgroup.org
>> >> > Twitter: https://twitter.com/hdf5
>> >>
>> >> _______________________________________________
>> >> Hdf-forum is for HDF software users discussion.
>> >> Hdf-forum@lists.hdfgroup.org
>> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_
lists.hdfgroup.org
>> >> Twitter: https://twitter.com/hdf5
>> >
>> >
>> >
>> >
>> > --
>> > Francesc Alted
>> >
>> > _______________________________________________
>> > Hdf-forum is for HDF software users discussion.
>> > Hdf-forum@lists.hdfgroup.org
>> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_
lists.hdfgroup.org
>> > Twitter: https://twitter.com/hdf5
>
>
>
>
> --
> Francesc Alted

--
Francesc Alted

elvis.stansvik · October 28, 2016, 6:08pm

>
>
> <elvis.stansvik@orexplore.com>:
>>
>> > <elvis.stansvik@orexplore.com>:
>> >>
>> >> > I second this request big time and would add zstd, if we are
>> >> > already
>> >> > trying
>> >> > out various encoders.
>> >>
>> >> This may not be of interest, and does not include zstd, but I'm
>> >> attaching an excerpt from some of the results I got when back when
>> >> doing our basic benchmarking of some algorithms (all lossless).
>> >>
>> >> It was based on those that we settled on Blosc_LZ4HC at level 4,
>> >> since
>> >> we were looking for very fast decompression times, while longer
>> >> compression times and slightly larger file size was acceptable up to
>> >> certain points. The gzip results are included mostly because that's
>> >> what we were using at the time and I wanted them as a comparison,
>> >> but
>> >> we knew we wanted something else. The input for those benchmarks was
>> >> a
>> >> 500x300x300 float dataset containing a tomographic 3D image.
>> >
>> >
>> > Zstd was included in Blosc a while ago:
>> >
>> > http://blosc.org/blog/zstd-has-just-landed-in-blosc.html
>> >
>> > and its performance really shines, even on real data:
>> >
>> >
>> >
>> > Genotype compressor benchmark
>> >
>> > (although here, being only integers of 1 byte, only the BITSHUFFLE
>> > filter is
>> > used, but not the faster SHUFFLE).
>> >
>> > As Blosc offers the same API for a number of codecs, trying it in
>> > combination with Zstd should be really easy.
>>
>> Zstd indeed looks very well-balanced. The reason I didn't include it
>> back when I did those benchmarks was that we were really focused on
>> decompression speed in our application, compression speed was very
>> much secondary. So I included mostly LZ4 codecs.
>
>
> Yes, that makes sense, but I think you should give a try at least at the
> lowest compression levels for Blosc+Zstd (1, 2 and probably 3 too). For
> these low compression levels Blosc chooses a block size that comfortably
> fits in L2. Also, note that the benchmarks above where for in-memory
> data,
> so for a typical disk-based workflow using HDF5, Blosc+Zstd can still
> perform well enough.

Alright, thanks for the tip. I read the benchmarks too fast and didn't
realize it was all in-memory. I should definitely at Zstd.

In our use case it's always from disk (or well, SSD), and sometimes
even slow-ish network mounts.

Cool. Keep us informed. I am definitely interested.

I found the old input file and very quickly I ran the benchmark again
with Blosc_ZSTD with byte-based shuffling at compression levels 1, 2
and 3:

compressor,ctime_mean(s),ctime_std(s),rtime_mean(s),rtime_std(s),size(B)
blosc_zstd_1,0.73083,0.00104,0.29489,0.00338,116666294
blosc_zstd_2,1.40672,0.00164,0.28097,0.00220,114666454
blosc_zstd_3,1.48507,0.01872,0.26451,0.00208,113485801

Unfortunately I can't find the spreadsheet where I made those
diagrams, so can't make a new updated one (at least not easily right
now).

But this shows that Zstd is very competitive. It achieves slightly
better compression ratio than Blosc_LZ4HC at level 4 (the original
file size was 189378052 bytes), which is what we picked, and the
compression is much faster. But Blosc_LZ4HC still wins out in the
decompression time, so I think in the end we picked the right one.

Our use case is essentially compress once, decompress many many times.
And during the decompression the user will sit there and wait. That's
why decompression time was so important to us.

Anyway, thanks a for making me have a look at Zstd, we may yet use it
somewhere else.

And I now remember the real reason I didn't include it the first time
around: We're basing our product on Ubuntu 16.04, where Blosc 1.7 is
the packaged version (1.10 is where Zstd support was added), so I
lazily just skipped it

Elvis

···

2016-10-28 18:14 GMT+02:00 Francesc Alted <faltet@gmail.com>:

2016-10-28 18:04 GMT+02:00 Elvis Stansvik <elvis.stansvik@orexplore.com>:

2016-10-28 17:53 GMT+02:00 Francesc Alted <faltet@gmail.com>:
> 2016-10-28 17:20 GMT+02:00 Elvis Stansvik
>> 2016-10-28 16:33 GMT+02:00 Francesc Alted <faltet@gmail.com>:
>> > 2016-10-28 13:59 GMT+02:00 Elvis Stansvik
>> >> 2016-10-28 13:23 GMT+02:00 Peter Steinbach <steinbach@scionics.de>:

Elvis

>
>
>>
>>
>> >
>> >>
>> >> I might try to dig up the script I used for the benchmark and see if
>> >> we still have the input I used, and do a test with lossy ZFP. It
>> >> could
>> >> be very interesting for creating 3D "thumbnails" in our application.
>> >
>> >
>> > It would be nice if your benchmark code (and dataset) can be made
>> > publicly
>> > available so as to serve to others as a good comparison.
>>
>> The dataset is unfortunately confidential and not something I can
>> release. I'm attaching the script I used though, it's very simple.
>>
>> But, a disclaimer: The benchmarks I did were not really thorough. They
>> were also internal and never really meant to be published. It was
>> mostly a quick and dirty test to see which of these LZ4 codecs would
>> be in the right ballpark for us.
>
>
> Ok. Thanks anyway.
>
>>
>>
>> Elvis
>>
>> >
>> >>
>> >>
>> >> Elvis
>> >>
>> >> >
>> >> > P
>> >> >
>> >> >
>> >> > On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>> >> >>
>> >> >> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <miller86@llnl.gov>:
>> >> >>>
>> >> >>> Hi All,
>> >> >>>
>> >> >>> Just wanted to mention a new HDF5 floating point compression
>> >> >>> plugin
>> >> >>> available on github...
>> >> >>>
>> >> >>> GitHub - LLNL/H5Z-ZFP: A registered ZFP compression plugin for HDF5
>> >> >>>
>> >> >>> This plugin will come embedded in the next release of the Silo
>> >> >>> library
>> >> >>> as
>> >> >>> well.
>> >> >>
>> >> >>
>> >> >> Thanks for the pointer. That's very interesting. I had not heard
>> >> >> about
>> >> >> ZFP before. The ability to set a bound on the error in the
>> >> >> lossless
>> >> >> case seems very useful.
>> >> >>
>> >> >> Do you know if there has been any comparative benchmarks of ZFP
>> >> >> against other compressors?
>> >> >>
>> >> >> After some basic benchmarking, we recently settled on Blosc_LZ4HC
>> >> >> at
>> >> >> level 4 for our datasets (3D float tomography data), but maybe it
>> >> >> would be worthwhile to look at ZFP as well..
>> >> >>
>> >> >> Best regards,
>> >> >> Elvis
>> >> >>
>> >> >>>
>> >> >>> --
>> >> >>> Mark C. Miller, LLNL
>> >> >>>
>> >> >>> _______________________________________________
>> >> >>> Hdf-forum is for HDF software users discussion.
>> >> >>> Hdf-forum@lists.hdfgroup.org
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> >>> Twitter: x.com
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> Hdf-forum is for HDF software users discussion.
>> >> >> Hdf-forum@lists.hdfgroup.org
>> >> >>
>> >> >>
>> >> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> >> Twitter: x.com
>> >> >>
>> >> >
>> >> > _______________________________________________
>> >> > Hdf-forum is for HDF software users discussion.
>> >> > Hdf-forum@lists.hdfgroup.org
>> >> >
>> >> >
>> >> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> > Twitter: x.com
>> >>
>> >> _______________________________________________
>> >> Hdf-forum is for HDF software users discussion.
>> >> Hdf-forum@lists.hdfgroup.org
>> >>
>> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> Twitter: x.com
>> >
>> >
>> >
>> >
>> > --
>> > Francesc Alted
>> >
>> > _______________________________________________
>> > Hdf-forum is for HDF software users discussion.
>> > Hdf-forum@lists.hdfgroup.org
>> >
>> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> > Twitter: x.com
>
>
>
>
> --
> Francesc Alted

--
Francesc Alted

miller86 · October 28, 2016, 6:24pm

Can I just clarify some of this discussion...

It reads like you are talking about compression ratios around 1.6x, less than 2:1. Is that correct?

FYI..ZFP demonstrates results far beyond that (10-30x and better) at the expense of (some) loss.

However, current efforts indicate that losses are tolerable in many post-processing analysis workflows.

We think the key to achieving good compression on floating point data, going forward, is to allow for some well controlled loss.

See this page on on ZFP losses effect, for example, taking derivatives...

http://computation.llnl.gov/projects/floating-point-compression/zfp-and-derivatives

as compared to other compression methods.

We already face loss-like noise in floating point results when dealing with system differences either between current systems and software stacks or over time as systems and software evolve.

Mark

···

--
Mark C. Miller, LLNL

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Elvis Stansvik <elvis.stansvik@orexplore.com<mailto:elvis.stansvik@orexplore.com>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Friday, October 28, 2016 at 11:08 AM
To: "faltet@gmail.com<mailto:faltet@gmail.com>" <faltet@gmail.com<mailto:faltet@gmail.com>>
Cc: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: Re: [Hdf-forum] New HDF5 compression plugin

2016-10-28 18:14 GMT+02:00 Francesc Alted <faltet@gmail.com<mailto:faltet@gmail.com>>:

2016-10-28 18:04 GMT+02:00 Elvis Stansvik <elvis.stansvik@orexplore.com<mailto:elvis.stansvik@orexplore.com>>:

2016-10-28 17:53 GMT+02:00 Francesc Alted <faltet@gmail.com<mailto:faltet@gmail.com>>:

2016-10-28 17:20 GMT+02:00 Elvis Stansvik
<elvis.stansvik@orexplore.com<mailto:elvis.stansvik@orexplore.com>>:

2016-10-28 16:33 GMT+02:00 Francesc Alted <faltet@gmail.com<mailto:faltet@gmail.com>>:
> 2016-10-28 13:59 GMT+02:00 Elvis Stansvik
> <elvis.stansvik@orexplore.com<mailto:elvis.stansvik@orexplore.com>>:
>>
>> 2016-10-28 13:23 GMT+02:00 Peter Steinbach <steinbach@scionics.de<mailto:steinbach@scionics.de>>:
>> > I second this request big time and would add zstd, if we are
>> > already
>> > trying
>> > out various encoders.
>>
>> This may not be of interest, and does not include zstd, but I'm
>> attaching an excerpt from some of the results I got when back when
>> doing our basic benchmarking of some algorithms (all lossless).
>>
>> It was based on those that we settled on Blosc_LZ4HC at level 4,
>> since
>> we were looking for very fast decompression times, while longer
>> compression times and slightly larger file size was acceptable up to
>> certain points. The gzip results are included mostly because that's
>> what we were using at the time and I wanted them as a comparison,
>> but
>> we knew we wanted something else. The input for those benchmarks was
>> a
>> 500x300x300 float dataset containing a tomographic 3D image.
>
>
> Zstd was included in Blosc a while ago:
>
> http://blosc.org/blog/zstd-has-just-landed-in-blosc.html
>
> and its performance really shines, even on real data:
>
>
>
> Genotype compressor benchmark
>
> (although here, being only integers of 1 byte, only the BITSHUFFLE
> filter is
> used, but not the faster SHUFFLE).
>
> As Blosc offers the same API for a number of codecs, trying it in
> combination with Zstd should be really easy.

Zstd indeed looks very well-balanced. The reason I didn't include it
back when I did those benchmarks was that we were really focused on
decompression speed in our application, compression speed was very
much secondary. So I included mostly LZ4 codecs.

Yes, that makes sense, but I think you should give a try at least at the
lowest compression levels for Blosc+Zstd (1, 2 and probably 3 too). For
these low compression levels Blosc chooses a block size that comfortably
fits in L2. Also, note that the benchmarks above where for in-memory
data,
so for a typical disk-based workflow using HDF5, Blosc+Zstd can still
perform well enough.

Alright, thanks for the tip. I read the benchmarks too fast and didn't
realize it was all in-memory. I should definitely at Zstd.

In our use case it's always from disk (or well, SSD), and sometimes
even slow-ish network mounts.

Cool. Keep us informed. I am definitely interested.

I found the old input file and very quickly I ran the benchmark again
with Blosc_ZSTD with byte-based shuffling at compression levels 1, 2
and 3:

compressor,ctime_mean(s),ctime_std(s),rtime_mean(s),rtime_std(s),size(B)
blosc_zstd_1,0.73083,0.00104,0.29489,0.00338,116666294
blosc_zstd_2,1.40672,0.00164,0.28097,0.00220,114666454
blosc_zstd_3,1.48507,0.01872,0.26451,0.00208,113485801

Unfortunately I can't find the spreadsheet where I made those
diagrams, so can't make a new updated one (at least not easily right
now).

But this shows that Zstd is very competitive. It achieves slightly
better compression ratio than Blosc_LZ4HC at level 4 (the original
file size was 189378052 bytes), which is what we picked, and the
compression is much faster. But Blosc_LZ4HC still wins out in the
decompression time, so I think in the end we picked the right one.

Our use case is essentially compress once, decompress many many times.
And during the decompression the user will sit there and wait. That's
why decompression time was so important to us.

Anyway, thanks a for making me have a look at Zstd, we may yet use it
somewhere else.

And I now remember the real reason I didn't include it the first time
around: We're basing our product on Ubuntu 16.04, where Blosc 1.7 is
the packaged version (1.10 is where Zstd support was added), so I
lazily just skipped it

Elvis

>
>>
>> I might try to dig up the script I used for the benchmark and see if
>> we still have the input I used, and do a test with lossy ZFP. It
>> could
>> be very interesting for creating 3D "thumbnails" in our application.
>
>
> It would be nice if your benchmark code (and dataset) can be made
> publicly
> available so as to serve to others as a good comparison.

The dataset is unfortunately confidential and not something I can
release. I'm attaching the script I used though, it's very simple.

But, a disclaimer: The benchmarks I did were not really thorough. They
were also internal and never really meant to be published. It was
mostly a quick and dirty test to see which of these LZ4 codecs would
be in the right ballpark for us.

Ok. Thanks anyway.

Elvis

>
>>
>>
>> Elvis
>>
>> >
>> > P
>> >
>> >
>> > On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>> >>
>> >> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <miller86@llnl.gov<mailto:miller86@llnl.gov>>:
>> >>>
>> >>> Hi All,
>> >>>
>> >>> Just wanted to mention a new HDF5 floating point compression
>> >>> plugin
>> >>> available on github...
>> >>>
>> >>> GitHub - LLNL/H5Z-ZFP: A registered ZFP compression plugin for HDF5
>> >>>
>> >>> This plugin will come embedded in the next release of the Silo
>> >>> library
>> >>> as
>> >>> well.
>> >>
>> >>
>> >> Thanks for the pointer. That's very interesting. I had not heard
>> >> about
>> >> ZFP before. The ability to set a bound on the error in the
>> >> lossless
>> >> case seems very useful.
>> >>
>> >> Do you know if there has been any comparative benchmarks of ZFP
>> >> against other compressors?
>> >>
>> >> After some basic benchmarking, we recently settled on Blosc_LZ4HC
>> >> at
>> >> level 4 for our datasets (3D float tomography data), but maybe it
>> >> would be worthwhile to look at ZFP as well..
>> >>
>> >> Best regards,
>> >> Elvis
>> >>
>> >>>
>> >>> --
>> >>> Mark C. Miller, LLNL
>> >>>
>> >>> _______________________________________________
>> >>> Hdf-forum is for HDF software users discussion.
>> >>> Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
>> >>>
>> >>>
>> >>>
>> >>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >>> Twitter: x.com
>> >>
>> >>
>> >> _______________________________________________
>> >> Hdf-forum is for HDF software users discussion.
>> >> Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
>> >>
>> >>
>> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> Twitter: x.com
>> >>
>> >
>> > _______________________________________________
>> > Hdf-forum is for HDF software users discussion.
>> > Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
>> >
>> >
>> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> > Twitter: x.com
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
>>
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: x.com
>
>
>
>
> --
> Francesc Alted
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
>
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: x.com

--
Francesc Alted

--
Francesc Alted

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

elvis.stansvik · October 28, 2016, 6:44pm

Can I just clarify some of this discussion...

It reads like you are talking about compression ratios around 1.6x, less
than 2:1. Is that correct?

Yes, in our case we only do lossless compression so far, but we have
been talking about lossy. Just haven't taken any steps yet, and I
didn't even know about ZFP from before. It looks very interesting.

FYI..ZFP demonstrates results far beyond that (10-30x and better) at the
expense of (some) loss.

Yes, ZFP is of course in a completely different ball game compression
ratio wise than the codecs I compared in my benchmark (which are all
lossless). It looks very impressive from reading the material on the
site and skimming the paper.

However, current efforts indicate that losses are tolerable in many
post-processing analysis workflows.

Right, we need to investigate, or rather I need to have a discussion
with our physicists on how much error we can tolerate (I'm not doing
any analysis myself, only visualization). Our data is single precision
float to begin with. For the visualization part I'm sure we could get
away with quite a bit of loss.

We think the key to achieving good compression on floating point data, going
forward, is to allow for some well controlled loss.

Yes, and it seems that ZFP has several knobs for controlling that loss
which look really useful.

See this page on on ZFP losses effect, for example, taking derivatives...

http://computation.llnl.gov/projects/floating-point-compression/zfp-and-derivatives

as compared to other compression methods.

Thanks for the pointer.

We already face loss-like noise in floating point results when dealing with
system differences either between current systems and software stacks or
over time as systems and software evolve.

Indeed.

We simply need to have a look at how much error we can tolerate.

Elvis

···

2016-10-28 20:24 GMT+02:00 Miller, Mark C. <miller86@llnl.gov>:

Mark

--
Mark C. Miller, LLNL

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Elvis
Stansvik <elvis.stansvik@orexplore.com>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Date: Friday, October 28, 2016 at 11:08 AM
To: "faltet@gmail.com" <faltet@gmail.com>
Cc: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] New HDF5 compression plugin

2016-10-28 18:14 GMT+02:00 Francesc Alted <faltet@gmail.com>:

2016-10-28 18:04 GMT+02:00 Elvis Stansvik <elvis.stansvik@orexplore.com>:

2016-10-28 17:53 GMT+02:00 Francesc Alted <faltet@gmail.com>:

2016-10-28 17:20 GMT+02:00 Elvis Stansvik
<elvis.stansvik@orexplore.com>:

2016-10-28 16:33 GMT+02:00 Francesc Alted <faltet@gmail.com>:
> 2016-10-28 13:59 GMT+02:00 Elvis Stansvik
> <elvis.stansvik@orexplore.com>:
>>
>> 2016-10-28 13:23 GMT+02:00 Peter Steinbach <steinbach@scionics.de>:
>> > I second this request big time and would add zstd, if we are
>> > already
>> > trying
>> > out various encoders.
>>
>> This may not be of interest, and does not include zstd, but I'm
>> attaching an excerpt from some of the results I got when back when
>> doing our basic benchmarking of some algorithms (all lossless).
>>
>> It was based on those that we settled on Blosc_LZ4HC at level 4,
>> since
>> we were looking for very fast decompression times, while longer
>> compression times and slightly larger file size was acceptable up to
>> certain points. The gzip results are included mostly because that's
>> what we were using at the time and I wanted them as a comparison,
>> but
>> we knew we wanted something else. The input for those benchmarks was
>> a
>> 500x300x300 float dataset containing a tomographic 3D image.
>
>
> Zstd was included in Blosc a while ago:
>
> http://blosc.org/blog/zstd-has-just-landed-in-blosc.html
>
> and its performance really shines, even on real data:
>
>
>
>
> Genotype compressor benchmark
>
> (although here, being only integers of 1 byte, only the BITSHUFFLE
> filter is
> used, but not the faster SHUFFLE).
>
> As Blosc offers the same API for a number of codecs, trying it in
> combination with Zstd should be really easy.

Zstd indeed looks very well-balanced. The reason I didn't include it
back when I did those benchmarks was that we were really focused on
decompression speed in our application, compression speed was very
much secondary. So I included mostly LZ4 codecs.

Yes, that makes sense, but I think you should give a try at least at the
lowest compression levels for Blosc+Zstd (1, 2 and probably 3 too). For
these low compression levels Blosc chooses a block size that comfortably
fits in L2. Also, note that the benchmarks above where for in-memory
data,
so for a typical disk-based workflow using HDF5, Blosc+Zstd can still
perform well enough.

Alright, thanks for the tip. I read the benchmarks too fast and didn't
realize it was all in-memory. I should definitely at Zstd.

In our use case it's always from disk (or well, SSD), and sometimes
even slow-ish network mounts.

Cool. Keep us informed. I am definitely interested.

I found the old input file and very quickly I ran the benchmark again
with Blosc_ZSTD with byte-based shuffling at compression levels 1, 2
and 3:

compressor,ctime_mean(s),ctime_std(s),rtime_mean(s),rtime_std(s),size(B)
blosc_zstd_1,0.73083,0.00104,0.29489,0.00338,116666294
blosc_zstd_2,1.40672,0.00164,0.28097,0.00220,114666454
blosc_zstd_3,1.48507,0.01872,0.26451,0.00208,113485801

Unfortunately I can't find the spreadsheet where I made those
diagrams, so can't make a new updated one (at least not easily right
now).

But this shows that Zstd is very competitive. It achieves slightly
better compression ratio than Blosc_LZ4HC at level 4 (the original
file size was 189378052 bytes), which is what we picked, and the
compression is much faster. But Blosc_LZ4HC still wins out in the
decompression time, so I think in the end we picked the right one.

Our use case is essentially compress once, decompress many many times.
And during the decompression the user will sit there and wait. That's
why decompression time was so important to us.

Anyway, thanks a for making me have a look at Zstd, we may yet use it
somewhere else.

And I now remember the real reason I didn't include it the first time
around: We're basing our product on Ubuntu 16.04, where Blosc 1.7 is
the packaged version (1.10 is where Zstd support was added), so I
lazily just skipped it

Elvis

Elvis

>
>>
>> I might try to dig up the script I used for the benchmark and see if
>> we still have the input I used, and do a test with lossy ZFP. It
>> could
>> be very interesting for creating 3D "thumbnails" in our application.
>
>
> It would be nice if your benchmark code (and dataset) can be made
> publicly
> available so as to serve to others as a good comparison.

The dataset is unfortunately confidential and not something I can
release. I'm attaching the script I used though, it's very simple.

But, a disclaimer: The benchmarks I did were not really thorough. They
were also internal and never really meant to be published. It was
mostly a quick and dirty test to see which of these LZ4 codecs would
be in the right ballpark for us.

Ok. Thanks anyway.

Elvis

>
>>
>>
>> Elvis
>>
>> >
>> > P
>> >
>> >
>> > On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>> >>
>> >> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <miller86@llnl.gov>:
>> >>>
>> >>> Hi All,
>> >>>
>> >>> Just wanted to mention a new HDF5 floating point compression
>> >>> plugin
>> >>> available on github...
>> >>>
>> >>> GitHub - LLNL/H5Z-ZFP: A registered ZFP compression plugin for HDF5
>> >>>
>> >>> This plugin will come embedded in the next release of the Silo
>> >>> library
>> >>> as
>> >>> well.
>> >>
>> >>
>> >> Thanks for the pointer. That's very interesting. I had not heard
>> >> about
>> >> ZFP before. The ability to set a bound on the error in the
>> >> lossless
>> >> case seems very useful.
>> >>
>> >> Do you know if there has been any comparative benchmarks of ZFP
>> >> against other compressors?
>> >>
>> >> After some basic benchmarking, we recently settled on Blosc_LZ4HC
>> >> at
>> >> level 4 for our datasets (3D float tomography data), but maybe it
>> >> would be worthwhile to look at ZFP as well..
>> >>
>> >> Best regards,
>> >> Elvis
>> >>
>> >>>
>> >>> --
>> >>> Mark C. Miller, LLNL
>> >>>
>> >>> _______________________________________________
>> >>> Hdf-forum is for HDF software users discussion.
>> >>> Hdf-forum@lists.hdfgroup.org
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >>> Twitter: x.com
>> >>
>> >>
>> >> _______________________________________________
>> >> Hdf-forum is for HDF software users discussion.
>> >> Hdf-forum@lists.hdfgroup.org
>> >>
>> >>
>> >>
>> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> Twitter: x.com
>> >>
>> >
>> > _______________________________________________
>> > Hdf-forum is for HDF software users discussion.
>> > Hdf-forum@lists.hdfgroup.org
>> >
>> >
>> >
>> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> > Twitter: x.com
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> Hdf-forum@lists.hdfgroup.org
>>
>>
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: x.com
>
>
>
>
> --
> Francesc Alted
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
>
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: x.com

--
Francesc Alted

--
Francesc Alted

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

Leigh_Orf · October 29, 2016, 11:47pm

This is something that I am very interested in trying out on Blue Waters,
where, with my current lossless compression (gzip) I would end up making
several PB of data over the next couple of years. There are certain
variables in my data that I would be happy to compress lossily. With the
compression ratios being reported this could be a gamechanger for me.

Is there a Fortran interface consistent with the other Fortranized routines
in HDF?

Leigh

···

On Fri, Oct 28, 2016 at 1:45 PM Elvis Stansvik <elvis.stansvik@orexplore.com> wrote:

2016-10-28 20:24 GMT+02:00 Miller, Mark C. <miller86@llnl.gov>:
> Can I just clarify some of this discussion...
>
> It reads like you are talking about compression ratios around 1.6x, less
> than 2:1. Is that correct?

Yes, in our case we only do lossless compression so far, but we have
been talking about lossy. Just haven't taken any steps yet, and I
didn't even know about ZFP from before. It looks very interesting.

>
> FYI..ZFP demonstrates results far beyond that (10-30x and better) at the
> expense of (some) loss.

Yes, ZFP is of course in a completely different ball game compression
ratio wise than the codecs I compared in my benchmark (which are all
lossless). It looks very impressive from reading the material on the
site and skimming the paper.

>
> However, current efforts indicate that losses are tolerable in many
> post-processing analysis workflows.

Right, we need to investigate, or rather I need to have a discussion
with our physicists on how much error we can tolerate (I'm not doing
any analysis myself, only visualization). Our data is single precision
float to begin with. For the visualization part I'm sure we could get
away with quite a bit of loss.

>
> We think the key to achieving good compression on floating point data,
going
> forward, is to allow for some well controlled loss.

Yes, and it seems that ZFP has several knobs for controlling that loss
which look really useful.

>
> See this page on on ZFP losses effect, for example, taking derivatives...
>
>
http://computation.llnl.gov/projects/floating-point-compression/zfp-and-derivatives
>
> as compared to other compression methods.

Thanks for the pointer.

>
> We already face loss-like noise in floating point results when dealing
with
> system differences either between current systems and software stacks or
> over time as systems and software evolve.

Indeed.

We simply need to have a look at how much error we can tolerate.

Elvis

>
> Mark
>
> --
> Mark C. Miller, LLNL
>
> From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of
Elvis
> Stansvik <elvis.stansvik@orexplore.com>
> Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
> Date: Friday, October 28, 2016 at 11:08 AM
> To: "faltet@gmail.com" <faltet@gmail.com>
> Cc: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
> Subject: Re: [Hdf-forum] New HDF5 compression plugin
>
> 2016-10-28 18:14 GMT+02:00 Francesc Alted <faltet@gmail.com>:
>
>
>
> 2016-10-28 18:04 GMT+02:00 Elvis Stansvik <elvis.stansvik@orexplore.com
>:
>
>
> 2016-10-28 17:53 GMT+02:00 Francesc Alted <faltet@gmail.com>:
>>
>>
>> 2016-10-28 17:20 GMT+02:00 Elvis Stansvik
>> <elvis.stansvik@orexplore.com>:
>>>
>>> 2016-10-28 16:33 GMT+02:00 Francesc Alted <faltet@gmail.com>:
>>> > 2016-10-28 13:59 GMT+02:00 Elvis Stansvik
>>> > <elvis.stansvik@orexplore.com>:
>>> >>
>>> >> 2016-10-28 13:23 GMT+02:00 Peter Steinbach <steinbach@scionics.de>:
>>> >> > I second this request big time and would add zstd, if we are
>>> >> > already
>>> >> > trying
>>> >> > out various encoders.
>>> >>
>>> >> This may not be of interest, and does not include zstd, but I'm
>>> >> attaching an excerpt from some of the results I got when back when
>>> >> doing our basic benchmarking of some algorithms (all lossless).
>>> >>
>>> >> It was based on those that we settled on Blosc_LZ4HC at level 4,
>>> >> since
>>> >> we were looking for very fast decompression times, while longer
>>> >> compression times and slightly larger file size was acceptable up to
>>> >> certain points. The gzip results are included mostly because that's
>>> >> what we were using at the time and I wanted them as a comparison,
>>> >> but
>>> >> we knew we wanted something else. The input for those benchmarks was
>>> >> a
>>> >> 500x300x300 float dataset containing a tomographic 3D image.
>>> >
>>> >
>>> > Zstd was included in Blosc a while ago:
>>> >
>>> > http://blosc.org/blog/zstd-has-just-landed-in-blosc.html
>>> >
>>> > and its performance really shines, even on real data:
>>> >
>>> >
>>> >
>>> >
>>> >
Genotype compressor benchmark
>>> >
>>> > (although here, being only integers of 1 byte, only the BITSHUFFLE
>>> > filter is
>>> > used, but not the faster SHUFFLE).
>>> >
>>> > As Blosc offers the same API for a number of codecs, trying it in
>>> > combination with Zstd should be really easy.
>>>
>>> Zstd indeed looks very well-balanced. The reason I didn't include it
>>> back when I did those benchmarks was that we were really focused on
>>> decompression speed in our application, compression speed was very
>>> much secondary. So I included mostly LZ4 codecs.
>>
>>
>> Yes, that makes sense, but I think you should give a try at least at the
>> lowest compression levels for Blosc+Zstd (1, 2 and probably 3 too). For
>> these low compression levels Blosc chooses a block size that comfortably
>> fits in L2. Also, note that the benchmarks above where for in-memory
>> data,
>> so for a typical disk-based workflow using HDF5, Blosc+Zstd can still
>> perform well enough.
>
> Alright, thanks for the tip. I read the benchmarks too fast and didn't
> realize it was all in-memory. I should definitely at Zstd.
>
> In our use case it's always from disk (or well, SSD), and sometimes
> even slow-ish network mounts.
>
>
>
> Cool. Keep us informed. I am definitely interested.
>
>
> I found the old input file and very quickly I ran the benchmark again
> with Blosc_ZSTD with byte-based shuffling at compression levels 1, 2
> and 3:
>
> compressor,ctime_mean(s),ctime_std(s),rtime_mean(s),rtime_std(s),size(B)
> blosc_zstd_1,0.73083,0.00104,0.29489,0.00338,116666294
> blosc_zstd_2,1.40672,0.00164,0.28097,0.00220,114666454
> blosc_zstd_3,1.48507,0.01872,0.26451,0.00208,113485801
>
> Unfortunately I can't find the spreadsheet where I made those
> diagrams, so can't make a new updated one (at least not easily right
> now).
>
> But this shows that Zstd is very competitive. It achieves slightly
> better compression ratio than Blosc_LZ4HC at level 4 (the original
> file size was 189378052 bytes), which is what we picked, and the
> compression is much faster. But Blosc_LZ4HC still wins out in the
> decompression time, so I think in the end we picked the right one.
>
> Our use case is essentially compress once, decompress many many times.
> And during the decompression the user will sit there and wait. That's
> why decompression time was so important to us.
>
> Anyway, thanks a for making me have a look at Zstd, we may yet use it
> somewhere else.
>
> And I now remember the real reason I didn't include it the first time
> around: We're basing our product on Ubuntu 16.04, where Blosc 1.7 is
> the packaged version (1.10 is where Zstd support was added), so I
> lazily just skipped it
>
> Elvis
>
>
>
>
> Elvis
>
>>
>>
>>>
>>>
>>> >
>>> >>
>>> >> I might try to dig up the script I used for the benchmark and see if
>>> >> we still have the input I used, and do a test with lossy ZFP. It
>>> >> could
>>> >> be very interesting for creating 3D "thumbnails" in our application.
>>> >
>>> >
>>> > It would be nice if your benchmark code (and dataset) can be made
>>> > publicly
>>> > available so as to serve to others as a good comparison.
>>>
>>> The dataset is unfortunately confidential and not something I can
>>> release. I'm attaching the script I used though, it's very simple.
>>>
>>> But, a disclaimer: The benchmarks I did were not really thorough. They
>>> were also internal and never really meant to be published. It was
>>> mostly a quick and dirty test to see which of these LZ4 codecs would
>>> be in the right ballpark for us.
>>
>>
>> Ok. Thanks anyway.
>>
>>>
>>>
>>> Elvis
>>>
>>> >
>>> >>
>>> >>
>>> >> Elvis
>>> >>
>>> >> >
>>> >> > P
>>> >> >
>>> >> >
>>> >> > On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>>> >> >>
>>> >> >> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <miller86@llnl.gov>:
>>> >> >>>
>>> >> >>> Hi All,
>>> >> >>>
>>> >> >>> Just wanted to mention a new HDF5 floating point compression
>>> >> >>> plugin
>>> >> >>> available on github...
>>> >> >>>
>>> >> >>> GitHub - LLNL/H5Z-ZFP: A registered ZFP compression plugin for HDF5
>>> >> >>>
>>> >> >>> This plugin will come embedded in the next release of the Silo
>>> >> >>> library
>>> >> >>> as
>>> >> >>> well.
>>> >> >>
>>> >> >>
>>> >> >> Thanks for the pointer. That's very interesting. I had not heard
>>> >> >> about
>>> >> >> ZFP before. The ability to set a bound on the error in the
>>> >> >> lossless
>>> >> >> case seems very useful.
>>> >> >>
>>> >> >> Do you know if there has been any comparative benchmarks of ZFP
>>> >> >> against other compressors?
>>> >> >>
>>> >> >> After some basic benchmarking, we recently settled on Blosc_LZ4HC
>>> >> >> at
>>> >> >> level 4 for our datasets (3D float tomography data), but maybe it
>>> >> >> would be worthwhile to look at ZFP as well..
>>> >> >>
>>> >> >> Best regards,
>>> >> >> Elvis
>>> >> >>
>>> >> >>>
>>> >> >>> --
>>> >> >>> Mark C. Miller, LLNL
>>> >> >>>
>>> >> >>> _______________________________________________
>>> >> >>> Hdf-forum is for HDF software users discussion.
>>> >> >>> Hdf-forum@lists.hdfgroup.org
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> >> >>> Twitter: x.com
>>> >> >>
>>> >> >>
>>> >> >> _______________________________________________
>>> >> >> Hdf-forum is for HDF software users discussion.
>>> >> >> Hdf-forum@lists.hdfgroup.org
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> >> >> Twitter: x.com
>>> >> >>
>>> >> >
>>> >> > _______________________________________________
>>> >> > Hdf-forum is for HDF software users discussion.
>>> >> > Hdf-forum@lists.hdfgroup.org
>>> >> >
>>> >> >
>>> >> >
>>> >> >
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> >> > Twitter: x.com
>>> >>
>>> >> _______________________________________________
>>> >> Hdf-forum is for HDF software users discussion.
>>> >> Hdf-forum@lists.hdfgroup.org
>>> >>
>>> >>
>>> >>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> >> Twitter: x.com
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Francesc Alted
>>> >
>>> > _______________________________________________
>>> > Hdf-forum is for HDF software users discussion.
>>> > Hdf-forum@lists.hdfgroup.org
>>> >
>>> >
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> > Twitter: x.com
>>
>>
>>
>>
>> --
>> Francesc Alted
>
>
>
>
>
> --
> Francesc Alted
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: x.com
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: x.com

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

miller86 · October 30, 2016, 5:35am

Hi Leigh,

Hmm. Fortran interface, eh? You mean to the HDF5 filter we've made available or to the ZFP compression library?

You mentioned "...Fortranized routines in HDF...", so I am assuming the HDF5 filter.

Well, short answer is at present, no we don't have those. But, very easy to add.

I've never used HDF5's fortran interface. Do you have your can you point me to example(s) that use filters already?

If so, we could probably come up with what you would need and test it pretty quickly.

Mark

···

--
Mark C. Miller, LLNL

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Leigh Orf <leigh.orf@gmail.com<mailto:leigh.orf@gmail.com>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Saturday, October 29, 2016 at 4:47 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: Re: [Hdf-forum] New HDF5 compression plugin

This is something that I am very interested in trying out on Blue Waters, where, with my current lossless compression (gzip) I would end up making several PB of data over the next couple of years. There are certain variables in my data that I would be happy to compress lossily. With the compression ratios being reported this could be a gamechanger for me.

Is there a Fortran interface consistent with the other Fortranized routines in HDF?

Leigh

On Fri, Oct 28, 2016 at 1:45 PM Elvis Stansvik <elvis.stansvik@orexplore.com<mailto:elvis.stansvik@orexplore.com>> wrote:
2016-10-28 20:24 GMT+02:00 Miller, Mark C. <miller86@llnl.gov<mailto:miller86@llnl.gov>>:

Can I just clarify some of this discussion...

It reads like you are talking about compression ratios around 1.6x, less
than 2:1. Is that correct?

Yes, in our case we only do lossless compression so far, but we have
been talking about lossy. Just haven't taken any steps yet, and I
didn't even know about ZFP from before. It looks very interesting.

FYI..ZFP demonstrates results far beyond that (10-30x and better) at the
expense of (some) loss.

Yes, ZFP is of course in a completely different ball game compression
ratio wise than the codecs I compared in my benchmark (which are all
lossless). It looks very impressive from reading the material on the
site and skimming the paper.

However, current efforts indicate that losses are tolerable in many
post-processing analysis workflows.

Right, we need to investigate, or rather I need to have a discussion
with our physicists on how much error we can tolerate (I'm not doing
any analysis myself, only visualization). Our data is single precision
float to begin with. For the visualization part I'm sure we could get
away with quite a bit of loss.

We think the key to achieving good compression on floating point data, going
forward, is to allow for some well controlled loss.

Yes, and it seems that ZFP has several knobs for controlling that loss
which look really useful.

See this page on on ZFP losses effect, for example, taking derivatives...

http://computation.llnl.gov/projects/floating-point-compression/zfp-and-derivatives

as compared to other compression methods.

Thanks for the pointer.

We already face loss-like noise in floating point results when dealing with
system differences either between current systems and software stacks or
over time as systems and software evolve.

Indeed.

We simply need to have a look at how much error we can tolerate.

Elvis

Mark

--
Mark C. Miller, LLNL

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Elvis
Stansvik <elvis.stansvik@orexplore.com<mailto:elvis.stansvik@orexplore.com>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Friday, October 28, 2016 at 11:08 AM
To: "faltet@gmail.com<mailto:faltet@gmail.com>" <faltet@gmail.com<mailto:faltet@gmail.com>>
Cc: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: Re: [Hdf-forum] New HDF5 compression plugin

2016-10-28 18:14 GMT+02:00 Francesc Alted <faltet@gmail.com<mailto:faltet@gmail.com>>:

2016-10-28 18:04 GMT+02:00 Elvis Stansvik <elvis.stansvik@orexplore.com<mailto:elvis.stansvik@orexplore.com>>:

2016-10-28 17:53 GMT+02:00 Francesc Alted <faltet@gmail.com<mailto:faltet@gmail.com>>:

2016-10-28 17:20 GMT+02:00 Elvis Stansvik
<elvis.stansvik@orexplore.com<mailto:elvis.stansvik@orexplore.com>>:

2016-10-28 16:33 GMT+02:00 Francesc Alted <faltet@gmail.com<mailto:faltet@gmail.com>>:
> 2016-10-28 13:59 GMT+02:00 Elvis Stansvik
> <elvis.stansvik@orexplore.com<mailto:elvis.stansvik@orexplore.com>>:
>>
>> 2016-10-28 13:23 GMT+02:00 Peter Steinbach <steinbach@scionics.de<mailto:steinbach@scionics.de>>:
>> > I second this request big time and would add zstd, if we are
>> > already
>> > trying
>> > out various encoders.
>>
>> This may not be of interest, and does not include zstd, but I'm
>> attaching an excerpt from some of the results I got when back when
>> doing our basic benchmarking of some algorithms (all lossless).
>>
>> It was based on those that we settled on Blosc_LZ4HC at level 4,
>> since
>> we were looking for very fast decompression times, while longer
>> compression times and slightly larger file size was acceptable up to
>> certain points. The gzip results are included mostly because that's
>> what we were using at the time and I wanted them as a comparison,
>> but
>> we knew we wanted something else. The input for those benchmarks was
>> a
>> 500x300x300 float dataset containing a tomographic 3D image.
>
>
> Zstd was included in Blosc a while ago:
>
> http://blosc.org/blog/zstd-has-just-landed-in-blosc.html
>
> and its performance really shines, even on real data:
>
>
>
>
> Genotype compressor benchmark
>
> (although here, being only integers of 1 byte, only the BITSHUFFLE
> filter is
> used, but not the faster SHUFFLE).
>
> As Blosc offers the same API for a number of codecs, trying it in
> combination with Zstd should be really easy.

Zstd indeed looks very well-balanced. The reason I didn't include it
back when I did those benchmarks was that we were really focused on
decompression speed in our application, compression speed was very
much secondary. So I included mostly LZ4 codecs.

Yes, that makes sense, but I think you should give a try at least at the
lowest compression levels for Blosc+Zstd (1, 2 and probably 3 too). For
these low compression levels Blosc chooses a block size that comfortably
fits in L2. Also, note that the benchmarks above where for in-memory
data,
so for a typical disk-based workflow using HDF5, Blosc+Zstd can still
perform well enough.

Alright, thanks for the tip. I read the benchmarks too fast and didn't
realize it was all in-memory. I should definitely at Zstd.

In our use case it's always from disk (or well, SSD), and sometimes
even slow-ish network mounts.

Cool. Keep us informed. I am definitely interested.

I found the old input file and very quickly I ran the benchmark again
with Blosc_ZSTD with byte-based shuffling at compression levels 1, 2
and 3:

compressor,ctime_mean(s),ctime_std(s),rtime_mean(s),rtime_std(s),size(B)
blosc_zstd_1,0.73083,0.00104,0.29489,0.00338,116666294
blosc_zstd_2,1.40672,0.00164,0.28097,0.00220,114666454
blosc_zstd_3,1.48507,0.01872,0.26451,0.00208,113485801

Unfortunately I can't find the spreadsheet where I made those
diagrams, so can't make a new updated one (at least not easily right
now).

But this shows that Zstd is very competitive. It achieves slightly
better compression ratio than Blosc_LZ4HC at level 4 (the original
file size was 189378052 bytes), which is what we picked, and the
compression is much faster. But Blosc_LZ4HC still wins out in the
decompression time, so I think in the end we picked the right one.

Our use case is essentially compress once, decompress many many times.
And during the decompression the user will sit there and wait. That's
why decompression time was so important to us.

Anyway, thanks a for making me have a look at Zstd, we may yet use it
somewhere else.

And I now remember the real reason I didn't include it the first time
around: We're basing our product on Ubuntu 16.04, where Blosc 1.7 is
the packaged version (1.10 is where Zstd support was added), so I
lazily just skipped it

Elvis

Elvis

>
>>
>> I might try to dig up the script I used for the benchmark and see if
>> we still have the input I used, and do a test with lossy ZFP. It
>> could
>> be very interesting for creating 3D "thumbnails" in our application.
>
>
> It would be nice if your benchmark code (and dataset) can be made
> publicly
> available so as to serve to others as a good comparison.

The dataset is unfortunately confidential and not something I can
release. I'm attaching the script I used though, it's very simple.

But, a disclaimer: The benchmarks I did were not really thorough. They
were also internal and never really meant to be published. It was
mostly a quick and dirty test to see which of these LZ4 codecs would
be in the right ballpark for us.

Ok. Thanks anyway.

Elvis

>
>>
>>
>> Elvis
>>
>> >
>> > P
>> >
>> >
>> > On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>> >>
>> >> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <miller86@llnl.gov<mailto:miller86@llnl.gov>>:
>> >>>
>> >>> Hi All,
>> >>>
>> >>> Just wanted to mention a new HDF5 floating point compression
>> >>> plugin
>> >>> available on github...
>> >>>
>> >>> GitHub - LLNL/H5Z-ZFP: A registered ZFP compression plugin for HDF5
>> >>>
>> >>> This plugin will come embedded in the next release of the Silo
>> >>> library
>> >>> as
>> >>> well.
>> >>
>> >>
>> >> Thanks for the pointer. That's very interesting. I had not heard
>> >> about
>> >> ZFP before. The ability to set a bound on the error in the
>> >> lossless
>> >> case seems very useful.
>> >>
>> >> Do you know if there has been any comparative benchmarks of ZFP
>> >> against other compressors?
>> >>
>> >> After some basic benchmarking, we recently settled on Blosc_LZ4HC
>> >> at
>> >> level 4 for our datasets (3D float tomography data), but maybe it
>> >> would be worthwhile to look at ZFP as well..
>> >>
>> >> Best regards,
>> >> Elvis
>> >>
>> >>>
>> >>> --
>> >>> Mark C. Miller, LLNL
>> >>>
>> >>> _______________________________________________
>> >>> Hdf-forum is for HDF software users discussion.
>> >>> Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >>> Twitter: x.com
>> >>
>> >>
>> >> _______________________________________________
>> >> Hdf-forum is for HDF software users discussion.
>> >> Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
>> >>
>> >>
>> >>
>> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> Twitter: x.com
>> >>
>> >
>> > _______________________________________________
>> > Hdf-forum is for HDF software users discussion.
>> > Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
>> >
>> >
>> >
>> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> > Twitter: x.com
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
>>
>>
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: x.com
>
>
>
>
> --
> Francesc Alted
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
>
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: x.com

--
Francesc Alted

--
Francesc Alted

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

Leigh_Orf · October 30, 2016, 2:05pm

Mark,

Yes indeed, the model I use is written in Fortran 95 so it would be
convenient to be able to activate the ZFP filter like any of HDF5's
existing filters. I looked at the Fortran interface code and it does look
pretty easy to add Fortran wrappers.

In the past I have played with scale/offset and N-bit filters, but ended up
just going with lossless gzip for my current work. They all have Fortran
hooks. Here's a few lines from my code that choose gzip with compression
factor 1 (the fastest) after turning on chunking:

call h5pcreate_f(H5P_DATASET_CREATE_F,chunk_id,ierr);check_err(ierr)
call h5pset_chunk_f(chunk_id,rank,chunkdims,ierr);check_err(ierr)
call h5pset_deflate_f (chunk_id,1,ierr);check_err(ierr) !We have chosen
gzip level 1
call
h5dcreate_f(f_id,trim(varname),H5T_NATIVE_REAL,dspace_id,dset_id,ierr,chunk_id);check_err(ierr)
call h5dwrite_f(dset_id, H5T_NATIVE_REAL, MCM3d, dims, ierr);check_err(ierr)

The Fortran interface for HDF5 is typically very similar to the C
interface, with an extra argument for a return value (ierr above), and with
a trailing _f on the subroutine name. And flags/id's are all integers with
the Fortran interface. Looking at H5Pff.f90 in fortran/src which contains
the h5pset_deflate_f code:

! Fortran90 Interface:
  SUBROUTINE h5pset_deflate_f(prp_id, level, hdferr)
    IMPLICIT NONE
    INTEGER(HID_T), INTENT(IN) :: prp_id ! Property list identifier
    INTEGER, INTENT(IN) :: level ! Compression level
    INTEGER, INTENT(OUT) :: hdferr ! Error code
                                         ! 0 on success and -1 on failure

hdferr = h5pset_deflate_c(prp_id, level)

END SUBROUTINE h5pset_deflate_f

(I took out the interface block required by Windows).

And the C wrapper code is found in src/fortran/H5Pf.c:

int_f
h5pset_deflate_c ( hid_t_f *prp_id , int_f *level)
/******/
{
  int ret_value = 0;
  hid_t c_prp_id;
  unsigned c_level;
  herr_t status;

  c_prp_id = (hid_t)*prp_id;
  c_level = (unsigned)*level;
  status = H5Pset_deflate(c_prp_id, c_level);
  if ( status < 0 ) ret_value = -1;
  return ret_value;
}

I think it would be pretty straightforward to just emulate these types of
calls for setting compression parameters for ZFP in the Fortran interface.

If this filter is all it's cracked up to be (and I have no reason to assume
it isn't based upon the paper that describes it), it would definitely have
a big impact on the data footprint of my output (and that of many others
that choose to use it). I say this as someone who is currently waiting for
about 100 TB of archived data to make its way from tape storage to scratch
on Blue Waters, which will take about a week. With ZFP compression this
could be done overnight.

I'm fairly certain folks on machines like Blue Waters would be very
interested in ZFP being part of HDF5. I would love to see how it performs
at scale.

I have a conference to prepare for this week but I can work on this myself
the following week.

Leigh

···

On Sun, Oct 30, 2016 at 12:36 AM Miller, Mark C. <miller86@llnl.gov> wrote:

Hi Leigh,

Hmm. Fortran interface, eh? You mean to the HDF5 filter we've made
available or to the ZFP compression library?

You mentioned "...Fortranized routines in HDF...", so I am assuming the
HDF5 filter.

Well, short answer is at present, no we don't have those. But, very easy
to add.

I've never used HDF5's fortran interface. Do you have your can you point
me to example(s) that use filters already?

If so, we could probably come up with what you would need and test it
pretty quickly.

Mark

--
Mark C. Miller, LLNL

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Leigh
Orf <leigh.orf@gmail.com>

Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Date: Saturday, October 29, 2016 at 4:47 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>

Subject: Re: [Hdf-forum] New HDF5 compression plugin

This is something that I am very interested in trying out on Blue Waters,
where, with my current lossless compression (gzip) I would end up making
several PB of data over the next couple of years. There are certain
variables in my data that I would be happy to compress lossily. With the
compression ratios being reported this could be a gamechanger for me.

Is there a Fortran interface consistent with the other Fortranized
routines in HDF?

Leigh

On Fri, Oct 28, 2016 at 1:45 PM Elvis Stansvik < > elvis.stansvik@orexplore.com> wrote:

2016-10-28 20:24 GMT+02:00 Miller, Mark C. <miller86@llnl.gov>:
> Can I just clarify some of this discussion...
>
> It reads like you are talking about compression ratios around 1.6x, less
> than 2:1. Is that correct?

Yes, in our case we only do lossless compression so far, but we have
been talking about lossy. Just haven't taken any steps yet, and I
didn't even know about ZFP from before. It looks very interesting.

>
> FYI..ZFP demonstrates results far beyond that (10-30x and better) at the
> expense of (some) loss.

Yes, ZFP is of course in a completely different ball game compression
ratio wise than the codecs I compared in my benchmark (which are all
lossless). It looks very impressive from reading the material on the
site and skimming the paper.

>
> However, current efforts indicate that losses are tolerable in many
> post-processing analysis workflows.

Right, we need to investigate, or rather I need to have a discussion
with our physicists on how much error we can tolerate (I'm not doing
any analysis myself, only visualization). Our data is single precision
float to begin with. For the visualization part I'm sure we could get
away with quite a bit of loss.

>
> We think the key to achieving good compression on floating point data,
going
> forward, is to allow for some well controlled loss.

Yes, and it seems that ZFP has several knobs for controlling that loss
which look really useful.

>
> See this page on on ZFP losses effect, for example, taking derivatives...
>
>
http://computation.llnl.gov/projects/floating-point-compression/zfp-and-derivatives
>
> as compared to other compression methods.

Thanks for the pointer.

>
> We already face loss-like noise in floating point results when dealing
with
> system differences either between current systems and software stacks or
> over time as systems and software evolve.

Indeed.

We simply need to have a look at how much error we can tolerate.

Elvis

>
> Mark
>
> --
> Mark C. Miller, LLNL
>
> From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of
Elvis
> Stansvik <elvis.stansvik@orexplore.com>
> Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
> Date: Friday, October 28, 2016 at 11:08 AM
> To: "faltet@gmail.com" <faltet@gmail.com>
> Cc: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
> Subject: Re: [Hdf-forum] New HDF5 compression plugin
>
> 2016-10-28 18:14 GMT+02:00 Francesc Alted <faltet@gmail.com>:
>
>
>
> 2016-10-28 18:04 GMT+02:00 Elvis Stansvik <elvis.stansvik@orexplore.com
>:
>
>
> 2016-10-28 17:53 GMT+02:00 Francesc Alted <faltet@gmail.com>:
>>
>>
>> 2016-10-28 17:20 GMT+02:00 Elvis Stansvik
>> <elvis.stansvik@orexplore.com>:
>>>
>>> 2016-10-28 16:33 GMT+02:00 Francesc Alted <faltet@gmail.com>:
>>> > 2016-10-28 13:59 GMT+02:00 Elvis Stansvik
>>> > <elvis.stansvik@orexplore.com>:
>>> >>
>>> >> 2016-10-28 13:23 GMT+02:00 Peter Steinbach <steinbach@scionics.de>:
>>> >> > I second this request big time and would add zstd, if we are
>>> >> > already
>>> >> > trying
>>> >> > out various encoders.
>>> >>
>>> >> This may not be of interest, and does not include zstd, but I'm
>>> >> attaching an excerpt from some of the results I got when back when
>>> >> doing our basic benchmarking of some algorithms (all lossless).
>>> >>
>>> >> It was based on those that we settled on Blosc_LZ4HC at level 4,
>>> >> since
>>> >> we were looking for very fast decompression times, while longer
>>> >> compression times and slightly larger file size was acceptable up to
>>> >> certain points. The gzip results are included mostly because that's
>>> >> what we were using at the time and I wanted them as a comparison,
>>> >> but
>>> >> we knew we wanted something else. The input for those benchmarks was
>>> >> a
>>> >> 500x300x300 float dataset containing a tomographic 3D image.
>>> >
>>> >
>>> > Zstd was included in Blosc a while ago:
>>> >
>>> > http://blosc.org/blog/zstd-has-just-landed-in-blosc.html
>>> >
>>> > and its performance really shines, even on real data:
>>> >
>>> >
>>> >
>>> >
>>> >
Genotype compressor benchmark
>>> >
>>> > (although here, being only integers of 1 byte, only the BITSHUFFLE
>>> > filter is
>>> > used, but not the faster SHUFFLE).
>>> >
>>> > As Blosc offers the same API for a number of codecs, trying it in
>>> > combination with Zstd should be really easy.
>>>
>>> Zstd indeed looks very well-balanced. The reason I didn't include it
>>> back when I did those benchmarks was that we were really focused on
>>> decompression speed in our application, compression speed was very
>>> much secondary. So I included mostly LZ4 codecs.
>>
>>
>> Yes, that makes sense, but I think you should give a try at least at the
>> lowest compression levels for Blosc+Zstd (1, 2 and probably 3 too). For
>> these low compression levels Blosc chooses a block size that comfortably
>> fits in L2. Also, note that the benchmarks above where for in-memory
>> data,
>> so for a typical disk-based workflow using HDF5, Blosc+Zstd can still
>> perform well enough.
>
> Alright, thanks for the tip. I read the benchmarks too fast and didn't
> realize it was all in-memory. I should definitely at Zstd.
>
> In our use case it's always from disk (or well, SSD), and sometimes
> even slow-ish network mounts.
>
>
>
> Cool. Keep us informed. I am definitely interested.
>
>
> I found the old input file and very quickly I ran the benchmark again
> with Blosc_ZSTD with byte-based shuffling at compression levels 1, 2
> and 3:
>
> compressor,ctime_mean(s),ctime_std(s),rtime_mean(s),rtime_std(s),size(B)
> blosc_zstd_1,0.73083,0.00104,0.29489,0.00338,116666294
> blosc_zstd_2,1.40672,0.00164,0.28097,0.00220,114666454
> blosc_zstd_3,1.48507,0.01872,0.26451,0.00208,113485801
>
> Unfortunately I can't find the spreadsheet where I made those
> diagrams, so can't make a new updated one (at least not easily right
> now).
>
> But this shows that Zstd is very competitive. It achieves slightly
> better compression ratio than Blosc_LZ4HC at level 4 (the original
> file size was 189378052 bytes), which is what we picked, and the
> compression is much faster. But Blosc_LZ4HC still wins out in the
> decompression time, so I think in the end we picked the right one.
>
> Our use case is essentially compress once, decompress many many times.
> And during the decompression the user will sit there and wait. That's
> why decompression time was so important to us.
>
> Anyway, thanks a for making me have a look at Zstd, we may yet use it
> somewhere else.
>
> And I now remember the real reason I didn't include it the first time
> around: We're basing our product on Ubuntu 16.04, where Blosc 1.7 is
> the packaged version (1.10 is where Zstd support was added), so I
> lazily just skipped it
>
> Elvis
>
>
>
>
> Elvis
>
>>
>>
>>>
>>>
>>> >
>>> >>
>>> >> I might try to dig up the script I used for the benchmark and see if
>>> >> we still have the input I used, and do a test with lossy ZFP. It
>>> >> could
>>> >> be very interesting for creating 3D "thumbnails" in our application.
>>> >
>>> >
>>> > It would be nice if your benchmark code (and dataset) can be made
>>> > publicly
>>> > available so as to serve to others as a good comparison.
>>>
>>> The dataset is unfortunately confidential and not something I can
>>> release. I'm attaching the script I used though, it's very simple.
>>>
>>> But, a disclaimer: The benchmarks I did were not really thorough. They
>>> were also internal and never really meant to be published. It was
>>> mostly a quick and dirty test to see which of these LZ4 codecs would
>>> be in the right ballpark for us.
>>
>>
>> Ok. Thanks anyway.
>>
>>>
>>>
>>> Elvis
>>>
>>> >
>>> >>
>>> >>
>>> >> Elvis
>>> >>
>>> >> >
>>> >> > P
>>> >> >
>>> >> >
>>> >> > On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>>> >> >>
>>> >> >> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <miller86@llnl.gov>:
>>> >> >>>
>>> >> >>> Hi All,
>>> >> >>>
>>> >> >>> Just wanted to mention a new HDF5 floating point compression
>>> >> >>> plugin
>>> >> >>> available on github...
>>> >> >>>
>>> >> >>> GitHub - LLNL/H5Z-ZFP: A registered ZFP compression plugin for HDF5
>>> >> >>>
>>> >> >>> This plugin will come embedded in the next release of the Silo
>>> >> >>> library
>>> >> >>> as
>>> >> >>> well.
>>> >> >>
>>> >> >>
>>> >> >> Thanks for the pointer. That's very interesting. I had not heard
>>> >> >> about
>>> >> >> ZFP before. The ability to set a bound on the error in the
>>> >> >> lossless
>>> >> >> case seems very useful.
>>> >> >>
>>> >> >> Do you know if there has been any comparative benchmarks of ZFP
>>> >> >> against other compressors?
>>> >> >>
>>> >> >> After some basic benchmarking, we recently settled on Blosc_LZ4HC
>>> >> >> at
>>> >> >> level 4 for our datasets (3D float tomography data), but maybe it
>>> >> >> would be worthwhile to look at ZFP as well..
>>> >> >>
>>> >> >> Best regards,
>>> >> >> Elvis
>>> >> >>
>>> >> >>>
>>> >> >>> --
>>> >> >>> Mark C. Miller, LLNL
>>> >> >>>
>>> >> >>> _______________________________________________
>>> >> >>> Hdf-forum is for HDF software users discussion.
>>> >> >>> Hdf-forum@lists.hdfgroup.org
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> >> >>> Twitter: x.com
>>> >> >>
>>> >> >>
>>> >> >> _______________________________________________
>>> >> >> Hdf-forum is for HDF software users discussion.
>>> >> >> Hdf-forum@lists.hdfgroup.org
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> >> >> Twitter: x.com
>>> >> >>
>>> >> >
>>> >> > _______________________________________________
>>> >> > Hdf-forum is for HDF software users discussion.
>>> >> > Hdf-forum@lists.hdfgroup.org
>>> >> >
>>> >> >
>>> >> >
>>> >> >
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> >> > Twitter: x.com
>>> >>
>>> >> _______________________________________________
>>> >> Hdf-forum is for HDF software users discussion.
>>> >> Hdf-forum@lists.hdfgroup.org
>>> >>
>>> >>
>>> >>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> >> Twitter: x.com
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Francesc Alted
>>> >
>>> > _______________________________________________
>>> > Hdf-forum is for HDF software users discussion.
>>> > Hdf-forum@lists.hdfgroup.org
>>> >
>>> >
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> > Twitter: x.com
>>
>>
>>
>>
>> --
>> Francesc Alted
>
>
>
>
>
> --
> Francesc Alted
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: x.com
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: x.com

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

New HDF5 compression plugin