Hi Leigh,
Hmm. Fortran interface, eh? You mean to the HDF5 filter we've made
available or to the ZFP compression library?
You mentioned "...Fortranized routines in HDF...", so I am assuming the
HDF5 filter.
Well, short answer is at present, no we don't have those. But, very easy
to add.
I've never used HDF5's fortran interface. Do you have your can you point
me to example(s) that use filters already?
If so, we could probably come up with what you would need and test it
pretty quickly.
Mark
--
Mark C. Miller, LLNL
From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Leigh
Orf <leigh.orf@gmail.com>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Date: Saturday, October 29, 2016 at 4:47 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] New HDF5 compression plugin
This is something that I am very interested in trying out on Blue Waters,
where, with my current lossless compression (gzip) I would end up making
several PB of data over the next couple of years. There are certain
variables in my data that I would be happy to compress lossily. With the
compression ratios being reported this could be a gamechanger for me.
Is there a Fortran interface consistent with the other Fortranized
routines in HDF?
Leigh
On Fri, Oct 28, 2016 at 1:45 PM Elvis Stansvik < > elvis.stansvik@orexplore.com> wrote:
2016-10-28 20:24 GMT+02:00 Miller, Mark C. <miller86@llnl.gov>:
> Can I just clarify some of this discussion...
>
> It reads like you are talking about compression ratios around 1.6x, less
> than 2:1. Is that correct?
Yes, in our case we only do lossless compression so far, but we have
been talking about lossy. Just haven't taken any steps yet, and I
didn't even know about ZFP from before. It looks very interesting.
>
> FYI..ZFP demonstrates results far beyond that (10-30x and better) at the
> expense of (some) loss.
Yes, ZFP is of course in a completely different ball game compression
ratio wise than the codecs I compared in my benchmark (which are all
lossless). It looks very impressive from reading the material on the
site and skimming the paper.
>
> However, current efforts indicate that losses are tolerable in many
> post-processing analysis workflows.
Right, we need to investigate, or rather I need to have a discussion
with our physicists on how much error we can tolerate (I'm not doing
any analysis myself, only visualization). Our data is single precision
float to begin with. For the visualization part I'm sure we could get
away with quite a bit of loss.
>
> We think the key to achieving good compression on floating point data,
going
> forward, is to allow for some well controlled loss.
Yes, and it seems that ZFP has several knobs for controlling that loss
which look really useful.
>
> See this page on on ZFP losses effect, for example, taking derivatives...
>
>
http://computation.llnl.gov/projects/floating-point-compression/zfp-and-derivatives
>
> as compared to other compression methods.
Thanks for the pointer.
>
> We already face loss-like noise in floating point results when dealing
with
> system differences either between current systems and software stacks or
> over time as systems and software evolve.
Indeed.
We simply need to have a look at how much error we can tolerate.
Elvis
>
> Mark
>
> --
> Mark C. Miller, LLNL
>
> From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of
Elvis
> Stansvik <elvis.stansvik@orexplore.com>
> Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
> Date: Friday, October 28, 2016 at 11:08 AM
> To: "faltet@gmail.com" <faltet@gmail.com>
> Cc: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
> Subject: Re: [Hdf-forum] New HDF5 compression plugin
>
> 2016-10-28 18:14 GMT+02:00 Francesc Alted <faltet@gmail.com>:
>
>
>
> 2016-10-28 18:04 GMT+02:00 Elvis Stansvik <elvis.stansvik@orexplore.com
>:
>
>
> 2016-10-28 17:53 GMT+02:00 Francesc Alted <faltet@gmail.com>:
>>
>>
>> 2016-10-28 17:20 GMT+02:00 Elvis Stansvik
>> <elvis.stansvik@orexplore.com>:
>>>
>>> 2016-10-28 16:33 GMT+02:00 Francesc Alted <faltet@gmail.com>:
>>> > 2016-10-28 13:59 GMT+02:00 Elvis Stansvik
>>> > <elvis.stansvik@orexplore.com>:
>>> >>
>>> >> 2016-10-28 13:23 GMT+02:00 Peter Steinbach <steinbach@scionics.de>:
>>> >> > I second this request big time and would add zstd, if we are
>>> >> > already
>>> >> > trying
>>> >> > out various encoders.
>>> >>
>>> >> This may not be of interest, and does not include zstd, but I'm
>>> >> attaching an excerpt from some of the results I got when back when
>>> >> doing our basic benchmarking of some algorithms (all lossless).
>>> >>
>>> >> It was based on those that we settled on Blosc_LZ4HC at level 4,
>>> >> since
>>> >> we were looking for very fast decompression times, while longer
>>> >> compression times and slightly larger file size was acceptable up to
>>> >> certain points. The gzip results are included mostly because that's
>>> >> what we were using at the time and I wanted them as a comparison,
>>> >> but
>>> >> we knew we wanted something else. The input for those benchmarks was
>>> >> a
>>> >> 500x300x300 float dataset containing a tomographic 3D image.
>>> >
>>> >
>>> > Zstd was included in Blosc a while ago:
>>> >
>>> > http://blosc.org/blog/zstd-has-just-landed-in-blosc.html
>>> >
>>> > and its performance really shines, even on real data:
>>> >
>>> >
>>> >
>>> >
>>> >
http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html
>>> >
>>> > (although here, being only integers of 1 byte, only the BITSHUFFLE
>>> > filter is
>>> > used, but not the faster SHUFFLE).
>>> >
>>> > As Blosc offers the same API for a number of codecs, trying it in
>>> > combination with Zstd should be really easy.
>>>
>>> Zstd indeed looks very well-balanced. The reason I didn't include it
>>> back when I did those benchmarks was that we were really focused on
>>> decompression speed in our application, compression speed was very
>>> much secondary. So I included mostly LZ4 codecs.
>>
>>
>> Yes, that makes sense, but I think you should give a try at least at the
>> lowest compression levels for Blosc+Zstd (1, 2 and probably 3 too). For
>> these low compression levels Blosc chooses a block size that comfortably
>> fits in L2. Also, note that the benchmarks above where for in-memory
>> data,
>> so for a typical disk-based workflow using HDF5, Blosc+Zstd can still
>> perform well enough.
>
> Alright, thanks for the tip. I read the benchmarks too fast and didn't
> realize it was all in-memory. I should definitely at Zstd.
>
> In our use case it's always from disk (or well, SSD), and sometimes
> even slow-ish network mounts.
>
>
>
> Cool. Keep us informed. I am definitely interested.
>
>
> I found the old input file and very quickly I ran the benchmark again
> with Blosc_ZSTD with byte-based shuffling at compression levels 1, 2
> and 3:
>
> compressor,ctime_mean(s),ctime_std(s),rtime_mean(s),rtime_std(s),size(B)
> blosc_zstd_1,0.73083,0.00104,0.29489,0.00338,116666294
> blosc_zstd_2,1.40672,0.00164,0.28097,0.00220,114666454
> blosc_zstd_3,1.48507,0.01872,0.26451,0.00208,113485801
>
> Unfortunately I can't find the spreadsheet where I made those
> diagrams, so can't make a new updated one (at least not easily right
> now).
>
> But this shows that Zstd is very competitive. It achieves slightly
> better compression ratio than Blosc_LZ4HC at level 4 (the original
> file size was 189378052 bytes), which is what we picked, and the
> compression is much faster. But Blosc_LZ4HC still wins out in the
> decompression time, so I think in the end we picked the right one.
>
> Our use case is essentially compress once, decompress many many times.
> And during the decompression the user will sit there and wait. That's
> why decompression time was so important to us.
>
> Anyway, thanks a for making me have a look at Zstd, we may yet use it
> somewhere else.
>
> And I now remember the real reason I didn't include it the first time
> around: We're basing our product on Ubuntu 16.04, where Blosc 1.7 is
> the packaged version (1.10 is where Zstd support was added), so I
> lazily just skipped it
>
> Elvis
>
>
>
>
> Elvis
>
>>
>>
>>>
>>>
>>> >
>>> >>
>>> >> I might try to dig up the script I used for the benchmark and see if
>>> >> we still have the input I used, and do a test with lossy ZFP. It
>>> >> could
>>> >> be very interesting for creating 3D "thumbnails" in our application.
>>> >
>>> >
>>> > It would be nice if your benchmark code (and dataset) can be made
>>> > publicly
>>> > available so as to serve to others as a good comparison.
>>>
>>> The dataset is unfortunately confidential and not something I can
>>> release. I'm attaching the script I used though, it's very simple.
>>>
>>> But, a disclaimer: The benchmarks I did were not really thorough. They
>>> were also internal and never really meant to be published. It was
>>> mostly a quick and dirty test to see which of these LZ4 codecs would
>>> be in the right ballpark for us.
>>
>>
>> Ok. Thanks anyway.
>>
>>>
>>>
>>> Elvis
>>>
>>> >
>>> >>
>>> >>
>>> >> Elvis
>>> >>
>>> >> >
>>> >> > P
>>> >> >
>>> >> >
>>> >> > On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>>> >> >>
>>> >> >> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <miller86@llnl.gov>:
>>> >> >>>
>>> >> >>> Hi All,
>>> >> >>>
>>> >> >>> Just wanted to mention a new HDF5 floating point compression
>>> >> >>> plugin
>>> >> >>> available on github...
>>> >> >>>
>>> >> >>> https://github.com/LLNL/H5Z-ZFP
>>> >> >>>
>>> >> >>> This plugin will come embedded in the next release of the Silo
>>> >> >>> library
>>> >> >>> as
>>> >> >>> well.
>>> >> >>
>>> >> >>
>>> >> >> Thanks for the pointer. That's very interesting. I had not heard
>>> >> >> about
>>> >> >> ZFP before. The ability to set a bound on the error in the
>>> >> >> lossless
>>> >> >> case seems very useful.
>>> >> >>
>>> >> >> Do you know if there has been any comparative benchmarks of ZFP
>>> >> >> against other compressors?
>>> >> >>
>>> >> >> After some basic benchmarking, we recently settled on Blosc_LZ4HC
>>> >> >> at
>>> >> >> level 4 for our datasets (3D float tomography data), but maybe it
>>> >> >> would be worthwhile to look at ZFP as well..
>>> >> >>
>>> >> >> Best regards,
>>> >> >> Elvis
>>> >> >>
>>> >> >>>
>>> >> >>> --
>>> >> >>> Mark C. Miller, LLNL
>>> >> >>>
>>> >> >>> _______________________________________________
>>> >> >>> Hdf-forum is for HDF software users discussion.
>>> >> >>> Hdf-forum@lists.hdfgroup.org
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> >> >>> Twitter: https://twitter.com/hdf5
>>> >> >>
>>> >> >>
>>> >> >> _______________________________________________
>>> >> >> Hdf-forum is for HDF software users discussion.
>>> >> >> Hdf-forum@lists.hdfgroup.org
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> >> >> Twitter: https://twitter.com/hdf5
>>> >> >>
>>> >> >
>>> >> > _______________________________________________
>>> >> > Hdf-forum is for HDF software users discussion.
>>> >> > Hdf-forum@lists.hdfgroup.org
>>> >> >
>>> >> >
>>> >> >
>>> >> >
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> >> > Twitter: https://twitter.com/hdf5
>>> >>
>>> >> _______________________________________________
>>> >> Hdf-forum is for HDF software users discussion.
>>> >> Hdf-forum@lists.hdfgroup.org
>>> >>
>>> >>
>>> >>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> >> Twitter: https://twitter.com/hdf5
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Francesc Alted
>>> >
>>> > _______________________________________________
>>> > Hdf-forum is for HDF software users discussion.
>>> > Hdf-forum@lists.hdfgroup.org
>>> >
>>> >
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> > Twitter: https://twitter.com/hdf5
>>
>>
>>
>>
>> --
>> Francesc Alted
>
>
>
>
>
> --
> Francesc Alted
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5