New compression plugin based on Snappy-CUDA


#1

Hi, folks!

As part of a validation study related to computational storage with HDF5, I ended up writing an I/O filter for data (de)compression using GPUs. The I/O filter is based on Mohammad Dashti’s wonderful snappy-CUDA project.

The filter works as intended, so perhaps it can be useful to you, too. Please visit the project page at https://github.com/lucasvr/snappy-cuda if you’re interested in giving it a try.

Have fun!
Lucas


#2

Will you register a filter ID with The HDF Group? G.


#3

Not sure I know, Barbara used to handle that.

Allen


#4

Now that you’ve asked me, I realized that I don’t have a filter ID registered for HDF5-UDF. If possible, I’d like to register both. What’s the preferred way to do so?

The filter ID I’ve been using for HDF5-UDF is 31300.
The ID I’ve temporarily assigned to Snappy-CUDA is 31301.

Thanks,
Lucas


#5

If the filter is compatible with Snappy ID 32003 (see https://portal.hdfgroup.org/display/support/Registered+Filter+Plugins).

If not, The HDF Group will issue new ID 32023 :slight_smile:

Under compatibility I mean that the current Snappy filer can be used to decompress Snappy-CUDA compressed data and vs. versa.


#6

That’s a good question, as I don’t see a link to download the source code of the current Snappy filter. I’ll contact the original author to ask him for a pointer.


#7

#8

This page has the links to different implementations.


#9

Thanks Elena. Judging from the implementation of cuda-c’s uncompressor alone, the two implementations are equivalent. However, since we don’t have access to the source code of the existing Snappy I/O filter, I can’t tell if the two filters are compatible. I’d like to know if Snappy ID 32003 prepends other metadata to the beginning of each compressed block, for instance.

I sent an email to Michael Rissi asking him for directions to download the filter. If I don’t hear from him until the end of the week then I think it’s safer to use a new ID.

Best regards,
Lucas


#10

I know Mike. Please let me know if he doesn’t respond. I’ll try to reach him.

Thank you!
Elena


#11

Hi Elena,

I just got a response from Mike. It looks like the information on the portal is innacurate, as the filter they developed was built on LZ4 as opposed to Snappy:

Oh that was quite some time ago… I never implemented the one using snappy, as LZ4 has a better compression and higher speed than snappy.
The LZ4 code, we handed over to the HDF5 group:
https://github.com/nexusformat/HDF5-External-Filter-Plugins/tree/master/LZ4

Given that there’s no existing implementation we can just reuse the same filter ID 32003. I will update the code accordingly.


#12

Great! Please send me the updated links, etc. and I will update filters table on the portal website.

Thank you!


#13

Hi Elena,

We can reuse most of the wording that describes Snappy already. Here’s a suggested description for the new filter. Please feel free to adjust as needed.

Snappy-CUDA Filter

Filter ID: 32003

Filter Description:

Snappy-CUDA is a compression/decompression library that leverages GPU processing power to compress/decompress data. The Snappy compression algorithm does not aim for maximum compression or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, the reference implementation of Snappy on the CPU is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger.

Links:
https://github.com/lucasvr/snappy-cuda
https://github.com/google/snappy

Contact Information:

Lucas C. Villa Real
Email: lucasvr at gmail dot com

Thanks!
Lucas


#14

Done. See https://confluence.hdfgroup.org/display/support/Filters


#15

Thank you, Elena! It looks good.


#16

Elena, thank you very much! It appears to be in good condition.