Hi Robert and Jerome,
Sequential HDF5 library can write and read compressed data. Parallel HDF5
can read a compressed dataset using several processes, but cannot write to
a compressed dataset.
Writing compressed data in parallel is a feature that is often requested,
but unfortunately we do not have funding to implement it. But before (or
actually after talking about funding, we really need to gather
requirements for this feature.
All,
Enabling writing of compressed data in a parallel HDF5 library will
require a lot of prototyping and a substantial development effort. We would
like to hear from you if you think the feature is absolutely critical for
your application. We also like to learn more about the writing patterns
your application uses.
In Robert's example each process writes a chunk of an HDF5 dataset. This
special case may be a little-bit easy to address than a general case when
data from a chunk may be distributed among several processes. It would be
good to know if this particular scenario is common. What are other commonly
used I/O patterns?
Knowing more about the I/O patterns will help us to understand the
approach we might take in going forward with the design and implementation
of the feature of writing an HDF5 compressed dataset in parallel (and the
cost, of course!)
Thank you!
Elena
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On Jan 13, 2013, at 2:09 PM, Jerome BENOIT wrote:
Hello,
On 13/01/13 18:37, Robert Seigel wrote:
Thank you for the response Jerome. Is this not an HDF5 issue because it is
not possible with HDF5? I would rather not have to compress the .h5 file
after it has been created.
HDF5 can compress data: there is a default compressor (gzip) and you can
use your own one (through some code):
code examples can be found for bzip2.
If you are a confident C coder, you can easily implement xz compress, and
you can certainly implement a parallel version
of those codes. (I use my own bzip2 and xz compress codes within HDF5, but
I have not yet parallelized them
by lake of time)
I guess it is a bad idea to compress h5 files: it is better to compress
within.
Note that you can drastically improve the compression rate by using
properly
some filters on the data.
To the pigz and pbzip2, pxz can be added.
Jerome
Rob
On Sun, Jan 13, 2013 at 11:10 AM, Jerome BENOIT <g6299304p@rezozer.net < > mailto:g6299304p@rezozer.net <g6299304p@rezozer.net>>> wrote:
On 13/01/13 16:38, Robert Seigel wrote:
Hello,
I currently am writing collectively to an HDF5 file in parallel
using chunks, where each processor writes its subdomain as a chunk of a
full dataset. I have this working correctly using hyperslabs, however the
file size is very large [about 18x larger than if it was created using
sequential HDF5 and a H5Pset_deflate(plist_id,6)]. If I try to apply this
routine to the property list while performing parallel I/O, HDF5 says that
this feature is not yet supported (I am using v1.8.10). Is there any way to
compress the file during parallel write?
This is rather a compressing issue than a HDF5 one:
you may look for parallel versions of current compressors (pigz,
pbzip2, ...).
hth,
Jerome
Thank you,
Rob
This body part will be downloaded on demand.
_________________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org<Hdf-forum@hdfgroup.org>
>
http://mail.hdfgroup.org/__mailman/listinfo/hdf-forum___hdfgroup.org <
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org