Parallel Compression

Hi,

the GROMACS software package is looking into whether we should use HDF5 to
store trajectories (=atomic coordinate time series).
The issue is that we need to support parallel compression. The FAQ explains
that parallel compression is not supported and also why.

For us it would be sufficient to write all data of a chuck at once and
thus the size of the chunk would be known. Also it would be sufficient that
a single chunk would be written in serial - it would be only required that
different chunks could be written in parallel. Thus as far as I can see the
problem described in the FAQ wouldn't apply for our restricted conditions.

The FAQ only explains that in general parallel compressed writing isn't
supported. Is it somehow possible to have parallel compressed writing under
these more restricted conditions?
If it is currently not supported, is support for it planned? Or how
difficult would it be to add support for it? Is it somehow possible to
workaround the limitation without having to modify HDF5 itself?

Thanks
Roland

···

--
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309

Hi Roland,

···

On Apr 3, 2012, at 1:16 PM, Roland Schulz wrote:

Hi,

the GROMACS software package is looking into whether we should use HDF5 to store trajectories (=atomic coordinate time series).
The issue is that we need to support parallel compression. The FAQ explains that parallel compression is not supported and also why.

For us it would be sufficient to write all data of a chuck at once and thus the size of the chunk would be known. Also it would be sufficient that a single chunk would be written in serial - it would be only required that different chunks could be written in parallel. Thus as far as I can see the problem described in the FAQ wouldn't apply for our restricted conditions.

The FAQ only explains that in general parallel compressed writing isn't supported. Is it somehow possible to have parallel compressed writing under these more restricted conditions?
If it is currently not supported, is support for it planned? Or how difficult would it be to add support for it? Is it somehow possible to workaround the limitation without having to modify HDF5 itself?

  It would be possible to work around the issue, for that situation, yes. However, it would require some engineering within the HDF5 library itself, and we don't have any funding for taking care of it currently. :-/ If you'd like to work on a patch to submit (or perhaps have some funds you could direct for this task), we'd be happy to work with you on how to make it work.

  Regards,
    Quincey

Hi Roland,

> Hi,
>
> the GROMACS software package is looking into whether we should use HDF5
to store trajectories (=atomic coordinate time series).
> The issue is that we need to support parallel compression. The FAQ
explains that parallel compression is not supported and also why.
>
> For us it would be sufficient to write all data of a chuck at once and
thus the size of the chunk would be known. Also it would be sufficient that
a single chunk would be written in serial - it would be only required that
different chunks could be written in parallel. Thus as far as I can see the
problem described in the FAQ wouldn't apply for our restricted conditions.
>
> The FAQ only explains that in general parallel compressed writing isn't
supported. Is it somehow possible to have parallel compressed writing under
these more restricted conditions?
> If it is currently not supported, is support for it planned? Or how
difficult would it be to add support for it? Is it somehow possible to
workaround the limitation without having to modify HDF5 itself?

        It would be possible to work around the issue, for that situation,
yes. However, it would require some engineering within the HDF5 library
itself, and we don't have any funding for taking care of it currently. :-/
If you'd like to work on a patch to submit (or perhaps have some funds you
could direct for this task), we'd be happy to work with you on how to make
it work.

Can you give me a very rough estimate of how much work it would be? I don't
know yet whether we would have funds available for this. Thus I would be
interested in an estimate both for how much work it would be for a
programmer not yet familiar with HDF5 internals to develop a patch and how
much funds would be required to support the development.
Having a very rough estimate would be very helpful to get a first idea of
whether it makes more sense to add parallel compression to HDF5 or develop
or own solution without HDF5.

Thanks
Roland

···

On Thu, Apr 5, 2012 at 2:58 PM, Quincey Koziol <koziol@hdfgroup.org> wrote:

On Apr 3, 2012, at 1:16 PM, Roland Schulz wrote:

       Regards,
               Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309