round-robin (not parallel) access to single hdf5 file

Leigh_Orf · December 15, 2010, 7:18pm

Mark,

Perhaps - however there will be a huge range of compression ratios in our
simulations. In many cases we literally have all the same floating point
values for a bunch of the variables in a given file. In other cases, it's
much less, getting only 2:1 or 4:1 compression ratios for scale+offset+gzip.
So with that kind of range, I'm not sure it would be worth the effort. I'll
mull it over some more though, there may be a way to make it worth the
effort.

Leigh

···

On Wed, Dec 15, 2010 at 12:13 PM, Mark Miller <miller86@llnl.gov> wrote:

Hi Leigh,

I guess I am still interested to know whether an approach where
specifying a minimum target compression ratio and then allowing HDF5 to
(possibly over) allocate assuming a max. compressed size would work for
you?

Mark

On Wed, 2010-12-15 at 10:59, Leigh Orf wrote:
>
> On Tue, Dec 14, 2010 at 5:42 PM, Quincey Koziol <koziol@hdfgroup.org> > > wrote:
> Hi Leigh,
>
>
> [snipped for brevity]
>
> > Quincey,
> >
> > Probably a combination of both, namely, an ideal situation
> > would be a group of MPI ranks collectively writing one
> > compressed HDF5 file. On Blue Waters a 100kcore run with 32
> > cores/MCM could therefore result in say around 3000 files,
> > which is not unreasonable.
> >
> > Maybe I'm thinking about this too simply, but couldn't you
> > compress the data on each MPI rank, save it in a buffer,
> > calculate the space required, and the write it? I don't know
> > enough about the internal workings of hdf5 to know whether
> > that would fit in the HDF5 model. In our particular
> > application on Blue Waters, memory is cheap, so there is
> > lots of space in memory for buffering data.
> >
>
>
> What you say above is basically what happens, except that
> space in the file needs to be allocated for each block of
> compressed data. Since each block is not the same size, the
> HDF5 library can't pre-allocate the space or algorithmically
> determine how much to reserve for each process. In the case
> of collective I/O, at least it's theoretically possible for
> all the processes to communicate and work it out, but I'm not
> certain it's going to be solvable for independent I/O, unless
> we reserve some processes to either allocate space (like a
> "free space server") or buffer the "I/O", etc.
>
> Could you make this work by forcing each core to have some specific
> chunking arrangement? For instance, you could have each core's
> dimension simply be the same dimension as each chunk, which actually
> works out pretty well in my application, at least in the horizontal. I
> typically have nxchunk=nx, nychunk=ny, and nzchunk to be something
> like 20 or so. But - now that I think about it, even if that were the
> case you don't know the size of the compressed chunks until you've
> compressed them and you'd still need to communicate the size of the
> compressed chunks amongst cores writing to an individual file.
>
> I don't know enough about hdf5 to understand how the preallocation
> process works. It sounds like you are allocating a bunch of zeroes (or
> something) on disk first, and then doing I/O straight to that space on
> disk? If this is the case then I can see how this necessitates some
> kind of collective communication if you are splitting up compression
> amongst MPI ranks.
>
> Personally I am perfectly happy with a bit of overhead which forces
> all cores to share amongst themselves what the compressed block size
> is before writing if it means we can do compression. Right now I see
> my choices as being (1) compressed, but 1 file per MPI rank, lots of
> files (2) No compression, fewer files, but perhaps compressing later
> on using h5repack, calling it in parallel, one h5repack per MPI rank
> as a post-processing step (yuck!).
>
> I'm glad you're working on this, personally I think this is important
> stuff for really huge simulations. In talking to other folks who will
> be using Blue Waters, compression is not much of an issue with many of
> them because of the nature of their data. Cloud data especially tends
> to compress very well. It would be a shame to fill terabytes of disk
> space with zeroes! I am sure we can still carry out our research
> objectives without compression, but the sheer amount of data we will
> be producing is staggering even with compression.
>
> Leigh
>
>
> Quincey
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
>
>
> --
> Leigh Orf
> Associate Professor of Atmospheric Science
> Department of Geology and Meteorology
> Central Michigan University
> Currently on sabbatical at the National Center for Atmospheric
> Research in Boulder, CO
> NCAR office phone: (303) 497-8200
--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Leigh Orf
Associate Professor of Atmospheric Science
Department of Geology and Meteorology
Central Michigan University
Currently on sabbatical at the National Center for Atmospheric Research
in Boulder, CO
NCAR office phone: (303) 497-8200

miller86 · December 15, 2010, 7:35pm

Hi Leigh,

Ok, I understand.

Yes, in my world, timesteps near zero compress very well as all the data
is generally just initial conditions; a lot of zeros. So, target
compression ratios for early time in a simulation might be as high as
10:1. But as the simulation evolves, the data gets more 'noisy' and we'd
reduce that to 3:1 or 2:1.

If the dynamic range of possible compression is 2-4:1 then you could at
least get within a factor of 2 of optimal. If the dynamic range is more
like 2-10:1, then I agree you'd be giving up too much.

Mark

···

On Wed, 2010-12-15 at 11:18, Leigh Orf wrote:

Mark,

Perhaps - however there will be a huge range of compression ratios in
our simulations. In many cases we literally have all the same floating
point values for a bunch of the variables in a given file. In other
cases, it's much less, getting only 2:1 or 4:1 compression ratios for
scale+offset+gzip. So with that kind of range, I'm not sure it would
be worth the effort. I'll mull it over some more though, there may be
a way to make it worth the effort.

Leigh

On Wed, Dec 15, 2010 at 12:13 PM, Mark Miller <miller86@llnl.gov> > wrote:
        Hi Leigh,

        I guess I am still interested to know whether an approach
        where
        specifying a minimum target compression ratio and then
        allowing HDF5 to
        (possibly over) allocate assuming a max. compressed size would
        work for
        you?

        Mark

        On Wed, 2010-12-15 at 10:59, Leigh Orf wrote:
        >
        > On Tue, Dec 14, 2010 at 5:42 PM, Quincey Koziol > <koziol@hdfgroup.org> > > wrote:
        > Hi Leigh,
        >
        >
        > [snipped for brevity]
        >
        > > Quincey,
        > >
        > > Probably a combination of both, namely, an ideal
        situation
        > > would be a group of MPI ranks collectively writing
        one
        > > compressed HDF5 file. On Blue Waters a 100kcore
        run with 32
        > > cores/MCM could therefore result in say around
        3000 files,
        > > which is not unreasonable.
        > >
        > > Maybe I'm thinking about this too simply, but
        couldn't you
        > > compress the data on each MPI rank, save it in a
        buffer,
        > > calculate the space required, and the write it? I
        don't know
        > > enough about the internal workings of hdf5 to know
        whether
        > > that would fit in the HDF5 model. In our
        particular
        > > application on Blue Waters, memory is cheap, so
        there is
        > > lots of space in memory for buffering data.
        > >
        >
        >
        > What you say above is basically what happens, except
        that
        > space in the file needs to be allocated for each
        block of
        > compressed data. Since each block is not the same
        size, the
        > HDF5 library can't pre-allocate the space or
        algorithmically
        > determine how much to reserve for each process. In
        the case
        > of collective I/O, at least it's theoretically
        possible for
        > all the processes to communicate and work it out,
        but I'm not
        > certain it's going to be solvable for independent
        I/O, unless
        > we reserve some processes to either allocate space
        (like a
        > "free space server") or buffer the "I/O", etc.
        >
        > Could you make this work by forcing each core to have some
        specific
        > chunking arrangement? For instance, you could have each
        core's
        > dimension simply be the same dimension as each chunk, which
        actually
        > works out pretty well in my application, at least in the
        horizontal. I
        > typically have nxchunk=nx, nychunk=ny, and nzchunk to be
        something
        > like 20 or so. But - now that I think about it, even if that
        were the
        > case you don't know the size of the compressed chunks until
        you've
        > compressed them and you'd still need to communicate the size
        of the
        > compressed chunks amongst cores writing to an individual
        file.
        >
        > I don't know enough about hdf5 to understand how the
        preallocation
        > process works. It sounds like you are allocating a bunch of
        zeroes (or
        > something) on disk first, and then doing I/O straight to
        that space on
        > disk? If this is the case then I can see how this
        necessitates some
        > kind of collective communication if you are splitting up
        compression
        > amongst MPI ranks.
        >
        > Personally I am perfectly happy with a bit of overhead which
        forces
        > all cores to share amongst themselves what the compressed
        block size
        > is before writing if it means we can do compression. Right
        now I see
        > my choices as being (1) compressed, but 1 file per MPI rank,
        lots of
        > files (2) No compression, fewer files, but perhaps
        compressing later
        > on using h5repack, calling it in parallel, one h5repack per
        MPI rank
        > as a post-processing step (yuck!).
        >
        > I'm glad you're working on this, personally I think this is
        important
        > stuff for really huge simulations. In talking to other folks
        who will
        > be using Blue Waters, compression is not much of an issue
        with many of
        > them because of the nature of their data. Cloud data
        especially tends
        > to compress very well. It would be a shame to fill terabytes
        of disk
        > space with zeroes! I am sure we can still carry out our
        research
        > objectives without compression, but the sheer amount of data
        we will
        > be producing is staggering even with compression.
        >
        > Leigh
        >
        >
        > Quincey
        >
        > _______________________________________________
        > Hdf-forum is for HDF software users discussion.
        > Hdf-forum@hdfgroup.org
        >
        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
        >
        >
        >
        >
        > --
        > Leigh Orf
        > Associate Professor of Atmospheric Science
        > Department of Geology and Meteorology
        > Central Michigan University
        > Currently on sabbatical at the National Center for
        Atmospheric
        > Research in Boulder, CO
        > NCAR office phone: (303) 497-8200

        --
        Mark C. Miller, Lawrence Livermore National Laboratory
        ================!!LLNL BUSINESS ONLY!!================
        miller86@llnl.gov urgent: miller86@pager.llnl.gov
        T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

        _______________________________________________

        Hdf-forum is for HDF software users discussion.
        Hdf-forum@hdfgroup.org
        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Leigh Orf
Associate Professor of Atmospheric Science
Department of Geology and Meteorology
Central Michigan University
Currently on sabbatical at the National Center for Atmospheric
Research in Boulder, CO
NCAR office phone: (303) 497-8200

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511