speeding up h5repack

Brock_Palen · October 28, 2008, 6:20pm

Is there any tweaks that can be done to speed up compressing already created hdf5 files?

For example

h5repack -v -i rt_3d_71nm_5micron_hdf5_plt_cnt_0010 -o lt_cnt_0010_zipped -f GZIP=1

Takes 129 Minutes

While:
gzip rt_3d_71nm_5micron_hdf5_plt_cnt_0010

Takes 1.5 Minutes

hdf5-1.6.7

We don't have szip enabled, but would be interested in trying (academic work so licensing should not be a problem).

Just seemed strange that it took so long, the uncompressed hdf5 file is from FLASH2.5.

Any insight would be nice.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp@umich.edu
(734)936-1985

···

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

epourmal · October 29, 2008, 1:00am

Brock,

It is hard to say for sure why performance is bad.

Do you know if original dataset was chunked?

Try

h5dump -p -H

command on your file and check for CHUNKED_LAYOUT keyword in the output.

Elena

···

On Oct 28, 2008, at 1:20 PM, Brock Palen wrote:

Is there any tweaks that can be done to speed up compressing already created hdf5 files?

For example

h5repack -v -i rt_3d_71nm_5micron_hdf5_plt_cnt_0010 -o lt_cnt_0010_zipped -f GZIP=1

Takes 129 Minutes

While:
gzip rt_3d_71nm_5micron_hdf5_plt_cnt_0010

Takes 1.5 Minutes

hdf5-1.6.7

We don't have szip enabled, but would be interested in trying (academic work so licensing should not be a problem).

Just seemed strange that it took so long, the uncompressed hdf5 file is from FLASH2.5.

Any insight would be nice.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp@umich.edu
(734)936-1985

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

gnwiii · October 28, 2008, 11:21pm

There is overhead processing the structural information in hdf5 files,
and there is startup overhead for the compression library (setting up
the structures) for each
chunk to be compressed. It would be interesting to see the time for some
trivial h5repack operation (-f NONE, scaling?). In principle, h5repack should
be able to take advantage of parallel processing, so if you could get 1000
processors going you mght beat gzip by a large factor.

What were the file sizes?

The gzip program supports levels 1--9 (fast, less compressed to slow,
more compressed), with default 6, so your gzip run should have been
doing more
compression work than h5repack. The question is how much of the overhead
is dealing with the hdf5 structure and how much from the compression library
startup. Function call profiles would give you the number of calls to deflate
and deflateInit for the two runs.

···

On Tue, Oct 28, 2008 at 3:20 PM, Brock Palen <brockp@umich.edu> wrote:

Is there any tweaks that can be done to speed up compressing already created
hdf5 files?

For example

h5repack -v -i rt_3d_71nm_5micron_hdf5_plt_cnt_0010 -o lt_cnt_0010_zipped -f
GZIP=1

Takes 129 Minutes

While:
gzip rt_3d_71nm_5micron_hdf5_plt_cnt_0010

Takes 1.5 Minutes

hdf5-1.6.7

We don't have szip enabled, but would be interested in trying (academic work
so licensing should not be a problem).

Just seemed strange that it took so long, the uncompressed hdf5 file is
from FLASH2.5.

Any insight would be nice.

--
George N. White III <aa056@chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Brock_Palen · October 29, 2008, 12:06am

Is there any tweaks that can be done to speed up compressing already created
hdf5 files?

For example

h5repack -v -i rt_3d_71nm_5micron_hdf5_plt_cnt_0010 -o lt_cnt_0010_zipped -f
GZIP=1

Takes 129 Minutes

While:
gzip rt_3d_71nm_5micron_hdf5_plt_cnt_0010

Takes 1.5 Minutes

hdf5-1.6.7

We don't have szip enabled, but would be interested in trying (academic work
so licensing should not be a problem).

Just seemed strange that it took so long, the uncompressed hdf5 file is
from FLASH2.5.

Any insight would be nice.

There is overhead processing the structural information in hdf5 files,
and there is startup overhead for the compression library (setting up
the structures) for each
chunk to be compressed. It would be interesting to see the time for some
trivial h5repack operation (-f NONE, scaling?). In principle, h5repack should
be able to take advantage of parallel processing, so if you could get 1000
processors going you mght beat gzip by a large factor.

What were the file sizes?

about 2GB uncompressed.

The gzip program supports levels 1--9 (fast, less compressed to slow,
more compressed), with default 6, so your gzip run should have been
doing more

Yes thats what I thought,

compression work than h5repack. The question is how much of the overhead
is dealing with the hdf5 structure and how much from the compression library
startup. Function call profiles would give you the number of calls to deflate
and deflateInit for the two runs.

I don't have the time to compile it with -pg but I should point out that 160minutes to 1.5 minutes in a huge spread.

What does h5repack do if the file is not chunked? These files were written using parallel hdf5, but I did not write the app and am only starting to learn hdf5.

···

On Oct 28, 2008, at 7:21 PM, George N. White III wrote:

On Tue, Oct 28, 2008 at 3:20 PM, Brock Palen <brockp@umich.edu> wrote:

--
George N. White III <aa056@chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

epourmal · October 29, 2008, 5:50pm

Brock,

For this particular dataset try to specify chunk size with the -l CHUNK=64x16x16x16 flag.

Explanation:

When user doesn't specify a chunk size, h5repack uses dimensions of the dataset (9970x16x16x16 in this case) to set up chunking parameters. Current implementation sets chunk dimensions to the dataset dimensions. Therefore, one gets a pretty big chunk that doesn't fit into chunk cache (1MB default; tuning is not available for h5repack at this point).

h5repack writes a dataset by hyperslabs. Since chunk doesn't fit into chunk cache, HDF5 library writes part of the chunk, evicts from chunk cache, compresses it and writes to the file. When next hyperslab needs to be written, HDF5 reads the chunk, uncompresses it, writes new data, compresses it, writes to file, and so on.

This behavior will be avoided if hyperslab corresponds to a chunk or to several chunks that fit into chunk cache.

We are aware of the problem and are working on improving HDF5 tools performance including better default strategy for choosing chunking parameters and hyperslabs.

Elena

···

On Oct 28, 2008, at 8:00 PM, Elena Pourmal wrote:

Brock,

It is hard to say for sure why performance is bad.

Do you know if original dataset was chunked?

Try

h5dump -p -H

command on your file and check for CHUNKED_LAYOUT keyword in the output.

Elena

On Oct 28, 2008, at 1:20 PM, Brock Palen wrote:

Is there any tweaks that can be done to speed up compressing already created hdf5 files?

For example

h5repack -v -i rt_3d_71nm_5micron_hdf5_plt_cnt_0010 -o lt_cnt_0010_zipped -f GZIP=1

Takes 129 Minutes

While:
gzip rt_3d_71nm_5micron_hdf5_plt_cnt_0010

Takes 1.5 Minutes

hdf5-1.6.7

We don't have szip enabled, but would be interested in trying (academic work so licensing should not be a problem).

Just seemed strange that it took so long, the uncompressed hdf5 file is from FLASH2.5.

Any insight would be nice.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp@umich.edu
(734)936-1985

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Brock_Palen · October 29, 2008, 9:12pm

No chunked layout

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp@umich.edu
(734)936-1985

···

On Oct 28, 2008, at 9:00 PM, Elena Pourmal wrote:

Brock,

It is hard to say for sure why performance is bad.

Do you know if original dataset was chunked?

Try

h5dump -p -H

command on your file and check for CHUNKED_LAYOUT keyword in the output.

Elena

On Oct 28, 2008, at 1:20 PM, Brock Palen wrote:

Is there any tweaks that can be done to speed up compressing already created hdf5 files?

For example

h5repack -v -i rt_3d_71nm_5micron_hdf5_plt_cnt_0010 -o lt_cnt_0010_zipped -f GZIP=1

Takes 129 Minutes

While:
gzip rt_3d_71nm_5micron_hdf5_plt_cnt_0010

Takes 1.5 Minutes

hdf5-1.6.7

We don't have szip enabled, but would be interested in trying (academic work so licensing should not be a problem).

Just seemed strange that it took so long, the uncompressed hdf5 file is from FLASH2.5.

Any insight would be nice.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp@umich.edu
(734)936-1985

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Andrew_Cunningham · October 29, 2008, 9:40pm

Hi,
I am looking for advice on the optimal way to R/W a HDF file in C/C++ under the following circumstances.
Essentially my data is one large array of single precision data of , say , 1 million rows by , say, 3000 columns. Sometimes it can be smaller, of course, and may be larger, but that would be a typical 'large' problem.

  The data is "received" by columns 1,2,3......N. Obviously I cannot hold all the data in memory so I write a column 'hyperslab' one at a time as I get the data.

  The main requirement is I need to efficiently be able to read a random selection of rows. So I need to read, say, the data from a 1000 rows (random indices) into memory. I do not know before hand which rows will need to be read. Obviously the implementation is a to construct a hyperslab read of those rows - but is that the most efficient way?

  Any tips before I dive in?

Andrew

···

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

nfortne2 · October 29, 2008, 10:10pm

Andrew,

The best way to do this depends on your priorities for performance. The most important parameters imfluencing read and write performance here are the chunk dimensions, and the raw data chunk cache (rdcc) configuration. Are you using any compression on the dataset? The following advice assumes you are not:

To optimize read performance, because the rows are contiguous in memory, you will probably want the cache to be smaller than the size of a chunk (see H5Pset_cache). This will force the library to read the only the data you want directly from the disk. You also want the chunks to be wide, to minimize the number of I/O's required. If sequential rows to be read tend to be close to each other, it may be a good idea to increase the size of the cache to be able to fit an entire "row" (or more) of chunks.

To optimize write performance, you may want to change the cache size to fit a chunk, depending on the width of the chunk you select. This will allow the library to write the entire chunk at once (per column), rather than seeking to each individual element. Narrower chunks will be faster in this case.

To change the cache configuration between writing and reading you will need to close then reopen the file.

The optimum chunk height depends on how much memory you want the application to use.

Good luck and let me know if you have any further questions.

Neil Fortner
The HDF Group

Quoting Andrew Cunningham <andrewc@mac.com>:

···

Hi,
   I am looking for advice on the optimal way to R/W a HDF file in C/C++
under the following circumstances.
  Essentially my data is one large array of single precision data of ,
say , 1 million rows by , say, 3000 columns. Sometimes it can be
smaller, of course, and may be larger, but that would be a typical
'large' problem.

  The data is "received" by columns 1,2,3......N. Obviously I cannot
hold all the data in memory so I write a column 'hyperslab' one at a
time as I get the data.

  The main requirement is I need to efficiently be able to read a random
selection of rows. So I need to read, say, the data from a 1000 rows
(random indices) into memory. I do not know before hand which rows will
need to be read. Obviously the implementation is a to construct a
hyperslab read of those rows - but is that the most efficient way?

  Any tips before I dive in?

Andrew

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

brian1.thomas · October 29, 2008, 9:48pm

Andrew,

Hyperslabs are the best way to extract data when working with large
datasets and for selective read. You can use hyperslabs for 2
dimensional as well as 3 dimensional arrays too. Use optimal indexing
techniques and chunk size while you create the HDF5 dataset for optimal
speed while reading from it.

Thanks,
Brian

···

-----Original Message-----
From: Andrew Cunningham [mailto:andrewc@mac.com]
Sent: Wednesday, October 29, 2008 5:41 PM
To: HDF Forum
Subject: [hdf-forum] Advice on way to write HDF file for optimal
reading.

Hi,
I am looking for advice on the optimal way to R/W a HDF file in
C/C+
+ under the following circumstances.
Essentially my data is one large array of single precision data
of , say , 1 million rows by , say, 3000 columns. Sometimes it can be
smaller, of course, and may be larger, but that would be a typical
'large' problem.

  The data is "received" by columns 1,2,3......N. Obviously I
cannot hold all the data in memory so I write a column 'hyperslab' one
at a time as I get the data.

  The main requirement is I need to efficiently be able to read a
random selection of rows. So I need to read, say, the data from a 1000
rows (random indices) into memory. I do not know before hand which rows
will need to be read. Obviously the implementation is a to construct a
hyperslab read of those rows - but is that the most efficient way?

  Any tips before I dive in?

Andrew

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to
hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Andrew_Cunningham · October 29, 2008, 10:52pm

HI Neil,
The situation is 'write-once'/'read-often'. That is, the HDF file will be written in column order in one data-translation operation and closed ( no-compression), then later opened ( and kept open) with reading of non-contiguous blocks of rows.

Andrew

···

On Oct 29, 2008, at 3:10 PM, nfortne2@hdfgroup.org wrote:

The best way to do this depends on your priorities for performance. The most important parameters imfluencing read and write performance here are the chunk dimensions, and the raw data chunk cache (rdcc) configuration. Are you using any compression on the dataset? The following advice assumes you are not:

T

Andrew

nfortne2 · October 30, 2008, 3:40pm

Andrew,

In that case, you can probably ignore the advice on write performance and set the chunks to be as wide as you dare (possibly even the width of the dataset). Still, would it be possible to write the data in row order? That would be significantly faster because the rows are contiguous on disk. In that case you would want to write one "row" of chunks at a time (or more).

Thanks,
-Neil

Quoting Andrew Cunningham <andrewc@mac.com>:

···

HI Neil,
The situation is 'write-once'/'read-often'. That is, the HDF file
will be written in column order in one data-translation operation and
closed ( no-compression), then later opened ( and kept open) with
reading of non-contiguous blocks of rows.

Andrew

On Oct 29, 2008, at 3:10 PM, nfortne2@hdfgroup.org wrote:

The best way to do this depends on your priorities for performance. The most important parameters imfluencing read and write performance here are the chunk dimensions, and the raw data chunk cache (rdcc) configuration. Are you using any compression on the dataset? The following advice assumes you are not:

T

Andrew

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.