Which property parameters will affact Parellel HDF5 performance ?

tony · August 26, 2009, 1:02am

Hi Mark,

Thanks for your suggestion. Just as what you said, I only call H5Pset_fill_time(H5P_FILL_TIME_NEVER) for the create property.
And then statistics the time of H5Dcreate costs. Because I will create lots of datasets, I have to pay attention to its create cost.
Do you have any experience for my second problems? which is better to write few times with large amount of data or write many times with samller data size.

Tony.

在2009-08-26，miller86发给您的邮件内容如下

···

------------------------------------------------------------------------------

From: "Mark Miller"
To: hdf-forum@hdfgroup.org
Date: 2009-08-26 00:20:44
Subject:
Hi Tony,

Your email here mentions 'create' behavior. But, the property lists Kent
sent you are related to 'transfer' behavior (e.g. H5Dread/H5Dwrite).
There are different property lists for 'creation' and 'transfer'. So,
just be aware of that.

Based on your comments, I think you want to make a call to
H5Pset_fill_time(H5D_FILL_TIME_NEVER); and explicitly control filling of
dataset with H5Dfill() yourself.

I have no experience with H5Pset_file_time(), but I am guessing that if
you do it on the write end of things, then you'll only end up 'filling'
when you actually have to. If you wait and do it on the read end of
things, then I assume that HDF5 will 'automagically' fill on read. But,
maybe I am wrong in which case that places the burden on any reader of
your file to remember to explicitly call H5Dfill. I doubt that is the
case, but it is something I would at least test.

Mark

On Tue, 2009-08-25 at 22:47 +0800, 4 wrote:

Hi kent,

    Thank you very much.
    It seems that used the default setting is a better and safe choice
for me, My calculation program will be used in both HP linux and
cluster.
    I just tried setting never to fill value, it decreased the
H5Dcreate time apparently in my parellel program.
I was wondering about how does the fill value implmentation internal?
Because it cost much time if set fill value.
   Another thing confused me is that it will changed the total time
when I changed data size each time.
For example, the data size of each dataset to write changed
from 20M to 10M , it will get better performance.
So I hesitate to make a decision, when I have 100M to write, what I
should do , to write it once or 10 times or others? Is there any rules
to get higher performance for this situations?

tony

        在2009-08-25，myang6发给您的邮件内容如下
        ------------------------------------------------------------------------------

        From: "MuQun Yang"
        To: hdf-forum@hdfgroup.org
        Date: 2009-08-25 21:12:43
        Subject:

        It really depends on your system.

        Check the following APIs:

            * H5Pset_dxpl_mpio
              <http://*www.*hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetDxplMpio>
              >>
            * H5Pset_dxpl_mpio_chunk_opt
              <http://*www.*hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetDxplMpioChunkOpt>
              >>
            * H5Pset_dxpl_mpio_chunk_opt_num
              <http://*www.*hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetDxplMpioChunkOptNum>
              >>
            * H5Pset_dxpl_mpio_chunk_opt_ratio
              <http://*www.*hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetDxplMpioChunkOptRatio>
              >>
            * H5Pset_dxpl_mpio_collective_opt
              <http://*www.*hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetDxplMpioCollectiveOpt>
              >>

        H5Pset_dxpl_mpio
        <http://*www.*hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetDxplMpio>
        will set to collective IO. Others may help you tune some parameters.
        They may help you.
        Be aware that they may make it even worse for many cases because they
        depend on underneath pfs and mpi-io implementations.

        Kent
        4 wrote:
        > Hi all,
        > Does anyone knows which parameters setting will affect parelle IO
        > performace?
        > There is so many porperty parameters , such as :
        > H5Pset_meta_block_size,H5Pset_sieve_buf_size,
        > H5Pset_small_data_block_size,H5Pset_cache
        > which is exactly effective for PHDF5 performance or used the default
        > value is always the best. I really cannot understand these parameters.
        > What I was doing is writing data to hdf5 datasets, when before writing
        > data, create datasets with the fixed length(contiguous layout), and
        > write the data to 1 dimension datasets using collective IO(4process,
        > each process has the approximate size data, uncontiguous access
        > model). About 20M data will write to a single datasets each time, and
        > about 200 datasets will do the same operation at the same time.
        > I hope to get much better performance as possible.
        > Can any one give suggestions on improve performance? And some hints on
        > how to use above porperty parameters ?
        > Thanks in advance.
        > tony
        >
        >
        >
        >
        > ------------------------------------------------------------------------
        >
        > _______________________________________________
        > Hdf-forum is for HDF software users discussion.
        > Hdf-forum@hdfgroup.org
        > http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
        >

        --
        ****************************************
        Kent Yang
        The HDF Group
        1901 South First Street, Suite C-2
        Champaign, IL 61820

        myang6@hdfgroup.org
        (217)265-5129 (office) (217)333-9049 (fax)
        ****************************************

        _______________________________________________
        Hdf-forum is for HDF software users discussion.
        Hdf-forum@hdfgroup.org
        http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Mark C. Miller, Lawrence Livermore National Laboratory
email: mailto:miller86@llnl.gov
(M/T/W) (925)-423-5901 (!!LLNL BUSINESS ONLY!!)
(Th/F) (530)-753-8511 (!!LLNL BUSINESS ONLY!!)

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

su-loric · March 23, 2018, 6:54pm

A post was merged into an existing topic: Which property parameters will affact Parellel HDF5 performance ?

su-loric · March 23, 2018, 6:54pm