Question about meta data for chunked datasets

Hello,

I have a parallel HDF5 application that is writing out chunked data to
a 3D dataset and is exhibiting a large number of small writes upon
closing the file. Below I've attached a trace of POSIX calls on node 0
showing the file open, then 4 chunks of size 1757600 bytes being
written, then a series of 40 - 3136 byte writes (mostly 3136), and
then a truncate call before the file is closed. The small writes are
not ideal because this is a lustre file system on a Cray XT at NERSC.
Together, those small writes and truncate take about 30% of the time
from file open to close.

My hypothesis is that the small writes represent meta data related to
the chunk indexing. Does that sound right? What is the best way for me
to consolidate these small writes into one large write? Should I use
H5Pset_meta_block_size() to set the block size to the lustre stripe
width of 1MB? I'm a little concerned by the fact that the 3136 byte
writes are not to contiguous offsets, and perhaps cannot be
consolidated into a single write.

What is the purpose of the truncate? Can it be removed?

Thanks,

Mark Howison
mhowison@lbl.gov
Student Research Assistant
Visualization Group, Lawrence Berkeley National Labs

ifi=5 41 open64("../output/prs.h5part",66,-1) 3.26833e+01 6.02412e-03
ifi=5 0 close(41) 3.26894e+01 1.08004e-04
ifi=5 41 open64("../output/prs.h5part",2,-1) 3.26895e+01 3.85680e-02
ifi=5 0 lseek64(41,0,2) 3.27325e+01 1.83105e-03
ifi=5 0 lseek64(41,0,0) 3.27344e+01 9.53674e-07
ifi=5 96 write(41,0x7fffffffb740,96) 3.27358e+01 3.38793e-04
ifi=5 7304 lseek64(41,7304,0) 3.27391e+01 3.09944e-06
ifi=5 1757600 write(41,0x371cdc80,1757600) 3.27391e+01 9.63148e-01
ifi=5 1757600 write(41,0x3737ae20,1757600) 3.37023e+01 2.74949e-02
ifi=5 1757600 write(41,0x37527fc0,1757600) 3.37299e+01 1.32360e-02
ifi=5 1757600 write(41,0x376d5160,1757600) 3.37432e+01 1.96590e-02
ifi=5 96 lseek64(41,96,0) 3.45493e+01 1.90735e-06
ifi=5 40 write(41,0x5d0b8188,40) 3.45493e+01 1.57619e-03
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45509e+01 1.69277e-05
ifi=5 120 write(41,0x5d0b87f8,120) 3.45510e+01 1.50204e-05
ifi=5 40 write(41,0x5d0b9308,40) 3.45510e+01 1.38283e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45510e+01 1.50204e-05
ifi=5 120 write(41,0x5d0b9858,120) 3.45510e+01 1.38283e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45510e+01 1.59740e-05
ifi=5 40 write(41,0x5d0c18d8,40) 3.45511e+01 1.40667e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45511e+01 1.50204e-05
ifi=5 120 write(41,0x5d0c1ed8,120) 3.45511e+01 1.40667e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45511e+01 1.50204e-05
ifi=5 40 write(41,0x5d0c4288,40) 3.45511e+01 1.40667e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45512e+01 1.50204e-05
ifi=5 120 write(41,0x5d0c4918,120) 3.45512e+01 1.40667e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45512e+01 1.40667e-05
ifi=5 272 write(41,0x5d0c8948,272) 3.45512e+01 3.58105e-03
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45548e+01 1.69277e-05
ifi=5 114251304 lseek64(41,114251304,0) 3.45549e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45549e+01 1.46720e-02
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45696e+01 1.78814e-05
ifi=5 214440776 lseek64(41,214440776,0) 3.45696e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45696e+01 1.86720e-02
ifi=5 314627112 lseek64(41,314627112,0) 3.45883e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45883e+01 1.42689e-02
ifi=5 414813448 lseek64(41,414813448,0) 3.46026e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46026e+01 1.24190e-02
ifi=5 514999784 lseek64(41,514999784,0) 3.46150e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46150e+01 1.48160e-02
ifi=5 615186120 lseek64(41,615186120,0) 3.46299e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46299e+01 3.93460e-02
ifi=5 715372456 lseek64(41,715372456,0) 3.46693e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46693e+01 1.76220e-02
ifi=5 815558792 lseek64(41,815558792,0) 3.46869e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46869e+01 1.06070e-02
ifi=5 915745128 lseek64(41,915745128,0) 3.46975e+01 1.19209e-06
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46975e+01 1.74150e-02
ifi=5 1015931464 lseek64(41,1015931464,0) 3.47150e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47150e+01 1.11501e-02
ifi=5 1116117800 lseek64(41,1116117800,0) 3.47262e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47262e+01 1.67122e-02
ifi=5 1216304136 lseek64(41,1216304136,0) 3.47429e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47429e+01 5.77402e-03
ifi=5 1316490472 lseek64(41,1316490472,0) 3.47487e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47487e+01 1.83940e-02
ifi=5 1416676808 lseek64(41,1416676808,0) 3.47671e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47671e+01 1.35159e-02
ifi=5 1516863144 lseek64(41,1516863144,0) 3.47806e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47806e+01 1.70491e-02
ifi=5 1617049480 lseek64(41,1617049480,0) 3.47977e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47977e+01 9.32908e-03
ifi=5 1717235816 lseek64(41,1717235816,0) 3.48071e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48071e+01 1.15631e-02
ifi=5 1817422152 lseek64(41,1817422152,0) 3.48187e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48187e+01 8.60000e-03
ifi=5 1917608488 lseek64(41,1917608488,0) 3.48273e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48273e+01 6.62398e-03
ifi=5 2017794824 lseek64(41,2017794824,0) 3.48339e+01 1.19209e-06
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48339e+01 7.51495e-03
ifi=5 2117981160 lseek64(41,2117981160,0) 3.48415e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48415e+01 1.77360e-02
ifi=5 2218167496 lseek64(41,2218167496,0) 3.48592e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48592e+01 1.63181e-02
ifi=5 2249807432 lseek64(41,2249807432,0) 3.48756e+01 0.00000e+00
ifi=5 328 write(41,0x7fffffffb660,328) 3.48756e+01 7.10177e-03
ifi=5 0 lseek64(41,0,0) 3.48828e+01 0.00000e+00
ifi=5 96 write(41,0x7fffffffb510,96) 3.48828e+01 2.69413e-05
ifi=5 0 ftruncate64(41,2249809480) 3.48829e+01 7.08644e-01
ifi=5 0 fsync(41) 3.55917e+01 3.28633e-01
ifi=5 0 lseek64(41,0,0) 3.59472e+01 1.90735e-06
ifi=5 96 write(41,0x7fffffffb4d0,96) 3.59473e+01 5.88894e-05
ifi=5 0 close(41) 3.59477e+01 9.05991e-06

···

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Also, here is a graph showing that same activity on node 0 (the first
row of pixels). The color key is:

blue = write
dark purple = truncate
purple = fsync
teal = fflush

Mark

···

On Fri, Feb 13, 2009 at 12:03 PM, Mark Howison <MHowison@lbl.gov> wrote:

Hello,

I have a parallel HDF5 application that is writing out chunked data to
a 3D dataset and is exhibiting a large number of small writes upon
closing the file. Below I've attached a trace of POSIX calls on node 0
showing the file open, then 4 chunks of size 1757600 bytes being
written, then a series of 40 - 3136 byte writes (mostly 3136), and
then a truncate call before the file is closed. The small writes are
not ideal because this is a lustre file system on a Cray XT at NERSC.
Together, those small writes and truncate take about 30% of the time
from file open to close.

My hypothesis is that the small writes represent meta data related to
the chunk indexing. Does that sound right? What is the best way for me
to consolidate these small writes into one large write? Should I use
H5Pset_meta_block_size() to set the block size to the lustre stripe
width of 1MB? I'm a little concerned by the fact that the 3136 byte
writes are not to contiguous offsets, and perhaps cannot be
consolidated into a single write.

What is the purpose of the truncate? Can it be removed?

Thanks,

Mark Howison
mhowison@lbl.gov
Student Research Assistant
Visualization Group, Lawrence Berkeley National Labs

ifi=5 41 open64("../output/prs.h5part",66,-1) 3.26833e+01 6.02412e-03
ifi=5 0 close(41) 3.26894e+01 1.08004e-04
ifi=5 41 open64("../output/prs.h5part",2,-1) 3.26895e+01 3.85680e-02
ifi=5 0 lseek64(41,0,2) 3.27325e+01 1.83105e-03
ifi=5 0 lseek64(41,0,0) 3.27344e+01 9.53674e-07
ifi=5 96 write(41,0x7fffffffb740,96) 3.27358e+01 3.38793e-04
ifi=5 7304 lseek64(41,7304,0) 3.27391e+01 3.09944e-06
ifi=5 1757600 write(41,0x371cdc80,1757600) 3.27391e+01 9.63148e-01
ifi=5 1757600 write(41,0x3737ae20,1757600) 3.37023e+01 2.74949e-02
ifi=5 1757600 write(41,0x37527fc0,1757600) 3.37299e+01 1.32360e-02
ifi=5 1757600 write(41,0x376d5160,1757600) 3.37432e+01 1.96590e-02
ifi=5 96 lseek64(41,96,0) 3.45493e+01 1.90735e-06
ifi=5 40 write(41,0x5d0b8188,40) 3.45493e+01 1.57619e-03
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45509e+01 1.69277e-05
ifi=5 120 write(41,0x5d0b87f8,120) 3.45510e+01 1.50204e-05
ifi=5 40 write(41,0x5d0b9308,40) 3.45510e+01 1.38283e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45510e+01 1.50204e-05
ifi=5 120 write(41,0x5d0b9858,120) 3.45510e+01 1.38283e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45510e+01 1.59740e-05
ifi=5 40 write(41,0x5d0c18d8,40) 3.45511e+01 1.40667e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45511e+01 1.50204e-05
ifi=5 120 write(41,0x5d0c1ed8,120) 3.45511e+01 1.40667e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45511e+01 1.50204e-05
ifi=5 40 write(41,0x5d0c4288,40) 3.45511e+01 1.40667e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45512e+01 1.50204e-05
ifi=5 120 write(41,0x5d0c4918,120) 3.45512e+01 1.40667e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45512e+01 1.40667e-05
ifi=5 272 write(41,0x5d0c8948,272) 3.45512e+01 3.58105e-03
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45548e+01 1.69277e-05
ifi=5 114251304 lseek64(41,114251304,0) 3.45549e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45549e+01 1.46720e-02
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45696e+01 1.78814e-05
ifi=5 214440776 lseek64(41,214440776,0) 3.45696e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45696e+01 1.86720e-02
ifi=5 314627112 lseek64(41,314627112,0) 3.45883e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45883e+01 1.42689e-02
ifi=5 414813448 lseek64(41,414813448,0) 3.46026e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46026e+01 1.24190e-02
ifi=5 514999784 lseek64(41,514999784,0) 3.46150e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46150e+01 1.48160e-02
ifi=5 615186120 lseek64(41,615186120,0) 3.46299e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46299e+01 3.93460e-02
ifi=5 715372456 lseek64(41,715372456,0) 3.46693e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46693e+01 1.76220e-02
ifi=5 815558792 lseek64(41,815558792,0) 3.46869e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46869e+01 1.06070e-02
ifi=5 915745128 lseek64(41,915745128,0) 3.46975e+01 1.19209e-06
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46975e+01 1.74150e-02
ifi=5 1015931464 lseek64(41,1015931464,0) 3.47150e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47150e+01 1.11501e-02
ifi=5 1116117800 lseek64(41,1116117800,0) 3.47262e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47262e+01 1.67122e-02
ifi=5 1216304136 lseek64(41,1216304136,0) 3.47429e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47429e+01 5.77402e-03
ifi=5 1316490472 lseek64(41,1316490472,0) 3.47487e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47487e+01 1.83940e-02
ifi=5 1416676808 lseek64(41,1416676808,0) 3.47671e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47671e+01 1.35159e-02
ifi=5 1516863144 lseek64(41,1516863144,0) 3.47806e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47806e+01 1.70491e-02
ifi=5 1617049480 lseek64(41,1617049480,0) 3.47977e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47977e+01 9.32908e-03
ifi=5 1717235816 lseek64(41,1717235816,0) 3.48071e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48071e+01 1.15631e-02
ifi=5 1817422152 lseek64(41,1817422152,0) 3.48187e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48187e+01 8.60000e-03
ifi=5 1917608488 lseek64(41,1917608488,0) 3.48273e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48273e+01 6.62398e-03
ifi=5 2017794824 lseek64(41,2017794824,0) 3.48339e+01 1.19209e-06
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48339e+01 7.51495e-03
ifi=5 2117981160 lseek64(41,2117981160,0) 3.48415e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48415e+01 1.77360e-02
ifi=5 2218167496 lseek64(41,2218167496,0) 3.48592e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48592e+01 1.63181e-02
ifi=5 2249807432 lseek64(41,2249807432,0) 3.48756e+01 0.00000e+00
ifi=5 328 write(41,0x7fffffffb660,328) 3.48756e+01 7.10177e-03
ifi=5 0 lseek64(41,0,0) 3.48828e+01 0.00000e+00
ifi=5 96 write(41,0x7fffffffb510,96) 3.48828e+01 2.69413e-05
ifi=5 0 ftruncate64(41,2249809480) 3.48829e+01 7.08644e-01
ifi=5 0 fsync(41) 3.55917e+01 3.28633e-01
ifi=5 0 lseek64(41,0,0) 3.59472e+01 1.90735e-06
ifi=5 96 write(41,0x7fffffffb4d0,96) 3.59473e+01 5.88894e-05
ifi=5 0 close(41) 3.59477e+01 9.05991e-06

Hi Mark,

Also, here is a graph showing that same activity on node 0 (the first
row of pixels). The color key is:

blue = write
dark purple = truncate
purple = fsync
teal = fflush

Mark

Hello,

I have a parallel HDF5 application that is writing out chunked data to
a 3D dataset and is exhibiting a large number of small writes upon
closing the file. Below I've attached a trace of POSIX calls on node 0
showing the file open, then 4 chunks of size 1757600 bytes being
written, then a series of 40 - 3136 byte writes (mostly 3136), and
then a truncate call before the file is closed. The small writes are
not ideal because this is a lustre file system on a Cray XT at NERSC.
Together, those small writes and truncate take about 30% of the time
from file open to close.

My hypothesis is that the small writes represent meta data related to
the chunk indexing. Does that sound right?

  Yes, that's probably correct.

What is the best way for me to consolidate these small writes into one large write? Should I use
H5Pset_meta_block_size() to set the block size to the lustre stripe
width of 1MB?

  Yes, that would probably help.

I'm a little concerned by the fact that the 3136 byte
writes are not to contiguous offsets, and perhaps cannot be
consolidated into a single write.

What is the purpose of the truncate? Can it be removed?

  I think with some analysis we could eliminate the truncate in some/all cases, but we'll need to finish getting funding in place to work on these issues with Lustre.

  Quincey

···

On Feb 13, 2009, at 3:41 PM, Mark Howison wrote:

On Fri, Feb 13, 2009 at 12:03 PM, Mark Howison <MHowison@lbl.gov> > wrote:

Thanks,

Mark Howison
mhowison@lbl.gov
Student Research Assistant
Visualization Group, Lawrence Berkeley National Labs

ifi=5 41 open64("../output/prs.h5part",66,-1) 3.26833e+01 6.02412e-03
ifi=5 0 close(41) 3.26894e+01 1.08004e-04
ifi=5 41 open64("../output/prs.h5part",2,-1) 3.26895e+01 3.85680e-02
ifi=5 0 lseek64(41,0,2) 3.27325e+01 1.83105e-03
ifi=5 0 lseek64(41,0,0) 3.27344e+01 9.53674e-07
ifi=5 96 write(41,0x7fffffffb740,96) 3.27358e+01 3.38793e-04
ifi=5 7304 lseek64(41,7304,0) 3.27391e+01 3.09944e-06
ifi=5 1757600 write(41,0x371cdc80,1757600) 3.27391e+01 9.63148e-01
ifi=5 1757600 write(41,0x3737ae20,1757600) 3.37023e+01 2.74949e-02
ifi=5 1757600 write(41,0x37527fc0,1757600) 3.37299e+01 1.32360e-02
ifi=5 1757600 write(41,0x376d5160,1757600) 3.37432e+01 1.96590e-02
ifi=5 96 lseek64(41,96,0) 3.45493e+01 1.90735e-06
ifi=5 40 write(41,0x5d0b8188,40) 3.45493e+01 1.57619e-03
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45509e+01 1.69277e-05
ifi=5 120 write(41,0x5d0b87f8,120) 3.45510e+01 1.50204e-05
ifi=5 40 write(41,0x5d0b9308,40) 3.45510e+01 1.38283e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45510e+01 1.50204e-05
ifi=5 120 write(41,0x5d0b9858,120) 3.45510e+01 1.38283e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45510e+01 1.59740e-05
ifi=5 40 write(41,0x5d0c18d8,40) 3.45511e+01 1.40667e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45511e+01 1.50204e-05
ifi=5 120 write(41,0x5d0c1ed8,120) 3.45511e+01 1.40667e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45511e+01 1.50204e-05
ifi=5 40 write(41,0x5d0c4288,40) 3.45511e+01 1.40667e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45512e+01 1.50204e-05
ifi=5 120 write(41,0x5d0c4918,120) 3.45512e+01 1.40667e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45512e+01 1.40667e-05
ifi=5 272 write(41,0x5d0c8948,272) 3.45512e+01 3.58105e-03
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45548e+01 1.69277e-05
ifi=5 114251304 lseek64(41,114251304,0) 3.45549e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45549e+01 1.46720e-02
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45696e+01 1.78814e-05
ifi=5 214440776 lseek64(41,214440776,0) 3.45696e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45696e+01 1.86720e-02
ifi=5 314627112 lseek64(41,314627112,0) 3.45883e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45883e+01 1.42689e-02
ifi=5 414813448 lseek64(41,414813448,0) 3.46026e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46026e+01 1.24190e-02
ifi=5 514999784 lseek64(41,514999784,0) 3.46150e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46150e+01 1.48160e-02
ifi=5 615186120 lseek64(41,615186120,0) 3.46299e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46299e+01 3.93460e-02
ifi=5 715372456 lseek64(41,715372456,0) 3.46693e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46693e+01 1.76220e-02
ifi=5 815558792 lseek64(41,815558792,0) 3.46869e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46869e+01 1.06070e-02
ifi=5 915745128 lseek64(41,915745128,0) 3.46975e+01 1.19209e-06
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46975e+01 1.74150e-02
ifi=5 1015931464 lseek64(41,1015931464,0) 3.47150e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47150e+01 1.11501e-02
ifi=5 1116117800 lseek64(41,1116117800,0) 3.47262e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47262e+01 1.67122e-02
ifi=5 1216304136 lseek64(41,1216304136,0) 3.47429e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47429e+01 5.77402e-03
ifi=5 1316490472 lseek64(41,1316490472,0) 3.47487e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47487e+01 1.83940e-02
ifi=5 1416676808 lseek64(41,1416676808,0) 3.47671e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47671e+01 1.35159e-02
ifi=5 1516863144 lseek64(41,1516863144,0) 3.47806e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47806e+01 1.70491e-02
ifi=5 1617049480 lseek64(41,1617049480,0) 3.47977e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47977e+01 9.32908e-03
ifi=5 1717235816 lseek64(41,1717235816,0) 3.48071e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48071e+01 1.15631e-02
ifi=5 1817422152 lseek64(41,1817422152,0) 3.48187e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48187e+01 8.60000e-03
ifi=5 1917608488 lseek64(41,1917608488,0) 3.48273e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48273e+01 6.62398e-03
ifi=5 2017794824 lseek64(41,2017794824,0) 3.48339e+01 1.19209e-06
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48339e+01 7.51495e-03
ifi=5 2117981160 lseek64(41,2117981160,0) 3.48415e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48415e+01 1.77360e-02
ifi=5 2218167496 lseek64(41,2218167496,0) 3.48592e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48592e+01 1.63181e-02
ifi=5 2249807432 lseek64(41,2249807432,0) 3.48756e+01 0.00000e+00
ifi=5 328 write(41,0x7fffffffb660,328) 3.48756e+01 7.10177e-03
ifi=5 0 lseek64(41,0,0) 3.48828e+01 0.00000e+00
ifi=5 96 write(41,0x7fffffffb510,96) 3.48828e+01 2.69413e-05
ifi=5 0 ftruncate64(41,2249809480) 3.48829e+01 7.08644e-01
ifi=5 0 fsync(41) 3.55917e+01 3.28633e-01
ifi=5 0 lseek64(41,0,0) 3.59472e+01 1.90735e-06
ifi=5 96 write(41,0x7fffffffb4d0,96) 3.59473e+01 5.88894e-05
ifi=5 0 close(41) 3.59477e+01 9.05991e-06

<node0-meta-data.png>----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Quincey,

I successfully used H5Pset_meta_block_size() to resize the meta block
to 1MB. You can see in the trace below that there is a seek to 1048576
before the data is written, presumably to set aside that first 1MB
block for meta data.

HDF5 is still making a series of small writes into the meta block
(everything after the lseek 96), but they are going through very
quickly now that the meta block is aligned to a lustre stripe
boundary.

However, as you can see in the attached plot, the truncate (purple) at
the end is still taking up a substantial amount of the total IO time
for this test application. For now, I will probably disable the
truncate directly in the MPI-POSIX VFD code, like Noel Keen has done,
but in the long term we should figure out why it is there and when it
is necessary. Hopefully, the lustre/HDF5 funding will come through
soon!

Thanks
Mark

ifi=5 -1 open64("../output/prs.h5part",2,-1) 3.47368e+01 2.34790e-02
ifi=5 41 open64("../output/prs.h5part",578,-1) 3.47642e+01 1.86651e-02
ifi=5 0 lseek64(41,0,0) 3.47932e+01 2.14577e-06
ifi=5 96 write(41,0x7fffffffb6e0,96) 3.47932e+01 1.29604e-03
ifi=5 1048576 lseek64(41,1048576,0) 3.48958e+01 3.09944e-06
ifi=5 1757600 write(41,0x37137d40,1757600) 3.48958e+01 1.58372e+00
ifi=5 1757600 write(41,0x372e4ee0,1757600) 3.64796e+01 4.77600e-01
ifi=5 1757600 write(41,0x37492080,1757600) 3.69572e+01 1.23870e-02
ifi=5 1757600 write(41,0x3763f220,1757600) 3.69696e+01 3.26340e-02
ifi=5 96 lseek64(41,96,0) 4.13958e+01 1.90735e-06
ifi=5 40 write(41,0x5d01aaa8,40) 4.13959e+01 4.13990e-03
ifi=5 544 write(41,0x5d01a548,544) 4.14000e+01 1.71661e-05
ifi=5 120 write(41,0x5d01b118,120) 4.14000e+01 1.50204e-05
ifi=5 40 write(41,0x5d01bc78,40) 4.14001e+01 1.50204e-05
ifi=5 544 write(41,0x5d01a548,544) 4.14001e+01 1.47820e-05
ifi=5 120 write(41,0x5d01c1c8,120) 4.14001e+01 1.38283e-05
ifi=5 328 write(41,0x7fffffffb630,328) 4.14001e+01 1.50204e-05
ifi=5 40 write(41,0x5d024248,40) 4.14001e+01 1.50204e-05
ifi=5 544 write(41,0x5d01a548,544) 4.14002e+01 1.50204e-05
ifi=5 120 write(41,0x5d024798,120) 4.14002e+01 1.50204e-05
ifi=5 328 write(41,0x7fffffffb630,328) 4.14002e+01 1.50204e-05
ifi=5 40 write(41,0x5d026ae8,40) 4.14002e+01 1.40667e-05
ifi=5 544 write(41,0x5d01a548,544) 4.14002e+01 1.50204e-05
ifi=5 120 write(41,0x5d0270d8,120) 4.14003e+01 1.50204e-05
ifi=5 328 write(41,0x7fffffffb630,328) 4.14003e+01 1.40667e-05
ifi=5 272 write(41,0x5d02b0d8,272) 4.14003e+01 1.54972e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14005e+01 1.62125e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14005e+01 1.54018e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14007e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14009e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14010e+01 1.69277e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14011e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14012e+01 1.53065e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14014e+01 1.53065e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14016e+01 1.59740e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14016e+01 1.51157e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14018e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14020e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14021e+01 1.59740e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14022e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14023e+01 1.50919e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14025e+01 1.53065e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14027e+01 1.50919e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14028e+01 1.59740e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14029e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14030e+01 1.48058e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14032e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14034e+01 1.69277e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14034e+01 1.51873e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14036e+01 1.52826e-04
ifi=5 328 write(41,0x7fffffffb630,328) 4.14037e+01 1.59740e-05
ifi=5 0 lseek64(41,0,0) 4.14039e+01 9.53674e-07
ifi=5 96 write(41,0x7fffffffb4e0,96) 4.14039e+01 1.69277e-05
ifi=5 0 ftruncate64(41,2250776576) 4.14039e+01 1.06910e+00
ifi=5 0 lseek64(41,0,0) 4.24736e+01 3.09944e-06
ifi=5 96 write(41,0x7fffffffb4a0,96) 4.24737e+01 4.69685e-05
ifi=5 0 close(41) 4.24739e+01 3.16906e-03

···

On Tue, Feb 17, 2009 at 9:19 AM, Quincey Koziol <koziol@hdfgroup.org> wrote:

Hi Mark,

On Feb 13, 2009, at 3:41 PM, Mark Howison wrote:

Also, here is a graph showing that same activity on node 0 (the first
row of pixels). The color key is:

blue = write
dark purple = truncate
purple = fsync
teal = fflush

Mark

On Fri, Feb 13, 2009 at 12:03 PM, Mark Howison <MHowison@lbl.gov> wrote:

Hello,

I have a parallel HDF5 application that is writing out chunked data to
a 3D dataset and is exhibiting a large number of small writes upon
closing the file. Below I've attached a trace of POSIX calls on node 0
showing the file open, then 4 chunks of size 1757600 bytes being
written, then a series of 40 - 3136 byte writes (mostly 3136), and
then a truncate call before the file is closed. The small writes are
not ideal because this is a lustre file system on a Cray XT at NERSC.
Together, those small writes and truncate take about 30% of the time
from file open to close.

My hypothesis is that the small writes represent meta data related to
the chunk indexing. Does that sound right?

   Yes, that&#39;s probably correct\.

What is the best way for me to consolidate these small writes into one
large write? Should I use
H5Pset_meta_block_size() to set the block size to the lustre stripe
width of 1MB?

   Yes, that would probably help\.

I'm a little concerned by the fact that the 3136 byte
writes are not to contiguous offsets, and perhaps cannot be
consolidated into a single write.

What is the purpose of the truncate? Can it be removed?

   I think with some analysis we could eliminate the truncate in

some/all cases, but we'll need to finish getting funding in place to work on
these issues with Lustre.

   Quincey

Thanks,

Mark Howison
mhowison@lbl.gov
Student Research Assistant
Visualization Group, Lawrence Berkeley National Labs

ifi=5 41 open64("../output/prs.h5part",66,-1) 3.26833e+01 6.02412e-03
ifi=5 0 close(41) 3.26894e+01 1.08004e-04
ifi=5 41 open64("../output/prs.h5part",2,-1) 3.26895e+01 3.85680e-02
ifi=5 0 lseek64(41,0,2) 3.27325e+01 1.83105e-03
ifi=5 0 lseek64(41,0,0) 3.27344e+01 9.53674e-07
ifi=5 96 write(41,0x7fffffffb740,96) 3.27358e+01 3.38793e-04
ifi=5 7304 lseek64(41,7304,0) 3.27391e+01 3.09944e-06
ifi=5 1757600 write(41,0x371cdc80,1757600) 3.27391e+01 9.63148e-01
ifi=5 1757600 write(41,0x3737ae20,1757600) 3.37023e+01 2.74949e-02
ifi=5 1757600 write(41,0x37527fc0,1757600) 3.37299e+01 1.32360e-02
ifi=5 1757600 write(41,0x376d5160,1757600) 3.37432e+01 1.96590e-02
ifi=5 96 lseek64(41,96,0) 3.45493e+01 1.90735e-06
ifi=5 40 write(41,0x5d0b8188,40) 3.45493e+01 1.57619e-03
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45509e+01 1.69277e-05
ifi=5 120 write(41,0x5d0b87f8,120) 3.45510e+01 1.50204e-05
ifi=5 40 write(41,0x5d0b9308,40) 3.45510e+01 1.38283e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45510e+01 1.50204e-05
ifi=5 120 write(41,0x5d0b9858,120) 3.45510e+01 1.38283e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45510e+01 1.59740e-05
ifi=5 40 write(41,0x5d0c18d8,40) 3.45511e+01 1.40667e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45511e+01 1.50204e-05
ifi=5 120 write(41,0x5d0c1ed8,120) 3.45511e+01 1.40667e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45511e+01 1.50204e-05
ifi=5 40 write(41,0x5d0c4288,40) 3.45511e+01 1.40667e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45512e+01 1.50204e-05
ifi=5 120 write(41,0x5d0c4918,120) 3.45512e+01 1.40667e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45512e+01 1.40667e-05
ifi=5 272 write(41,0x5d0c8948,272) 3.45512e+01 3.58105e-03
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45548e+01 1.69277e-05
ifi=5 114251304 lseek64(41,114251304,0) 3.45549e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45549e+01 1.46720e-02
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45696e+01 1.78814e-05
ifi=5 214440776 lseek64(41,214440776,0) 3.45696e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45696e+01 1.86720e-02
ifi=5 314627112 lseek64(41,314627112,0) 3.45883e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45883e+01 1.42689e-02
ifi=5 414813448 lseek64(41,414813448,0) 3.46026e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46026e+01 1.24190e-02
ifi=5 514999784 lseek64(41,514999784,0) 3.46150e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46150e+01 1.48160e-02
ifi=5 615186120 lseek64(41,615186120,0) 3.46299e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46299e+01 3.93460e-02
ifi=5 715372456 lseek64(41,715372456,0) 3.46693e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46693e+01 1.76220e-02
ifi=5 815558792 lseek64(41,815558792,0) 3.46869e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46869e+01 1.06070e-02
ifi=5 915745128 lseek64(41,915745128,0) 3.46975e+01 1.19209e-06
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46975e+01 1.74150e-02
ifi=5 1015931464 lseek64(41,1015931464,0) 3.47150e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47150e+01 1.11501e-02
ifi=5 1116117800 lseek64(41,1116117800,0) 3.47262e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47262e+01 1.67122e-02
ifi=5 1216304136 lseek64(41,1216304136,0) 3.47429e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47429e+01 5.77402e-03
ifi=5 1316490472 lseek64(41,1316490472,0) 3.47487e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47487e+01 1.83940e-02
ifi=5 1416676808 lseek64(41,1416676808,0) 3.47671e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47671e+01 1.35159e-02
ifi=5 1516863144 lseek64(41,1516863144,0) 3.47806e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47806e+01 1.70491e-02
ifi=5 1617049480 lseek64(41,1617049480,0) 3.47977e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47977e+01 9.32908e-03
ifi=5 1717235816 lseek64(41,1717235816,0) 3.48071e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48071e+01 1.15631e-02
ifi=5 1817422152 lseek64(41,1817422152,0) 3.48187e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48187e+01 8.60000e-03
ifi=5 1917608488 lseek64(41,1917608488,0) 3.48273e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48273e+01 6.62398e-03
ifi=5 2017794824 lseek64(41,2017794824,0) 3.48339e+01 1.19209e-06
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48339e+01 7.51495e-03
ifi=5 2117981160 lseek64(41,2117981160,0) 3.48415e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48415e+01 1.77360e-02
ifi=5 2218167496 lseek64(41,2218167496,0) 3.48592e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48592e+01 1.63181e-02
ifi=5 2249807432 lseek64(41,2249807432,0) 3.48756e+01 0.00000e+00
ifi=5 328 write(41,0x7fffffffb660,328) 3.48756e+01 7.10177e-03
ifi=5 0 lseek64(41,0,0) 3.48828e+01 0.00000e+00
ifi=5 96 write(41,0x7fffffffb510,96) 3.48828e+01 2.69413e-05
ifi=5 0 ftruncate64(41,2249809480) 3.48829e+01 7.08644e-01
ifi=5 0 fsync(41) 3.55917e+01 3.28633e-01
ifi=5 0 lseek64(41,0,0) 3.59472e+01 1.90735e-06
ifi=5 96 write(41,0x7fffffffb4d0,96) 3.59473e+01 5.88894e-05
ifi=5 0 close(41) 3.59477e+01 9.05991e-06

<node0-meta-data.png>----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to
hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hello,

Just wanted to point out that the HDF version I modified to comment out the ftruncate
was 1.8.0 and that this ftruncate was done by all procs. (I think it was similar
in 1.6.5 ...)

  I _think_ that in version 1.8.2 (the latest), there was a change made to only
do the ftruncate on rank0.

Whether or not to truncate is a decision with procs&cons... Safest to truncate...
may not be necessary for all... may be hiding some other issue.... ?

Mark, did you verify that the truncate is truncatting the right amount?
(Looks like 2249809480 below...)

Noel

Mark Howison wrote:

···

Hi Quincey,

I successfully used H5Pset_meta_block_size() to resize the meta block
to 1MB. You can see in the trace below that there is a seek to 1048576
before the data is written, presumably to set aside that first 1MB
block for meta data.

HDF5 is still making a series of small writes into the meta block
(everything after the lseek 96), but they are going through very
quickly now that the meta block is aligned to a lustre stripe
boundary.

However, as you can see in the attached plot, the truncate (purple) at
the end is still taking up a substantial amount of the total IO time
for this test application. For now, I will probably disable the
truncate directly in the MPI-POSIX VFD code, like Noel Keen has done,
but in the long term we should figure out why it is there and when it
is necessary. Hopefully, the lustre/HDF5 funding will come through
soon!

Thanks
Mark

ifi=5 -1 open64("../output/prs.h5part",2,-1) 3.47368e+01 2.34790e-02
ifi=5 41 open64("../output/prs.h5part",578,-1) 3.47642e+01 1.86651e-02
ifi=5 0 lseek64(41,0,0) 3.47932e+01 2.14577e-06
ifi=5 96 write(41,0x7fffffffb6e0,96) 3.47932e+01 1.29604e-03
ifi=5 1048576 lseek64(41,1048576,0) 3.48958e+01 3.09944e-06
ifi=5 1757600 write(41,0x37137d40,1757600) 3.48958e+01 1.58372e+00
ifi=5 1757600 write(41,0x372e4ee0,1757600) 3.64796e+01 4.77600e-01
ifi=5 1757600 write(41,0x37492080,1757600) 3.69572e+01 1.23870e-02
ifi=5 1757600 write(41,0x3763f220,1757600) 3.69696e+01 3.26340e-02
ifi=5 96 lseek64(41,96,0) 4.13958e+01 1.90735e-06
ifi=5 40 write(41,0x5d01aaa8,40) 4.13959e+01 4.13990e-03
ifi=5 544 write(41,0x5d01a548,544) 4.14000e+01 1.71661e-05
ifi=5 120 write(41,0x5d01b118,120) 4.14000e+01 1.50204e-05
ifi=5 40 write(41,0x5d01bc78,40) 4.14001e+01 1.50204e-05
ifi=5 544 write(41,0x5d01a548,544) 4.14001e+01 1.47820e-05
ifi=5 120 write(41,0x5d01c1c8,120) 4.14001e+01 1.38283e-05
ifi=5 328 write(41,0x7fffffffb630,328) 4.14001e+01 1.50204e-05
ifi=5 40 write(41,0x5d024248,40) 4.14001e+01 1.50204e-05
ifi=5 544 write(41,0x5d01a548,544) 4.14002e+01 1.50204e-05
ifi=5 120 write(41,0x5d024798,120) 4.14002e+01 1.50204e-05
ifi=5 328 write(41,0x7fffffffb630,328) 4.14002e+01 1.50204e-05
ifi=5 40 write(41,0x5d026ae8,40) 4.14002e+01 1.40667e-05
ifi=5 544 write(41,0x5d01a548,544) 4.14002e+01 1.50204e-05
ifi=5 120 write(41,0x5d0270d8,120) 4.14003e+01 1.50204e-05
ifi=5 328 write(41,0x7fffffffb630,328) 4.14003e+01 1.40667e-05
ifi=5 272 write(41,0x5d02b0d8,272) 4.14003e+01 1.54972e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14005e+01 1.62125e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14005e+01 1.54018e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14007e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14009e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14010e+01 1.69277e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14011e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14012e+01 1.53065e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14014e+01 1.53065e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14016e+01 1.59740e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14016e+01 1.51157e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14018e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14020e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14021e+01 1.59740e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14022e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14023e+01 1.50919e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14025e+01 1.53065e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14027e+01 1.50919e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14028e+01 1.59740e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14029e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14030e+01 1.48058e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14032e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14034e+01 1.69277e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14034e+01 1.51873e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14036e+01 1.52826e-04
ifi=5 328 write(41,0x7fffffffb630,328) 4.14037e+01 1.59740e-05
ifi=5 0 lseek64(41,0,0) 4.14039e+01 9.53674e-07
ifi=5 96 write(41,0x7fffffffb4e0,96) 4.14039e+01 1.69277e-05
ifi=5 0 ftruncate64(41,2250776576) 4.14039e+01 1.06910e+00
ifi=5 0 lseek64(41,0,0) 4.24736e+01 3.09944e-06
ifi=5 96 write(41,0x7fffffffb4a0,96) 4.24737e+01 4.69685e-05
ifi=5 0 close(41) 4.24739e+01 3.16906e-03

On Tue, Feb 17, 2009 at 9:19 AM, Quincey Koziol <koziol@hdfgroup.org> wrote:
  

Hi Mark,

On Feb 13, 2009, at 3:41 PM, Mark Howison wrote:

Also, here is a graph showing that same activity on node 0 (the first
row of pixels). The color key is:

blue = write
dark purple = truncate
purple = fsync
teal = fflush

Mark

On Fri, Feb 13, 2009 at 12:03 PM, Mark Howison <MHowison@lbl.gov> wrote:
      

Hello,

I have a parallel HDF5 application that is writing out chunked data to
a 3D dataset and is exhibiting a large number of small writes upon
closing the file. Below I've attached a trace of POSIX calls on node 0
showing the file open, then 4 chunks of size 1757600 bytes being
written, then a series of 40 - 3136 byte writes (mostly 3136), and
then a truncate call before the file is closed. The small writes are
not ideal because this is a lustre file system on a Cray XT at NERSC.
Together, those small writes and truncate take about 30% of the time
from file open to close.

My hypothesis is that the small writes represent meta data related to
the chunk indexing. Does that sound right?
        

       Yes, that's probably correct.

What is the best way for me to consolidate these small writes into one
large write? Should I use
H5Pset_meta_block_size() to set the block size to the lustre stripe
width of 1MB?
        

       Yes, that would probably help.

I'm a little concerned by the fact that the 3136 byte
writes are not to contiguous offsets, and perhaps cannot be
consolidated into a single write.

What is the purpose of the truncate? Can it be removed?
        

       I think with some analysis we could eliminate the truncate in
some/all cases, but we'll need to finish getting funding in place to work on
these issues with Lustre.

       Quincey

Thanks,

Mark Howison
mhowison@lbl.gov
Student Research Assistant
Visualization Group, Lawrence Berkeley National Labs

ifi=5 41 open64("../output/prs.h5part",66,-1) 3.26833e+01 6.02412e-03
ifi=5 0 close(41) 3.26894e+01 1.08004e-04
ifi=5 41 open64("../output/prs.h5part",2,-1) 3.26895e+01 3.85680e-02
ifi=5 0 lseek64(41,0,2) 3.27325e+01 1.83105e-03
ifi=5 0 lseek64(41,0,0) 3.27344e+01 9.53674e-07
ifi=5 96 write(41,0x7fffffffb740,96) 3.27358e+01 3.38793e-04
ifi=5 7304 lseek64(41,7304,0) 3.27391e+01 3.09944e-06
ifi=5 1757600 write(41,0x371cdc80,1757600) 3.27391e+01 9.63148e-01
ifi=5 1757600 write(41,0x3737ae20,1757600) 3.37023e+01 2.74949e-02
ifi=5 1757600 write(41,0x37527fc0,1757600) 3.37299e+01 1.32360e-02
ifi=5 1757600 write(41,0x376d5160,1757600) 3.37432e+01 1.96590e-02
ifi=5 96 lseek64(41,96,0) 3.45493e+01 1.90735e-06
ifi=5 40 write(41,0x5d0b8188,40) 3.45493e+01 1.57619e-03
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45509e+01 1.69277e-05
ifi=5 120 write(41,0x5d0b87f8,120) 3.45510e+01 1.50204e-05
ifi=5 40 write(41,0x5d0b9308,40) 3.45510e+01 1.38283e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45510e+01 1.50204e-05
ifi=5 120 write(41,0x5d0b9858,120) 3.45510e+01 1.38283e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45510e+01 1.59740e-05
ifi=5 40 write(41,0x5d0c18d8,40) 3.45511e+01 1.40667e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45511e+01 1.50204e-05
ifi=5 120 write(41,0x5d0c1ed8,120) 3.45511e+01 1.40667e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45511e+01 1.50204e-05
ifi=5 40 write(41,0x5d0c4288,40) 3.45511e+01 1.40667e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45512e+01 1.50204e-05
ifi=5 120 write(41,0x5d0c4918,120) 3.45512e+01 1.40667e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45512e+01 1.40667e-05
ifi=5 272 write(41,0x5d0c8948,272) 3.45512e+01 3.58105e-03
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45548e+01 1.69277e-05
ifi=5 114251304 lseek64(41,114251304,0) 3.45549e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45549e+01 1.46720e-02
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45696e+01 1.78814e-05
ifi=5 214440776 lseek64(41,214440776,0) 3.45696e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45696e+01 1.86720e-02
ifi=5 314627112 lseek64(41,314627112,0) 3.45883e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45883e+01 1.42689e-02
ifi=5 414813448 lseek64(41,414813448,0) 3.46026e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46026e+01 1.24190e-02
ifi=5 514999784 lseek64(41,514999784,0) 3.46150e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46150e+01 1.48160e-02
ifi=5 615186120 lseek64(41,615186120,0) 3.46299e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46299e+01 3.93460e-02
ifi=5 715372456 lseek64(41,715372456,0) 3.46693e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46693e+01 1.76220e-02
ifi=5 815558792 lseek64(41,815558792,0) 3.46869e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46869e+01 1.06070e-02
ifi=5 915745128 lseek64(41,915745128,0) 3.46975e+01 1.19209e-06
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46975e+01 1.74150e-02
ifi=5 1015931464 lseek64(41,1015931464,0) 3.47150e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47150e+01 1.11501e-02
ifi=5 1116117800 lseek64(41,1116117800,0) 3.47262e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47262e+01 1.67122e-02
ifi=5 1216304136 lseek64(41,1216304136,0) 3.47429e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47429e+01 5.77402e-03
ifi=5 1316490472 lseek64(41,1316490472,0) 3.47487e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47487e+01 1.83940e-02
ifi=5 1416676808 lseek64(41,1416676808,0) 3.47671e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47671e+01 1.35159e-02
ifi=5 1516863144 lseek64(41,1516863144,0) 3.47806e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47806e+01 1.70491e-02
ifi=5 1617049480 lseek64(41,1617049480,0) 3.47977e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47977e+01 9.32908e-03
ifi=5 1717235816 lseek64(41,1717235816,0) 3.48071e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48071e+01 1.15631e-02
ifi=5 1817422152 lseek64(41,1817422152,0) 3.48187e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48187e+01 8.60000e-03
ifi=5 1917608488 lseek64(41,1917608488,0) 3.48273e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48273e+01 6.62398e-03
ifi=5 2017794824 lseek64(41,2017794824,0) 3.48339e+01 1.19209e-06
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48339e+01 7.51495e-03
ifi=5 2117981160 lseek64(41,2117981160,0) 3.48415e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48415e+01 1.77360e-02
ifi=5 2218167496 lseek64(41,2218167496,0) 3.48592e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48592e+01 1.63181e-02
ifi=5 2249807432 lseek64(41,2249807432,0) 3.48756e+01 0.00000e+00
ifi=5 328 write(41,0x7fffffffb660,328) 3.48756e+01 7.10177e-03
ifi=5 0 lseek64(41,0,0) 3.48828e+01 0.00000e+00
ifi=5 96 write(41,0x7fffffffb510,96) 3.48828e+01 2.69413e-05
ifi=5 0 ftruncate64(41,2249809480) 3.48829e+01 7.08644e-01
ifi=5 0 fsync(41) 3.55917e+01 3.28633e-01
ifi=5 0 lseek64(41,0,0) 3.59472e+01 1.90735e-06
ifi=5 96 write(41,0x7fffffffb4d0,96) 3.59473e+01 5.88894e-05
ifi=5 0 close(41) 3.59477e+01 9.05991e-06

<node0-meta-data.png>----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to
hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.
      
------------------------------------------------------------------------

However, as you can see in the attached plot, the truncate (purple) at
the end is still taking up a substantial amount of the total IO time
for this test application. For now, I will probably disable the
truncate directly in the MPI-POSIX VFD code, like Noel Keen has done,
but in the long term we should figure out why it is there and when it
is necessary. Hopefully, the lustre/HDF5 funding will come through
soon!

Maybe HDF5 needs to truncate, maybe it doesn't. But if I'm reading
your plot right, only one process is calling truncate. Sounds to me
like you've found a Lustre issue.

What does lustre do if you run a standalone program that calls
ftruncate to create a 2GB file? To create a 2250776576 byte file? If
you do a few writes before calling ftruncate?

==rob

···

On Mon, Mar 23, 2009 at 12:05:20PM -0700, Mark Howison wrote:

Thanks
Mark

ifi=5 -1 open64("../output/prs.h5part",2,-1) 3.47368e+01 2.34790e-02
ifi=5 41 open64("../output/prs.h5part",578,-1) 3.47642e+01 1.86651e-02
ifi=5 0 lseek64(41,0,0) 3.47932e+01 2.14577e-06
ifi=5 96 write(41,0x7fffffffb6e0,96) 3.47932e+01 1.29604e-03
ifi=5 1048576 lseek64(41,1048576,0) 3.48958e+01 3.09944e-06
ifi=5 1757600 write(41,0x37137d40,1757600) 3.48958e+01 1.58372e+00
ifi=5 1757600 write(41,0x372e4ee0,1757600) 3.64796e+01 4.77600e-01
ifi=5 1757600 write(41,0x37492080,1757600) 3.69572e+01 1.23870e-02
ifi=5 1757600 write(41,0x3763f220,1757600) 3.69696e+01 3.26340e-02
ifi=5 96 lseek64(41,96,0) 4.13958e+01 1.90735e-06
ifi=5 40 write(41,0x5d01aaa8,40) 4.13959e+01 4.13990e-03
ifi=5 544 write(41,0x5d01a548,544) 4.14000e+01 1.71661e-05
ifi=5 120 write(41,0x5d01b118,120) 4.14000e+01 1.50204e-05
ifi=5 40 write(41,0x5d01bc78,40) 4.14001e+01 1.50204e-05
ifi=5 544 write(41,0x5d01a548,544) 4.14001e+01 1.47820e-05
ifi=5 120 write(41,0x5d01c1c8,120) 4.14001e+01 1.38283e-05
ifi=5 328 write(41,0x7fffffffb630,328) 4.14001e+01 1.50204e-05
ifi=5 40 write(41,0x5d024248,40) 4.14001e+01 1.50204e-05
ifi=5 544 write(41,0x5d01a548,544) 4.14002e+01 1.50204e-05
ifi=5 120 write(41,0x5d024798,120) 4.14002e+01 1.50204e-05
ifi=5 328 write(41,0x7fffffffb630,328) 4.14002e+01 1.50204e-05
ifi=5 40 write(41,0x5d026ae8,40) 4.14002e+01 1.40667e-05
ifi=5 544 write(41,0x5d01a548,544) 4.14002e+01 1.50204e-05
ifi=5 120 write(41,0x5d0270d8,120) 4.14003e+01 1.50204e-05
ifi=5 328 write(41,0x7fffffffb630,328) 4.14003e+01 1.40667e-05
ifi=5 272 write(41,0x5d02b0d8,272) 4.14003e+01 1.54972e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14005e+01 1.62125e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14005e+01 1.54018e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14007e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14009e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14010e+01 1.69277e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14011e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14012e+01 1.53065e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14014e+01 1.53065e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14016e+01 1.59740e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14016e+01 1.51157e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14018e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14020e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14021e+01 1.59740e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14022e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14023e+01 1.50919e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14025e+01 1.53065e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14027e+01 1.50919e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14028e+01 1.59740e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14029e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14030e+01 1.48058e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14032e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14034e+01 1.69277e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14034e+01 1.51873e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14036e+01 1.52826e-04
ifi=5 328 write(41,0x7fffffffb630,328) 4.14037e+01 1.59740e-05
ifi=5 0 lseek64(41,0,0) 4.14039e+01 9.53674e-07
ifi=5 96 write(41,0x7fffffffb4e0,96) 4.14039e+01 1.69277e-05
ifi=5 0 ftruncate64(41,2,250,776,576) 4.14039e+01 1.06910e+00
ifi=5 0 lseek64(41,0,0) 4.24736e+01 3.09944e-06
ifi=5 96 write(41,0x7fffffffb4a0,96) 4.24737e+01 4.69685e-05
ifi=5 0 close(41) 4.24739e+01 3.16906e-03

On Tue, Feb 17, 2009 at 9:19 AM, Quincey Koziol <koziol@hdfgroup.org> wrote:
> Hi Mark,
>
> On Feb 13, 2009, at 3:41 PM, Mark Howison wrote:
>
>> Also, here is a graph showing that same activity on node 0 (the first
>> row of pixels). The color key is:
>>
>> blue = write
>> dark purple = truncate
>> purple = fsync
>> teal = fflush
>>
>> Mark
>>
>>
>> On Fri, Feb 13, 2009 at 12:03 PM, Mark Howison <MHowison@lbl.gov> wrote:
>>>
>>> Hello,
>>>
>>> I have a parallel HDF5 application that is writing out chunked data to
>>> a 3D dataset and is exhibiting a large number of small writes upon
>>> closing the file. Below I've attached a trace of POSIX calls on node 0
>>> showing the file open, then 4 chunks of size 1757600 bytes being
>>> written, then a series of 40 - 3136 byte writes (mostly 3136), and
>>> then a truncate call before the file is closed. The small writes are
>>> not ideal because this is a lustre file system on a Cray XT at NERSC.
>>> Together, those small writes and truncate take about 30% of the time
>>> from file open to close.
>>>
>>> My hypothesis is that the small writes represent meta data related to
>>> the chunk indexing. Does that sound right?
>
> � � � �Yes, that's probably correct.
>
>>> What is the best way for me to consolidate these small writes into one
>>> large write? Should I use
>>> H5Pset_meta_block_size() to set the block size to the lustre stripe
>>> width of 1MB?
>
> � � � �Yes, that would probably help.
>
>>> I'm a little concerned by the fact that the 3136 byte
>>> writes are not to contiguous offsets, and perhaps cannot be
>>> consolidated into a single write.
>>>
>>> What is the purpose of the truncate? Can it be removed?
>
> � � � �I think with some analysis we could eliminate the truncate in
> some/all cases, but we'll need to finish getting funding in place to work on
> these issues with Lustre.
>
> � � � �Quincey
>
>>> Thanks,
>>>
>>> Mark Howison
>>> mhowison@lbl.gov
>>> Student Research Assistant
>>> Visualization Group, Lawrence Berkeley National Labs
>>>
>>>
>>> ifi=5 41 open64("../output/prs.h5part",66,-1) 3.26833e+01 6.02412e-03
>>> ifi=5 0 close(41) 3.26894e+01 1.08004e-04
>>> ifi=5 41 open64("../output/prs.h5part",2,-1) 3.26895e+01 3.85680e-02
>>> ifi=5 0 lseek64(41,0,2) 3.27325e+01 1.83105e-03
>>> ifi=5 0 lseek64(41,0,0) 3.27344e+01 9.53674e-07
>>> ifi=5 96 write(41,0x7fffffffb740,96) 3.27358e+01 3.38793e-04
>>> ifi=5 7304 lseek64(41,7304,0) 3.27391e+01 3.09944e-06
>>> ifi=5 1757600 write(41,0x371cdc80,1757600) 3.27391e+01 9.63148e-01
>>> ifi=5 1757600 write(41,0x3737ae20,1757600) 3.37023e+01 2.74949e-02
>>> ifi=5 1757600 write(41,0x37527fc0,1757600) 3.37299e+01 1.32360e-02
>>> ifi=5 1757600 write(41,0x376d5160,1757600) 3.37432e+01 1.96590e-02
>>> ifi=5 96 lseek64(41,96,0) 3.45493e+01 1.90735e-06
>>> ifi=5 40 write(41,0x5d0b8188,40) 3.45493e+01 1.57619e-03
>>> ifi=5 544 write(41,0x5d0b7ca8,544) 3.45509e+01 1.69277e-05
>>> ifi=5 120 write(41,0x5d0b87f8,120) 3.45510e+01 1.50204e-05
>>> ifi=5 40 write(41,0x5d0b9308,40) 3.45510e+01 1.38283e-05
>>> ifi=5 544 write(41,0x5d0b7ca8,544) 3.45510e+01 1.50204e-05
>>> ifi=5 120 write(41,0x5d0b9858,120) 3.45510e+01 1.38283e-05
>>> ifi=5 328 write(41,0x7fffffffb660,328) 3.45510e+01 1.59740e-05
>>> ifi=5 40 write(41,0x5d0c18d8,40) 3.45511e+01 1.40667e-05
>>> ifi=5 544 write(41,0x5d0b7ca8,544) 3.45511e+01 1.50204e-05
>>> ifi=5 120 write(41,0x5d0c1ed8,120) 3.45511e+01 1.40667e-05
>>> ifi=5 328 write(41,0x7fffffffb660,328) 3.45511e+01 1.50204e-05
>>> ifi=5 40 write(41,0x5d0c4288,40) 3.45511e+01 1.40667e-05
>>> ifi=5 544 write(41,0x5d0b7ca8,544) 3.45512e+01 1.50204e-05
>>> ifi=5 120 write(41,0x5d0c4918,120) 3.45512e+01 1.40667e-05
>>> ifi=5 328 write(41,0x7fffffffb660,328) 3.45512e+01 1.40667e-05
>>> ifi=5 272 write(41,0x5d0c8948,272) 3.45512e+01 3.58105e-03
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.45548e+01 1.69277e-05
>>> ifi=5 114251304 lseek64(41,114251304,0) 3.45549e+01 9.53674e-07
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.45549e+01 1.46720e-02
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.45696e+01 1.78814e-05
>>> ifi=5 214440776 lseek64(41,214440776,0) 3.45696e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.45696e+01 1.86720e-02
>>> ifi=5 314627112 lseek64(41,314627112,0) 3.45883e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.45883e+01 1.42689e-02
>>> ifi=5 414813448 lseek64(41,414813448,0) 3.46026e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.46026e+01 1.24190e-02
>>> ifi=5 514999784 lseek64(41,514999784,0) 3.46150e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.46150e+01 1.48160e-02
>>> ifi=5 615186120 lseek64(41,615186120,0) 3.46299e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.46299e+01 3.93460e-02
>>> ifi=5 715372456 lseek64(41,715372456,0) 3.46693e+01 9.53674e-07
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.46693e+01 1.76220e-02
>>> ifi=5 815558792 lseek64(41,815558792,0) 3.46869e+01 9.53674e-07
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.46869e+01 1.06070e-02
>>> ifi=5 915745128 lseek64(41,915745128,0) 3.46975e+01 1.19209e-06
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.46975e+01 1.74150e-02
>>> ifi=5 1015931464 lseek64(41,1015931464,0) 3.47150e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47150e+01 1.11501e-02
>>> ifi=5 1116117800 lseek64(41,1116117800,0) 3.47262e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47262e+01 1.67122e-02
>>> ifi=5 1216304136 lseek64(41,1216304136,0) 3.47429e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47429e+01 5.77402e-03
>>> ifi=5 1316490472 lseek64(41,1316490472,0) 3.47487e+01 9.53674e-07
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47487e+01 1.83940e-02
>>> ifi=5 1416676808 lseek64(41,1416676808,0) 3.47671e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47671e+01 1.35159e-02
>>> ifi=5 1516863144 lseek64(41,1516863144,0) 3.47806e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47806e+01 1.70491e-02
>>> ifi=5 1617049480 lseek64(41,1617049480,0) 3.47977e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47977e+01 9.32908e-03
>>> ifi=5 1717235816 lseek64(41,1717235816,0) 3.48071e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.48071e+01 1.15631e-02
>>> ifi=5 1817422152 lseek64(41,1817422152,0) 3.48187e+01 9.53674e-07
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.48187e+01 8.60000e-03
>>> ifi=5 1917608488 lseek64(41,1917608488,0) 3.48273e+01 9.53674e-07
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.48273e+01 6.62398e-03
>>> ifi=5 2017794824 lseek64(41,2017794824,0) 3.48339e+01 1.19209e-06
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.48339e+01 7.51495e-03
>>> ifi=5 2117981160 lseek64(41,2117981160,0) 3.48415e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.48415e+01 1.77360e-02
>>> ifi=5 2218167496 lseek64(41,2218167496,0) 3.48592e+01 9.53674e-07
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.48592e+01 1.63181e-02
>>> ifi=5 2249807432 lseek64(41,2249807432,0) 3.48756e+01 0.00000e+00
>>> ifi=5 328 write(41,0x7fffffffb660,328) 3.48756e+01 7.10177e-03
>>> ifi=5 0 lseek64(41,0,0) 3.48828e+01 0.00000e+00
>>> ifi=5 96 write(41,0x7fffffffb510,96) 3.48828e+01 2.69413e-05
>>> ifi=5 0 ftruncate64(41,2249809480) 3.48829e+01 7.08644e-01
>>> ifi=5 0 fsync(41) 3.55917e+01 3.28633e-01
>>> ifi=5 0 lseek64(41,0,0) 3.59472e+01 1.90735e-06
>>> ifi=5 96 write(41,0x7fffffffb4d0,96) 3.59473e+01 5.88894e-05
>>> ifi=5 0 close(41) 3.59477e+01 9.05991e-06
>>>
>>
>> <node0-meta-data.png>----------------------------------------------------------------------
>> This mailing list is for HDF software users discussion.
>> To subscribe to this list, send a message to
>> hdf-forum-subscribe@hdfgroup.org.
>> To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.
>
>

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

Hi Noel,

I am working with 1.8.1, and it only calls ftruncate on the meta data
proc (which defaults to proc 0):

/* Use the round-robin process to truncate (extend) the file */
        if(file->mpi_rank == H5_PAR_META_WRITE) {

I'm not sure what the "round-robin" process is, but it probably refers
to whatever 1.8.0 was doing with all procs issuing ftruncate.

The truncate amount seems to be correct. The final file looks like:

-rw------- 1 mhowison mhowison 2250776576 Mar 20 2009 prs.h5part

and the truncate call has argument 2250776576. (I think you found
2249809480 in the trace from a previous email, all the way at the
bottom.)

Mark

···

On Mon, Mar 23, 2009 at 12:21 PM, noel keen <noel@hpcrd.lbl.gov> wrote:

Hello,

Just wanted to point out that the HDF version I modified to comment out the
ftruncate
was 1.8.0 and that this ftruncate was done by all procs. (I think it was
similar
in 1.6.5 ...)

I _think_ that in version 1.8.2 (the latest), there was a change made to
only
do the ftruncate on rank0.

Whether or not to truncate is a decision with procs&cons... Safest to
truncate...
may not be necessary for all... may be hiding some other issue.... ?

Mark, did you verify that the truncate is truncatting the right amount?
(Looks like 2249809480 below...)

Noel

Mark Howison wrote:

Hi Quincey,

I successfully used H5Pset_meta_block_size() to resize the meta block
to 1MB. You can see in the trace below that there is a seek to 1048576
before the data is written, presumably to set aside that first 1MB
block for meta data.

HDF5 is still making a series of small writes into the meta block
(everything after the lseek 96), but they are going through very
quickly now that the meta block is aligned to a lustre stripe
boundary.

However, as you can see in the attached plot, the truncate (purple) at
the end is still taking up a substantial amount of the total IO time
for this test application. For now, I will probably disable the
truncate directly in the MPI-POSIX VFD code, like Noel Keen has done,
but in the long term we should figure out why it is there and when it
is necessary. Hopefully, the lustre/HDF5 funding will come through
soon!

Thanks
Mark

ifi=5 -1 open64("../output/prs.h5part",2,-1) 3.47368e+01 2.34790e-02
ifi=5 41 open64("../output/prs.h5part",578,-1) 3.47642e+01 1.86651e-02
ifi=5 0 lseek64(41,0,0) 3.47932e+01 2.14577e-06
ifi=5 96 write(41,0x7fffffffb6e0,96) 3.47932e+01 1.29604e-03
ifi=5 1048576 lseek64(41,1048576,0) 3.48958e+01 3.09944e-06
ifi=5 1757600 write(41,0x37137d40,1757600) 3.48958e+01 1.58372e+00
ifi=5 1757600 write(41,0x372e4ee0,1757600) 3.64796e+01 4.77600e-01
ifi=5 1757600 write(41,0x37492080,1757600) 3.69572e+01 1.23870e-02
ifi=5 1757600 write(41,0x3763f220,1757600) 3.69696e+01 3.26340e-02
ifi=5 96 lseek64(41,96,0) 4.13958e+01 1.90735e-06
ifi=5 40 write(41,0x5d01aaa8,40) 4.13959e+01 4.13990e-03
ifi=5 544 write(41,0x5d01a548,544) 4.14000e+01 1.71661e-05
ifi=5 120 write(41,0x5d01b118,120) 4.14000e+01 1.50204e-05
ifi=5 40 write(41,0x5d01bc78,40) 4.14001e+01 1.50204e-05
ifi=5 544 write(41,0x5d01a548,544) 4.14001e+01 1.47820e-05
ifi=5 120 write(41,0x5d01c1c8,120) 4.14001e+01 1.38283e-05
ifi=5 328 write(41,0x7fffffffb630,328) 4.14001e+01 1.50204e-05
ifi=5 40 write(41,0x5d024248,40) 4.14001e+01 1.50204e-05
ifi=5 544 write(41,0x5d01a548,544) 4.14002e+01 1.50204e-05
ifi=5 120 write(41,0x5d024798,120) 4.14002e+01 1.50204e-05
ifi=5 328 write(41,0x7fffffffb630,328) 4.14002e+01 1.50204e-05
ifi=5 40 write(41,0x5d026ae8,40) 4.14002e+01 1.40667e-05
ifi=5 544 write(41,0x5d01a548,544) 4.14002e+01 1.50204e-05
ifi=5 120 write(41,0x5d0270d8,120) 4.14003e+01 1.50204e-05
ifi=5 328 write(41,0x7fffffffb630,328) 4.14003e+01 1.40667e-05
ifi=5 272 write(41,0x5d02b0d8,272) 4.14003e+01 1.54972e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14005e+01 1.62125e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14005e+01 1.54018e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14007e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14009e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14010e+01 1.69277e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14011e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14012e+01 1.53065e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14014e+01 1.53065e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14016e+01 1.59740e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14016e+01 1.51157e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14018e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14020e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14021e+01 1.59740e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14022e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14023e+01 1.50919e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14025e+01 1.53065e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14027e+01 1.50919e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14028e+01 1.59740e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14029e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14030e+01 1.48058e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14032e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14034e+01 1.69277e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14034e+01 1.51873e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14036e+01 1.52826e-04
ifi=5 328 write(41,0x7fffffffb630,328) 4.14037e+01 1.59740e-05
ifi=5 0 lseek64(41,0,0) 4.14039e+01 9.53674e-07
ifi=5 96 write(41,0x7fffffffb4e0,96) 4.14039e+01 1.69277e-05
ifi=5 0 ftruncate64(41,2250776576) 4.14039e+01 1.06910e+00
ifi=5 0 lseek64(41,0,0) 4.24736e+01 3.09944e-06
ifi=5 96 write(41,0x7fffffffb4a0,96) 4.24737e+01 4.69685e-05
ifi=5 0 close(41) 4.24739e+01 3.16906e-03

On Tue, Feb 17, 2009 at 9:19 AM, Quincey Koziol <koziol@hdfgroup.org> >> wrote:

Hi Mark,

On Feb 13, 2009, at 3:41 PM, Mark Howison wrote:

Also, here is a graph showing that same activity on node 0 (the first
row of pixels). The color key is:

blue = write
dark purple = truncate
purple = fsync
teal = fflush

Mark

On Fri, Feb 13, 2009 at 12:03 PM, Mark Howison <MHowison@lbl.gov> wrote:

Hello,

I have a parallel HDF5 application that is writing out chunked data to
a 3D dataset and is exhibiting a large number of small writes upon
closing the file. Below I've attached a trace of POSIX calls on node 0
showing the file open, then 4 chunks of size 1757600 bytes being
written, then a series of 40 - 3136 byte writes (mostly 3136), and
then a truncate call before the file is closed. The small writes are
not ideal because this is a lustre file system on a Cray XT at NERSC.
Together, those small writes and truncate take about 30% of the time
from file open to close.

My hypothesis is that the small writes represent meta data related to
the chunk indexing. Does that sound right?

  Yes, that&#39;s probably correct\.

What is the best way for me to consolidate these small writes into one
large write? Should I use
H5Pset_meta_block_size() to set the block size to the lustre stripe
width of 1MB?

  Yes, that would probably help\.

I'm a little concerned by the fact that the 3136 byte
writes are not to contiguous offsets, and perhaps cannot be
consolidated into a single write.

What is the purpose of the truncate? Can it be removed?

  I think with some analysis we could eliminate the truncate in

some/all cases, but we'll need to finish getting funding in place to work
on
these issues with Lustre.

  Quincey

Thanks,

Mark Howison
mhowison@lbl.gov
Student Research Assistant
Visualization Group, Lawrence Berkeley National Labs

ifi=5 41 open64("../output/prs.h5part",66,-1) 3.26833e+01 6.02412e-03
ifi=5 0 close(41) 3.26894e+01 1.08004e-04
ifi=5 41 open64("../output/prs.h5part",2,-1) 3.26895e+01 3.85680e-02
ifi=5 0 lseek64(41,0,2) 3.27325e+01 1.83105e-03
ifi=5 0 lseek64(41,0,0) 3.27344e+01 9.53674e-07
ifi=5 96 write(41,0x7fffffffb740,96) 3.27358e+01 3.38793e-04
ifi=5 7304 lseek64(41,7304,0) 3.27391e+01 3.09944e-06
ifi=5 1757600 write(41,0x371cdc80,1757600) 3.27391e+01 9.63148e-01
ifi=5 1757600 write(41,0x3737ae20,1757600) 3.37023e+01 2.74949e-02
ifi=5 1757600 write(41,0x37527fc0,1757600) 3.37299e+01 1.32360e-02
ifi=5 1757600 write(41,0x376d5160,1757600) 3.37432e+01 1.96590e-02
ifi=5 96 lseek64(41,96,0) 3.45493e+01 1.90735e-06
ifi=5 40 write(41,0x5d0b8188,40) 3.45493e+01 1.57619e-03
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45509e+01 1.69277e-05
ifi=5 120 write(41,0x5d0b87f8,120) 3.45510e+01 1.50204e-05
ifi=5 40 write(41,0x5d0b9308,40) 3.45510e+01 1.38283e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45510e+01 1.50204e-05
ifi=5 120 write(41,0x5d0b9858,120) 3.45510e+01 1.38283e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45510e+01 1.59740e-05
ifi=5 40 write(41,0x5d0c18d8,40) 3.45511e+01 1.40667e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45511e+01 1.50204e-05
ifi=5 120 write(41,0x5d0c1ed8,120) 3.45511e+01 1.40667e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45511e+01 1.50204e-05
ifi=5 40 write(41,0x5d0c4288,40) 3.45511e+01 1.40667e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45512e+01 1.50204e-05
ifi=5 120 write(41,0x5d0c4918,120) 3.45512e+01 1.40667e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45512e+01 1.40667e-05
ifi=5 272 write(41,0x5d0c8948,272) 3.45512e+01 3.58105e-03
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45548e+01 1.69277e-05
ifi=5 114251304 lseek64(41,114251304,0) 3.45549e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45549e+01 1.46720e-02
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45696e+01 1.78814e-05
ifi=5 214440776 lseek64(41,214440776,0) 3.45696e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45696e+01 1.86720e-02
ifi=5 314627112 lseek64(41,314627112,0) 3.45883e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45883e+01 1.42689e-02
ifi=5 414813448 lseek64(41,414813448,0) 3.46026e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46026e+01 1.24190e-02
ifi=5 514999784 lseek64(41,514999784,0) 3.46150e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46150e+01 1.48160e-02
ifi=5 615186120 lseek64(41,615186120,0) 3.46299e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46299e+01 3.93460e-02
ifi=5 715372456 lseek64(41,715372456,0) 3.46693e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46693e+01 1.76220e-02
ifi=5 815558792 lseek64(41,815558792,0) 3.46869e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46869e+01 1.06070e-02
ifi=5 915745128 lseek64(41,915745128,0) 3.46975e+01 1.19209e-06
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46975e+01 1.74150e-02
ifi=5 1015931464 lseek64(41,1015931464,0) 3.47150e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47150e+01 1.11501e-02
ifi=5 1116117800 lseek64(41,1116117800,0) 3.47262e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47262e+01 1.67122e-02
ifi=5 1216304136 lseek64(41,1216304136,0) 3.47429e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47429e+01 5.77402e-03
ifi=5 1316490472 lseek64(41,1316490472,0) 3.47487e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47487e+01 1.83940e-02
ifi=5 1416676808 lseek64(41,1416676808,0) 3.47671e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47671e+01 1.35159e-02
ifi=5 1516863144 lseek64(41,1516863144,0) 3.47806e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47806e+01 1.70491e-02
ifi=5 1617049480 lseek64(41,1617049480,0) 3.47977e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47977e+01 9.32908e-03
ifi=5 1717235816 lseek64(41,1717235816,0) 3.48071e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48071e+01 1.15631e-02
ifi=5 1817422152 lseek64(41,1817422152,0) 3.48187e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48187e+01 8.60000e-03
ifi=5 1917608488 lseek64(41,1917608488,0) 3.48273e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48273e+01 6.62398e-03
ifi=5 2017794824 lseek64(41,2017794824,0) 3.48339e+01 1.19209e-06
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48339e+01 7.51495e-03
ifi=5 2117981160 lseek64(41,2117981160,0) 3.48415e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48415e+01 1.77360e-02
ifi=5 2218167496 lseek64(41,2218167496,0) 3.48592e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48592e+01 1.63181e-02
ifi=5 2249807432 lseek64(41,2249807432,0) 3.48756e+01 0.00000e+00
ifi=5 328 write(41,0x7fffffffb660,328) 3.48756e+01 7.10177e-03
ifi=5 0 lseek64(41,0,0) 3.48828e+01 0.00000e+00
ifi=5 96 write(41,0x7fffffffb510,96) 3.48828e+01 2.69413e-05
ifi=5 0 ftruncate64(41,2249809480) 3.48829e+01 7.08644e-01
ifi=5 0 fsync(41) 3.55917e+01 3.28633e-01
ifi=5 0 lseek64(41,0,0) 3.59472e+01 1.90735e-06
ifi=5 96 write(41,0x7fffffffb4d0,96) 3.59473e+01 5.88894e-05
ifi=5 0 close(41) 3.59477e+01 9.05991e-06

<node0-meta-data.png>----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to
hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

------------------------------------------------------------------------

I think they traces the need for ftruncate() down to economizing on the file size. Otherwise, there didn't seem to be a reason for ftruncate() for correctness. Also, the ADIOS folks claim that ftruncate() hits the Lustre metadata server (MDS), which is part of why it is such an expensive call. So perhaps a first-order optimization is to allow it to be disabled on Lustre to get good performance with slightly larger file sizes.

-john

···

On Mar 23, 2009, at 12:46 PM, Rob Latham wrote:

On Mon, Mar 23, 2009 at 12:05:20PM -0700, Mark Howison wrote:

However, as you can see in the attached plot, the truncate (purple) at
the end is still taking up a substantial amount of the total IO time
for this test application. For now, I will probably disable the
truncate directly in the MPI-POSIX VFD code, like Noel Keen has done,
but in the long term we should figure out why it is there and when it
is necessary. Hopefully, the lustre/HDF5 funding will come through
soon!

Maybe HDF5 needs to truncate, maybe it doesn't. But if I'm reading
your plot right, only one process is calling truncate. Sounds to me
like you've found a Lustre issue.

What does lustre do if you run a standalone program that calls
ftruncate to create a 2GB file? To create a 2250776576 byte file? If
you do a few writes before calling ftruncate?

==rob

Thanks
Mark

ifi=5 -1 open64("../output/prs.h5part",2,-1) 3.47368e+01 2.34790e-02
ifi=5 41 open64("../output/prs.h5part",578,-1) 3.47642e+01 1.86651e-02
ifi=5 0 lseek64(41,0,0) 3.47932e+01 2.14577e-06
ifi=5 96 write(41,0x7fffffffb6e0,96) 3.47932e+01 1.29604e-03
ifi=5 1048576 lseek64(41,1048576,0) 3.48958e+01 3.09944e-06
ifi=5 1757600 write(41,0x37137d40,1757600) 3.48958e+01 1.58372e+00
ifi=5 1757600 write(41,0x372e4ee0,1757600) 3.64796e+01 4.77600e-01
ifi=5 1757600 write(41,0x37492080,1757600) 3.69572e+01 1.23870e-02
ifi=5 1757600 write(41,0x3763f220,1757600) 3.69696e+01 3.26340e-02
ifi=5 96 lseek64(41,96,0) 4.13958e+01 1.90735e-06
ifi=5 40 write(41,0x5d01aaa8,40) 4.13959e+01 4.13990e-03
ifi=5 544 write(41,0x5d01a548,544) 4.14000e+01 1.71661e-05
ifi=5 120 write(41,0x5d01b118,120) 4.14000e+01 1.50204e-05
ifi=5 40 write(41,0x5d01bc78,40) 4.14001e+01 1.50204e-05
ifi=5 544 write(41,0x5d01a548,544) 4.14001e+01 1.47820e-05
ifi=5 120 write(41,0x5d01c1c8,120) 4.14001e+01 1.38283e-05
ifi=5 328 write(41,0x7fffffffb630,328) 4.14001e+01 1.50204e-05
ifi=5 40 write(41,0x5d024248,40) 4.14001e+01 1.50204e-05
ifi=5 544 write(41,0x5d01a548,544) 4.14002e+01 1.50204e-05
ifi=5 120 write(41,0x5d024798,120) 4.14002e+01 1.50204e-05
ifi=5 328 write(41,0x7fffffffb630,328) 4.14002e+01 1.50204e-05
ifi=5 40 write(41,0x5d026ae8,40) 4.14002e+01 1.40667e-05
ifi=5 544 write(41,0x5d01a548,544) 4.14002e+01 1.50204e-05
ifi=5 120 write(41,0x5d0270d8,120) 4.14003e+01 1.50204e-05
ifi=5 328 write(41,0x7fffffffb630,328) 4.14003e+01 1.40667e-05
ifi=5 272 write(41,0x5d02b0d8,272) 4.14003e+01 1.54972e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14005e+01 1.62125e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14005e+01 1.54018e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14007e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14009e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14010e+01 1.69277e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14011e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14012e+01 1.53065e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14014e+01 1.53065e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14016e+01 1.59740e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14016e+01 1.51157e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14018e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14020e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14021e+01 1.59740e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14022e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14023e+01 1.50919e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14025e+01 1.53065e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14027e+01 1.50919e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14028e+01 1.59740e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14029e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14030e+01 1.48058e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14032e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14034e+01 1.69277e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14034e+01 1.51873e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14036e+01 1.52826e-04
ifi=5 328 write(41,0x7fffffffb630,328) 4.14037e+01 1.59740e-05
ifi=5 0 lseek64(41,0,0) 4.14039e+01 9.53674e-07
ifi=5 96 write(41,0x7fffffffb4e0,96) 4.14039e+01 1.69277e-05
ifi=5 0 ftruncate64(41,2,250,776,576) 4.14039e+01 1.06910e+00
ifi=5 0 lseek64(41,0,0) 4.24736e+01 3.09944e-06
ifi=5 96 write(41,0x7fffffffb4a0,96) 4.24737e+01 4.69685e-05
ifi=5 0 close(41) 4.24739e+01 3.16906e-03

On Tue, Feb 17, 2009 at 9:19 AM, Quincey Koziol >> <koziol@hdfgroup.org> wrote:

Hi Mark,

On Feb 13, 2009, at 3:41 PM, Mark Howison wrote:

Also, here is a graph showing that same activity on node 0 (the first
row of pixels). The color key is:

blue = write
dark purple = truncate
purple = fsync
teal = fflush

Mark

On Fri, Feb 13, 2009 at 12:03 PM, Mark Howison <MHowison@lbl.gov> >>>> wrote:

Hello,

I have a parallel HDF5 application that is writing out chunked data to
a 3D dataset and is exhibiting a large number of small writes upon
closing the file. Below I've attached a trace of POSIX calls on node 0
showing the file open, then 4 chunks of size 1757600 bytes being
written, then a series of 40 - 3136 byte writes (mostly 3136), and
then a truncate call before the file is closed. The small writes are
not ideal because this is a lustre file system on a Cray XT at NERSC.
Together, those small writes and truncate take about 30% of the time
from file open to close.

My hypothesis is that the small writes represent meta data related to
the chunk indexing. Does that sound right?

       Yes, that's probably correct.

What is the best way for me to consolidate these small writes into one
large write? Should I use
H5Pset_meta_block_size() to set the block size to the lustre stripe
width of 1MB?

       Yes, that would probably help.

I'm a little concerned by the fact that the 3136 byte
writes are not to contiguous offsets, and perhaps cannot be
consolidated into a single write.

What is the purpose of the truncate? Can it be removed?

       I think with some analysis we could eliminate the truncate in
some/all cases, but we'll need to finish getting funding in place to work on
these issues with Lustre.

       Quincey

Thanks,

Mark Howison
mhowison@lbl.gov
Student Research Assistant
Visualization Group, Lawrence Berkeley National Labs

ifi=5 41 open64("../output/prs.h5part",66,-1) 3.26833e+01 6.02412e-03
ifi=5 0 close(41) 3.26894e+01 1.08004e-04
ifi=5 41 open64("../output/prs.h5part",2,-1) 3.26895e+01 3.85680e-02
ifi=5 0 lseek64(41,0,2) 3.27325e+01 1.83105e-03
ifi=5 0 lseek64(41,0,0) 3.27344e+01 9.53674e-07
ifi=5 96 write(41,0x7fffffffb740,96) 3.27358e+01 3.38793e-04
ifi=5 7304 lseek64(41,7304,0) 3.27391e+01 3.09944e-06
ifi=5 1757600 write(41,0x371cdc80,1757600) 3.27391e+01 9.63148e-01
ifi=5 1757600 write(41,0x3737ae20,1757600) 3.37023e+01 2.74949e-02
ifi=5 1757600 write(41,0x37527fc0,1757600) 3.37299e+01 1.32360e-02
ifi=5 1757600 write(41,0x376d5160,1757600) 3.37432e+01 1.96590e-02
ifi=5 96 lseek64(41,96,0) 3.45493e+01 1.90735e-06
ifi=5 40 write(41,0x5d0b8188,40) 3.45493e+01 1.57619e-03
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45509e+01 1.69277e-05
ifi=5 120 write(41,0x5d0b87f8,120) 3.45510e+01 1.50204e-05
ifi=5 40 write(41,0x5d0b9308,40) 3.45510e+01 1.38283e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45510e+01 1.50204e-05
ifi=5 120 write(41,0x5d0b9858,120) 3.45510e+01 1.38283e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45510e+01 1.59740e-05
ifi=5 40 write(41,0x5d0c18d8,40) 3.45511e+01 1.40667e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45511e+01 1.50204e-05
ifi=5 120 write(41,0x5d0c1ed8,120) 3.45511e+01 1.40667e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45511e+01 1.50204e-05
ifi=5 40 write(41,0x5d0c4288,40) 3.45511e+01 1.40667e-05
ifi=5 544 write(41,0x5d0b7ca8,544) 3.45512e+01 1.50204e-05
ifi=5 120 write(41,0x5d0c4918,120) 3.45512e+01 1.40667e-05
ifi=5 328 write(41,0x7fffffffb660,328) 3.45512e+01 1.40667e-05
ifi=5 272 write(41,0x5d0c8948,272) 3.45512e+01 3.58105e-03
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45548e+01 1.69277e-05
ifi=5 114251304 lseek64(41,114251304,0) 3.45549e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45549e+01 1.46720e-02
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45696e+01 1.78814e-05
ifi=5 214440776 lseek64(41,214440776,0) 3.45696e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45696e+01 1.86720e-02
ifi=5 314627112 lseek64(41,314627112,0) 3.45883e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.45883e+01 1.42689e-02
ifi=5 414813448 lseek64(41,414813448,0) 3.46026e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46026e+01 1.24190e-02
ifi=5 514999784 lseek64(41,514999784,0) 3.46150e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46150e+01 1.48160e-02
ifi=5 615186120 lseek64(41,615186120,0) 3.46299e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46299e+01 3.93460e-02
ifi=5 715372456 lseek64(41,715372456,0) 3.46693e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46693e+01 1.76220e-02
ifi=5 815558792 lseek64(41,815558792,0) 3.46869e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46869e+01 1.06070e-02
ifi=5 915745128 lseek64(41,915745128,0) 3.46975e+01 1.19209e-06
ifi=5 3136 write(41,0x5d0c7948,3136) 3.46975e+01 1.74150e-02
ifi=5 1015931464 lseek64(41,1015931464,0) 3.47150e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47150e+01 1.11501e-02
ifi=5 1116117800 lseek64(41,1116117800,0) 3.47262e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47262e+01 1.67122e-02
ifi=5 1216304136 lseek64(41,1216304136,0) 3.47429e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47429e+01 5.77402e-03
ifi=5 1316490472 lseek64(41,1316490472,0) 3.47487e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47487e+01 1.83940e-02
ifi=5 1416676808 lseek64(41,1416676808,0) 3.47671e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47671e+01 1.35159e-02
ifi=5 1516863144 lseek64(41,1516863144,0) 3.47806e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47806e+01 1.70491e-02
ifi=5 1617049480 lseek64(41,1617049480,0) 3.47977e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.47977e+01 9.32908e-03
ifi=5 1717235816 lseek64(41,1717235816,0) 3.48071e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48071e+01 1.15631e-02
ifi=5 1817422152 lseek64(41,1817422152,0) 3.48187e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48187e+01 8.60000e-03
ifi=5 1917608488 lseek64(41,1917608488,0) 3.48273e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48273e+01 6.62398e-03
ifi=5 2017794824 lseek64(41,2017794824,0) 3.48339e+01 1.19209e-06
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48339e+01 7.51495e-03
ifi=5 2117981160 lseek64(41,2117981160,0) 3.48415e+01 0.00000e+00
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48415e+01 1.77360e-02
ifi=5 2218167496 lseek64(41,2218167496,0) 3.48592e+01 9.53674e-07
ifi=5 3136 write(41,0x5d0c7948,3136) 3.48592e+01 1.63181e-02
ifi=5 2249807432 lseek64(41,2249807432,0) 3.48756e+01 0.00000e+00
ifi=5 328 write(41,0x7fffffffb660,328) 3.48756e+01 7.10177e-03
ifi=5 0 lseek64(41,0,0) 3.48828e+01 0.00000e+00
ifi=5 96 write(41,0x7fffffffb510,96) 3.48828e+01 2.69413e-05
ifi=5 0 ftruncate64(41,2249809480) 3.48829e+01 7.08644e-01
ifi=5 0 fsync(41) 3.55917e+01 3.28633e-01
ifi=5 0 lseek64(41,0,0) 3.59472e+01 1.90735e-06
ifi=5 96 write(41,0x7fffffffb4d0,96) 3.59473e+01 5.88894e-05
ifi=5 0 close(41) 3.59477e+01 9.05991e-06

<node0-meta-data.png>----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to
hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Rob,

Here is a plot of 4 procs showing three tests (source code attached):

1) truncate to 2GB
2) truncate to 2250776576
3) each proc writes a byte, then truncate to 2250776576

The truncate (purple) seems to take about the same time for each, but
is on the same time scale as the opens and closes (green and brown).

In my other case, the truncate is taking orders of magnitude longer
than the open/close, which must be a peculiarity of that IO pattern.

Mark

main.c (2.27 KB)

···

On Mon, Mar 23, 2009 at 12:46 PM, Rob Latham <robl@mcs.anl.gov> wrote:

On Mon, Mar 23, 2009 at 12:05:20PM -0700, Mark Howison wrote:

However, as you can see in the attached plot, the truncate (purple) at
the end is still taking up a substantial amount of the total IO time
for this test application. For now, I will probably disable the
truncate directly in the MPI-POSIX VFD code, like Noel Keen has done,
but in the long term we should figure out why it is there and when it
is necessary. Hopefully, the lustre/HDF5 funding will come through
soon!

Maybe HDF5 needs to truncate, maybe it doesn't. But if I'm reading
your plot right, only one process is calling truncate. Sounds to me
like you've found a Lustre issue.

What does lustre do if you run a standalone program that calls
ftruncate to create a 2GB file? To create a 2250776576 byte file? If
you do a few writes before calling ftruncate?

==rob

Thanks
Mark

ifi=5 -1 open64("../output/prs.h5part",2,-1) 3.47368e+01 2.34790e-02
ifi=5 41 open64("../output/prs.h5part",578,-1) 3.47642e+01 1.86651e-02
ifi=5 0 lseek64(41,0,0) 3.47932e+01 2.14577e-06
ifi=5 96 write(41,0x7fffffffb6e0,96) 3.47932e+01 1.29604e-03
ifi=5 1048576 lseek64(41,1048576,0) 3.48958e+01 3.09944e-06
ifi=5 1757600 write(41,0x37137d40,1757600) 3.48958e+01 1.58372e+00
ifi=5 1757600 write(41,0x372e4ee0,1757600) 3.64796e+01 4.77600e-01
ifi=5 1757600 write(41,0x37492080,1757600) 3.69572e+01 1.23870e-02
ifi=5 1757600 write(41,0x3763f220,1757600) 3.69696e+01 3.26340e-02
ifi=5 96 lseek64(41,96,0) 4.13958e+01 1.90735e-06
ifi=5 40 write(41,0x5d01aaa8,40) 4.13959e+01 4.13990e-03
ifi=5 544 write(41,0x5d01a548,544) 4.14000e+01 1.71661e-05
ifi=5 120 write(41,0x5d01b118,120) 4.14000e+01 1.50204e-05
ifi=5 40 write(41,0x5d01bc78,40) 4.14001e+01 1.50204e-05
ifi=5 544 write(41,0x5d01a548,544) 4.14001e+01 1.47820e-05
ifi=5 120 write(41,0x5d01c1c8,120) 4.14001e+01 1.38283e-05
ifi=5 328 write(41,0x7fffffffb630,328) 4.14001e+01 1.50204e-05
ifi=5 40 write(41,0x5d024248,40) 4.14001e+01 1.50204e-05
ifi=5 544 write(41,0x5d01a548,544) 4.14002e+01 1.50204e-05
ifi=5 120 write(41,0x5d024798,120) 4.14002e+01 1.50204e-05
ifi=5 328 write(41,0x7fffffffb630,328) 4.14002e+01 1.50204e-05
ifi=5 40 write(41,0x5d026ae8,40) 4.14002e+01 1.40667e-05
ifi=5 544 write(41,0x5d01a548,544) 4.14002e+01 1.50204e-05
ifi=5 120 write(41,0x5d0270d8,120) 4.14003e+01 1.50204e-05
ifi=5 328 write(41,0x7fffffffb630,328) 4.14003e+01 1.40667e-05
ifi=5 272 write(41,0x5d02b0d8,272) 4.14003e+01 1.54972e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14005e+01 1.62125e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14005e+01 1.54018e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14007e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14009e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14010e+01 1.69277e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14011e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14012e+01 1.53065e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14014e+01 1.53065e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14016e+01 1.59740e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14016e+01 1.51157e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14018e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14020e+01 1.52111e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14021e+01 1.59740e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14022e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14023e+01 1.50919e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14025e+01 1.53065e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14027e+01 1.50919e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14028e+01 1.59740e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14029e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14030e+01 1.48058e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14032e+01 1.49965e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14034e+01 1.69277e-05
ifi=5 3136 write(41,0x5d02a118,3136) 4.14034e+01 1.51873e-04
ifi=5 3136 write(41,0x5d02a118,3136) 4.14036e+01 1.52826e-04
ifi=5 328 write(41,0x7fffffffb630,328) 4.14037e+01 1.59740e-05
ifi=5 0 lseek64(41,0,0) 4.14039e+01 9.53674e-07
ifi=5 96 write(41,0x7fffffffb4e0,96) 4.14039e+01 1.69277e-05
ifi=5 0 ftruncate64(41,2,250,776,576) 4.14039e+01 1.06910e+00
ifi=5 0 lseek64(41,0,0) 4.24736e+01 3.09944e-06
ifi=5 96 write(41,0x7fffffffb4a0,96) 4.24737e+01 4.69685e-05
ifi=5 0 close(41) 4.24739e+01 3.16906e-03

On Tue, Feb 17, 2009 at 9:19 AM, Quincey Koziol <koziol@hdfgroup.org> wrote:
> Hi Mark,
>
> On Feb 13, 2009, at 3:41 PM, Mark Howison wrote:
>
>> Also, here is a graph showing that same activity on node 0 (the first
>> row of pixels). The color key is:
>>
>> blue = write
>> dark purple = truncate
>> purple = fsync
>> teal = fflush
>>
>> Mark
>>
>>
>> On Fri, Feb 13, 2009 at 12:03 PM, Mark Howison <MHowison@lbl.gov> wrote:
>>>
>>> Hello,
>>>
>>> I have a parallel HDF5 application that is writing out chunked data to
>>> a 3D dataset and is exhibiting a large number of small writes upon
>>> closing the file. Below I've attached a trace of POSIX calls on node 0
>>> showing the file open, then 4 chunks of size 1757600 bytes being
>>> written, then a series of 40 - 3136 byte writes (mostly 3136), and
>>> then a truncate call before the file is closed. The small writes are
>>> not ideal because this is a lustre file system on a Cray XT at NERSC.
>>> Together, those small writes and truncate take about 30% of the time
>>> from file open to close.
>>>
>>> My hypothesis is that the small writes represent meta data related to
>>> the chunk indexing. Does that sound right?
>
> Yes, that's probably correct.
>
>>> What is the best way for me to consolidate these small writes into one
>>> large write? Should I use
>>> H5Pset_meta_block_size() to set the block size to the lustre stripe
>>> width of 1MB?
>
> Yes, that would probably help.
>
>>> I'm a little concerned by the fact that the 3136 byte
>>> writes are not to contiguous offsets, and perhaps cannot be
>>> consolidated into a single write.
>>>
>>> What is the purpose of the truncate? Can it be removed?
>
> I think with some analysis we could eliminate the truncate in
> some/all cases, but we'll need to finish getting funding in place to work on
> these issues with Lustre.
>
> Quincey
>
>>> Thanks,
>>>
>>> Mark Howison
>>> mhowison@lbl.gov
>>> Student Research Assistant
>>> Visualization Group, Lawrence Berkeley National Labs
>>>
>>>
>>> ifi=5 41 open64("../output/prs.h5part",66,-1) 3.26833e+01 6.02412e-03
>>> ifi=5 0 close(41) 3.26894e+01 1.08004e-04
>>> ifi=5 41 open64("../output/prs.h5part",2,-1) 3.26895e+01 3.85680e-02
>>> ifi=5 0 lseek64(41,0,2) 3.27325e+01 1.83105e-03
>>> ifi=5 0 lseek64(41,0,0) 3.27344e+01 9.53674e-07
>>> ifi=5 96 write(41,0x7fffffffb740,96) 3.27358e+01 3.38793e-04
>>> ifi=5 7304 lseek64(41,7304,0) 3.27391e+01 3.09944e-06
>>> ifi=5 1757600 write(41,0x371cdc80,1757600) 3.27391e+01 9.63148e-01
>>> ifi=5 1757600 write(41,0x3737ae20,1757600) 3.37023e+01 2.74949e-02
>>> ifi=5 1757600 write(41,0x37527fc0,1757600) 3.37299e+01 1.32360e-02
>>> ifi=5 1757600 write(41,0x376d5160,1757600) 3.37432e+01 1.96590e-02
>>> ifi=5 96 lseek64(41,96,0) 3.45493e+01 1.90735e-06
>>> ifi=5 40 write(41,0x5d0b8188,40) 3.45493e+01 1.57619e-03
>>> ifi=5 544 write(41,0x5d0b7ca8,544) 3.45509e+01 1.69277e-05
>>> ifi=5 120 write(41,0x5d0b87f8,120) 3.45510e+01 1.50204e-05
>>> ifi=5 40 write(41,0x5d0b9308,40) 3.45510e+01 1.38283e-05
>>> ifi=5 544 write(41,0x5d0b7ca8,544) 3.45510e+01 1.50204e-05
>>> ifi=5 120 write(41,0x5d0b9858,120) 3.45510e+01 1.38283e-05
>>> ifi=5 328 write(41,0x7fffffffb660,328) 3.45510e+01 1.59740e-05
>>> ifi=5 40 write(41,0x5d0c18d8,40) 3.45511e+01 1.40667e-05
>>> ifi=5 544 write(41,0x5d0b7ca8,544) 3.45511e+01 1.50204e-05
>>> ifi=5 120 write(41,0x5d0c1ed8,120) 3.45511e+01 1.40667e-05
>>> ifi=5 328 write(41,0x7fffffffb660,328) 3.45511e+01 1.50204e-05
>>> ifi=5 40 write(41,0x5d0c4288,40) 3.45511e+01 1.40667e-05
>>> ifi=5 544 write(41,0x5d0b7ca8,544) 3.45512e+01 1.50204e-05
>>> ifi=5 120 write(41,0x5d0c4918,120) 3.45512e+01 1.40667e-05
>>> ifi=5 328 write(41,0x7fffffffb660,328) 3.45512e+01 1.40667e-05
>>> ifi=5 272 write(41,0x5d0c8948,272) 3.45512e+01 3.58105e-03
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.45548e+01 1.69277e-05
>>> ifi=5 114251304 lseek64(41,114251304,0) 3.45549e+01 9.53674e-07
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.45549e+01 1.46720e-02
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.45696e+01 1.78814e-05
>>> ifi=5 214440776 lseek64(41,214440776,0) 3.45696e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.45696e+01 1.86720e-02
>>> ifi=5 314627112 lseek64(41,314627112,0) 3.45883e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.45883e+01 1.42689e-02
>>> ifi=5 414813448 lseek64(41,414813448,0) 3.46026e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.46026e+01 1.24190e-02
>>> ifi=5 514999784 lseek64(41,514999784,0) 3.46150e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.46150e+01 1.48160e-02
>>> ifi=5 615186120 lseek64(41,615186120,0) 3.46299e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.46299e+01 3.93460e-02
>>> ifi=5 715372456 lseek64(41,715372456,0) 3.46693e+01 9.53674e-07
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.46693e+01 1.76220e-02
>>> ifi=5 815558792 lseek64(41,815558792,0) 3.46869e+01 9.53674e-07
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.46869e+01 1.06070e-02
>>> ifi=5 915745128 lseek64(41,915745128,0) 3.46975e+01 1.19209e-06
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.46975e+01 1.74150e-02
>>> ifi=5 1015931464 lseek64(41,1015931464,0) 3.47150e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47150e+01 1.11501e-02
>>> ifi=5 1116117800 lseek64(41,1116117800,0) 3.47262e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47262e+01 1.67122e-02
>>> ifi=5 1216304136 lseek64(41,1216304136,0) 3.47429e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47429e+01 5.77402e-03
>>> ifi=5 1316490472 lseek64(41,1316490472,0) 3.47487e+01 9.53674e-07
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47487e+01 1.83940e-02
>>> ifi=5 1416676808 lseek64(41,1416676808,0) 3.47671e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47671e+01 1.35159e-02
>>> ifi=5 1516863144 lseek64(41,1516863144,0) 3.47806e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47806e+01 1.70491e-02
>>> ifi=5 1617049480 lseek64(41,1617049480,0) 3.47977e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47977e+01 9.32908e-03
>>> ifi=5 1717235816 lseek64(41,1717235816,0) 3.48071e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.48071e+01 1.15631e-02
>>> ifi=5 1817422152 lseek64(41,1817422152,0) 3.48187e+01 9.53674e-07
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.48187e+01 8.60000e-03
>>> ifi=5 1917608488 lseek64(41,1917608488,0) 3.48273e+01 9.53674e-07
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.48273e+01 6.62398e-03
>>> ifi=5 2017794824 lseek64(41,2017794824,0) 3.48339e+01 1.19209e-06
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.48339e+01 7.51495e-03
>>> ifi=5 2117981160 lseek64(41,2117981160,0) 3.48415e+01 0.00000e+00
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.48415e+01 1.77360e-02
>>> ifi=5 2218167496 lseek64(41,2218167496,0) 3.48592e+01 9.53674e-07
>>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.48592e+01 1.63181e-02
>>> ifi=5 2249807432 lseek64(41,2249807432,0) 3.48756e+01 0.00000e+00
>>> ifi=5 328 write(41,0x7fffffffb660,328) 3.48756e+01 7.10177e-03
>>> ifi=5 0 lseek64(41,0,0) 3.48828e+01 0.00000e+00
>>> ifi=5 96 write(41,0x7fffffffb510,96) 3.48828e+01 2.69413e-05
>>> ifi=5 0 ftruncate64(41,2249809480) 3.48829e+01 7.08644e-01
>>> ifi=5 0 fsync(41) 3.55917e+01 3.28633e-01
>>> ifi=5 0 lseek64(41,0,0) 3.59472e+01 1.90735e-06
>>> ifi=5 96 write(41,0x7fffffffb4d0,96) 3.59473e+01 5.88894e-05
>>> ifi=5 0 close(41) 3.59477e+01 9.05991e-06
>>>
>>
>> <node0-meta-data.png>----------------------------------------------------------------------
>> This mailing list is for HDF software users discussion.
>> To subscribe to this list, send a message to
>> hdf-forum-subscribe@hdfgroup.org.
>> To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.
>
>

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA