Hi,
Well, the only reason I am using Independent IO is that Collective IO was
performing slower than Independent IO. I don't think that there is anything
specific to the data layout to make independent IO faster, but still to my
surprise Independent IO was faster.
I think the implementation is quite simple and expected better results with
collective IO. (I am basically writing out a big 1D array of eigenvectors, so
hyperslab selection was straightforward). The timings are measured on the
cluster franklin by NERSC.
Oh! I didn't notice it was 1D. Then yes, at this scale (15
processors) and this access pattern (essentially contiguous),
collective I/O would only introduce overhead with no benefit. You
could set some hints to tune this, but they won't get to the real
issue here:
http://www.nersc.gov/nusers/resources/franklin/
The file system in LUSTRE on franklin and MPI implementation is MPICH2.
Ok, the somewhat-detailed technical answer follows, but the short
answer is that parallel reads from a single lustre file are much
faster than parallel writes. (I consider this a defect in the
MPI-IO/Lustre interface -- one which groups are working to address,
fortunately, but it will take some time for those efforts to make
their way onto Franklin I'm afraid.)
Here's one workaround that you can do in your application. Do you know
how to set MPI-IO hints through HDF5? One thing you can do to speed
up writes is to turn on collective I/O but then force all I/O through
a single processor. Do so by setting "cb_nodes" to "1" (the string
"1").
So, what's going on with your code? Here's that more-detailed
answer:
In the read case, the data does not change and so you can have all 15
processes read at the same time and Lustre will not attempt to
serialize those operations.
In the write case, however, the first process to reach "write" will
acquire a lock on the entire file.
Then when the second process hits "write", it will force the first
process to revoke most of its lock, process 2 will then take *its*
lock.
This process goes on and on for these N processes. A writer comes in,
forces a lock revocation, and then acquires a lock. All very costly
operations.
There's not much the HDF5 library can do in this case. This is a file
system defect -- one that the MPI-IO library can address, but not one
that HDF5 can fix very well.
I would suggest contacting the NERSC support staff about this issue.
They are good people and know more about how to coax performance out
of Lustre than I do.
Sorry I don't have better news for you, but I bet the HDF5 guys are
happy I'm giving them a pass on this :>
==rob
···
On Mon, Nov 03, 2008 at 05:11:38PM -0600, Nikhil Laghave wrote:
--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.