> Hmm. Not sure whether you were poking fun or down right insulted by my
> comment, quoted below.
Hey no offense at all. I was indeed poking fun, but at myself.
Clearly this approach works for a large class of problems.
Thanks for clarification.
This discussion, though, is actually pretty exciting to me, and it
parallels a frequent discussion in the file system domain: is it
better to write one file per process or get all the processes to
coordinate access to a single file.
So, there is an all-or-nothing assumption here that has become common-
place but leads to a completely artificial constraint; that is the
choice is either a file-per-process or one file for all processes.
Poor Man's Parallel I/O (PMPIO) provides you with a knob to 'dial-in'
the number of files that are written COMPLETELY INDEPENDENTLY of the
number of processors doing the writing. We typically use values of 32,
64, 128 or 256 files depending upon how many real I/O channels we have
from compute nodes to the file-system. But, we'll write to these numbers
of files from tens of thousands of cpus. I think so far, we've scaled to
256,000 cpus writing to 1024 files.
If you are interested, I have attached two source code files,
pmpio_hdf5_test.c and pmpio.h. If you build a SERIAL HDF5 and then
compile pmpio_hdf5_test.c and link it with MPI, you can run it and see
an example of a VERY SIMPLE Poor Man's Parallel I/O test client. It is
really unceremonialy simple. But, it demonstrates the basic approach.
The PMPIO routines defined in pmpio.h can be easily integrated into any
application currently using a file-per-process approach.
The HDF5 team has worked on the SAP approach. I liked the SAP idea on
the surface but a single (set-aside) processor simply isn't going to
scale well to 10,000+ cpus. So, it probably needs to at least be a
SAPSSSS approach as in 'Set Aside ProcessorSSSS' and then you still have
the issue with concurrent metadata distributed across the SAPSSS. In
addition, I don't like the idea of an application having to take into
account an increased processor allocation for the SAPPSSSS.
So, a long while back a colleague of mine and I worked on a 'deferred
object creation' strategy where each processor is required to segregate
'HDF5 file metadata-changing' operations to specific regions of
execution. All processors are interacting with a single HDF5 file.
However, in these segregated regions of execution, processors make
requests for HDF5 object creation (H5Dcreate, H5Gcreate, etc.) that they
intend to use shortly thereafter. The requests are queued, locally and
the object creation is actually deferred. Attempts to operate on the
created objects will fail. Then, collectively, after all processors have
completed their 'metadata changing' operations, they call a 'sync-my-
pending-requests-with-hdf5' function. It is a collective function. Upon
return, each processor can then, again, proceed independently operating
on the objects they created. The actual implementation would involve
adding a new property to various object creation property lists
indicating 'deferred creation' is being requested. This enables the
calls to H5<whatever>create to return immediately and simply queue the
request. The object ids returned would contain information indicating
they are 'not yet created'. A new call such as H5FcreateComplete() would
have to be called to sync everything across all processors. Calling this
function would 'clear' the 'not yet created' info in all the pending
objects. I think such an approach would be a substantial improvement
over existing collective interface and avoid problems with the SAP
approach. I think we implemented some of this in a layer on top of HDF5
back in 2003/04 but it never had enough interest to make it into HDF5
proper.
Mark
pmpio_hdf5_test.c (5.13 KB)
pmpio.h (19.5 KB)
···
On Wed, 2009-08-12 at 13:46 -0500, Rob Latham wrote:
On Wed, Aug 12, 2009 at 09:11:49AM -0700, Mark Miller wrote:
--
Mark C. Miller, Lawrence Livermore National Laboratory
email: mailto:miller86@llnl.gov
(M/T/W) (925)-423-5901 (!!LLNL BUSINESS ONLY!!)
(Th/F) (530)-753-8511 (!!LLNL BUSINESS ONLY!!)