puzzled by chunking, storage, and performance.

Hi Mark,

  Ah, yes, that may be a good segue into this two-pass feature. I've

been thinking about this feature and wondering about how to implement
it. Something that occurs to me would be to construct it like a
"transaction", where the application opens a transaction, the HDF5
library just records those operations performed with API routines,
then when the application closes the transaction, they are replayed
twice: once to record the results of all the operations, and then a
second pass that actually performs all the I/O. That would help to
reduce the overhead from the collective metadata modification overhead
also.

  BTW, if we go down this "transaction" path, it allows the HDF5
library to push the fault tolerance up to the application level - the
library could guarantee that the atomicity of what was "visible" in
the file was an entire checkpoint, rather than the atomicity being on
a per-API call basis.

Hmm. Thats only true if 'transaction' is whole file scope, right? I mean
aren't you going to allow application to decide what 'granularity' a
transaction should be; a single dataset, a bunch of datasets in a group
in the file, etc.

  Yes, it would be the whole file scope. (Although the modifications within the transaction could be limited to changes to a single dataset, of course)

If scope of 'transaction' is only a whole-file, then...

I may be misunderstanding your notions here but I don't think you'd want
to design this around the assumption that a 'transaction' could embody
something that included all buffer pointers passed into HDF5 by caller
and then HDF5 could automagically FINISH the transaction on behalf of
the application without returning control back to the application.

I think there are going to be too many situations where applications
unwind their own internal data structures placing data into temporary
buffers that are then handed off to HDF5 for I/O and freed. And, for a
given HDF5 file, this likely happens again and again as different parts
of the application's internal data is spit out to HDF5. But, not to
worry.

  Hmm, I think you are saying that the application would re-use the buffer(s) passed to HDF5 for more than one call to H5Dwrite(), is that the case? If so, that would complicate the transaction idea. Hmm... Perhaps a callback for the application to free/re-allocate the memory buffers? Maybe we could use the transaction idea for just metadata modifications, and the two-pass idea for the raw data writes?

My idea included the notion the application would have to re-engage in
all such 'data prep for I/O' processes a second time. I assume time to
complete such process, relative to actual I/O time, is small enough that
it doesn't matter to the application that it has to do it twice. I think
for most applications, that would be true and relatively easy to
engineer to engage in the work in two passes.

  Yes, running through the whole process of copying data into the temporary buffers would be necessary, if the temporary buffers are re-used.

  Something else to consider - I'm just wrapping up an exascale-oriented workshop this morning and after listening to the presentations and talking to people here, I'm concerned that the exascale-class machines are not going to want to double-copy/compress the data they are writing. Yes, they will have more compute that I/O bandwidth, but the costs of moving memory are going to be very expensive... :-/

  Quincey

···

On Feb 23, 2011, at 5:43 PM, Mark Miller wrote:

On Wed, 2011-02-23 at 14:41, Quincey Koziol wrote:

Hi Rhys,

···

On Feb 23, 2011, at 9:17 PM, Rhys Ulerich wrote:

      BTW, if we go down this "transaction" path, it allows the HDF5
library to push the fault tolerance up to the application level - the
library could guarantee that the atomicity of what was "visible" in
the file was an entire checkpoint, rather than the atomicity being on
a per-API call basis.

Hmm. Thats only true if 'transaction' is whole file scope, right? I mean
aren't you going to allow application to decide what 'granularity' a
transaction should be; a single dataset, a bunch of datasets in a group
in the file, etc.

Careful fellas... you'll end up implementing a good part of
conventional database transactions and their ACID guarantees before
you're done. And you won't have the benefit of SQL as a lingua
franca. If you want fancy transaction semantics why not just use a
database vendor with a particularly rich BLOB API?

  I'm definitely not advocating going whole-hog for ACID semantics, but I think there are certain useful pieces of ACID that can be leveraged. :slight_smile:

  Quincey