Predictability of file disk layout?

Hi,

  I'm wondering how far it would be predictable whether the file size
would change on disk or parts of the file are moved. I would expect
that if an uncompressed dataset is read, modified, and written back
in the same type as it was, then the file is left more or less untouched,
and data end up on the same sectors on the hard disk as they had been
before. Meaning, if the HDF5 file was unfragmented in the file system
before, then it remains unfragmented after such an update operation.
However, if data are appended, or compressed, then the file size might
change, and new parts elsewhere on the harddisk might become involved
in the same logical HDF5 file.

How valid are such assumptions? It should affect I/O performance.
I'm thus also wondering if - at least in theory - something like a
"locking" mode could be added to HDF5, such that any operations that
would modify the disk layout, result in failure. Maybe such could even
be easily implemented via the virtual file driver, in that it won't
allow resizing of the file once such a "locking" mode was enabled
after initial creation. I might think such is sufficient to prevent
changes in the hard disk layout.

  Werner

···

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Hi Werner,

Hi,

I'm wondering how far it would be predictable whether the file size
would change on disk or parts of the file are moved. I would expect
that if an uncompressed dataset is read, modified, and written back
in the same type as it was, then the file is left more or less untouched,
and data end up on the same sectors on the hard disk as they had been
before. Meaning, if the HDF5 file was unfragmented in the file system
before, then it remains unfragmented after such an update operation.
However, if data are appended, or compressed, then the file size might
change, and new parts elsewhere on the harddisk might become involved
in the same logical HDF5 file.

  This is almost true currently. As long as the datatype of the datasets written doesn't have a variable-length component (VL-sequences, VL-strings or region references), your statements are true.

How valid are such assumptions? It should affect I/O performance.
I'm thus also wondering if - at least in theory - something like a
"locking" mode could be added to HDF5, such that any operations that
would modify the disk layout, result in failure. Maybe such could even
be easily implemented via the virtual file driver, in that it won't
allow resizing of the file once such a "locking" mode was enabled
after initial creation. I might think such is sufficient to prevent
changes in the hard disk layout.

  I'm guessing that if you overrode the 'alloc' callback in the VFD layer to always return an error, you could get this behavior (modulo the variable-length datatype issue above).

  Quincey

···

On Dec 7, 2009, at 12:17 PM, Werner Benger wrote: