File Image Operations RFC -- docx version


#1

Hi All,

   Attached please find a docx version of an RFC concerning operations
on file images in HDF5. A pdf version follows in my next message.

   For now at least, the driving use case behind the RFC is the desire
to use the HDF5 library as a packetizing/depacketizing tool. Simply put,
in some cases it is cheaper to construct an HDF5 file in a buffer, and
then transmit the buffer to another machine than it is to transfer the
same data by writing a file to a commonly accessible file system.

   If this is something you are interested in doing, please give the RFC a
read, and let us know what you think. Implementation hasn't started,
so now is the time to suggest tweaks to the API that would make your life
easier.

   More generally, if you can see any use for opening an HDF5 file that
is stored in an in memory buffer, or for creating a new file with an
initial image, or for any other operations involving in memory images of
HDF5 files, please take a look and let us know if we have missed any use
cases.

   The RFC is 30 pages long, so the following conceptual summary may
help you skim through and find the points of interest.

   The core file driver can already be used to construct buffers containing
HDF5 files, and we already have facilities (albeit nasty ones) that allow the
user access to these buffers. However, on the receiving end, there is no
way to open a buffer containing an HDF5 file without first writing the buffer
to the file system.

   In essence, the purpose of the RFC is to propose ways of remedying
this situation.

   At the level of the HDF5 library proper, the RFC proposes that this be done
with two new sets of property list calls, along with supporting code, and
modifications to the core file driver.

   The first set of property list calls inserts and extracts copies of buffers
containing HDF5 file images into and out of file access property lists. Note
that since the property list structure is designed to support call by value,
this results in two buffer copies -- one when the buffer is copied into a
FAPL, and a second when the buffer is copied from the FAPL into the core
file driver.

   The second set of property list calls allows the application to specify
callbacks that are used by the property list code and the core file driver
when allocating, copying, reallocating and freeing buffers containing file
images. From the HDF5 library's perspective, these calls must be
functionally identical to the standard malloc, memcpy, realloc, and free C
library calls -- although as you will see in the RFC, they can be designed in
such a way as to avoid the buffer copies mentioned above.

   I'm sure you can envision the supporting code modifications to the core
file driver and property list facilities implied by the above outline.

   In addition, the RFC proposes a couple of high level library calls to
facilitate the more common operations on file images.
   There is much more in the RFC, including numerous examples, but this
should give you the gist of it.

   Again, it this sort of functionality is of potential interest to you,
please give it a read and let me know what you think. Note that I will be
out of town and out of email contact from Monday July 25 through Tuesday
Aug 2 or Wednesday Aug 3. Expect my replies to be delayed accordingly.

                                                      Best regards,

                                                      John Mainzer

file_image_ops_RFC_v10.docx (213 KB)