Assembling HDF5 file piece by piece

Hi all,

we are planning on developing a web service that assembles potentially
very large HDF5 files on the fly and sends them over HTTP. We cannot
create the files and then send them once they are created - for one the
required disc space might not be available on the server and the latency
until the first byte is sent would also not be acceptable. Thus we would
need to assemble the files piece by piece and send each piece once it is
available.

Is this possible at all with the HDF5 format? We don't mind creating the
binary files by hand but the final results must be readable by libhdf5.

Initial tries with some simple scripts and a hex editor already failed
at the superblock level, namely the "End of File Address". If its set to
0 (or some other wrong number that is not the file size) libhdf5 raises
an error and refuses to read the file (note that I did adjust the
superblock checksum every time). If we assemble the file piece by piece
we don't know the final file size at the time the superblock is created
and streamed over HTTP. We also don't know what ends up in the final
file before we would have to send the first chunk - it might be possible
to determine an upper bound ahead of time but it would add significant
complexity to the point where we might not be able to execute the project.

Is there any way around this? Some magic "End of File Address" number or
some way to move the superblock to the end of the file? The HDF5 spec is
really long and I did not yet read everything so maybe I am missing
something.

Assuming this can be dealt with: Are there any other potential
roadblocks we might stumble into? If not: Any change on changing the
HDF5 spec/libhdf5 so it interprets a "End of File Address" of 0 as
"unspecified"?

Looking forward to your thoughts and advice. Thanks a lot!

Lion

Why does it have to be *one*single* HDF5 file? It might be possible if you had sort of a 'master' or 'root' HDF5 file and then a number of other HDF5 files that get 'mounted' into the master (much like unix fs 'mount' command) so that the libhdf5 caller things its only ever opening the master file. But, that master file would still need to know a lot ahead of time. If you don't know enough ahead of time about at least *some* of the contents of the resulting assembly (like number of datasets and their names or someting), then I think its going to be difficult.

The boot block is just part of the problem. After the boot block is successfully read, libhdf5 is going to want to read some metadata about the file's contents (group names, dataset names, etc.). That metadata can be scattered all over the "file". So, you'd need to construct things such that at least the *initial* metadata is in the first chunk of stuff you send.

Here is a simpler problem…can you make this work for a real file that is "growing" locally *without* having to re-open the file each time new stuff is added to the file? I mean, forget the HTTP part of the problem and see if you can get a libhdf5 caller to *behave* the way you want when the underlying file is being "assembled" in the way you plan? I think there may be ways of getting it to work if you treat it really as multiple HDF5 either by using things like a) external datasets, b) mounting files or c) using the 'family' virtual file driver (vfd). However, all of these approaches *will* have some requirement for at least *some* apriori knowledge of the file's contents.

Hope that helps.

Mark

···

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Lion Krischer <lion.krischer@gmail.com<mailto:lion.krischer@gmail.com>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Monday, March 28, 2016 6:03 AM
To: "hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>" <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] Assembling HDF5 file piece by piece

Hi all,

we are planning on developing a web service that assembles potentially
very large HDF5 files on the fly and sends them over HTTP. We cannot
create the files and then send them once they are created - for one the
required disc space might not be available on the server and the latency
until the first byte is sent would also not be acceptable. Thus we would
need to assemble the files piece by piece and send each piece once it is
available.

Is this possible at all with the HDF5 format? We don't mind creating the
binary files by hand but the final results must be readable by libhdf5.

Initial tries with some simple scripts and a hex editor already failed
at the superblock level, namely the "End of File Address". If its set to
0 (or some other wrong number that is not the file size) libhdf5 raises
an error and refuses to read the file (note that I did adjust the
superblock checksum every time). If we assemble the file piece by piece
we don't know the final file size at the time the superblock is created
and streamed over HTTP. We also don't know what ends up in the final
file before we would have to send the first chunk - it might be possible
to determine an upper bound ahead of time but it would add significant
complexity to the point where we might not be able to execute the project.

Is there any way around this? Some magic "End of File Address" number or
some way to move the superblock to the end of the file? The HDF5 spec is
really long and I did not yet read everything so maybe I am missing
something.

Assuming this can be dealt with: Are there any other potential
roadblocks we might stumble into? If not: Any change on changing the
HDF5 spec/libhdf5 so it interprets a "End of File Address" of 0 as
"unspecified"?

Looking forward to your thoughts and advice. Thanks a lot!

Lion

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://secure-web.cisco.com/1tGyGB-0KIHvsExwdsWPpOoJwgx-8GBodHaq-RZ1bXDd_CjgkYfyaKop3dEDAW9cZ0LkoZC35m7s-YISk4PrXXjPS1Df4XWYaSFRVVIWwVbdy7zTTYfcrFa-Y6W_1N69L2PEA8QTAZVbFCKzx4CJ6W1elRG4omISfq1bMKtmJNAdJFZGOMfobAY71Y9fUz_-9FZmv00qMvRZOGirXRINpiVgOy0VRtPuE29yjhS-Iwaci91KPHEWTHYukvP7QeC7DzqgKviQmBHSYs02YLUgYunfv3nbZKbNplKVvi44P2AlrwDLQGtCQsxy2B9DMMQfQ_QCCYxgriCsp5hP96uzK-N0Yixp6D0Gc0aQf0Zm9euk/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5