Writing (direct) chunk piece by piece

Hello,
the HDF5 API provides H5Dwrite_chunk to write a full chunk, but I wonder if there is a way to write chunks piece by piece. The idea is to aggregate multiples data (compressed or not) in one chunk without the need of an intermediated memory buffer and memory copies.
I remember a suggestion to use H5Dget_chunk_info to get the chunk offset in the file then use direct access pwrite (considering that the structure of the file is already there). But I am not sure how that plays with compressed chunks… or if there is a better alternative.

Assuming there’s no compression, this could be done as follows:

  • Allocate the chunk early
  • Obtain the address of the chunk
  • Do unbuffered I/O (w/o the HDF5 library) to the chunk

This is “running w/ scissors,” but it can be effective.

G.

1 Like

Similar has been done: see h5cpp::packet_table, FYI I sneaked an alignment bug into the pipeline, thanks for Bin Dong@berkely lab for spotting it. (should not affect you, as long as your chunks are aligned, as should be)

You should be able to modify h5::append such that it aggregates data from different queues, and flushes them to the disks. See the chunk packing part, and the filter chain.
Also look in this repository where collect examples answered on this c++ thread. If I recall there should be a lock-free queue, and a ZMQ with c++ and fortran.

steve

Hi Steve,

Similar has been done: see h5cpp::packet_table, …

I did a git grep pwrite and git grep H5Dget_chunk_info on your repo and could not find anything, could you be more specific where similar (aka “writing chunk piece by piece”) has been done?

Thank you Gerd,
I guess there is no option for compressed chunks since “allocated chunk early” is not possible with variable size chunks?

The idea of a chunk (compressed or otherwise) being treated as an indivisible unit is baked in. If you knew the size of the compressed chunk in advance, you could just H5Dwrite_chunk dummy bytes and then do the same approach, but that’s maybe unlikely, and you’d be writing data, albeit 2nd time unbuffered, multiple times. G.

1 Like

If you are trying to iterate all the chunks, then H5Dchunk_iter may be advantageous over H5Dget_chunk_info.

ImarisWriter is a piece of software that uses H5Dwrite_chunk to write a file chunk by chunk:

For uncompressed chunks, here’s one approach that I’ve done in the past along the same lines of @gheber’s suggestion.

  1. Create a HDF5 file template by using H5Pset_meta_blocksize to be large enough to hold all of the meta data for the file. This will consolidate all of the metadata into a single block at the beginning of the file.
  2. Allocate the chunks early as @gheber suggests.
  3. Verify the template by confirming via H5Dchunk_iter that the chunks are stored consecutively.
  4. Read in single consolidated meta block header as your template header.
  5. To write uncompressed blocks manually, without the HDF5 library, write the meta block header followed by the chunks in the order detected by H5Dchunk_iter.

I’m not quite sure how to extend this to compressed chunks, but I do have some thoughts.

A. You can configure a dataset with a filter, but then write all the chunks without the filter applied indicated by setting the appropriate bit of the filter_mask in H5Dwrite_chunk to be 1. This might allow the chunk to be preallocated via the template approach.
B. If we could at a later time modify the filter_mask to now indicate that the chunk is indeed compressed, then we could insert compressed chunks into a file at a later time. This would not necessarily save space on disk, but could reduce the I/O requirements of reading the chunks.

Since we have already invoked running w/ scissors thinking, perhaps we could investigate the chunk data structure itself.

I. Create the template HDF5 file with the appropriate filter pipeline, a large meta block size, and preallocated chunks (either early allocated if possible or manually via H5Dwrite_chunk).
II. Read in the consolidated metadata block header.
III. Identify the non-paged Fixed Array Index in the metadata.
IV. Modify the Data Block Elements for the filtered dataset chunk.
V. Recompute the Fixed Array Datablock Checksum.

Maybe some GNU poke recipes would be useful here, @gheber ?

By similar I don’t mean an exact match. In my interpretation chunk is the elementary unit of IO ops – I am not discussing the size, but its nature – therefore writing partial chunks in this model is not possible. This interpretation matches the IO cache mechanism inside of various storage devices, including memory architectures.

Now lets see what would it take to do it with a chunk which has been encrypted with a block cypher (pseudo code):

1. `pread(fd, data, partial_size, offset)`
2. goto #1 until finished reading full chunk because the chunk is atomic respect to block cipher
3. have chunk fully loaded, can call filter_chain

instead of this the preferred method is to read the entire chunk, then decode now you can do this with getting the offset, chunk size or just use direct chunk read; profile the code and if it is in 90% of the IO bandwidth of an NVME device you call it a day.

Can you please outline how one can do better reading partial chunks, when a chunk is atomic from the filter pipeline perspective?