Is there a way to crop a subregion of 3D matrix dataset stored in hdf5


#1

Hi,
I have a hdf5 file which stores a 3D matrix in \Path\Data1item.

Is there a way (using code) to process directly on the hdf5 file itself (instead of read a sub-block from thisfile then write to another new hdf5 file) to crop the 3D matrix so that the HDF5 stores a sub-block of the 3D matrix?

If there is such a way and the crop was executed, then is there a way to restore the block directly on this file?

Thanks.


#2

You may be interested in this h5cpp armadillo example. The idea is to model the problem with arma::Cube<T> then setting the chunk size h5::chunk{x,y,z} such that the partial IO request gives maximum bandwidth respect to your queries.

Each chunk is the sub block you are referring to. Notice that sparse blocks – ones with all zeros – don’t take up space, allowing you to have block diagonals or other sparse patterns, if that is what you are after for.

Storing the data in a different file/dataset is trivial, you can model the dataset with arma::mat or arma::vec depending your needs – or the rank of the sub view.

Other linear algebra packages such as eigen3, blitz, … are supported. The most popular choices are eigen3, armadillo however.

best: steve


#3

There is currently no explicit mechanism in HDF5 to achieve this behavior. However, there are several ways to mimic it.

  1. For the special case that one corner of the block is the origin (0,0,0), you could use a so-called extendible dataset to achieve this behavior. However, there would be potential storage-savings only if the underlying storage layout is chunked.
  2. You could “fake” this behavior with a chunked dataset where some kind of compression is applied. To do the trimming and save some storage, you could select the complement of your block and write zeros or some masking default value. The underlying “trimmed” chunks would be compressed to nearly nothing.
  3. You can explicitly manage some kind of block structure for your 3D dataset as a 3D array of object references. This is similar to Steven’s suggestion.

G.