Needs clarification : stride in HDF

pierre · August 10, 2023, 2:41pm

If I have a non-contiguous 2D memory buffer (with stride), I think that I can only write it to a data set by using stride values in hyperslabs.
However, it can only work if the original stride (usually in bytes in API like OpenCV) is a multiple of the element size.

Are both assumptions correct ?

gheber · August 10, 2023, 3:11pm

The unit of the hyperslab parameters is [element]. In other words, start, stride, count, & block are all expressed in elements (count). A hyperslab selection is a pattern regardless of what the datatype of an associated dataset might be.
Element size [bytes] is a storage concept and doesn’t apply in the (logical) realm of dataspaces and selections. OK?

G.

gheber · August 10, 2023, 3:26pm

I’m not sure I’m following 100%. Yes, if you have a regular pattern, a hyperslab is a compact way of describing a selection. You can also combine multiple hyperslab selections using set operations (see H5S_seloper_t).

OK? G.

pierre · August 10, 2023, 3:46pm

I just wanted to think about the following scenario :
on input, an 8b array (lets say) 2x1000 with 1280 bytes per row (so there are 280 exceeding bytes per row)
If I want to write it to a 2x1000 dataset, I only have two options :

copy the input in a contiguous memory buffer (removing the stride)
use an hyperslab with clock/count/stride describing the layout in order to exclude the exceeding 280 elements

gheber · August 10, 2023, 8:35pm

I’m sorry, I’m a little slow today. What is an 8b array?

pierre · August 11, 2023, 10:39am

I just wanted to mean “an array of bytes” because I chose the simplest kind of element for my question

gheber · August 12, 2023, 12:42am

OK, I’m still confused. Isn’t your input (in-memory) dataset 2x1280? And you want to write a subset of elements to a 2x1000 dataset. Right?

You would do the copy by hand, if the pattern were not easily described by hyperslabs. You could also use H5Dgather to do things in memory.

Ideally, you don’t need a separate hyperslab for each row, unless the pattern is row dependent.

Returning to your original questions:

I don’t understand the concept of a “non-contiguous 2D memory buffer.” Do you have an array of pointers (to multiple buffers) or do you have a single buffer, but you want to skip certain elements? Hyperslabs won’t work with the former.

I still don’t understand this. The stride in a hyperslab selection has nothing to do with the element size, which is a datatype concept. Are you saying the datatype (size) of the elements in memory is different from the elements in the file? That would be fine as long as there is a pre-defined or user-defined datatype conversion function. The latter would need to be registered before you call H5Dwrite or H5Tconvert. In other words, you can have a hyperslab selection and a datatype conversion going on at the same time, but they are seperate considerations.

OK? G.

pierre · August 20, 2023, 9:54am

I think I have the answer I needed, but I will still restate my question for the future readers that might get confused as well.

Usually, in image-manipulation APIs like OpenCV, a 2D array can have exceeding bytes per row, so that each row of the image starts at a “good” address regarding hardware optimizations.
For instance, an image of width 1000 and height 123 might be represented by a buffer of size 1280x123, each row being at address start+row_index*1280. Even if the image width is 1000, each row is made of 1280 bytes, which is the stride value.
Not only for hardware optimizations, it is also tremendously useful to represent rectangular sub-images (ROI : region of interest) with the same data structure (start, dimensions, stride) within the same buffer.
If an image has a “good” size, let’s says here 1280x123, the actual width matches the stride width and there aren’t any exceeding bytes, the whole content of the buffer is “contiguous”.
One last subtelty : the width is usually given as a number of elements, while the stride is in bytes.

In HDF5, the “stride” is somewhat different and describes gaps between blocks in hyperslabs.

My question was oriented as “should I use the HDF5 hyperslab “strides” to mimic the “native stride” of a 2D buffer, when exchanging data bewteen an HDF dataset and a memory-buffer-with-stride, or is there a better practice” ?

dave.allured · August 20, 2023, 4:57pm

@pierre, do not use strides for this scenario. The count spec in the memory hyperslab should create the desired gap between rows, along with block size = 1. Note that the dataset hyperslab (in the file) is specified separately.

There is a similar example in the OLD users guide. Please see section 2.2.5.1, Reading Data into a Differently Shaped Memory Block, in particular Code Example 2-6 which specifies the memory hyperslab.

I will hazard a guess for your 2-D scenario. Untested:

Dataset dimensions = 1000, 123 (in file)
Dataset starts = 0, 0
Dataset strides = NULL (defaults to 1, 1)
Dataset counts = 1000, 123
Dataset blocks = NULL (defaults to 1, 1)

Memory dimensions = 1280, 123
Memory starts = 0, 0
Memory strides = NULL (defaults to 1, 1)
Memory counts = 1000, 123
Memory blocks = NULL (defaults to 1, 1)

If the valid image inside the memory buffer is not left justified, then adjust with the first element of memory starts above.

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Needs clarification : stride in HDF