Type confusion for chunk sizes


#1

@ajelenak recently brought up “removing the contiguous storage layout and only have chunked and compact” in What do you want to see in “HDF5 2.0”? making me think about how chunks are exposed in the public API versus their internal library use.

I realized that much of the public chunk API wisely exposes chunk sizes as hsize_t, an unsigned 64-bit integer such as in H5Dget_chunk_info, H5Dget_chunk_info_by_coord, and H5Dget_chunk_storage_size making this possible.

H5Dwrite_chunk uses size_t rather than hsize_t. For the most part, I think this is similarly also a unsigned 64-bit integer. I suppose size_t and hsize_t could differ.

The recent H5Dchunk_iter in 1.13 uses uint32_t, which I understand to be the current internal storage type for HDF5 chunks. I created an issue and potential fix for this.

Should the chunk size be canonically hsize_t (unsigned 64-bit integer) in all places? Is there another reason that the size in H5Dwrite_chunk may differ? Does a change from size_t to hsize_t require a versioned function to avoid breaking backwards compatability?


#2

The problem is that size means different things in different APIs. In the case of H5Dwrite, chunk size is specified in bytes (in storage). In other places, chunk size is represented as the number of elements with an implicit reference to datatype element “size.”

Yes, if the unit is a dataype element count. No, if the unit is storage (or memory) byte size.

G.


#3

All the functions I cited above are discussing bytes.

The first three are hsize_t. The fourth is size_t. The fifth is uint32_t.

@derobins makes the point that they should all be uint32_t since chunk sizes are currently limited to 32-bits within the library.


#4

Yikes! :slightly_frowning_face:


#5

Please add any further comments to this GitHub issue: https://github.com/HDFGroup/hdf5/issues/2131.

.