What is the maximum payload size stored in Compact Layout

It isn’t clear to me if the Documentation limits the payload as 64KB, or header = book_keeping + payload is 64K. If the latter, how much space is used for book-keeping (if any), or other way: What is the maximum size of payload?

Excerpt: The compact layout can improve storage and access performance for files that have many very tiny datasets. With one I/O access both the header and data values can be read. The compact layout reduces the size of a file, as the data is stored with the header which will always be allocated for a dataset. However, the object header is 64 KB in size , so this layout can only be used for very small datasets.

Motivation: H5CPP computes the layout information respect to the C++ object saved (as well as provides full manual control). In the former case, to take full advantage of HDF5 features one has to know the upper bound of payload to decide the optimal layout.
Most advanced linear algebra systems provide support for tiny/fixed sized matrices – often stored on stack --, providing optimised IO can lead to overall improved end-user experience.

Update Note: H5Pset_layout tells: The raw data size limit is 64K (65520 bytes). however 64K is 2^16 or 65536 which leaves 16bytes for the book keeping.
best: steve

Steve, have a look at https://portal.hdfgroup.org/display/HDF5/File+Format+Specification and see Layout: Compact Storage Property Description In the latest version, I can see 4 bytes being used for book-keeping:
(Version - 1 byte, Layout class - 1 byte, and Size - 2 bytes). ‘Size’ is the size of the “payload.” Note that, for example, the datatype information is stored in a separate message and doesn’t count against the payload size. There is some variability between specification versions. G.

1 Like

@gheber Thanks! I interpret the diagram as:

address   | description
----------|-----------------------------------------------------
  4 byte  | version, dims, layout class, reserved, 
  8 byte  | reserved
     ...  | 4byte * rank; max 32
136 byte  | optional
140 byte  | optional
144 byte  | COMPACT DATA   

The value I am looking for is the greatest safe (or worst case respect to storage) number, after all it is always safe to place the payload into contiguous layout. Considering all the above (If my interpretation is correct) I am getting: 2^16 - 140 = 65396 for the maximum safe size for a compact layout payload.
Am I wrong with my interpretation of the referenced diagram?

steve

Yes, that works for versions 1,2,3 of the format spec. (Subject to change in the future!)
You can gain 4 bytes by taking out one of the optional fields:

Dataset Element Size
The size of a dataset element, in bytes. This field is only present for chunked storage.

G.

1 Like

OK, let’s set maximum safe compact layout payload to 64K - 140 = 65396
therefore: H5CPP_COMPACT_PAYLOAD_MAX_SIZE = 65396 – with the possibility to reconfigure with other cut off value should that be required, and is documented here.

All objects having no explicit dcpl_t with layout and less than H5CPP_COMPACT_PAYLOAD_MAX_SIZE and no filters, or chunks specified shall implicitly be saved as compact layout.

steve

Jawohl.

(Post must be at least 20 characters.)