It isn’t clear to me if the Documentation limits the payload as 64KB, or header = book_keeping + payload is 64K. If the latter, how much space is used for book-keeping (if any), or other way: What is the maximum size of payload?
Excerpt:The compact layout can improve storage and access performance for files that have many very tiny datasets. With one I/O access both the header and data values can be read. The compact layout reduces the size of a file, as the data is stored with the header which will always be allocated for a dataset.However, the object header is 64 KB in size, so this layout can only be used for very small datasets.
Motivation:H5CPP computes the layout information respect to the C++ object saved (as well as provides full manual control). In the former case, to take full advantage of HDF5 features one has to know the upper bound of payload to decide the optimal layout.
Most advanced linear algebra systems provide support for tiny/fixed sized matrices – often stored on stack --, providing optimised IO can lead to overall improved end-user experience.
Update Note:H5Pset_layout tells: The raw data size limit is 64K (65520 bytes). however 64K is 2^16 or 65536 which leaves 16bytes for the book keeping.
best: steve
Steve, have a look at https://portal.hdfgroup.org/display/HDF5/File+Format+Specification and see Layout: Compact Storage Property Description In the latest version, I can see 4 bytes being used for book-keeping:
(Version - 1 byte, Layout class - 1 byte, and Size - 2 bytes). ‘Size’ is the size of the “payload.” Note that, for example, the datatype information is stored in a separate message and doesn’t count against the payload size. There is some variability between specification versions. G.
The value I am looking for is the greatest safe (or worst case respect to storage) number, after all it is always safe to place the payload into contiguous layout. Considering all the above (If my interpretation is correct) I am getting: 2^16 - 140 = 65396 for the maximum safe size for a compact layout payload.
Am I wrong with my interpretation of the referenced diagram?
Yes, that works for versions 1,2,3 of the format spec. (Subject to change in the future!)
You can gain 4 bytes by taking out one of the optional fields:
Dataset Element Size
The size of a dataset element, in bytes. This field is only present for chunked storage.
OK, let’s set maximum safe compact layout payload to 64K - 140 = 65396
therefore: H5CPP_COMPACT_PAYLOAD_MAX_SIZE = 65396 – with the possibility to reconfigure with other cut off value should that be required, and is documented here.
All objects having no explicit dcpl_t with layout and less than H5CPP_COMPACT_PAYLOAD_MAX_SIZE and no filters, or chunks specified shall implicitly be saved as compact layout.