I am reading the HDF5 File Format specification and looking at data files created with the 1.8 compatibility settings by the HDF5 library version 1.10 wrapped in the Java layer.
The file has a version 2 superblock (i.e. no Group Leaf Node K and Group Internal Node K) in the header.
The superblock does not refer to a Superblock extension, i.e. there is no reference to a “B-tree ‘K’ Values Message”.
The file contains a chunked dataset which uses Version1 B-Trees. How would a reader know the K value for this B-Tree. There are no defaults established in the file format specification.
Q1: Could you please clarify via the file format specification how the K values should be determined in this case?
The Data Layout message for this dataset is a version 3 layout message. While the dataset was created with a single dimension, its Dimensionality value is set at 2.
The specifification says "This (Dimensionality) specifies the number of dimension size fields later in the message.
That is one would read two dimension size fields from the message. But this does not leave enough message data available to read the Dataset Element Size field at the end.
Q2: Could you please clarify why the dimensionality value is one higher thatn the dimensions? Shall a reader subtract 1 from the value in the dimensionality field to be able to decode the layout together with the dataset element size?
Q1. The “B-tree ‘K’ Values Message” is intended for non-default K values. If there isn’t one, the defaults apply. You have a point. (I can’t find them there either.) The defaults should be documented in the file format spec. The defaults are mentioned in the documentation of H5Pset_sym_k (16 and 4) and H5Pset_istore_k (32).
Q2. Can you send us an example? There is no internal adjustment to “dimensionality” or rank of a dataspace. If it says 2, that’s what it is.
Hi Gerd,
thank you for the answer.
AdQ1, as you agree that this should be part of the spec, could you please point out the best place to report these improvement requests. Do you have a tracker for this?
That leads to my original question, shall I subtract 1 from the dimensionality field? This is specified for the version1 and version2 of the datalayout message, but not for version 3, i.e. again I think this is a flaw of the documentation.
I’m warming up to the idea that there is a mistake in the specification. The description of the Dimensionality field changed from versions 1 and 2 to version 3 as follows:
An array has a fixed dimensionality. This field specifies the number of dimension size fields later in the message. The value stored for chunked storage is 1 greater than the number of dimensions in the dataset’s dataspace. For example, 2 is stored for a 1 dimensional dataset.
turned into
A chunk has a fixed dimensionality. This field specifies the number of dimension size fields later in the message.
I think the 3rd 4 is actually the dataset element size which the v3 layout message spec calls for.
I think the confusion can also be seen in the h5debug output:
% h5debug h5ex_d_unlimmod.h5 800
Reading signature at address 800 (rel)
Object Header...
Dirty: FALSE
Version: 1
Header size (in bytes): 16
Number of links: 1
Number of messages (allocated): 6 (8)
Number of chunks (allocated): 1 (2)
Chunk 0...
Address: 800
Size in bytes: 256
Gap: 0
Message 0...
Message ID (sequence number): 0x0005 `fill_new' (0)
Dirty: FALSE
Message flags: <C>
Chunk number: 0
Raw message data (offset, size) in chunk: (24, 8) bytes
Message Information:
Space Allocation Time: Incremental
Fill Time: If Set
Fill Value Defined: Default
Size: 0
Data type: <dataset type>
Message 1...
Message ID (sequence number): 0x0003 `datatype' (0)
Dirty: FALSE
Message flags: <C>
Chunk number: 0
Raw message data (offset, size) in chunk: (40, 16) bytes
Message Information:
Type class: integer
Size: 4 bytes
Version: 1
Byte order: little endian
Precision: 32 bits
Offset: 0 bits
Low pad type: zero
High pad type: zero
Sign scheme: 2's comp
Message 2...
Message ID (sequence number): 0x0001 `dataspace' (0)
Dirty: FALSE
Message flags: <none>
Chunk number: 0
Raw message data (offset, size) in chunk: (64, 40) bytes
Message Information:
Rank: 2
Dim Size: {6, 10}
Dim Max: {UNLIM, UNLIM}
Message 3...
Message ID (sequence number): 0x0008 `layout' (0)
Dirty: FALSE
Message flags: <C>
Chunk number: 0
Raw message data (offset, size) in chunk: (112, 24) bytes
Message Information:
Version: 3
Type: Chunked
Number of dimensions: 3
Size: {4, 4, 4}
Index Type: v1 B-tree
Index address: 1400
Message 4...
Message ID (sequence number): 0x0012 `mtime_new' (0)
Dirty: FALSE
Message flags: <none>
Chunk number: 0
Raw message data (offset, size) in chunk: (144, 8) bytes
Message Information:
Time: 2010-03-18 08:36:44 CDT
Message 5...
Message ID (sequence number): 0x0000 `null' (0)
Dirty: FALSE
Message flags: <none>
Chunk number: 0
Raw message data (offset, size) in chunk: (160, 112) bytes
Message Information:
<No info for this message>
Yes, I believe, you are right on both points (subtract one and documentation error).