I Could use some help parsing HDF5 files, in particular w/ Data Object Headers

Hi Marc,
What is the message(s) that are in the object header? If one of them is a “continuation” message, it’ll point at another block of space in the file where more messages are stored for that object. I would also recommend getting familiar with the ‘h5debug’ tool that’s part of the distribution, as it will help you parse through the internal file format details.

	Quincey

Quincey,

thanx a lot for replying.

that’s my point: there don’t seem to be ANY valid messages, the entire object header is this:

01000F00 01000000 18000000 00000000 10000800 00000000 34070000 D0020000 00000000

if I “pretend” that that data is messages and just read all 15, I get what would be nonsensical data:

reading <ObjectHeader @638, 15 messages, size 18/>
reading <ObjectHeaderMessage @644, Type Nil Flags 10 Size 0/>
reading <ObjectHeaderMessage @64C, Type Nil Flags 34 Size 0/>
reading <ObjectHeaderMessage @654, Type 720 Flags 00 Size 0/>
reading <ObjectHeaderMessage @65C, Type Nil Flags 54 Size 0/>
reading <ObjectHeaderMessage @664, Type Nil Flags FF Size 0/>
reading <ObjectHeaderMessage @670, Type Nil Flags 24 Size 0/>
reading <ObjectHeaderMessage @678, Type DataLayout Flags 00 Size 0/>
reading <ObjectHeaderMessage @680, Type ObjectHeaderContinuation Flags 48 Size 0/>
reading <ObjectHeaderMessage @688, Type Nil Flags 50 Size 0/>
reading <ObjectHeaderMessage @690, Type DataLayout Flags 9C Size 0/>
reading <ObjectHeaderMessage @698, Type Nil Flags 00 Size 0/>
reading <ObjectHeaderMessage @6A0, Type Nil Flags 01 Size 0/>
reading <ObjectHeaderMessage @6A8, Type 72 Flags 00 Size 0/>
reading <ObjectHeaderMessage @6B0, Type Nil Flags 00 Size 0/>
reading <ObjectHeaderMessage @6B8, Type Nil Flags 00 Size 0/>

(but note that. by the fourth or so message I’m already into the space of the next “TREE” and “HEAP” objects — so this is garbage data; even the ObjectHeaderContinuation message it shows is not really one — it’s long past data thats valid, and just happens to hit a 0x10:

Hi Marc,
Could you post the file on Dropbox? It would be easier to see what’s going on with the full context.

	Quincey

The file is available here: https://www.dropbox.com/s/5tt4riai3qw8vyb/Curacao180420162506.PPI8166.h5?dl=0

The Group Symbol Table for the sample I showed above is at 0x01B4, with the first Group Symbol Table Entry at 0x01BC which points to the Object Header at 0x0638.

thanx!

Got it. What’s the path to the object, in the group hierarchy? (Or the address of the object header)

Quincey

the path/name is “how”, for this one.

my processing starts with the root entry

  • GroupSymbolTableEntry @0x002C (Header at 0x004C)

which is cached in the scratchpad (and I just ignore what’s at 4C, for now, but even that one already has the same “problems”) and points to the start tree and heap at

  • LocalHeap @0x0098
  • Tree @0x0074

which has a sub-tree

  • Tree @0x0B24

which contains

  • GroupSymbolTable @0x01B4, 2 symbols

the first of which is the one I mentioned above:

  • GroupSymbolTableEntry @0x01BC, ‘how’ Header at 0x0638 />
  • ObjectHeader @0x0638, 15 messages, size 18 />

The ObjectHeader at 0x0638 (but really all the other ones too) is what confuses me. as it doesn’t match up (15 messages, but only size 18, and none of the 18 bytes make sense :wink:

OK, I’ve found it. Here’s my process of getting there:

tools/src/misc/h5debug /Users/koziol/Downloads/Web/Curacao180420162506

.PPI8166.h5

Reading signature at address 0 (rel)

File Super Block…

File name (as opened): /Users/koziol/Downloads/Web/C

uracao180420162506.PPI8166.h5

File name (after resolving symlinks): /Users/koziol/Downloads/Web/C

uracao180420162506.PPI8166.h5

File access flags 0x00000000

File open reference count: 1

Address of super block: 0 (abs)

Size of userblock: 0 bytes

Superblock version number: 1

Free list version number: 0

Root group symbol table entry version number: 0

Shared header version number: 0

Size of file offsets (haddr_t type): 4 bytes

Size of file lengths (hsize_t type): 4 bytes

Symbol table leaf node 1/2 rank: 1

Symbol table internal node 1/2 rank: 1

Indexed storage internal node 1/2 rank: 1

File status flags: 0x01

Superblock extension address: UNDEF (rel)

Shared object header message table address: UNDEF (rel)

Shared object header message version number: 0

Number of shared object header message indexes: 0

Address of driver information block: UNDEF (rel)

Root group symbol table entry:

Name offset into private heap: 0

Object header address: 76

Cache info type: Symbol Table

Cached entry information:

  B-tree address:                              116

  Heap address:                                152

tools/src/misc/h5debug /Users/koziol/Downloads/Web/Curacao180420162506

.PPI8166.h5 76

Reading signature at address 76 (rel)

Object Header…

Dirty: FALSE

Version: 1

Header size (in bytes): 16

Number of links: 1

Number of messages (allocated): 2 (2)

Number of chunks (allocated): 1 (2)

Chunk 0…

Address: 76

Size in bytes: 24

Gap: 0

Message 0…

Message ID (sequence number): 0x0011 `stab’ (0)

Dirty: FALSE

Message flags:

Chunk number: 0

Raw message data (offset, size) in chunk: (24, 8) bytes

Message Information:

  B-tree address:                              116

  Name heap address:                           152

Message 1…

Message ID (sequence number): 0x0000 `null’ (0)

Dirty: FALSE

Message flags:

Chunk number: 0

Raw message data (offset, size) in chunk: (40, 0) bytes

Message Information:

  <No info for this message>

tools/src/misc/h5debug /Users/koziol/Downloads/Web/Curacao180420162506.PPI8166.h5 116 152

Reading signature at address 116 (rel)

Tree type ID: H5B_SNODE_ID

Size of node: 36

Size of raw (disk) key: 4

Dirty flag: False

Level: 1

Address of left sibling: UNDEF

Address of right sibling: UNDEF

Number of children (max): 2 (2)

Child 0…

Address: 2852

Left Key:

  Heap offset:                                 0

  Name:

Right Key:

  Heap offset:                                 8

  Name:                                        what

Child 1…

Address: 2816

Left Key:

  Heap offset:                                 8

  Name:                                        what

Right Key:

  Heap offset:                                 16

  Name:                                        where

tools/src/misc/h5debug /Users/koziol/Downloads/Web/Curacao180420162506

.PPI8166.h5 2852 152

Reading signature at address 2852 (rel)

Tree type ID: H5B_SNODE_ID

Size of node: 36

Size of raw (disk) key: 4

Dirty flag: False

Level: 0

Address of left sibling: UNDEF

Address of right sibling: 2816

Number of children (max): 2 (2)

Child 0…

Address: 436

Left Key:

  Heap offset:                                 0

  Name:

Right Key:

  Heap offset:                                 32

  Name:                                        image1

Child 1…

Address: 2744

Left Key:

  Heap offset:                                 32

  Name:                                        image1

Right Key:

  Heap offset:                                 8

  Name:                                        what

tools/src/misc/h5debug /Users/koziol/Downloads/Web/Curacao180420162506.PPI8166.h5 436 152

Reading signature at address 436 (rel)

Symbol Table Node…

Dirty: No

Size of Node (in bytes): 72

Number of Symbols: 2 of 2

Symbol 0:

Name: `how’

Name offset into private heap: 24

Object header address: 1592

Cache info type: Nothing Cached

Symbol 1:

Name: `image1’

Name offset into private heap: 32

Object header address: 2564

Cache info type: Nothing Cached

tools/src/misc/h5debug /Users/koziol/Downloads/Web/Curacao180420162506.PPI8166.h5 1592

Reading signature at address 1592 (rel)

Object Header…

Dirty: FALSE

Version: 1

Header size (in bytes): 16

Number of links: 1

Number of messages (allocated): 15 (16)

Number of chunks (allocated): 2 (2)

Chunk 0…

Address: 1592

Size in bytes: 24

Gap: 0

Chunk 1…

Address: 1844

Size in bytes: 720

Gap: 0

Message 0…

Message ID (sequence number): 0x0010 `hdr continuation’ (0)

Dirty: FALSE

Message flags:

Chunk number: 0

Raw message data (offset, size) in chunk: (24, 8) bytes

Message Information:

  Continuation address:                        1844

  Continuation size in bytes:                  720

  Points to chunk number:                      1

Message 1…

Message ID (sequence number): 0x0000 `null’ (0)

Dirty: FALSE

Message flags:

Chunk number: 0

Raw message data (offset, size) in chunk: (40, 0) bytes

Message Information:

  <No info for this message>

Message 2…

Message ID (sequence number): 0x0011 `stab’ (0)

Dirty: FALSE

Message flags:

Chunk number: 1

Raw message data (offset, size) in chunk: (8, 8) bytes

Message Information:

  B-tree address:                              1632

  Name heap address:                           1668

Message 3…

Message ID (sequence number): 0x000c `attribute’ (0)

Dirty: FALSE

Message flags:

Chunk number: 1

Raw message data (offset, size) in chunk: (24, 48) bytes

Message Information:

  Name:                                        "WMO"

  Character Set of Name:                       ASCII

  Object opened:                               FALSE

  Object:                                      0

  Creation Index:                              0

  Datatype...

     Encoded Size:                             12

     Type class:                               integer

     Size:                                     4 bytes

     Version:                                  1

     Byte order:                               little endian

     Precision:                                32 bits

     Offset:                                   0 bits

     Low pad type:                             zero

     High pad type:                            zero

     Sign scheme:                              2's comp

  Dataspace...

     Encoded Size:                             8

     Space class:                              H5S_SCALAR

Message 4…

Message ID (sequence number): 0x000c `attribute’ (1)

Dirty: FALSE

Message flags:

Chunk number: 1

Raw message data (offset, size) in chunk: (80, 40) bytes

Message Information:

  Name:                                        "place"

  Character Set of Name:                       ASCII

  Object opened:                               FALSE

  Object:                                      0

  Creation Index:                              0

  Datatype...

     Encoded Size:                             8

     Type class:                               text string

     Size:                                     8 bytes

     Version:                                  1

     Character Set:                            ASCII

     String Padding:                           NULL Terminated

  Dataspace...

     Encoded Size:                             8

     Space class:                              H5S_SCALAR

Message 3…

Message ID (sequence number): 0x000c `attribute’ (0)

Dirty: FALSE

Message flags:

Chunk number: 1

Raw message data (offset, size) in chunk: (24, 48) bytes

Message Information:

  Name:                                        "WMO"

  Character Set of Name:                       ASCII

  Object opened:                               FALSE

  Object:                                      0

  Creation Index:                              0

  Datatype...

     Encoded Size:                             12

     Type class:                               integer

     Size:                                     4 bytes

     Version:                                  1

     Byte order:                               little endian

     Precision:                                32 bits

     Offset:                                   0 bits

     Low pad type:                             zero

     High pad type:                            zero

     Sign scheme:                              2's comp

  Dataspace...

     Encoded Size:                             8

     Space class:                              H5S_SCALAR

Message 4…

Message ID (sequence number): 0x000c `attribute’ (1)

Dirty: FALSE

Message flags:

Chunk number: 1

Raw message data (offset, size) in chunk: (80, 40) bytes

Message Information:

  Name:                                        "place"

  Character Set of Name:                       ASCII

  Object opened:                               FALSE

  Object:                                      0

  Creation Index:                              0

  Datatype...

     Encoded Size:                             8

     Type class:                               text string

     Size:                                     8 bytes

     Version:                                  1

     Character Set:                            ASCII

     String Padding:                           NULL Terminated

  Dataspace...

     Encoded Size:                             8

     Space class:                              H5S_SCALAR

Message 7…

Message ID (sequence number): 0x000c `attribute’ (4)

Dirty: FALSE

Message flags:

Chunk number: 1

Raw message data (offset, size) in chunk: (248, 56) bytes

Message Information:

  Name:                                        "endepochs"

  Character Set of Name:                       ASCII

  Object opened:                               FALSE

  Object:                                      0

  Creation Index:                              0

  Datatype...

     Encoded Size:                             12

     Type class:                               integer

     Size:                                     4 bytes

     Version:                                  1

     Byte order:                               little endian

     Precision:                                32 bits

     Offset:                                   0 bits

     Low pad type:                             zero

     High pad type:                            zero

     Sign scheme:                              2's comp

  Dataspace...

     Encoded Size:                             8

     Space class:                              H5S_SCALAR

Message 8…

Message ID (sequence number): 0x000c `attribute’ (5)

Dirty: FALSE

Message flags:

Chunk number: 1

Raw message data (offset, size) in chunk: (312, 40) bytes

Message Information:

  Name:                                        "system"

  Character Set of Name:                       ASCII

  Object opened:                               FALSE

  Object:                                      0

  Creation Index:                              0

  Datatype...

     Encoded Size:                             8

     Type class:                               text string

     Size:                                     6 bytes

     Version:                                  1

     Character Set:                            ASCII

     String Padding:                           NULL Terminated

  Dataspace...

     Encoded Size:                             8

     Space class:                              H5S_SCALAR

Message 9…

Message ID (sequence number): 0x000c `attribute’ (6)

Dirty: FALSE

Message flags:

Chunk number: 1

Raw message data (offset, size) in chunk: (360, 48) bytes

Message Information:

  Name:                                        "software"

  Character Set of Name:                       ASCII

  Object opened:                               FALSE

  Object:                                      0

  Creation Index:                              0

  Datatype...

     Encoded Size:                             8

     Type class:                               text string

     Size:                                     5 bytes

     Version:                                  1

     Character Set:                            ASCII

     String Padding:                           NULL Terminated

  Dataspace...

     Encoded Size:                             8

     Space class:                              H5S_SCALAR

Message 10…

Message ID (sequence number): 0x000c `attribute’ (7)

Dirty: FALSE

Message flags:

Chunk number: 1

Raw message data (offset, size) in chunk: (416, 64) bytes

Message Information:

  Name:                                        "wavelength"

  Character Set of Name:                       ASCII

  Object opened:                               FALSE

  Object:                                      0

  Creation Index:                              0

  Datatype...

     Encoded Size:                             20

     Type class:                               floating-point

     Size:                                     4 bytes

     Version:                                  1

     Byte order:                               little endian

     Precision:                                32 bits

     Offset:                                   0 bits

     Low pad type:                             zero

     High pad type:                            zero

     Internal pad type:                        zero

     Normalization:                            implied

     Sign bit location:                        31

     Exponent location:                        23

     Exponent bias:                            0x0000007f

     Exponent size:                            8

     Mantissa location:                        0

     Mantissa size:                            23

  Dataspace...

     Encoded Size:                             8

     Space class:                              H5S_SCALAR

Message 11…

Message ID (sequence number): 0x000c `attribute’ (8)

Dirty: FALSE

Message flags:

Chunk number: 1

Raw message data (offset, size) in chunk: (488, 64) bytes

Message Information:

  Name:                                        "pulsewidth"

  Character Set of Name:                       ASCII

  Object opened:                               FALSE

  Object:                                      0

  Creation Index:                              0

  Datatype...

     Encoded Size:                             20

     Type class:                               floating-point

     Size:                                     4 bytes

     Version:                                  1

     Byte order:                               little endian

     Precision:                                32 bits

     Offset:                                   0 bits

     Low pad type:                             zero

     High pad type:                            zero

     Internal pad type:                        zero

     Normalization:                            implied

     Sign bit location:                        31

     Exponent location:                        23

     Exponent bias:                            0x0000007f

     Exponent size:                            8

     Mantissa location:                        0

     Mantissa size:                            23

  Dataspace...

     Encoded Size:                             8

     Space class:                              H5S_SCALAR

Message 12…

Message ID (sequence number): 0x000c `attribute’ (9)

Dirty: FALSE

Message flags:

Chunk number: 1

Raw message data (offset, size) in chunk: (560, 48) bytes

Message Information:

  Name:                                        "lowprf"

  Character Set of Name:                       ASCII

  Object opened:                               FALSE

  Object:                                      0

  Creation Index:                              0

  Datatype...

     Encoded Size:                             12

     Type class:                               integer

     Size:                                     4 bytes

     Version:                                  1

     Byte order:                               little endian

     Precision:                                32 bits

     Offset:                                   0 bits

     Low pad type:                             zero

     High pad type:                            zero

     Sign scheme:                              2's comp

  Dataspace...

     Encoded Size:                             8

     Space class:                              H5S_SCALAR

Message 13…

Message ID (sequence number): 0x000c `attribute’ (10)

Dirty: FALSE

Message flags:

Chunk number: 1

Raw message data (offset, size) in chunk: (616, 48) bytes

Message Information:

  Name:                                        "highprf"

  Character Set of Name:                       ASCII

  Object opened:                               FALSE

  Object:                                      0

  Creation Index:                              0

  Datatype...

     Encoded Size:                             12

     Type class:                               integer

     Size:                                     4 bytes

     Version:                                  1

     Byte order:                               little endian

     Precision:                                32 bits

     Offset:                                   0 bits

     Low pad type:                             zero

     High pad type:                            zero

     Sign scheme:                              2's comp

  Dataspace...

     Encoded Size:                             8

     Space class:                              H5S_SCALAR

Message 14…

Message ID (sequence number): 0x000c `attribute’ (11)

Dirty: FALSE

Message flags:

Chunk number: 1

Raw message data (offset, size) in chunk: (672, 48) bytes

Message Information:

  Name:                                        "nodes"

  Character Set of Name:                       ASCII

  Object opened:                               FALSE

  Object:                                      0

  Creation Index:                              0

  Datatype...

     Encoded Size:                             8

     Type class:                               text string

     Size:                                     4 bytes

     Version:                                  1

     Character Set:                            ASCII

     String Padding:                           NULL Terminated

  Dataspace...

     Encoded Size:                             12

     Space class:                              H5S_SIMPLE

        Rank:                                  1

        Dim Size:                              {1}

        Dim Max:                               CONSTANT

So, it looks like the only message in the first “chunk” of the object’s header is the continuation message (message #0, above).  I haven’t looked at the bytes in particular, but the file is correct and should be parseable, using the file format spec.  Could you be more specific about which part of the file format spec is confusing?  (And I can improve that piece)

Quincey

To maybe explain better what my issue is, lets look at the very first (root) entry at 0x004C. here is the data (which incidentally overlaps with the ScratchPad for the GroupSymbolTableEntry from @002C, but I assume that’s on purpopse/an optimization):

01000200
01000000
18000000

00000000
11000800
00000000
74000000
98000000
00000000

I’m going by this part of the spec. so what I expect to see is:

  • 4 bytes “Version, Total Number of Header Messages”. those make sense “01 00 0200” - Version 1, 2 messages
  • 4 bytes “Object Reference Count”. 01000000, again makes sense, count 1
  • 4 bytes “Object Header Size”. 18000000. all the headers seem to have exactly a size of 18 — ie 6 more 4-byte pairs.

here’s where stuff falls apart for me, because next I expect:

  • 4 bytes “Header Message Type #1, Size of Header Message Data #1”. I have 00000000, so that’d be a NIL message without size. seems odd, but fair enough, could be valid.
  • 4 bytes “Header Message #1 Flags, Reserved (zero)”. 11000800. so that’d be 11 for flags (which makes little sense) and the 008000 is where the spec says it should be reserved/zero. so that’s a red flag
  • 0 bytes “Header Message Data #1” because size was 0. so thats fine.
  • 4 bytes “Header Message #2 Flags, Reserved (zero)”. 00000000. again a NIL message, ok.
  • 4 bytes “Header Message #2 Flags, Reserved (zero)”. 74000000. '74` makes not sense as flag (and also happens to be the location of the first tree. double fishy.

it seems to me that the remaining 18 bytes of a object header don’t actually contain header message infos as spec’ed.


Looking at your output, I don’t recognize values, but I don’t see how the values match the spec: I’m adding // comments inline:

Reading signature at address 76 (rel). // that's 0x4C
Object Header…
Dirty: FALSE
Version: 1
Header size (in bytes): 16
Number of links: 1
Number of messages (allocated): 2 (2). // ok, that matches what I read, 2 messages
Number of chunks (allocated): 1 (2). // where does this info come from?

Chunk 0…
Address: 76.   // again./still thats 0x4C
Size in bytes: 24.  // thats 0x18, so matches there size I see. so far so goo...

Gap: 0
Message 0…
Message ID (sequence number): 0x0011 `stab’ (0) // I assume this comes from `11000800`? but the spec mentioned no "Message ID", it has Message "flags" here!?
Dirty: FALSE
Message flags:

Chunk number: 0
Raw message data (offset, size) in chunk: (24, 8) bytes
Message Information:
  B-tree address:                              116.   // thats 0x74 — again I can see where the value comes from
  Name heap address:                           152. // thats 0x98 — dito.

Message 1…

IOW, it seems that these 24 bytes:

00000000
11000800
00000000
74000000
98000000
00000000

get intrepreted VASTLY different than what id expect from what the spec says in “IV.A.1.a. Version 1 Data Object Header Prefix”?

I think I got it. i missed that the entries are 8-byte aligned, so I must skip sets of 4, where necessary!

Got it all working. thanks so much for your help!

Super! Glad to help you out. :slight_smile: Would you suggest any text that could be improved in the format spec?

Quincey
1 Like

I’d say just make it a bit more clear that any of the message blocks might be preceded by padding bytes.

Also, I noticed that the 8-aligment is (oddly) not relative to the file, but relative to the start of the header itself (both the header, or any continuation blocks, it appears, can be “misaligned” e.g. at x4 or xC alignments), and if they are, the individual messages apparently need to match that alignment (vs x0 or x8, if they were globally aligned). Seems weird and was a bit unexpected.

I would also be suprised. I think the “Header messages are aligned on
8-byte boundaries for version 1 object headers.” in the spec could be made
more clear and mention this.

Also think this information could be repeated in the description of the
Header Message Type #1 field in the field descrpition table below,
something like “[…]. This field is preceded by 0-7 padding bytes in order
to align the header message to 8-byte boundaries relative to the start of
the header.” (wording could probably be better). For those who happen to
miss the introductory sentence and just go by the field description table.

/Innocent bystander Elvis

1 Like

I’m stumped on something else in the file, again. I’m reading the header messages of /image1/data with start at 0x0b58 and they all look sensible in general:

  <DataSpaceObjectHeaderMessage @B58: <2 Dimension(s), [720x720]>>
  <DataTypeObjectHeaderMessage @B70: <FixedPoint Size 1 0/8>>
  <ObjectHeaderMessage @B88, Type DataStorageFillValue Flags 01 Size 8/>
  <ObjectHeaderMessage @B98, Type DataStorageFilterPipeline Flags 01 Size 32/>
  <ObjectHeaderMessage @BC0, Type DataLayout Flags 01 Size 24/>
  <ObjectHeaderMessage @BE0, Type ObjectModificationTime Flags 00 Size 8/>
  <ObjectHeaderMessage @BF0, Type Nil Flags 00 Size 96/>

except I cannot make sense of the DataLayout message at 0x0bc0:

08001800 // DataLayout, size 24
01000000 // Flags 1: "Constant"
030203A0 // Version 3, Layout Class 2 (Chunked) (but the "03A0" looks out of place, should be reserved/0? (although the docs only says "This space inserted only to align table nicely" and doesn't mentioned "0", for once — which seems inconsistent with other padding fields)
0C0000D0 // Dimensionality "12" seems wrong, and the "D0" is out of place, should be reserved/0)
020000D0 // supposed address of the B-Tree — but D000000C is not a valid address

I’m probably missing something silly again?

A second oddity: the object header at 0x6910 claims to have 14 messages. yet the file cleanly ends after message 12, the remaining two messages simple don’t exist (yet, the official HDF5 tools don’t complain… The 12 messages look valid, so this isn’t total garbage data.

reading <ObjectHeader @6910, 14 messages, size 18>
reading <ObjectHeaderContinuationObjectHeaderMessage @6920 => 69C4>
reading <SymbolTableObjectHeaderMessage @69C4, 0 child object(s)>
reading <AttributeObjectHeaderMessage @69D4, 'product' <String Size 4> <0 Dimension(s), []>, <Data 50504900 ("PPI")>>
reading <AttributeObjectHeaderMessage @6A04, 'prodpar' <FloatingPoint Size 4> <0 Dimension(s), []>, <Data 0000003F ("")>>
reading <AttributeObjectHeaderMessage @6A44, 'quantity' <String Size 4> <0 Dimension(s), []>, <Data 44425A00 ("DBZ")>>
reading <AttributeObjectHeaderMessage @6A7C, 'startdate' <String Size 9> <0 Dimension(s), []>, <Data 323031383034323000 ("20180420")>>
reading <AttributeObjectHeaderMessage @6ABC, 'starttime' <String Size 7> <0 Dimension(s), []>, <Data 31363235303600 ("162506")>>
reading <AttributeObjectHeaderMessage @6AF4, 'enddate' <String Size 9> <0 Dimension(s), []>, <Data 323031383034323000 ("20180420")>>
reading <AttributeObjectHeaderMessage @6B2C, 'endtime' <String Size 7> <0 Dimension(s), []>, <Data 31363235303600 ("162506")>>
reading <AttributeObjectHeaderMessage @6B5C, 'gain' <FloatingPoint Size 4> <0 Dimension(s), []>, <Data 0000003F ("")>>
reading <AttributeObjectHeaderMessage @6B9C, 'offset' <FloatingPoint Size 4> <0 Dimension(s), []>, <Data 000000C2 ("")>>
reading <AttributeObjectHeaderMessage @6BDC, 'nodata' <FloatingPoint Size 4> <0 Dimension(s), []>, <Data 00007F43 ("")>>
reading <AttributeObjectHeaderMessage @6C1C, 'undetect' <FloatingPoint Size 4> <0 Dimension(s), []>, <Data 00000000 ("0")>>
WARNING: ObjectHeader <ObjectHeader @6910, 14 messages, size 18> trying to read beyond EOF for message #13.

If you look under “This Document” in the spec, you’ll see:

Various format tables in this document have cells with “This space

inserted only to align table nicely”. These entries in the table are just
to make the table presentation nicer and do not represent any values or
padding in the file."

So those “fields” are simply a visual formatting thing in the spec (to
avoid getting a ragged right side of the tables), not some padding or
reserved space in the actual file.

This means that the 03A0 that you see is the beginning of the Properties of
the Data Layout Message. Since it’s chunked (class 2), this means the 03 is
the dimensionality (3 dimensions) and A0 is the first byte of the address
of the b-tree.

Hope that makes sense.

Elvis

dwarfland https://forum.hdfgroup.org/u/dwarfland
May 12

I’m stumped on something else in the file, again. I’m reading the header
messages of /image1/data with start at 0x0b58 and they all look sensible
in general:

<DataSpaceObjectHeaderMessage @B58: <2 Dimension(s), [720x720]>>
<DataTypeObjectHeaderMessage @B70: <FixedPoint Size 1 0/8>>
<ObjectHeaderMessage @B88, Type DataStorageFillValue Flags 01 Size 8/>
<ObjectHeaderMessage @B98, Type DataStorageFilterPipeline Flags 01 Size 32/>
<ObjectHeaderMessage @BC0, Type DataLayout Flags 01 Size 24/>
<ObjectHeaderMessage @BE0, Type ObjectModificationTime Flags 00 Size 8/>
<ObjectHeaderMessage @BF0, Type Nil Flags 00 Size 96/>

except I cannot make sense of the DataLayout message at 0x0bc0:

08001800 // DataLayout, size 24
01000000 // Flags 1: “Constant”
030203A0 // Version 3, Layout Class 2 (Chunked) (but the “03A0” looks out of place, should be reserved/0? (although the docs only says “This space inserted only to align table nicely” and doesn’t mentioned “0”, for once — which seems inconsistent with other padding fields)
0C0000D0 // Dimensionality “12” seems wrong, and the “D0” is out of place, should be reserved/0)
020000D0 // supposed address of the B-Tree — but D000000C is not a valid address

I’m probably missing something silly again?

If you look under “This Document” in the spec, you’ll see:

Various format tables in this document have cells with “This space

inserted only to align table nicely”. These entries in the table are just
to make the table presentation nicer and do not represent any values or
padding in the file."

So those “fields” are simply a visual formatting thing in the spec (to
avoid getting a ragged right side of the tables), not some padding or
reserved space in the actual file.

This means that the 03A0 that you see is the beginning of the Properties
of the Data Layout Message. Since it’s chunked (class 2), this means the 03
is the dimensionality (3 dimensions) and A0 is the first byte of the
address of the b-tree.

Hope that makes sense.

BTW when you’re done with your library it would be great if you posted a
link.

Side note: From time to time I’ve thought about writing a minimal C library
to read a subset of HDF5 myself. The reason being that we are doing
multi-threaded reading of multiple HDF5 files in a GUI app (to avoid
blocking the UI), but the fact that the HDF5 library handles thread safety
by simply taking a global lock (so is not thread efficient) is becoming an
annoyance, since large reads can block small ones for quite some time. Our
requirements are not big, we simply need to be able to read chunked
compressed datasets, and we’re only using the “old” pre-1.10 format. So I
don’t think a minimal thread safe/thread efficient C library to do just
that would be that big an effort. We have no need for writing files, and we
have no need for HPC features like MPI et.c.

If anyone else reading this has already written such a library (I’m
thinking maybe for embedded applications?), please shout out!

Elvis

Definitely will do, once it’s (a) done and (b) the context of the project I’m writing this for is far enough along that I’ll be able to.

FWIW, I’m writing this in Elements. while I’m focused on testing on .NET right now, the final code should compile for .NET, Java and Cocoa, as well as native Windows, Linux and WebAssembly :wink:

Looks like my reply via email to this part didn’t arrive. Thanx, this helped a lot. I completely missed that bit of info. I’m past this now — next on my list, later this week, handling Raw Data Chunk tree nodes :wink:

Hello Elvis!

13.05.2018 22:18, Elvis Stansvik пишет:

BTW when you’re done with your library it would be great if you posted a
link.

Side note: From time to time I’ve thought about writing a minimal C library
to read a subset of HDF5 myself. The reason being that we are doing
multi-threaded reading of multiple HDF5 files in a GUI app (to avoid
blocking the UI), but the fact that the HDF5 library handles thread safety
by simply taking a global lock (so is not thread efficient) is becoming an
annoyance, since large reads can block small ones for quite some time. Our
requirements are not big, we simply need to be able to read chunked
compressed datasets, and we’re only using the “old” pre-1.10 format. So I
don’t think a minimal thread safe/thread efficient C library to do just
that would be that big an effort. We have no need for writing files, and we
have no need for HPC features like MPI et.c.

If anyone else reading this has already written such a library (I’m
thinking maybe for embedded applications?), please shout out!

Indeed, there are several alternative implementations of HDF5 data
interface:

  1. Forum thread by Markus Krug:
    http://hdf-forum.184993.n3.nabble.com/HDF-lib-incompatible-with-HDF-file-spec-td4029881.html
  2. libmysofa (for embedded devices):
    https://github.com/hoene/libmysofa
  3. pyfive (pure Python HDF5 reader):
    https://github.com/jjhelmus/pyfive

I collect these links because I believe alternative implementations of a
data format are crucial, for the purpose of long-term “sustainability”;
but also for optimized performance (e.g. improved concurrency) and
exploring brave possibilities beyond initial intent (think pyfive +
http://brython.info/).

Do you, reader of this message, happen to develop another one? Please share!

Best wishes,
Andrey Paramonov

Andrey_Paramonov https://forum.hdfgroup.org/u/andrey_paramonov
May 14

Hello Elvis!

13.05.2018 22:18, Elvis Stansvik пишет:

BTW when you’re done with your library it would be great if you posted a
link.

Side note: From time to time I’ve thought about writing a minimal C library
to read a subset of HDF5 myself. The reason being that we are doing
multi-threaded reading of multiple HDF5 files in a GUI app (to avoid
blocking the UI), but the fact that the HDF5 library handles thread safety
by simply taking a global lock (so is not thread efficient) is becoming an
annoyance, since large reads can block small ones for quite some time. Our
requirements are not big, we simply need to be able to read chunked
compressed datasets, and we’re only using the “old” pre-1.10 format. So I
don’t think a minimal thread safe/thread efficient C library to do just
that would be that big an effort. We have no need for writing files, and we
have no need for HPC features like MPI et.c.

If anyone else reading this has already written such a library (I’m
thinking maybe for embedded applications?), please shout out!

Indeed, there are several alternative implementations of HDF5 data
interface:

  1. Forum thread by Markus Krug:
    http://hdf-forum.184993.n3.nabble.com/HDF-lib-
    incompatible-with-HDF-file-spec-td4029881.html
    http://hdf-forum.184993.n3.nabble.com/HDF-lib-incompatible-with-HDF-file-spec-td4029881.html
  2. libmysofa (for embedded devices):
    https://github.com/hoene/libmysofa
  3. pyfive (pure Python HDF5 reader):
    https://github.com/jjhelmus/pyfive

Thanks Andrey, I did not know about libsofa. Will have a look.

Elvis

I collect these links because I believe alternative implementations of a