I’m pulling this out to a new thread from here…
I’m still stuck a bit with my HDF5 reader implementation. I’m trying to read the actual data block in this file (dropbox link), which according to the messages is (supposedly) GZIP-deflated.
My DataLayoutObjectHeaderMessage is at 0x0BC0
and points to a tree at 0x0CA0
<DataLayoutObjectHeaderMessage @BC0: 2 Dimension(s), [], Size 1>
<Tree @CA0, 0 children, siblings (-, -)>
54524545 //sig
01000100 // flags & co
FFFFFFFF //sibling
FFFFFFF.F// sibling
F85B0000 // size of chunk
00000000 // filtermask (ie none are skipped
00000000 // dim1 (64 bits)
00000000 // dim1 (64 bits)
00000000 // dim2 (64 bits)
00000000 // dim2 (64 bits)
00000000 // dim extra (64 bits)
00000000 // dim extra (64 bits)
180D0000 // address => 0d18
which looks correct to me. the full list of messages on the object is:
<DataSpaceObjectHeaderMessage @B58: <2 Dimension(s), [720x720]>>
<DataTypeObjectHeaderMessage @B70: <FixedPoint Size 1 0/8>>
<DataStorageFillValueObjectHeaderMessage @B88: >
<DataStorageFilterPipelineObjectHeaderMessage @B98: Deflate> // GZIP!
<DataLayoutObjectHeaderMessage @BC0: 2 Dimension(s), [], Size 1>
<ObjectHeaderContinuationObjectHeaderMessage @BE0, >
<NilObjectHeaderMessage @BF0>
so the only relevant data transformation seems to be the GZIP Deflate filter. Yet the data does not seem to be a valid GZIP stream; I tried several different implementations and they all refuse the data The data starts in my file at 0xD18, and looks like this:
789CECBDF777E2D89680AB1FDE5A6FBDB9333775
DF4E151D48069C730E6424A104220703CE95ABABA
AF39D3B71CDCCFFDC6F6F09BBB08D6DB08123C4
FEBACB0113E4A38FED7DD2D6EFBF13DD85EB04D
6074B102DE8C861929BB026DDD198C426D8D22B8
...
it’s the beginning of a block of data that looks “random” (preceded by lots of zeros and more structured “HDF5-looking” bytes, so I’m tempted to believe it’s the correct address — but maybe the raw chunk doesn’t contain the GZIP data right away, but has another preamble or header? I’m not sure what a GZIP compressed stream of data should look like, but it fails right on the first byte(s) it reads (with the bit-more-precise error “Message: Bad state (invalid stored block lengths)”, when using the open source gzip implementation).
Unfortunately, the spec just says “Filters supported by The HDF Group are documented immediately below.” but the does not really define how the filters work in more detail, aside from stating that id “01” means “GZIP deflate compression”…
I’m sur i’m missing something, does the filter add some sort of preamble or other processing acound the actual GZIP data?
Any help would be greatly appreciated, as I need to move forward with this project… (I’ll open source the reader implementation, when done
thanx,
marc