Problems Accessing Atribute Data

anthony.j.ashford · January 5, 2023, 8:00pm

Hi,

I am having troubles accessing the data for Attributes found in the Object Header for items. I am building a parser from the, “HDF5 File Format Specification 3.0”, using C#. I am able to parse a document and the structure matches the structure displayed by the, “HDFView 3.2.0”, application. The problem is most of the Attributes from the document have a Datatype of, “Variable-Length”, and a Dataspaces of zero dimensions. The majority are variable length strings for the Datatype. From the document, the data for the attribute follows the Dataspace specification, but since it has zero dimensions, I am expecting zero data for the attribute, and when I view the file in a hex editor, there are no meaningful strings.

I have also built the HDF5 API and tried to get the data and it looks like it is finding the Dataspace to have zero dimensions also (it looks like a check for when it gets zero for the dimension is needed since it tries to allocate a zero sized buffer and massively crashes because the pointer to the buffer reference 0x0, anyways…).

My problem is that for one of these Attributes in the HDF5 document, it should have a value according to the HDFView application. I have searched the document in the hex editor and the string it is associating with the Attribute is not in the surrounding area of where the message was parsed, it is in another location within the file.

I know this is a shot in the dark, but any ideas on what is going wrong in my parser and the HDF5 API that the HDFView application is able to resolve the data for the Attribute?

Thanks in advance for your time.

gheber · January 6, 2023, 12:06pm

strings_attached.c (1.6 KB)

The attached snippet produces two HDF5 files with string attributes to the root group as fixed-length and variable-length strings, respectively.

fixed.h5 looks like this:

00000000: 8948 4446 0d0a 1a0a 0308 0800 0000 0000  .HDF............
00000010: 0000 0000 ffff ffff ffff ffff c300 0000  ................
00000020: 0000 0000 3000 0000 0000 0000 061a 6137  ....0.........a7
00000030: 4f48 4452 0220 2a06 b863 2a06 b863 2a06  OHDR. *..c*..c*.
00000040: b863 2a06 b863 7802 1200 0000 00ff ffff  .c*..cx.........
00000050: ffff ffff ffff ffff ffff ffff ff0a 0200  ................
00000060: 0100 0015 1200 0400 00ff ffff ffff ffff  ................
00000070: ffff ffff ffff ffff ff0c 2800 0003 0006  ..........(.....
00000080: 0008 0004 0000 6174 7472 3100 1300 0000  ......attr1.....
00000090: 0d00 0000 0200 0000 4865 6c6c 6f2c 2057  ........Hello, W
000000a0: 6f72 6c64 2100 1600 0000 0000 0000 0000  orld!...........
000000b0: 0000 0000 0000 0000 0000 0000 0000 0065  ...............e
000000c0: 5ce2 95                                  \..

In this case, the attribute’s (attr1) value (Hello, World!) is stored in the object header, which begins at address 0x00000030.

vlen.h5 looks like this (truncated):

00000000: 8948 4446 0d0a 1a0a 0308 0800 0000 0000  .HDF............
00000010: 0000 0000 ffff ffff ffff ffff 0018 0000  ................
00000020: 0000 0000 3000 0000 0000 0000 4189 8907  ....0.......A...
00000030: 4f48 4452 0220 8009 b863 8009 b863 8009  OHDR. ...c...c..
00000040: b863 8009 b863 7802 1200 0000 00ff ffff  .c...cx.........
00000050: ffff ffff ffff ffff ffff ffff ff0a 0200  ................
00000060: 0100 0015 1200 0400 00ff ffff ffff ffff  ................
00000070: ffff ffff ffff ffff ff0c 3700 0003 0006  ..........7.....
00000080: 0014 0004 0000 6174 7472 3100 1901 0000  ......attr1.....
00000090: 1000 0000 1000 0000 0100 0000 0000 0800  ................
000000a0: 0200 0000 0d00 0000 0008 0000 0000 0000  ................
000000b0: 0100 0000 0007 0000 0000 0000 0000 0008  ................
000000c0: 38b3 6700 0000 0000 0000 0000 0000 0000  8.g.............
...
00000800: 4743 4f4c 0100 0000 0010 0000 0000 0000  GCOL............
00000810: 0100 0000 0000 0000 0d00 0000 0000 0000  ................
00000820: 4865 6c6c 6f2c 2057 6f72 6c64 2100 0000  Hello, World!...
...

In this case, the value is stored in a global heap collection (GCOL), which begins at address 0x00000800. The object header contains the attribute metadata only and a GCOL descriptor/locator.

Why the different treatment? It’s easy to update the fixed-size attribute value in place without hassle. Since the variable-length attribute value can be changed to almost anything, at least the core metadata structure (OHDR) will remain mostly intact (GCOL ref. update), but the value reallocation is delegated to low-level file space infrastructure and happening elsewhere.

OK?
G.

gheber · January 6, 2023, 12:08pm

We’d love to hear more about your application!

gheber · January 6, 2023, 12:19pm

I forgot to mention that there is no claim to the “efficiency” of the current variable-length element storage implementation. We (and others) have plenty of ideas to improve the HDF5 library implementation, e.g., see here. Contributors of any kind are welcome!

Finally, the HSDS implementation of HDF5 already incorporated that lesson in its store layout.

G.

anthony.j.ashford · January 18, 2023, 4:25pm

Hello Again,

Sorry for the delay, I responded to your query instead of posting here. So here is what I responded to last week:

Let me first give an overview of the project I am working on. I am on a team that provides a suite of tools to various entities to collect and analyze data collected from various training/live events using various, “Collectors”, that save the collected data to a database for further analysis. The various data can be collected from Sea/Air/Ground assets.

The tools that we are creating were developed for Windows using C#, however.

Along with the live data being collected, many of the sites have files containing data from previous events that they want imported into the database and have that information included in the analysis along with all the other collected data.

To that end, one of the sites would like to have their HDF5 files imported. Since the files are generic, I need to be able to parse any HDF5 and extract the data using C# since our import tool was written in that (it has various plug-ins to import many differently encoded file types). Fortunately, I do not have to take into account the locking provided by the HDF5 API since no writes will be occurring. I am currently at the point where I am able to parse the sample document provided to me and all the Groups/Datasets/Datatypes are matching what the HDFViewer is displaying, now comes the fun part of actually extracting the data.

I definitely appreciate all the assistance the HDF5 team has provided me. I realize I am approaching things from a different angle than the typical programmer user base since I am creating a parser from the specification document instead of just using the API. There are some parts of the documentation that are lacking, and the support the you guys are providing is Outstanding, along with having a debug version of “h5dump” that I can use in Visual Studio.

Once again, Thanks for all the assistance!

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Problems Accessing Atribute Data