I am having troubles accessing the data for Attributes found in the Object Header for items. I am building a parser from the, “HDF5 File Format Specification 3.0”, using C#. I am able to parse a document and the structure matches the structure displayed by the, “HDFView 3.2.0”, application. The problem is most of the Attributes from the document have a Datatype of, “Variable-Length”, and a Dataspaces of zero dimensions. The majority are variable length strings for the Datatype. From the document, the data for the attribute follows the Dataspace specification, but since it has zero dimensions, I am expecting zero data for the attribute, and when I view the file in a hex editor, there are no meaningful strings.
I have also built the HDF5 API and tried to get the data and it looks like it is finding the Dataspace to have zero dimensions also (it looks like a check for when it gets zero for the dimension is needed since it tries to allocate a zero sized buffer and massively crashes because the pointer to the buffer reference 0x0, anyways…).
My problem is that for one of these Attributes in the HDF5 document, it should have a value according to the HDFView application. I have searched the document in the hex editor and the string it is associating with the Attribute is not in the surrounding area of where the message was parsed, it is in another location within the file.
I know this is a shot in the dark, but any ideas on what is going wrong in my parser and the HDF5 API that the HDFView application is able to resolve the data for the Attribute?
In this case, the value is stored in a global heap collection (GCOL), which begins at address 0x00000800. The object header contains the attribute metadata only and a GCOL descriptor/locator.
Why the different treatment? It’s easy to update the fixed-size attribute value in place without hassle. Since the variable-length attribute value can be changed to almost anything, at least the core metadata structure (OHDR) will remain mostly intact (GCOL ref. update), but the value reallocation is delegated to low-level file space infrastructure and happening elsewhere.
I forgot to mention that there is no claim to the “efficiency” of the current variable-length element storage implementation. We (and others) have plenty of ideas to improve the HDF5 library implementation, e.g., see here. Contributors of any kind are welcome!
Finally, the HSDS implementation of HDF5 already incorporated that lesson in its store layout.
Sorry for the delay, I responded to your query instead of posting here. So here is what I responded to last week:
Let me first give an overview of the project I am working on. I am on a team that provides a suite of tools to various entities to collect and analyze data collected from various training/live events using various, “Collectors”, that save the collected data to a database for further analysis. The various data can be collected from Sea/Air/Ground assets.
The tools that we are creating were developed for Windows using C#, however.
Along with the live data being collected, many of the sites have files containing data from previous events that they want imported into the database and have that information included in the analysis along with all the other collected data.
To that end, one of the sites would like to have their HDF5 files imported. Since the files are generic, I need to be able to parse any HDF5 and extract the data using C# since our import tool was written in that (it has various plug-ins to import many differently encoded file types). Fortunately, I do not have to take into account the locking provided by the HDF5 API since no writes will be occurring. I am currently at the point where I am able to parse the sample document provided to me and all the Groups/Datasets/Datatypes are matching what the HDFViewer is displaying, now comes the fun part of actually extracting the data.
I definitely appreciate all the assistance the HDF5 team has provided me. I realize I am approaching things from a different angle than the typical programmer user base since I am creating a parser from the specification document instead of just using the API. There are some parts of the documentation that are lacking, and the support the you guys are providing is Outstanding, along with having a debug version of “h5dump” that I can use in Visual Studio.