Reference Bit Field

gheber · January 25, 2023, 2:31pm

I’m not sure I understand the question. Let’s try to circle back. You have a chunked dataset C of a compound datatype, with one of the fields being a dataset region reference. You have succeeded in locating and reading the chunks, and you can parse all fields of interest. (No small feat. Congratulations!) The field that appears a little “mysterious” is

H5T_REFERENCE { H5T_STD_REF_DSETREG } "VarDataRef";

We know each reference logically represents a pair (dataset ID, selection). You’ve established that the dataset R referenced is of rank 1, of an opqaque datatype of size one, and that the selection S is an irregular hyperslab. Is your question how to apply selection S to dataset R?

G.

gheber · January 25, 2023, 2:36pm

Assuming that that’s the question, the answer is easy. Logically, your dataset R is just a 1D byte array (each element is just a byte) and you’d just walk through that dataset and pick out the blocks listed in the hyperslab selection. And that’s it.

A dataset region reference is like a pointer to a dataset region. Dereferencing such a “pointer” is locating and retrieving the elements of that region (selection).

OK?

G.

anthony.j.ashford · January 25, 2023, 3:03pm

Hi,

Yes, I am trying to wrap my head around how the Selection applies to the Dataset comprised of one byte values. Just for reference and so we are on the same page, in the statement, “each reference logically represents a pair (dataset ID, selection)”, does the, “dataset ID”, imply the, “Object Header”, for the Dataset? Just for my clarification and so we are on the same page.

Also, as I stated in a previous post, The Compound Datatype has an element labeled,“VarDataOffset” that is an unsigned 64 bit integer. I mention this because the Selection for each Reference Type has the Starting Offset match this value in each Compound Type, i.e. the first row has an offset of 0, the second has one of 208, the third has one of 500, and so on. These values are matching the Selection starting offset for each row.

The problem I am having is for each of these starting offsets applied to the Dataset, they have the same value as the first Reference with an offset of 0. I checked the list of bytes that comprise the Dataset and for some reason, for each of the starting offsets, the value pointed to is the same as the first one. This leads me to believe I am definitely doing something wrong regarding the Dataset and using the Selection to find the correct element in the Dataset. I am fairly positive that the Selection is getting processed correctly since the starting offsets match the, “VarDataOffset” value in the Compound records.

Other than these last bits of observation, I think we are both on the same page as far as what I have and what I am having problems with. As an aside, I could probably dump the Dataset for the Reference Type along with the important information from the Selections of the first few entries so you can better see what I have to work with.

Looking forward to hearing back from you again. Thanks as always.

gheber · January 26, 2023, 12:25am

I should be more careful in choosing my words: As you have seen already, the value of each VarDataRef field is a global heap ID, which stores the encoded dataset region reference. According to the specification, this is the object address (=dataset object header address) followed by the dataspace selection information. When I said dataset ID, I meant the object address. OK?

OK, I didn’t pick up on that. So you say this information duplicates the offset part of the hyperslab selection (start, stride, count, block) in the dataset region reference. Redundant but fine.

The offset gives us the first dataset element (opaque<1> = byte) of a selection. What about the other selection parameters? Does the VarDataSize field shadow the block size or stride of the hyperslab? I think you said that we are looking at an irregular hyperslab selection, which is a list of blocks. If there is more than one block in a selection, I don’t see how the producer could have captured that in a single number.

Maybe the first element in all these selections is the same byte value, but that doesn’t mean that applies to the rest of the selection. They are different copies of the same opaque value, but they are not the same dataset element, because their positions (offsets) in the dataset are different.

An example would help.

G.

anthony.j.ashford · January 26, 2023, 9:57am

Hi,

Yes it is strange that the producer of the document would know the offsets from the Selections and be able to store them as a field, but that is what I am seeing for the fist several elements.

I will try to get a dump of the Dataset of the References, a brief screen shot of the HDFView screen showing what it is interpreting for the first batch of rows, and the information that the Dataspace Selection is coming back with for the first dozen or so items.

If you can think of any other information I can provide, please let me know. I asked about letting you have a copy of the entire document, but we have not heard back from the owner yet.

Thanks for the input and I will package up the pertinent information that can hopefully better illustrate the problem.

anthony.j.ashford · January 26, 2023, 4:47pm

Hi Again,

Attached is some hopefully useful information regarding the Dataset and the Selection. I have included a diagram of how the V1 Btree has the Chunks positioned for the Dataset in question. I have also included a CSV file for the data that is contained in the Dataset (rather large, but Notepad++ is able to handle it). A text file containing the first twenty or so Dataspace Selections for the first elements of the Dataset. A file containing a partial view of what the HDFView application is interpreting the Reference Type field value to be.

If there is any other information I can provide, feel free to let me know. Thanks.Reference Datatype Information.7z (746.3 KB)

gheber · January 27, 2023, 12:57pm

Just a few comments/questions on the inventory:

Reference Dataset V1 Btree.jpg

This is the B-tree indexing the chunks of the dataset referenced in the dataset region references, right? The offsets of the chunks look sensible, but what’s more important are the keys of the B-tree nodes. In general, the offsets of the chunks have nothing to do with the offsets in the hyperslabs. There is a correlation here because we are dealing with an opaque<1> datatype, but that’s just a coincidence. The offsets in the hyperslabs are to be resolved/interpreted via the B-tree keys.

Depending on the acquisition order, it is very likely that the chunks were allocated in a monotonically increasing (dataspace) order, but that would rely on an implementation detail.

Reference Datasource Selection.txt

The Start Offset and End Offset fields look sensible. Remember, they refer to offsets in the dataspace and not chunk offsets or offsets inside chunks! The HDFView Reference Value is bogus: it’s a 12-byte (8-bytes for the GCOL offset and a 4-byte object index) struct cast in some weird way to an integer.

Reference Dataset.csv

These are the bytes (as uint<8>) if the opaque<1> dataset?
If yes, the first selection

Start Offset [1,1] = 0
End Offset [1,1] = 207

represents just the first 208 rows of that file.

OK?
G.

anthony.j.ashford · January 30, 2023, 10:03am

Hi,

Thanks for the input. For the, “Reference Dataset V1 BTree.jpg”, I will examine the Keys and provide more information to you.

For, “Reference Datasource Selection.txt”, that is the part that is throwing me. After checking references to Hyperslabs online, from what I can gather, it is like a window into the Dataspace. In the case I have presented, that would be “frames” that range in size from high 200 bytes to low 300 bytes. Given this range of bytes, I am having troubles figuring out how to extract the single bytes that the Opaque type is being referenced from.

The, “Reference Dataset.csv”, is the actual Dataset that describes the Dataspace for the Opaque Type. It is one byte per line since the Datatype that the Reference Type is referencing is a Opaque Type of Size = 1.

Given that I am looking for a single byte (the Opaque data) in a section of the Referenced Dataspace that is a couple hundred bytes in size, I need to figure out how to extract that single byte, i.e. what is it’s offset in the specified Hyperslab. From your first statement about the V1 BTree, I will look into the Keys, however, I was under the impression that the Keys provide size of the chunk, a mask describing the filters to skip when processing the data, and the offset of the chunk within the Dataset that the chunk is a member of.

I guess I am really missing the final piece and that is how do I get the actual offset of the Opaque data (single byte) given the Hyperslab for the given Hyperslab describing a section of the Dataset for the Opaque Type that the Reference Type is referencing. Thanks.

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Reference Bit Field