Hi,
I am unable to find a satisfactory way to model object hierarchies, specifically something as follows:
class B { ... };
class A {
std::vector <B> m_b;
};
Here are some of my requirements:
* The number of B's is relatively small, probably lower than 10
* There's no upper bound
* It needs to be possible to append data later
* The order the items are written needs to be preserved
My initial thoughts were to have a new dataset for each "list" of sub objects and then reference it from the owning object. In this model, each 'A' object has it's own list of indexes to B objects:
[A DATASET]
data1, "DATASET REF A_B_1"
data2, "DATASET REF A_B_2"
data3, "DATASET REF A_B_3"
[B DATASET]
B1
B2
B3
B4
B5
[A_B_1 DATASET]
Index to B1
Index to B3
Index to B4
[A_B_2 DATASET]
Index to B2
Index to B5
[A_B_3 DATASET]
Index to B3
Unfortunately, at least the way I implemented this, the approach was too slow. It also seemed to be the case that as lots and lots of datasets were added the performance would degrade significantly.
My current approach is to use region references. In order to preserve the write order, I write the written index to a LIST dataset and once all B's have been written I create region references to the list indexes.
The above therefore looks like:
[A's DATASET]
data1, "DATASET REGION REF 0,2"
data2, "DATASET REGION REF 3,4"
data3, "DATASET REGION REF 5"
[B DATASET]
B1
B2
B3
B4
B5
[A_B_LIST]
Index to B1
Index to B3
Index to B4
Index to B2
Index to B5
Index to B3
Time wise, this performed significantly better than the previous model (and meets my requirements), however, it seems that the size of the hdf file is now extremely large, mostly due to the region references.
I have a test case with about 10k records and each record has 4 region references. If I create and write out all the data with the correct region references, then the size of the resulting file is about 2.5Mb. If I write the data with "empty" region references then size of the file is about 250k.
Is this expected? Is there a way I can optimise this?
Finally, is there an standard approach for modelling this kind of data HDF that I should be using?
Many thanks for your time,
Richard
···
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.