B-tree terminology in reference manual.

Hi All,

I have to admit I've been totally baffled by meaning of the 1/2 rank and
1/2 node size parameters controlling B-Tree storage for groups and
chunked datasets. What do these parameters mean in terms of the
arrangement of groups in an HDF5 file and number of items, on average in
a group and/or average depth of the hierarchy of groups? I've even
googled these terms and don't find useful information.

If my files typically have between 2 and 5 groups deep with between 5
and 100 objects in a group, what should I set these parameters to?

Any guidance appreciated.

Mark

···

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

Hi Mark,

Hi All,

I have to admit I've been totally baffled by meaning of the 1/2 rank and
1/2 node size parameters controlling B-Tree storage for groups and
chunked datasets. What do these parameters mean in terms of the
arrangement of groups in an HDF5 file and number of items, on average in
a group and/or average depth of the hierarchy of groups? I've even
googled these terms and don't find useful information.

  Another place we should improve our documenation... *sigh* B-tree nodes are allowed to have between the "1/2 rank" and twice that value (the "full rank", maybe? The wikipedia page for B-trees (http://en.wikipedia.org/wiki/B-tree) calls this the "order" of the B-tree) number of entries in them (except for the root of the B-tree, which can have less).

If my files typically have between 2 and 5 groups deep with between 5
and 100 objects in a group, what should I set these parameters to?

  These parameters won't affect the depth, but if you have small numbers of links in a group (or chunks in a dataset), you can reduce the 1/2 rank value to be ~1/2 of the number of links in the largest group. Basically these parameters affect the maximum "fan out" from each node in the B-tree and if you have a small number of entries in your B-trees, your file size should be smaller and your performance may improve by reducing the parameter values. However, if you reduce the parameter values too far, the depth of the B-tree will increase and things will get worse again. I don't normally suggest tweaking these values unless you have unusually weird file layouts like 10,000 links in a group or all the groups having only one link in them, etc.

  Quincey

···

On Jun 23, 2010, at 5:54 PM, Mark Miller wrote: