Creating/storing data of variable length compound members

Hi,

  I would like to store a collection of the following as a compound data

struct MyBox {
  double minX;
  double minY;
  double minZ;
  double maxX;
  double maxY;
  double maxZ;
  std::vector<size_t> offsets;
};

  I have no problem with the POD min* and max*

  However, I am unsure how best to handle std::vector<size_t>

  I read about variable length for string and was wondering if the
information is applicable.

  Which example code should I consult to have a better understand of
variable length as used within a compound type ?

Cheers

···

--
Nicholas Yue
Graphics - RenderMan, Visualization, OpenGL, HDF5
Custom Dev - C++ porting, OSX, Linux, Windows
http://au.linkedin.com/in/nicholasyue

Nick, there's, as usual, no single right answer.
What's the average length of 'offsets' and what's
the length variability?
( 'size_t' is not a good starting point for a machine
independent representation.)

Representing 'offsets' a VLEN has its price: you'll loose
some performance and the ability to use compression on the data set.

If there's a sensible upper bound on the length and only slight
variation in the length, you might stick with a (fixed-size) ARRAY
component.

A compromise would be to separate the two parts of your compound
and have a 'bounding boxes' (+ HDF5 reference) dataset and an 'offsets'
dataset. Entries in the former would be compounds of your
bounding boxes and an HDF5 region reference into a global
'offsets' dataset. (In this simple case, you can think of a region
reference as a (offset, count) pair which references a contiguous
region in a global 'offsets' dataset.)

G.

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Nicholas Yue
Sent: Monday, October 28, 2013 3:53 PM
To: HDF Users Discussion List
Subject: [Hdf-forum] Creating/storing data of variable length compound members

Hi,

  I would like to store a collection of the following as a compound data

struct MyBox {
  double minX;
  double minY;
  double minZ;
  double maxX;
  double maxY;
  double maxZ;
  std::vector<size_t> offsets;
};

  I have no problem with the POD min* and max*

  However, I am unsure how best to handle std::vector<size_t>

  I read about variable length for string and was wondering if the information is applicable.

  Which example code should I consult to have a better understand of variable length as used within a compound type ?

Cheers
--
Nicholas Yue
Graphics - RenderMan, Visualization, OpenGL, HDF5
Custom Dev - C++ porting, OSX, Linux, Windows
http://au.linkedin.com/in/nicholasyue

Another option would be to play the relational database game and split 'offsets' into a different dataset. Connect the items with some key (an index?) and perhaps a length. Of course connecting the dots is a manual procedure.

Depending on what you're doing this might not be terribly efficient. I've used this pattern a fair amount. Most of my accesses are sequential, so I can just maintain an index for each dataset. In this case, getting the next set of offsets is an increment while keys match.

Scott

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Gerd Heber
Sent: Monday, October 28, 2013 5:13 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Creating/storing data of variable length compound members

Nick, there's, as usual, no single right answer.
What's the average length of 'offsets' and what's
the length variability?
( 'size_t' is not a good starting point for a machine
independent representation.)

Representing 'offsets' a VLEN has its price: you'll loose
some performance and the ability to use compression on the data set.

If there's a sensible upper bound on the length and only slight
variation in the length, you might stick with a (fixed-size) ARRAY
component.

A compromise would be to separate the two parts of your compound
and have a 'bounding boxes' (+ HDF5 reference) dataset and an 'offsets'
dataset. Entries in the former would be compounds of your
bounding boxes and an HDF5 region reference into a global
'offsets' dataset. (In this simple case, you can think of a region
reference as a (offset, count) pair which references a contiguous
region in a global 'offsets' dataset.)

G.

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Nicholas Yue
Sent: Monday, October 28, 2013 3:53 PM
To: HDF Users Discussion List
Subject: [Hdf-forum] Creating/storing data of variable length compound members

Hi,

  I would like to store a collection of the following as a compound data

struct MyBox {
  double minX;
  double minY;
  double minZ;
  double maxX;
  double maxY;
  double maxZ;
  std::vector<size_t> offsets;
};

  I have no problem with the POD min* and max*

  However, I am unsure how best to handle std::vector<size_t>

  I read about variable length for string and was wondering if the information is applicable.

  Which example code should I consult to have a better understand of variable length as used within a compound type ?

Cheers
--
Nicholas Yue
Graphics - RenderMan, Visualization, OpenGL, HDF5
Custom Dev - C++ porting, OSX, Linux, Windows
http://au.linkedin.com/in/nicholasyue

________________________________

This e-mail and any files transmitted with it may be proprietary and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the sender. Please note that any views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of Exelis Inc. The recipient should check this e-mail and any attachments for the presence of viruses. Exelis Inc. accepts no liability for any damage caused by any virus transmitted by this e-mail.

Hi Gerd,

     I have taken your advice and reduce the use of VLEN for only the most essential part (in my case, only one component).

     With regards to the variability of the length (in my use case), it varies from 1 ... N x 10^5. For my use case, this single use of VLEN (taking into account the performance penalty) should still be beneficial as it allows me to represent sparse data which I plan to bring into a photorealistic renderer where memory footprint is high on the requirement to render out out-of-core geometry.

Cheers

···

On 29/10/13 8:13 AM, Gerd Heber wrote:

Nick, there's, as usual, no single right answer.

What's the average length of 'offsets' and what's

the length variability?

( 'size_t' is not a good starting point for a machine

independent representation.)

Representing 'offsets' a VLEN has its price: you'll loose

some performance and the ability to use compression on the data set.

If there's a sensible upper bound on the length and only slight

variation in the length, you might stick with a (fixed-size) ARRAY

component.

A compromise would be to separate the two parts of your compound

and have a 'bounding boxes' (+ HDF5 reference) dataset and an 'offsets'

dataset. Entries in the former would be compounds of your

bounding boxes and an HDF5 region reference into a global

'offsets' dataset. (In this simple case, you can think of a region

reference as a (offset, count) pair which references a contiguous

region in a global 'offsets' dataset.)

--
Nicholas Yue
Graphics - RenderMan, Visualization, OpenGL, HDF5
Custom Dev - C++ porting, OSX, Linux, Windows
http://au.linkedin.com/in/nicholasyue