Dynamically building a compound datatype

Hi folks -
I'm investigating HDF5 as a solution to a problem I've got. I'm working on
an application that reads a variety of message types, converts them to a
canonical form, and writes out the data (primarily numeric) in various
formats. I'm attracted by the idea that HDF would provide a well-supported
way of presenting the hierarchies inherent in the data. Complicating the
picture is the fact that many of the data types are variable-length and
various of the message types have conditional fields. In short, sizes and
offsets can be pretty fluid.

I've been looking at the documentation for compound datatype, and to my
inexperienced eye it seems like the most natural way to represent a
message. I have some questions, though:
1) The message structure I'm dealing with is discovered at runtime, while
the example code uses structs that are defined at compile time. Is there a
common idiom for specifying compound datatypes dynamically?
2). The documentation indicates that datatype members can't be variable
length? Are there common workarounds (e.g. allocating the max that a member
might need)?
3) The documentation mentions that members can be small arrays - how small
is small?

Thanks for your attention.

-Josiah

Josiah, great questions. Compound datatypes make sense for dense data and
may
not be the right vehicle to approach a sparse situation. (How sparse is your
data?)

1) The message structure I'm dealing with is discovered at runtime,
   while the example code uses structs that are defined at compile time.
   Is there a common idiom for specifying compound datatypes dynamically?

Nothing prevents you from doing an H5Tcreate(H5T_COMPOUND, size) at runtime
and then add the appropriate members dynamically (via H5Tinsert).
You might write some kind of type information parse, e.g., based on a JSON
or XML representation. Remember though that all elements in a dataset (or
attribute)
must be of the same type. (Unless you make things totally opaque and loose
all visibility...)

2). The documentation indicates that datatype members can't be variable

length?

Are there common workarounds (e.g. allocating the max that a member might

need)?

Where did you read that? You can have VLEN (sequence) or variable-length
string members,
no problem there. How efficient that can be is another story. (There's no
support
for compression, if you have VLENs & Co in the mix.)

3) The documentation mentions that members can be small arrays - how small

is small?

Small or big arrays. As small as an array where each element is just one
byte.
For fixed-size arrays, have a look at the H5T_ARRAY class (up to 32
dimensions).

Best, G.

Josiah,

I dynamically build compound types all the time based upon .NET type information. There are some details that have to be addressed, but the correspondence for basic structures is roughly 1:1. I haven't managed to automate array creation within a structure just yet. So yeah, it can be done.

Gerd,

Wasn't there something about the VLENs being turned off by default because they caused a big memory leak? I've brute-forced my way around variable length issues because I didn't want to experiment.

Scott

···

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Gerd Heber
Sent: Tuesday, July 16, 2013 3:34 PM
To: 'HDF Users Discussion List'
Subject: Re: [Hdf-forum] Dynamically building a compound datatype

Josiah, great questions. Compound datatypes make sense for dense data and may not be the right vehicle to approach a sparse situation. (How sparse is your
data?)

1) The message structure I'm dealing with is discovered at runtime,
   while the example code uses structs that are defined at compile time.
   Is there a common idiom for specifying compound datatypes dynamically?

Nothing prevents you from doing an H5Tcreate(H5T_COMPOUND, size) at runtime and then add the appropriate members dynamically (via H5Tinsert).
You might write some kind of type information parse, e.g., based on a JSON or XML representation. Remember though that all elements in a dataset (or
attribute)
must be of the same type. (Unless you make things totally opaque and loose all visibility...)

2). The documentation indicates that datatype members can't be
variable

length?

Are there common workarounds (e.g. allocating the max that a member
might

need)?

Where did you read that? You can have VLEN (sequence) or variable-length string members, no problem there. How efficient that can be is another story. (There's no support for compression, if you have VLENs & Co in the mix.)

3) The documentation mentions that members can be small arrays - how
small

is small?

Small or big arrays. As small as an array where each element is just one byte.
For fixed-size arrays, have a look at the H5T_ARRAY class (up to 32 dimensions).

Best, G.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

________________________________

This e-mail and any files transmitted with it may be proprietary and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the sender. Please note that any views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of Exelis Inc. The recipient should check this e-mail and any attachments for the presence of viruses. Exelis Inc. accepts no liability for any damage caused by any virus transmitted by this e-mail.

Josiah, you can write individual components of a compound type.
You could write one vector after the other to the respective component.
This might be faster than reshuffling the deck completely.

G.

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of
Josiah Slack
Sent: Tuesday, July 16, 2013 3:14 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Dynamically building a compound datatype

Thanks for the responses - they're encouraging. The data is generally pretty
dense, which means that I don't need to rule out compound types yet :slight_smile:
Let's assume that I'm building a compound data type on the fly, using the
H5Tcreate/H5Tinsert approach mentioned above. When it comes time to actually
write the data, there's an additional complication - each field of the data
is in the form of a C++ vector (meaning that I don't immediately have a
pointer to a block of data to pass to H5Dwrite).

On Tue, Jul 16, 2013 at 3:44 PM, Mitchell, Scott - IS <Scott.Mitchell@exelisinc.com> wrote:
Josiah,

I dynamically build compound types all the time based upon .NET type
information. There are some details that have to be addressed, but the
correspondence for basic structures is roughly 1:1. I haven't managed to
automate array creation within a structure just yet. So yeah, it can be
done.

Gerd,

Wasn't there something about the VLENs being turned off by default because
they caused a big memory leak? I've brute-forced my way around variable
length issues because I didn't want to experiment.

Scott

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of
Gerd Heber
Sent: Tuesday, July 16, 2013 3:34 PM
To: 'HDF Users Discussion List'
Subject: Re: [Hdf-forum] Dynamically building a compound datatype

Josiah, great questions. Compound datatypes make sense for dense data and
may not be the right vehicle to approach a sparse situation. (How sparse is
your
data?)

1) The message structure I'm dealing with is discovered at runtime,
while the example code uses structs that are defined at compile time.
Is there a common idiom for specifying compound datatypes dynamically?

Nothing prevents you from doing an H5Tcreate(H5T_COMPOUND, size) at runtime
and then add the appropriate members dynamically (via H5Tinsert).
You might write some kind of type information parse, e.g., based on a JSON
or XML representation. Remember though that all elements in a dataset (or
attribute)
must be of the same type. (Unless you make things totally opaque and loose
all visibility...)

2). The documentation indicates that datatype members can't be
variable

length?

Are there common workarounds (e.g. allocating the max that a member
might

need)?

Where did you read that? You can have VLEN (sequence) or variable-length
string members, no problem there. How efficient that can be is another
story. (There's no support for compression, if you have VLENs & Co in the
mix.)

3) The documentation mentions that members can be small arrays - how
small

is small?

Small or big arrays. As small as an array where each element is just one
byte.
For fixed-size arrays, have a look at the H5T_ARRAY class (up to 32
dimensions).

Best, G.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
________________________________

This e-mail and any files transmitted with it may be proprietary and are
intended solely for the use of the individual or entity to whom they are
addressed. If you have received this e-mail in error please notify the
sender. Please note that any views or opinions presented in this e-mail are
solely those of the author and do not necessarily represent those of Exelis
Inc. The recipient should check this e-mail and any attachments for the
presence of viruses. Exelis Inc. accepts no liability for any damage caused
by any virus transmitted by this e-mail.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Thanks for the responses - they're encouraging. The data is generally
pretty dense, which means that I don't need to rule out compound types yet
:slight_smile:

Let's assume that I'm building a compound data type on the fly, using the
H5Tcreate/H5Tinsert approach mentioned above. When it comes time to
actually write the data, there's an additional complication - each field of
the data is in the form of a C++ vector (meaning that I don't immediately
have a pointer to a block of data to pass to H5Dwrite).

···

On Tue, Jul 16, 2013 at 3:44 PM, Mitchell, Scott - IS < Scott.Mitchell@exelisinc.com> wrote:

Josiah,

I dynamically build compound types all the time based upon .NET type
information. There are some details that have to be addressed, but the
correspondence for basic structures is roughly 1:1. I haven't managed to
automate array creation within a structure just yet. So yeah, it can be
done.

Gerd,

Wasn't there something about the VLENs being turned off by default because
they caused a big memory leak? I've brute-forced my way around variable
length issues because I didn't want to experiment.

Scott

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf
Of Gerd Heber
Sent: Tuesday, July 16, 2013 3:34 PM
To: 'HDF Users Discussion List'
Subject: Re: [Hdf-forum] Dynamically building a compound datatype

Josiah, great questions. Compound datatypes make sense for dense data and
may not be the right vehicle to approach a sparse situation. (How sparse is
your
data?)

> 1) The message structure I'm dealing with is discovered at runtime,
> while the example code uses structs that are defined at compile time.
> Is there a common idiom for specifying compound datatypes dynamically?

Nothing prevents you from doing an H5Tcreate(H5T_COMPOUND, size) at
runtime and then add the appropriate members dynamically (via H5Tinsert).
You might write some kind of type information parse, e.g., based on a JSON
or XML representation. Remember though that all elements in a dataset (or
attribute)
must be of the same type. (Unless you make things totally opaque and loose
all visibility...)

> 2). The documentation indicates that datatype members can't be
> variable
length?
> Are there common workarounds (e.g. allocating the max that a member
> might
need)?

Where did you read that? You can have VLEN (sequence) or variable-length
string members, no problem there. How efficient that can be is another
story. (There's no support for compression, if you have VLENs & Co in the
mix.)

> 3) The documentation mentions that members can be small arrays - how
> small
is small?

Small or big arrays. As small as an array where each element is just one
byte.
For fixed-size arrays, have a look at the H5T_ARRAY class (up to 32
dimensions).

Best, G.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

________________________________

This e-mail and any files transmitted with it may be proprietary and are
intended solely for the use of the individual or entity to whom they are
addressed. If you have received this e-mail in error please notify the
sender. Please note that any views or opinions presented in this e-mail are
solely those of the author and do not necessarily represent those of Exelis
Inc. The recipient should check this e-mail and any attachments for the
presence of viruses. Exelis Inc. accepts no liability for any damage caused
by any virus transmitted by this e-mail.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Perfect, that's good news!

-Josiah

···

On Tue, Jul 16, 2013 at 4:35 PM, Gerd Heber <gheber@hdfgroup.org> wrote:

Josiah, you can write individual components of a compound type.
You could write one vector after the other to the respective component.
This might be faster than reshuffling the deck completely.

G.

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of
Josiah Slack
Sent: Tuesday, July 16, 2013 3:14 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Dynamically building a compound datatype

Thanks for the responses - they're encouraging. The data is generally
pretty
dense, which means that I don't need to rule out compound types yet :slight_smile:
Let's assume that I'm building a compound data type on the fly, using the
H5Tcreate/H5Tinsert approach mentioned above. When it comes time to
actually
write the data, there's an additional complication - each field of the data
is in the form of a C++ vector (meaning that I don't immediately have a
pointer to a block of data to pass to H5Dwrite).

On Tue, Jul 16, 2013 at 3:44 PM, Mitchell, Scott - IS > <Scott.Mitchell@exelisinc.com> wrote:
Josiah,

I dynamically build compound types all the time based upon .NET type
information. There are some details that have to be addressed, but the
correspondence for basic structures is roughly 1:1. I haven't managed to
automate array creation within a structure just yet. So yeah, it can be
done.

Gerd,

Wasn't there something about the VLENs being turned off by default because
they caused a big memory leak? I've brute-forced my way around variable
length issues because I didn't want to experiment.

Scott

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of
Gerd Heber
Sent: Tuesday, July 16, 2013 3:34 PM
To: 'HDF Users Discussion List'
Subject: Re: [Hdf-forum] Dynamically building a compound datatype

Josiah, great questions. Compound datatypes make sense for dense data and
may not be the right vehicle to approach a sparse situation. (How sparse is
your
data?)

> 1) The message structure I'm dealing with is discovered at runtime,
> while the example code uses structs that are defined at compile time.
> Is there a common idiom for specifying compound datatypes dynamically?

Nothing prevents you from doing an H5Tcreate(H5T_COMPOUND, size) at runtime
and then add the appropriate members dynamically (via H5Tinsert).
You might write some kind of type information parse, e.g., based on a JSON
or XML representation. Remember though that all elements in a dataset (or
attribute)
must be of the same type. (Unless you make things totally opaque and loose
all visibility...)

> 2). The documentation indicates that datatype members can't be
> variable
length?
> Are there common workarounds (e.g. allocating the max that a member
> might
need)?

Where did you read that? You can have VLEN (sequence) or variable-length
string members, no problem there. How efficient that can be is another
story. (There's no support for compression, if you have VLENs & Co in the
mix.)

> 3) The documentation mentions that members can be small arrays - how
> small
is small?

Small or big arrays. As small as an array where each element is just one
byte.
For fixed-size arrays, have a look at the H5T_ARRAY class (up to 32
dimensions).

Best, G.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
________________________________

This e-mail and any files transmitted with it may be proprietary and are
intended solely for the use of the individual or entity to whom they are
addressed. If you have received this e-mail in error please notify the
sender. Please note that any views or opinions presented in this e-mail are
solely those of the author and do not necessarily represent those of Exelis
Inc. The recipient should check this e-mail and any attachments for the
presence of viruses. Exelis Inc. accepts no liability for any damage caused
by any virus transmitted by this e-mail.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

It might be time to start a new thread here - but this question does follow
from the previous discussion. Let's say that I'm iteratively building my
compound data type, and (as a simplifying assumption) the fields in the
message are all numeric scalars.

I start off by calling H5Tcreate with the size of message about to be
processed. Then I loop over the fields in the message, keeping track of the
offset and calling H5Tinsert() with the name of the field and its size. Do
I now create a dataset for that field and write it? The examples that come
with the distribution generally have the dataset being associated with the
entire compound type, which wouldn't be ideal for me.

Thanks.

-Josiah

···

On Tue, Jul 16, 2013 at 4:38 PM, Josiah Slack <josiahnmi@gmail.com> wrote:

Perfect, that's good news!

-Josiah

On Tue, Jul 16, 2013 at 4:35 PM, Gerd Heber <gheber@hdfgroup.org> wrote:

Josiah, you can write individual components of a compound type.
You could write one vector after the other to the respective component.
This might be faster than reshuffling the deck completely.

G.

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf
Of
Josiah Slack
Sent: Tuesday, July 16, 2013 3:14 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Dynamically building a compound datatype

Thanks for the responses - they're encouraging. The data is generally
pretty
dense, which means that I don't need to rule out compound types yet :slight_smile:
Let's assume that I'm building a compound data type on the fly, using the
H5Tcreate/H5Tinsert approach mentioned above. When it comes time to
actually
write the data, there's an additional complication - each field of the
data
is in the form of a C++ vector (meaning that I don't immediately have a
pointer to a block of data to pass to H5Dwrite).

On Tue, Jul 16, 2013 at 3:44 PM, Mitchell, Scott - IS >> <Scott.Mitchell@exelisinc.com> wrote:
Josiah,

I dynamically build compound types all the time based upon .NET type
information. There are some details that have to be addressed, but the
correspondence for basic structures is roughly 1:1. I haven't managed to
automate array creation within a structure just yet. So yeah, it can be
done.

Gerd,

Wasn't there something about the VLENs being turned off by default because
they caused a big memory leak? I've brute-forced my way around variable
length issues because I didn't want to experiment.

Scott

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf
Of
Gerd Heber
Sent: Tuesday, July 16, 2013 3:34 PM
To: 'HDF Users Discussion List'
Subject: Re: [Hdf-forum] Dynamically building a compound datatype

Josiah, great questions. Compound datatypes make sense for dense data and
may not be the right vehicle to approach a sparse situation. (How sparse
is
your
data?)

> 1) The message structure I'm dealing with is discovered at runtime,
> while the example code uses structs that are defined at compile time.
> Is there a common idiom for specifying compound datatypes
dynamically?

Nothing prevents you from doing an H5Tcreate(H5T_COMPOUND, size) at
runtime
and then add the appropriate members dynamically (via H5Tinsert).
You might write some kind of type information parse, e.g., based on a JSON
or XML representation. Remember though that all elements in a dataset (or
attribute)
must be of the same type. (Unless you make things totally opaque and loose
all visibility...)

> 2). The documentation indicates that datatype members can't be
> variable
length?
> Are there common workarounds (e.g. allocating the max that a member
> might
need)?

Where did you read that? You can have VLEN (sequence) or variable-length
string members, no problem there. How efficient that can be is another
story. (There's no support for compression, if you have VLENs & Co in the
mix.)

> 3) The documentation mentions that members can be small arrays - how
> small
is small?

Small or big arrays. As small as an array where each element is just one
byte.
For fixed-size arrays, have a look at the H5T_ARRAY class (up to 32
dimensions).

Best, G.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
________________________________

This e-mail and any files transmitted with it may be proprietary and are
intended solely for the use of the individual or entity to whom they are
addressed. If you have received this e-mail in error please notify the
sender. Please note that any views or opinions presented in this e-mail
are
solely those of the author and do not necessarily represent those of
Exelis
Inc. The recipient should check this e-mail and any attachments for the
presence of viruses. Exelis Inc. accepts no liability for any damage
caused
by any virus transmitted by this e-mail.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org