Strings in Compound type?

Hi,

  how would one create a compound data type that consists out of two strings?
Loose equivalent in C:

struct Type
{
  char *first, *second;
};

In HDF5, it would be a compound type made from two type ID's that are
either "hid_t H5Tvlen_create( hid_t base_type_id )" or H5T_C_S1, to
be inserted as a member "first" and "second" to a

hid_t TypeID = H5Tcreate( H5T_COMPOUND, size_t size );
    H5Tinsert( TypeID, "first" , 0 , H5T_C_S1 );
    H5Tinsert( TypeID, "second", size_t offset, H5T_C_S1 );

However, what would be the "size" of this compound data type and the
offset of the second member?

It would seem this presumably simple approach can't work (would be nice
to document that around H5Tcreate() or H5Tinsert() ) ?

Alternatively, it should be possible to create an array of strings, using

hsize_t dims[1] = { 2 };
H5Tarray_create( H5T_C_S1, 1, dims);

but this would be less verbose as the data type doesn't get named members.

Any hints/recommendation on what would work best?

Thanks,
  Werner

···

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Hi Werner,

···

On Sep 23, 2010, at 12:04 PM, Werner Benger wrote:

Hi,

how would one create a compound data type that consists out of two strings?
Loose equivalent in C:

struct Type
{
  char *first, *second;
};

In HDF5, it would be a compound type made from two type ID's that are
either "hid_t H5Tvlen_create( hid_t base_type_id )" or H5T_C_S1, to
be inserted as a member "first" and "second" to a

hid_t TypeID = H5Tcreate( H5T_COMPOUND, size_t size );
  H5Tinsert( TypeID, "first" , 0 , H5T_C_S1 );
  H5Tinsert( TypeID, "second", size_t offset, H5T_C_S1 );

However, what would be the "size" of this compound data type and the
offset of the second member?

It would seem this presumably simple approach can't work (would be nice
to document that around H5Tcreate() or H5Tinsert() ) ?

Alternatively, it should be possible to create an array of strings, using

hsize_t dims[1] = { 2 };
H5Tarray_create( H5T_C_S1, 1, dims);

but this would be less verbose as the data type doesn't get named members.

Any hints/recommendation on what would work best?

  Nope, having two VL-strings as fields in a compound datatype is fine. Just create a variable length string datatype (tid = H5Tcopy(H5T_C_S1); H5Tset_size(tid, H5T_VARIABLE); ) and then insert that for each field, at the appropriate offset.

  Quincey

Hi Quincey,

Hi Werner,

Hi,

how would one create a compound data type that consists out of two strings?
Loose equivalent in C:

struct Type
{
  char *first, *second;
};

In HDF5, it would be a compound type made from two type ID's that are
either "hid_t H5Tvlen_create( hid_t base_type_id )" or H5T_C_S1, to
be inserted as a member "first" and "second" to a

hid_t TypeID = H5Tcreate( H5T_COMPOUND, size_t size );
  H5Tinsert( TypeID, "first" , 0 , H5T_C_S1 );
  H5Tinsert( TypeID, "second", size_t offset, H5T_C_S1 );

However, what would be the "size" of this compound data type and the
offset of the second member?

It would seem this presumably simple approach can't work (would be nice
to document that around H5Tcreate() or H5Tinsert() ) ?

Alternatively, it should be possible to create an array of strings, using

hsize_t dims[1] = { 2 };
H5Tarray_create( H5T_C_S1, 1, dims);

but this would be less verbose as the data type doesn't get named members.

Any hints/recommendation on what would work best?

  Nope, having two VL-strings as fields in a compound datatype is fine. Just create a variable length string datatype (tid = H5Tcopy(H5T_C_S1); H5Tset_size(tid, H5T_VARIABLE); ) and then insert that for each field, at the appropriate offset.

Excellent, that works well! So the size of the compound data type is
the sum of the length of both strings, and the offset of the second member
is the length of the first one plus one (for the 0-byte), correct?

The only disadvantage seems to be that it's required to copy both
strings into a contiguous memory location, concatenating both into
one, inserting a 0-byte between them. Is there a way to specify two
pointers instead, similar to the variable length data type? Is it
possible to have an attribute of variable length data type? It seems
functions such as H5Dvlen_reclaim() are only for data sets, not for
attributes?

  Werner

···

On Thu, 23 Sep 2010 12:10:18 -0500, Quincey Koziol <koziol@hdfgroup.org> wrote:

On Sep 23, 2010, at 12:04 PM, Werner Benger wrote:

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Hi Werner,

Hi Quincey,

Hi Werner,

Hi,

how would one create a compound data type that consists out of two strings?
Loose equivalent in C:

struct Type
{
  char *first, *second;
};

In HDF5, it would be a compound type made from two type ID's that are
either "hid_t H5Tvlen_create( hid_t base_type_id )" or H5T_C_S1, to
be inserted as a member "first" and "second" to a

hid_t TypeID = H5Tcreate( H5T_COMPOUND, size_t size );
H5Tinsert( TypeID, "first" , 0 , H5T_C_S1 );
H5Tinsert( TypeID, "second", size_t offset, H5T_C_S1 );

However, what would be the "size" of this compound data type and the
offset of the second member?

It would seem this presumably simple approach can't work (would be nice
to document that around H5Tcreate() or H5Tinsert() ) ?

Alternatively, it should be possible to create an array of strings, using

hsize_t dims[1] = { 2 };
H5Tarray_create( H5T_C_S1, 1, dims);

but this would be less verbose as the data type doesn't get named members.

Any hints/recommendation on what would work best?

  Nope, having two VL-strings as fields in a compound datatype is fine. Just create a variable length string datatype (tid = H5Tcopy(H5T_C_S1); H5Tset_size(tid, H5T_VARIABLE); ) and then insert that for each field, at the appropriate offset.

Excellent, that works well! So the size of the compound data type is
the sum of the length of both strings, and the offset of the second member
is the length of the first one plus one (for the 0-byte), correct?

  No, the size of the compound datatype (in memory) is just the size of the struct (i.e. 2 pointers, along with possible alignment).

The only disadvantage seems to be that it's required to copy both
strings into a contiguous memory location, concatenating both into
one, inserting a 0-byte between them. Is there a way to specify two
pointers instead, similar to the variable length data type?

  Yes, it is explicitly for pointers.

Is it possible to have an attribute of variable length data type?

  Yes, that works fine.

It seems functions such as H5Dvlen_reclaim() are only for data sets, not for
attributes?

  That function will work fine for buffers from attributes.

  I recommend reading through the test/tvlstr.c code in the distribution.

  Quincey

···

On Sep 23, 2010, at 12:57 PM, Werner Benger wrote:

On Thu, 23 Sep 2010 12:10:18 -0500, Quincey Koziol <koziol@hdfgroup.org> wrote:

On Sep 23, 2010, at 12:04 PM, Werner Benger wrote:

Hi Quincey,

Hi Werner,

Hi Quincey,

Hi Werner,

Hi,

how would one create a compound data type that consists out of two strings?
Loose equivalent in C:

struct Type
{
  char *first, *second;
};

In HDF5, it would be a compound type made from two type ID's that are
either "hid_t H5Tvlen_create( hid_t base_type_id )" or H5T_C_S1, to
be inserted as a member "first" and "second" to a

hid_t TypeID = H5Tcreate( H5T_COMPOUND, size_t size );
H5Tinsert( TypeID, "first" , 0 , H5T_C_S1 );
H5Tinsert( TypeID, "second", size_t offset, H5T_C_S1 );

However, what would be the "size" of this compound data type and the
offset of the second member?

It would seem this presumably simple approach can't work (would be nice
to document that around H5Tcreate() or H5Tinsert() ) ?

Alternatively, it should be possible to create an array of strings, using

hsize_t dims[1] = { 2 };
H5Tarray_create( H5T_C_S1, 1, dims);

but this would be less verbose as the data type doesn't get named members.

Any hints/recommendation on what would work best?

  Nope, having two VL-strings as fields in a compound datatype is fine. Just create a variable length string datatype (tid = H5Tcopy(H5T_C_S1); H5Tset_size(tid, H5T_VARIABLE); ) and then insert that for each field, at the appropriate offset.

Excellent, that works well! So the size of the compound data type is
the sum of the length of both strings, and the offset of the second member
is the length of the first one plus one (for the 0-byte), correct?

  No, the size of the compound datatype (in memory) is just the size of the struct (i.e. 2 pointers, along with possible alignment).

The only disadvantage seems to be that it's required to copy both
strings into a contiguous memory location, concatenating both into
one, inserting a 0-byte between them. Is there a way to specify two
pointers instead, similar to the variable length data type?

  Yes, it is explicitly for pointers.

Is it possible to have an attribute of variable length data type?

  Yes, that works fine.

It seems functions such as H5Dvlen_reclaim() are only for data sets, not for
attributes?

  That function will work fine for buffers from attributes.

  I recommend reading through the test/tvlstr.c code in the distribution.

Okay, using H5T_VARIABLE works fine with pointers, though using variable-length
strings increases the file size by a factor of about two as compared to
fixed-length strings in my case (not a big deal, though).

May I suggest mentioning H5T_VARIABLE in the documentation of
H5Tset_size() at

http://www.hdfgroup.org/HDF5/doc/RM/RM_H5T.html#Datatype-SetSize

One would expect to find it mentioned there, though the tvlstr.c code
demonstrates its usage as well.

Thanks!

  Werner

···

On Thu, 23 Sep 2010 13:33:20 -0500, Quincey Koziol <koziol@hdfgroup.org> wrote:

On Sep 23, 2010, at 12:57 PM, Werner Benger wrote:

On Thu, 23 Sep 2010 12:10:18 -0500, Quincey Koziol >> <koziol@hdfgroup.org> wrote:

On Sep 23, 2010, at 12:04 PM, Werner Benger wrote:

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Hi Werner,

Hi Quincey,

Hi Werner,

Hi Quincey,

Hi Werner,

Hi,

how would one create a compound data type that consists out of two strings?
Loose equivalent in C:

struct Type
{
  char *first, *second;
};

In HDF5, it would be a compound type made from two type ID's that are
either "hid_t H5Tvlen_create( hid_t base_type_id )" or H5T_C_S1, to
be inserted as a member "first" and "second" to a

hid_t TypeID = H5Tcreate( H5T_COMPOUND, size_t size );
H5Tinsert( TypeID, "first" , 0 , H5T_C_S1 );
H5Tinsert( TypeID, "second", size_t offset, H5T_C_S1 );

However, what would be the "size" of this compound data type and the
offset of the second member?

It would seem this presumably simple approach can't work (would be nice
to document that around H5Tcreate() or H5Tinsert() ) ?

Alternatively, it should be possible to create an array of strings, using

hsize_t dims[1] = { 2 };
H5Tarray_create( H5T_C_S1, 1, dims);

but this would be less verbose as the data type doesn't get named members.

Any hints/recommendation on what would work best?

  Nope, having two VL-strings as fields in a compound datatype is fine. Just create a variable length string datatype (tid = H5Tcopy(H5T_C_S1); H5Tset_size(tid, H5T_VARIABLE); ) and then insert that for each field, at the appropriate offset.

Excellent, that works well! So the size of the compound data type is
the sum of the length of both strings, and the offset of the second member
is the length of the first one plus one (for the 0-byte), correct?

  No, the size of the compound datatype (in memory) is just the size of the struct (i.e. 2 pointers, along with possible alignment).

The only disadvantage seems to be that it's required to copy both
strings into a contiguous memory location, concatenating both into
one, inserting a 0-byte between them. Is there a way to specify two
pointers instead, similar to the variable length data type?

  Yes, it is explicitly for pointers.

Is it possible to have an attribute of variable length data type?

  Yes, that works fine.

It seems functions such as H5Dvlen_reclaim() are only for data sets, not for
attributes?

  That function will work fine for buffers from attributes.

  I recommend reading through the test/tvlstr.c code in the distribution.

Okay, using H5T_VARIABLE works fine with pointers, though using variable-length
strings increases the file size by a factor of about two as compared to
fixed-length strings in my case (not a big deal, though).

May I suggest mentioning H5T_VARIABLE in the documentation of
H5Tset_size() at

http://www.hdfgroup.org/HDF5/doc/RM/RM_H5T.html#Datatype-SetSize

One would expect to find it mentioned there, though the tvlstr.c code
demonstrates its usage as well.

  Ah, yes, definitely! I've filed an issue to correct this.

  Thanks,
    Quincey

···

On Sep 23, 2010, at 2:01 PM, Werner Benger wrote:

On Thu, 23 Sep 2010 13:33:20 -0500, Quincey Koziol <koziol@hdfgroup.org> wrote:

On Sep 23, 2010, at 12:57 PM, Werner Benger wrote:

On Thu, 23 Sep 2010 12:10:18 -0500, Quincey Koziol <koziol@hdfgroup.org> wrote:

On Sep 23, 2010, at 12:04 PM, Werner Benger wrote:

Thanks!

  Werner

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362