size of variable length string

Quick question, is there a way to find the length of a variable length
string without first reading it?

Thanks,
Ken

Yes. But the procedure you'd use depends a bit on what kind of object
you are talking about (dataset or attribute) and how you chose to define
the object. Did you use H5T_C_S1 as the datatype for the string? Did you
maybe just define a dataset/attribute of type H5T_NATIVE_CHAR and then
an 1D dataspace of some size? I think those choices effect how you'd go
about obtaining length.

A general algorithm might look like...

     1. whether its a dataset or attribute, you need to open it first
        (H5Dopen/H5Aopen)
     2. then get its datatype (H5Dget_type/H5Aget_type)
     3. then get the size of that datatype (H5Tget_size)
     4. Then get the size of the dataspace
     5. Use results from steps 3 and 4 to determine total size.

Mark

···

On Wed, 2010-06-02 at 18:55, Ken Sullivan wrote:

Quick question, is there a way to find the length of a variable length
string without first reading it?

Thanks,
Ken

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851

Hi Mark,

Yes. But the procedure you'd use depends a bit on what kind of object
you are talking about (dataset or attribute) and how you chose to define
the object. Did you use H5T_C_S1 as the datatype for the string? Did you
maybe just define a dataset/attribute of type H5T_NATIVE_CHAR and then
an 1D dataspace of some size? I think those choices effect how you'd go
about obtaining length.

A general algorithm might look like...

    1. whether its a dataset or attribute, you need to open it first
       (H5Dopen/H5Aopen)
    2. then get its datatype (H5Dget_type/H5Aget_type)
    3. then get the size of that datatype (H5Tget_size)
    4. Then get the size of the dataspace
    5. Use results from steps 3 and 4 to determine total size.

  Actually, I think that Ken is trying to determine the actual string length for an individual element of a dataset or attribute, and he's correct that there's no way to retrieve the length before reading it in. The method you describe above is correct for determining the size of the array an application will need to allocate for reading the strings, but the size returned from H5Tget_size() should be the same as sizeof(char *) (for variable-length strings - for variable-length sequences of another datatype, it should be sizeof(hvl_t)). The memory for the actual strings themselves will be allocated at read time (and the malloc/free routines used can be controlled with H5Pset_vlen_mem_manager).

  Quincey

···

On Jun 2, 2010, at 9:36 PM, Mark Miller wrote:

Mark

On Wed, 2010-06-02 at 18:55, Ken Sullivan wrote:

Quick question, is there a way to find the length of a variable length
string without first reading it?

Thanks,
Ken

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Ah, I see. Well, that is definitely different from what I was thinking
he was asking. And, requires a different answer than I gave. :wink:

Mark

···

On Thu, 2010-06-03 at 04:39, Quincey Koziol wrote:

Hi Mark,

On Jun 2, 2010, at 9:36 PM, Mark Miller wrote:

> Yes. But the procedure you'd use depends a bit on what kind of object
> you are talking about (dataset or attribute) and how you chose to define
> the object. Did you use H5T_C_S1 as the datatype for the string? Did you
> maybe just define a dataset/attribute of type H5T_NATIVE_CHAR and then
> an 1D dataspace of some size? I think those choices effect how you'd go
> about obtaining length.
>
> A general algorithm might look like...
>
> 1. whether its a dataset or attribute, you need to open it first
> (H5Dopen/H5Aopen)
> 2. then get its datatype (H5Dget_type/H5Aget_type)
> 3. then get the size of that datatype (H5Tget_size)
> 4. Then get the size of the dataspace
> 5. Use results from steps 3 and 4 to determine total size.

  Actually, I think that Ken is trying to determine the actual string length for an individual element of a dataset or attribute, and he's correct that there's no way to retrieve the length before reading it in. The method you describe above is correct for determining the size of the array an application will need to allocate for reading the strings, but the size returned from H5Tget_size() should be the same as sizeof(char *) (for variable-length strings - for variable-length sequences of another datatype, it should be sizeof(hvl_t)). The memory for the actual strings themselves will be allocated at read time (and the malloc/free routines used can be controlled with H5Pset_vlen_mem_manager).

  Quincey

> Mark
>
> On Wed, 2010-06-02 at 18:55, Ken Sullivan wrote:
>> Quick question, is there a way to find the length of a variable length
>> string without first reading it?
>>
>> Thanks,
>> Ken
> --
> Mark C. Miller, Lawrence Livermore National Laboratory
> ================!!LLNL BUSINESS ONLY!!================
> miller86@llnl.gov urgent: miller86@pager.llnl.gov
> T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851

Hi Quincey,

Hi Mark,

Yes. But the procedure you'd use depends a bit on what kind of object
you are talking about (dataset or attribute) and how you chose to define
the object. Did you use H5T_C_S1 as the datatype for the string? Did you
maybe just define a dataset/attribute of type H5T_NATIVE_CHAR and then
an 1D dataspace of some size? I think those choices effect how you'd go
about obtaining length.

A general algorithm might look like...

    1. whether its a dataset or attribute, you need to open it first
       (H5Dopen/H5Aopen)
    2. then get its datatype (H5Dget_type/H5Aget_type)
    3. then get the size of that datatype (H5Tget_size)
    4. Then get the size of the dataspace
    5. Use results from steps 3 and 4 to determine total size.

  Actually, I think that Ken is trying to determine the actual string length for an individual element of a dataset or attribute, and he's correct that there's no way to retrieve the length before reading it in. The method you describe above is correct for determining the size of the array an application will need to allocate for reading the strings, but the size returned from H5Tget_size() should be the same as sizeof(char *) (for variable-length strings - for variable-length sequences of another datatype, it should be sizeof(hvl_t)). The memory for the actual strings themselves will be allocated at read time (and the malloc/free routines used can be controlled with H5Pset_vlen_mem_manager).

Is the H5Dread() routing robust against H5Pset_vlen_mem_manager's allocate function returning NULL?

If so, it might be possible to provide an allocation function

      typedef void *(*H5MM_allocate_t)(size_t size, void *alloc_info) ;

that divides the size parameter by H5Tget_size( H5Dget_type() ) to get
the number of elements that would be allocated for a specific element
(if that one is sufficiently simple).
This function would also need to count which element within the dataspace
it is just reading, so needs to be one allocate-function per dataspace
to be in sync.

But, would only work if the H5Dread() deals with null pointers by just
skipping the reading, but not exiting with an error. Didn't try this yet.

  Werner

···

On Thu, 03 Jun 2010 06:39:27 -0500, Quincey Koziol <koziol@hdfgroup.org> wrote:

On Jun 2, 2010, at 9:36 PM, Mark Miller wrote:

  Quincey

Mark

On Wed, 2010-06-02 at 18:55, Ken Sullivan wrote:

Quick question, is there a way to find the length of a variable length
string without first reading it?

Thanks,
Ken

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Hi Werner,

Hi Quincey,

Hi Mark,

Yes. But the procedure you'd use depends a bit on what kind of object
you are talking about (dataset or attribute) and how you chose to define
the object. Did you use H5T_C_S1 as the datatype for the string? Did you
maybe just define a dataset/attribute of type H5T_NATIVE_CHAR and then
an 1D dataspace of some size? I think those choices effect how you'd go
about obtaining length.

A general algorithm might look like...

   1. whether its a dataset or attribute, you need to open it first
      (H5Dopen/H5Aopen)
   2. then get its datatype (H5Dget_type/H5Aget_type)
   3. then get the size of that datatype (H5Tget_size)
   4. Then get the size of the dataspace
   5. Use results from steps 3 and 4 to determine total size.

  Actually, I think that Ken is trying to determine the actual string length for an individual element of a dataset or attribute, and he's correct that there's no way to retrieve the length before reading it in. The method you describe above is correct for determining the size of the array an application will need to allocate for reading the strings, but the size returned from H5Tget_size() should be the same as sizeof(char *) (for variable-length strings - for variable-length sequences of another datatype, it should be sizeof(hvl_t)). The memory for the actual strings themselves will be allocated at read time (and the malloc/free routines used can be controlled with H5Pset_vlen_mem_manager).

Is the H5Dread() routing robust against H5Pset_vlen_mem_manager's allocate function returning NULL?

  Yes, I'm confident that it is. Getting NULL back from the allocate function will cause the H5Dread() call to fail and return an error. It won't release the memory from a partially completed I/O, however.

If so, it might be possible to provide an allocation function

    typedef void *(*H5MM_allocate_t)(size_t size, void *alloc_info) ;

that divides the size parameter by H5Tget_size( H5Dget_type() ) to get
the number of elements that would be allocated for a specific element
(if that one is sufficiently simple).
This function would also need to count which element within the dataspace
it is just reading, so needs to be one allocate-function per dataspace
to be in sync.

  Yes, I thought about suggesting this, but decided it was pretty hackish. :slight_smile:

  Quincey

···

On Jun 3, 2010, at 8:46 AM, Werner Benger wrote:

On Thu, 03 Jun 2010 06:39:27 -0500, Quincey Koziol <koziol@hdfgroup.org> wrote:

On Jun 2, 2010, at 9:36 PM, Mark Miller wrote:

But, would only work if the H5Dread() deals with null pointers by just
skipping the reading, but not exiting with an error. Didn't try this yet.

  Werner

  Quincey

Mark

On Wed, 2010-06-02 at 18:55, Ken Sullivan wrote:

Quick question, is there a way to find the length of a variable length
string without first reading it?

Thanks,
Ken

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Ah, I see. Well, that is definitely different from what I was thinking
he was asking. And, requires a different answer than I gave. :wink:

  Well, now he knows both pieces of information. :slight_smile:

    Quincey

···

On Jun 3, 2010, at 9:23 AM, Mark Miller wrote:

Mark

On Thu, 2010-06-03 at 04:39, Quincey Koziol wrote:

Hi Mark,

On Jun 2, 2010, at 9:36 PM, Mark Miller wrote:

Yes. But the procedure you'd use depends a bit on what kind of object
you are talking about (dataset or attribute) and how you chose to define
the object. Did you use H5T_C_S1 as the datatype for the string? Did you
maybe just define a dataset/attribute of type H5T_NATIVE_CHAR and then
an 1D dataspace of some size? I think those choices effect how you'd go
about obtaining length.

A general algorithm might look like...

   1. whether its a dataset or attribute, you need to open it first
      (H5Dopen/H5Aopen)
   2. then get its datatype (H5Dget_type/H5Aget_type)
   3. then get the size of that datatype (H5Tget_size)
   4. Then get the size of the dataspace
   5. Use results from steps 3 and 4 to determine total size.

  Actually, I think that Ken is trying to determine the actual string length for an individual element of a dataset or attribute, and he's correct that there's no way to retrieve the length before reading it in. The method you describe above is correct for determining the size of the array an application will need to allocate for reading the strings, but the size returned from H5Tget_size() should be the same as sizeof(char *) (for variable-length strings - for variable-length sequences of another datatype, it should be sizeof(hvl_t)). The memory for the actual strings themselves will be allocated at read time (and the malloc/free routines used can be controlled with H5Pset_vlen_mem_manager).

  Quincey

Mark

On Wed, 2010-06-02 at 18:55, Ken Sullivan wrote:

Quick question, is there a way to find the length of a variable length
string without first reading it?

Thanks,
Ken

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Thanks all for the help, and yes I was looking to find a string element
length. I've been using fixed-length strings previously, and so have been
passing in buffers allocated in my code to H5A/Dread(). It seemed the least
perturbation to my code now that I'm writing/reading variable length strings
would be to just copy the string from the HDF allocated string to my
pre-allocated buffer, and immediately reclaim the HDF memory. Anyhow, at
read time I can just check if it's variable-length or not, and do my own
alloc either before or after the read depending.

-Ken

···

On Thu, Jun 3, 2010 at 7:38 AM, Quincey Koziol <koziol@hdfgroup.org> wrote:

On Jun 3, 2010, at 9:23 AM, Mark Miller wrote:

> Ah, I see. Well, that is definitely different from what I was thinking
> he was asking. And, requires a different answer than I gave. :wink:

        Well, now he knows both pieces of information. :slight_smile:

               Quincey

> Mark
>
> On Thu, 2010-06-03 at 04:39, Quincey Koziol wrote:
>> Hi Mark,
>>
>> On Jun 2, 2010, at 9:36 PM, Mark Miller wrote:
>>
>>> Yes. But the procedure you'd use depends a bit on what kind of object
>>> you are talking about (dataset or attribute) and how you chose to
define
>>> the object. Did you use H5T_C_S1 as the datatype for the string? Did
you
>>> maybe just define a dataset/attribute of type H5T_NATIVE_CHAR and then
>>> an 1D dataspace of some size? I think those choices effect how you'd go
>>> about obtaining length.
>>>
>>> A general algorithm might look like...
>>>
>>> 1. whether its a dataset or attribute, you need to open it first
>>> (H5Dopen/H5Aopen)
>>> 2. then get its datatype (H5Dget_type/H5Aget_type)
>>> 3. then get the size of that datatype (H5Tget_size)
>>> 4. Then get the size of the dataspace
>>> 5. Use results from steps 3 and 4 to determine total size.
>>
>> Actually, I think that Ken is trying to determine the actual string
length for an individual element of a dataset or attribute, and he's correct
that there's no way to retrieve the length before reading it in. The method
you describe above is correct for determining the size of the array an
application will need to allocate for reading the strings, but the size
returned from H5Tget_size() should be the same as sizeof(char *) (for
variable-length strings - for variable-length sequences of another datatype,
it should be sizeof(hvl_t)). The memory for the actual strings themselves
will be allocated at read time (and the malloc/free routines used can be
controlled with H5Pset_vlen_mem_manager).
>>
>> Quincey
>>
>>> Mark
>>>
>>> On Wed, 2010-06-02 at 18:55, Ken Sullivan wrote:
>>>> Quick question, is there a way to find the length of a variable length
>>>> string without first reading it?
>>>>
>>>> Thanks,
>>>> Ken
>>> --
>>> Mark C. Miller, Lawrence Livermore National Laboratory
>>> ================!!LLNL BUSINESS ONLY!!================
>>> miller86@llnl.gov urgent: miller86@pager.llnl.gov
>>> T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851
>>>
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> Hdf-forum@hdfgroup.org
>>> http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> Hdf-forum@hdfgroup.org
>> http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> --
> Mark C. Miller, Lawrence Livermore National Laboratory
> ================!!LLNL BUSINESS ONLY!!================
> miller86@llnl.gov urgent: miller86@pager.llnl.gov
> T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org