only read the length of variable-length array without reading the elements

Hi Everyone,

Is it possible to do this?

I have an array containing Nvl variable length arrays already written to a
hdf5 file. I can read in the entire data by (e.g., in c++)

hvl_t vl[Nvl];
dset.read(vl, VarLenType(&PredType:NATIVE_INT));

However, if I only want to read the length of each variable-length object,
i.e., vl[].len, how can I do it without reading in the data vl[].p as well?
Thanks!

Jiaxin

Hi,

Personal experience here but I think you'll come to the same conclusions if
you go through the mailing lists long enough: vlens are not a first class
citizen in HDF. There's all kinds of exceptions and requirements to use
them not common to other types.

But what are you going to do, not have variable length data? Sheesh, right?

So you need to do something several dub as "linearization" - keeping a
dataset of the counts of the variable-length data in one dataset. Then
another dataset that has each of those variable-length lists concatenated
in the same order. Exclusive-prefix sum (a variant cumulative sum) that
and you'll have the starting indices of each list.

It's a little awkward - but it works, is supported in all the
bindings/languages, and filters work on it - all of which can't be said for
vlen types.

Of course your exact question you didn't ask this - but I'll be quick and
point out you'd have a dataset of them in the above approach that you could
load, if that's something you control.

-Jason

···

On Fri, Oct 9, 2015 at 12:38 PM, Jiaxin Han <hanjiaxin@gmail.com> wrote:

Hi Everyone,

Is it possible to do this?

I have an array containing Nvl variable length arrays already written to
a hdf5 file. I can read in the entire data by (e.g., in c++)

hvl_t vl[Nvl];
dset.read(vl, VarLenType(&PredType:NATIVE_INT));

However, if I only want to read the length of each variable-length object,
i.e., vl[].len, how can I do it without reading in the data vl[].p as
well?
Thanks!

Jiaxin

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Thank you Jason!

Actually my previous implementation was as you outlined, saving three flat
arrays containing the data, the length, and the offset separately. Later
I'm attracted to vlen array because I can easily view them in hdfview and
can load them as record arrays in h5py which are more user-friendly when I
try to inspect the data. Saving them as separate arrays will require some
wrapping later on to get convenient access.

Since this post has probably been held for quite a while by the moderating
process, I have found a work-around which may not be optimal:
(posted here


)

hsize_t dim[1];VarLenType vl_t(PredType:NATIVE_INT);DataSpace
dspace=dset.getSpace();
dspace.getSimpleExtentDims(dim);hsize_t count[]={1}, offset[]={0},
stride[]={1}, block[]={1};for(offset[0]=0;offset[0]<dim[0];offset[0]++){
      dspace.selectHyperslab(H5S_SELECT_SET, count, offset, stride, block);
      cout<<dset.getVlenBufSize(vl_t, dspace)/vl_t.getSuper().getSize()<<" ";}

Jiaxin

···

2015-10-14 2:56 GMT+01:00 Jason Newton <nevion@gmail.com>:

Hi,

Personal experience here but I think you'll come to the same conclusions
if you go through the mailing lists long enough: vlens are not a first
class citizen in HDF. There's all kinds of exceptions and requirements to
use them not common to other types.

But what are you going to do, not have variable length data? Sheesh,
right?

So you need to do something several dub as "linearization" - keeping a
dataset of the counts of the variable-length data in one dataset. Then
another dataset that has each of those variable-length lists concatenated
in the same order. Exclusive-prefix sum (a variant cumulative sum) that
and you'll have the starting indices of each list.

It's a little awkward - but it works, is supported in all the
bindings/languages, and filters work on it - all of which can't be said for
vlen types.

Of course your exact question you didn't ask this - but I'll be quick and
point out you'd have a dataset of them in the above approach that you could
load, if that's something you control.

-Jason

On Fri, Oct 9, 2015 at 12:38 PM, Jiaxin Han <hanjiaxin@gmail.com> wrote:

Hi Everyone,

Is it possible to do this?

I have an array containing Nvl variable length arrays already written to
a hdf5 file. I can read in the entire data by (e.g., in c++)

hvl_t vl[Nvl];
dset.read(vl, VarLenType(&PredType:NATIVE_INT));

However, if I only want to read the length of each variable-length
object, i.e., vl[].len, how can I do it without reading in the data
vl[].p as well?
Thanks!

Jiaxin

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5