H5T conversion and VLENs

I encountered a problem when attempting to register a conversion function to convert VLENs to an opaque data type. After some digging, I came across this three-year-old thread

http://hdf-forum.184993.n3.nabble.com/hdf-forum-VLEN-conversion-problems-td193957.html

which describes my problem exactly. To sum up, the memory buffer passed to my conversion function appears to point to heap ids rather than to valid memory. Also, and this was not mentioned in the older thread, the buf_stride parameter is set to zero. Supposedly a bug report was filed (where?).

1) Has any progress been made on this issue?
2) Is there another way around this problem?

One idea that occurred to me is that it is conceivable that under normal operation a built-in conversion function is normally called that converts from the heap representation of VLENs to the memory representation. If so, is there a way to call that function from my conversion?

Thanks,
Jason

Jason,

Could you explain more about your intention? Why do you want to convert VLEN to opaque data? Thanks.

Ray

···

On Sep 25, 2012, at 10:14 AM, Jason Sommerville wrote:

I encountered a problem when attempting to register a conversion function to convert VLENs to an opaque data type. After some digging, I came across this three-year-old thread

http://hdf-forum.184993.n3.nabble.com/hdf-forum-VLEN-conversion-problems-td193957.html

which describes my problem exactly. To sum up, the memory buffer passed to my conversion function appears to point to heap ids rather than to valid memory. Also, and this was not mentioned in the older thread, the buf_stride parameter is set to zero. Supposedly a bug report was filed (where?).

1) Has any progress been made on this issue?
2) Is there another way around this problem?

One idea that occurred to me is that it is conceivable that under normal operation a built-in conversion function is normally called that converts from the heap representation of VLENs to the memory representation. If so, is there a way to call that function from my conversion?

Thanks,
Jason

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Sure.

Actually, my intention is, for all intents and purposes, identical to the old poster's. I need to read from stored VLENs and write to an object that has a different memory layout. (It's less critical, but it would also be nice to go the other way. I haven't tried that yet.) I'm using an opaque datatype for the memory representation because that seemed to be the best way to describe the data as far as HDF5 was concerned. Basically, all I'm really trying to say is "Convert from a VLEN to this 'other' type of data, and I'll tell you how to do it."

Specifically, I'm poking around at updating the LabVIEW <-> HDF5 library I wrote several years back. The natural representation of the HDF5 VLEN is a LabVIEW array (nearly all arrays in LabVIEW are variable length). However, the representations of the two are different. As you know, a dataset containing HDF5 VLENs look like this in memory:

len 0
pointer 0 -> {data 0,0, data0,1, ...}
len 1
pointer 1 -> {data1,0, data1,1, ...}
...

However, LabVIEW's data looks (for the purposes of this discussion, anyway) like this:

pointer0 -> {len 0, possible padding, data 0,0, data0,1 ...}
pointer1 -> {len 1, possible padding, data 1,0, data1,1 ...}
...

So, I create an opaque type to represent the LabVIEW pointer and register a conversion function from the VLEN datatype to the opaque type. The theory was that when the conversion function got called it would allocate the LabVIEW arrays and convert the buffer from the vector of hvl_ts to a vector of LabVIEW pointers.

I hadn't exactly figured out what would happen next. For atomic base types one could simply copy the data. It's more complicated if the base type is compound or something like that. In any case, it didn't matter because the buffer in the conversion function didn't contain the correct (or at least expected) data in the buffer when the conversion function was called.

Obviously one could call the read (e.g. H5Dread, H5Aread) function and then after the fact convert from the hvl_t dataset to the LabVIEW dataset. However, that has numerous downsides, e.g. what if this VLEN is a member of a compound several levels down? How do I handle the multiple calls (attributes and datasets, primarily) which could require this conversion? How do I efficiently handle LabVIEW memory allocation?

Thanks,
Jason

···

On 9/25/2012 12:40 PM, Raymond Lu wrote:

Jason,

Could you explain more about your intention? Why do you want to convert VLEN to opaque data? Thanks.

Ray

On Sep 25, 2012, at 10:14 AM, Jason Sommerville wrote:

I encountered a problem when attempting to register a conversion function to convert VLENs to an opaque data type. After some digging, I came across this three-year-old thread

http://hdf-forum.184993.n3.nabble.com/hdf-forum-VLEN-conversion-problems-td193957.html

which describes my problem exactly. To sum up, the memory buffer passed to my conversion function appears to point to heap ids rather than to valid memory. Also, and this was not mentioned in the older thread, the buf_stride parameter is set to zero. Supposedly a bug report was filed (where?).

1) Has any progress been made on this issue?
2) Is there another way around this problem?

One idea that occurred to me is that it is conceivable that under normal operation a built-in conversion function is normally called that converts from the heap representation of VLENs to the memory representation. If so, is there a way to call that function from my conversion?

Thanks,
Jason

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Jason,
  Hmm, so you are registering your datatype conversion routine (with H5Tregister) and then calling H5Dread on your dataset, expecting to get the opaque values in your buffer?

  Quincey

···

On Sep 25, 2012, at 12:07 PM, Jason Sommerville wrote:

Sure.

Actually, my intention is, for all intents and purposes, identical to the old poster's. I need to read from stored VLENs and write to an object that has a different memory layout. (It's less critical, but it would also be nice to go the other way. I haven't tried that yet.) I'm using an opaque datatype for the memory representation because that seemed to be the best way to describe the data as far as HDF5 was concerned. Basically, all I'm really trying to say is "Convert from a VLEN to this 'other' type of data, and I'll tell you how to do it."

Specifically, I'm poking around at updating the LabVIEW <-> HDF5 library I wrote several years back. The natural representation of the HDF5 VLEN is a LabVIEW array (nearly all arrays in LabVIEW are variable length). However, the representations of the two are different. As you know, a dataset containing HDF5 VLENs look like this in memory:

len 0
pointer 0 -> {data 0,0, data0,1, ...}
len 1
pointer 1 -> {data1,0, data1,1, ...}
...

However, LabVIEW's data looks (for the purposes of this discussion, anyway) like this:

pointer0 -> {len 0, possible padding, data 0,0, data0,1 ...}
pointer1 -> {len 1, possible padding, data 1,0, data1,1 ...}
...

So, I create an opaque type to represent the LabVIEW pointer and register a conversion function from the VLEN datatype to the opaque type. The theory was that when the conversion function got called it would allocate the LabVIEW arrays and convert the buffer from the vector of hvl_ts to a vector of LabVIEW pointers.

I hadn't exactly figured out what would happen next. For atomic base types one could simply copy the data. It's more complicated if the base type is compound or something like that. In any case, it didn't matter because the buffer in the conversion function didn't contain the correct (or at least expected) data in the buffer when the conversion function was called.

Obviously one could call the read (e.g. H5Dread, H5Aread) function and then after the fact convert from the hvl_t dataset to the LabVIEW dataset. However, that has numerous downsides, e.g. what if this VLEN is a member of a compound several levels down? How do I handle the multiple calls (attributes and datasets, primarily) which could require this conversion? How do I efficiently handle LabVIEW memory allocation?

Thanks,
Jason

On 9/25/2012 12:40 PM, Raymond Lu wrote:

Jason,

Could you explain more about your intention? Why do you want to convert VLEN to opaque data? Thanks.

Ray

On Sep 25, 2012, at 10:14 AM, Jason Sommerville wrote:

I encountered a problem when attempting to register a conversion function to convert VLENs to an opaque data type. After some digging, I came across this three-year-old thread

http://hdf-forum.184993.n3.nabble.com/hdf-forum-VLEN-conversion-problems-td193957.html

which describes my problem exactly. To sum up, the memory buffer passed to my conversion function appears to point to heap ids rather than to valid memory. Also, and this was not mentioned in the older thread, the buf_stride parameter is set to zero. Supposedly a bug report was filed (where?).

1) Has any progress been made on this issue?
2) Is there another way around this problem?

One idea that occurred to me is that it is conceivable that under normal operation a built-in conversion function is normally called that converts from the heap representation of VLENs to the memory representation. If so, is there a way to call that function from my conversion?

Thanks,
Jason

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Well, let me be precise.

1) I expect after the call to H5Dread completes that, yes, the opaque (at least opaque to HDF5) values will be in the buffer passed to H5Dread.

2a) I expected that the buffer* parameter as passed to my conversion routine would contain a vector of hvl_ts.
2b) My conversion routine would modify this buffer into a vector of opaques.

The part that is currently problematic is that the buffer passed to the conversion routine does not have a vector of hvl_ts but rather a vector of some sort of internal hdf5 types.

*Not necessarily the same as the buffer parameter passed to H5Dread.

Jason

···

On 09/25/2012 05:52 PM, Quincey Koziol wrote:

Hi Jason,
  Hmm, so you are registering your datatype conversion routine (with H5Tregister) and then calling H5Dread on your dataset, expecting to get the opaque values in your buffer?

  Quincey

On Sep 25, 2012, at 12:07 PM, Jason Sommerville wrote:

Sure.

Actually, my intention is, for all intents and purposes, identical to the old poster's. I need to read from stored VLENs and write to an object that has a different memory layout. (It's less critical, but it would also be nice to go the other way. I haven't tried that yet.) I'm using an opaque datatype for the memory representation because that seemed to be the best way to describe the data as far as HDF5 was concerned. Basically, all I'm really trying to say is "Convert from a VLEN to this 'other' type of data, and I'll tell you how to do it."

Specifically, I'm poking around at updating the LabVIEW<-> HDF5 library I wrote several years back. The natural representation of the HDF5 VLEN is a LabVIEW array (nearly all arrays in LabVIEW are variable length). However, the representations of the two are different. As you know, a dataset containing HDF5 VLENs look like this in memory:

len 0
pointer 0 -> {data 0,0, data0,1, ...}
len 1
pointer 1 -> {data1,0, data1,1, ...}
...

However, LabVIEW's data looks (for the purposes of this discussion, anyway) like this:

pointer0 -> {len 0, possible padding, data 0,0, data0,1 ...}
pointer1 -> {len 1, possible padding, data 1,0, data1,1 ...}
...

So, I create an opaque type to represent the LabVIEW pointer and register a conversion function from the VLEN datatype to the opaque type. The theory was that when the conversion function got called it would allocate the LabVIEW arrays and convert the buffer from the vector of hvl_ts to a vector of LabVIEW pointers.

I hadn't exactly figured out what would happen next. For atomic base types one could simply copy the data. It's more complicated if the base type is compound or something like that. In any case, it didn't matter because the buffer in the conversion function didn't contain the correct (or at least expected) data in the buffer when the conversion function was called.

Obviously one could call the read (e.g. H5Dread, H5Aread) function and then after the fact convert from the hvl_t dataset to the LabVIEW dataset. However, that has numerous downsides, e.g. what if this VLEN is a member of a compound several levels down? How do I handle the multiple calls (attributes and datasets, primarily) which could require this conversion? How do I efficiently handle LabVIEW memory allocation?

Thanks,
Jason

On 9/25/2012 12:40 PM, Raymond Lu wrote:

Jason,

Could you explain more about your intention? Why do you want to convert VLEN to opaque data? Thanks.

Ray

On Sep 25, 2012, at 10:14 AM, Jason Sommerville wrote:

I encountered a problem when attempting to register a conversion function to convert VLENs to an opaque data type. After some digging, I came across this three-year-old thread

http://hdf-forum.184993.n3.nabble.com/hdf-forum-VLEN-conversion-problems-td193957.html

which describes my problem exactly. To sum up, the memory buffer passed to my conversion function appears to point to heap ids rather than to valid memory. Also, and this was not mentioned in the older thread, the buf_stride parameter is set to zero. Supposedly a bug report was filed (where?).

1) Has any progress been made on this issue?
2) Is there another way around this problem?

One idea that occurred to me is that it is conceivable that under normal operation a built-in conversion function is normally called that converts from the heap representation of VLENs to the memory representation. If so, is there a way to call that function from my conversion?

Thanks,
Jason

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Jason,

Well, let me be precise.

1) I expect after the call to H5Dread completes that, yes, the opaque (at least opaque to HDF5) values will be in the buffer passed to H5Dread.

2a) I expected that the buffer* parameter as passed to my conversion routine would contain a vector of hvl_ts.
2b) My conversion routine would modify this buffer into a vector of opaques.

The part that is currently problematic is that the buffer passed to the conversion routine does not have a vector of hvl_ts but rather a vector of some sort of internal hdf5 types.

  Yes, I agree with you, sending a buffer full of internal info to the type conversion framework isn't the right behavior for the library.

  I've filed an issue for this, and we'll try to tackle it when possible. (A well-written, well-tested patch addressing it would be welcome also! :slight_smile:

  Quincey

···

On Sep 25, 2012, at 7:39 PM, Jason Sommerville wrote:

*Not necessarily the same as the buffer parameter passed to H5Dread.

Jason

On 09/25/2012 05:52 PM, Quincey Koziol wrote:

Hi Jason,
  Hmm, so you are registering your datatype conversion routine (with H5Tregister) and then calling H5Dread on your dataset, expecting to get the opaque values in your buffer?

  Quincey

On Sep 25, 2012, at 12:07 PM, Jason Sommerville wrote:

Sure.

Actually, my intention is, for all intents and purposes, identical to the old poster's. I need to read from stored VLENs and write to an object that has a different memory layout. (It's less critical, but it would also be nice to go the other way. I haven't tried that yet.) I'm using an opaque datatype for the memory representation because that seemed to be the best way to describe the data as far as HDF5 was concerned. Basically, all I'm really trying to say is "Convert from a VLEN to this 'other' type of data, and I'll tell you how to do it."

Specifically, I'm poking around at updating the LabVIEW<-> HDF5 library I wrote several years back. The natural representation of the HDF5 VLEN is a LabVIEW array (nearly all arrays in LabVIEW are variable length). However, the representations of the two are different. As you know, a dataset containing HDF5 VLENs look like this in memory:

len 0
pointer 0 -> {data 0,0, data0,1, ...}
len 1
pointer 1 -> {data1,0, data1,1, ...}
...

However, LabVIEW's data looks (for the purposes of this discussion, anyway) like this:

pointer0 -> {len 0, possible padding, data 0,0, data0,1 ...}
pointer1 -> {len 1, possible padding, data 1,0, data1,1 ...}
...

So, I create an opaque type to represent the LabVIEW pointer and register a conversion function from the VLEN datatype to the opaque type. The theory was that when the conversion function got called it would allocate the LabVIEW arrays and convert the buffer from the vector of hvl_ts to a vector of LabVIEW pointers.

I hadn't exactly figured out what would happen next. For atomic base types one could simply copy the data. It's more complicated if the base type is compound or something like that. In any case, it didn't matter because the buffer in the conversion function didn't contain the correct (or at least expected) data in the buffer when the conversion function was called.

Obviously one could call the read (e.g. H5Dread, H5Aread) function and then after the fact convert from the hvl_t dataset to the LabVIEW dataset. However, that has numerous downsides, e.g. what if this VLEN is a member of a compound several levels down? How do I handle the multiple calls (attributes and datasets, primarily) which could require this conversion? How do I efficiently handle LabVIEW memory allocation?

Thanks,
Jason

On 9/25/2012 12:40 PM, Raymond Lu wrote:

Jason,

Could you explain more about your intention? Why do you want to convert VLEN to opaque data? Thanks.

Ray

On Sep 25, 2012, at 10:14 AM, Jason Sommerville wrote:

I encountered a problem when attempting to register a conversion function to convert VLENs to an opaque data type. After some digging, I came across this three-year-old thread

http://hdf-forum.184993.n3.nabble.com/hdf-forum-VLEN-conversion-problems-td193957.html

which describes my problem exactly. To sum up, the memory buffer passed to my conversion function appears to point to heap ids rather than to valid memory. Also, and this was not mentioned in the older thread, the buf_stride parameter is set to zero. Supposedly a bug report was filed (where?).

1) Has any progress been made on this issue?
2) Is there another way around this problem?

One idea that occurred to me is that it is conceivable that under normal operation a built-in conversion function is normally called that converts from the heap representation of VLENs to the memory representation. If so, is there a way to call that function from my conversion?

Thanks,
Jason

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Jason & Quincey,

The part that is currently problematic is that the buffer passed to the conversion routine does not have a vector of hvl_ts but rather a vector of some sort of internal hdf5 types.

I'm the original person who asked about this (I'm the main author of
h5py). We have to convert from HDF5 vlen strings to an opaque object
(a Python string). Since h5py has to deal with this behavior in
released versions of HDF5 we implemented a workaround, which I briefly
described in that thread:

1. Read from the dataset selection into a contiguous conversion buffer
with exactly the same type as the dataset
2. Call H5Tconvert to go from the dataset type to your destination
type. The correct data is supplied to the custom converter when you
do this (for some reason)
3. Scatter the converted points from the buffer to your memory destination.

This is kind of annoying because the gather/scatter process is
time-intensive, and getting everything correct w.r.t backing buffers,
etc. is a real headache. You can see our implementation here (in
Cython):

https://code.google.com/p/h5py/source/browse/h5py/_proxy.pyx (starts
at line 102)

I agree it would be great if this were fixed (in HDF5 1.10?). Along
with identifier recycling, this is one of the biggest sources of pain
in the h5py codebase.

Parenthetically, are you aware of h5labview
(http://sourceforge.net/p/h5labview)? Maybe you and Martijn could
join forces.

Andrew

Alright, I've gotten the kinks worked out. I'm going to post the code relevant to the discussion below, but let me sum up in English first, so that you don't have to sort it out for yourself.

It seems that internally there are two different types for vlens, one for the memory representation, and one for the file representation, which I will creatively proceed to call "file-type vlen" and "memory-type vlen". Users can only create memory-type vlen ids. When the library is called upon to convert from a vlen, it passes this file-type id to the conversion function.

(This also explains why you can't register a hard conversion function on a VLEN and have it work. There is no way that I know for a user to get a file-type vlen id outside of the conversion function itself. Therefore, you can't register a hard conversion path for vlen to whatever. Soft conversion functions still work since only the class is inspected at registration time.)

So, the basic approach is to use HDF5's built in conversion function to convert from the file-type vlen to the memory-type vlen inside our custom conversion. We'll create some private data that we'll store in cdata->priv during init to help things along. I'll call this location pState to show what is being held there.

In H5T_CONV_INIT we do the following:

1) Get a memory-type vlen type id. Interestingly, we can do this by calling pState->mem_id=H5Tcopy(src_id). (Even more interestingly, H5Tequal(pState->mem_id, src_id) will return false, while H5Tequal(pState->mem_id, <dataset datatype>) will return true.)

2) Call pState->fn = H5Tfind(src_id, pState->mem_id, &pState->cdata) to get the conversion function

In H5T_CONV_CONV we:

1) Set up custom memory allocators with H5Pset_vlen_mem_manager
Quincy, because I know what type I'm converting to at this point, I can create the correct memory allocator routine.
If the vlen conversion code was actually working (passing memory-type and not file-type vlens to this call), I'd be screwed because I would not have a chance to modify the allocator based on the data type.

2) Call pState->fn(src_id, pState->mem_id, cdata.command=H5T_CONV_CONV, ...)
This converts the file-type vlens to memory vlens using the custom allocator set in 1

3) Iterate through what are now hvl_ts and convert them to the proper "opaque" format.

And of course we clean up all our allocations in H5T_CONV_FREE.

Quincy, one other thing. I'm using the opaque tag to hold the string representation of the hex value of the memory data type of the contained data. This seems a little hokey. Perhaps we could get void* tags added to opaques? Callbacks on creation and deletion of the opaque type would also be awesome, but that might be getting a little too specialized to my use for the library.

Actual code below if you want to see the details...

struct HDF5Vlen2LVConverterState
{
     hid_t mem_id; ///<- The memory version of the vlen
     H5T_conv_t chain_fn; ///<- The file to memory converter function
     H5T_cdata_t *pchain_cdata; ///<- The cdata for the file to memory vlen converter
     hid_t elt_id; ///<- The hid_t id of the destination element.
     size_t offset; ///<- The offset from the begining of the handle to the first element of data
};

typedef std::map<void*, UHandle> PtrHandleMap;

struct H5NewData
{
     PtrHandleMap *pPtrHdlMap;
     size_t offset;
     bool dofree;
};

void* H5NewHandle(size_t len, H5NewData* pdata)
{
     UHandle hdl=DSNewHandle(len+pdata->offset);
     if(!hdl)
         return NULL;
     memset(*hdl, 0, pdata->offset);
     void* data = (void*)((char*)(*hdl)+pdata->offset);
pdata->pPtrHdlMap->insert(PtrHandleMap::value_type(data, hdl));
     return data;
}

void H5FreeHandle(void* buffer, H5NewData* pdata)
{
     if(pdata->dofree)
     {
         PtrHandleMap::iterator iter=pdata->pPtrHdlMap->find(buffer);
         if(iter!=pdata->pPtrHdlMap->end())
         {
             UHandle hdl = iter->second;
             DSDisposeHandle(hdl);
         }
     }
}

/* Returns the required alignment of they type inside
    a LabVIEW compound, handle, or array */
size_t GetLVAlignmentForH5T(hid_t htype, size_t pack=0xFFFFFFFF);

herr_t HDF5Vlen2LVConversion(hid_t src_id, hid_t dst_id, H5T_cdata_t *cdata, size_t nelmts, size_t buf_stride, size_t bkg_stride, void* buffer, void* bkg_buffer, hid_t dset_xfer_plist)
{
     switch(cdata->command)
     {
     case H5T_CONV_INIT: {
         if(H5Tget_class(dst_id)==H5T_OPAQUE)
         {
             if(H5Tget_size(dst_id)==sizeof(void*))
             {
                 hid_t mem_id=0;
                 HDF5Vlen2LVConverterState *pState=NULL;

                 mem_id=H5Tcopy(src_id);
                 if(!mem_id) goto errout;
                 pState = new HDF5Vlen2LVConverterState();
                 if(!pState)
                 {
                     H5Epush_sim("HDF5Vlen2LVConversion", H5E_ERR_CLS, H5E_RESOURCE, H5E_CANTALLOC, NULL);
                     goto errout;
                 }
                 pState->mem_id = mem_id;
                 pState->chain_fn = H5Tfind(src_id, mem_id, &pState->pchain_cdata);
                 if(!pState->chain_fn)
                 {
                     H5Epush_sim("HDF5Vlen2LVConversion", H5E_ERR_CLS, H5E_DATATYPE, H5E_CANTCONVERT, "Can't locate disk vlen to memory vlen converter");
                     goto errout;
                 }
                 char* tag = H5Tget_tag(dst_id);
                 //tag is the hex string of the element type id
                 sscanf(tag, "%x", &pState->elt_id);
                 if(pState->elt_id)
                 {
                     size_t alignment = GetLVAlignmentForH5T(pState->elt_id);
                     if(pState->offset%alignment != 0)
                         pState->offset=alignment;
                     else
pState->offset=sizeof(int32_t);
                 }
                 else
                 {
                     //This should only happen during registration
                     pState->offset=4;
                 }
                 cdata->priv=(void*)pState;
                 return 0;
errout:
                 if(mem_id)
                     H5Tclose(mem_id);
                 if(pState)
                     delete pState;
                 return -1;
             }
         }
         return -1;
         }

     case H5T_CONV_FREE:
         if(cdata->priv)
         {
             HDF5Vlen2LVConverterState *pState=(HDF5Vlen2LVConverterState*)cdata->priv;
             if(pState->mem_id)
                 H5Tclose(pState->mem_id);
             delete pState;
         }
         return 0;

     case H5T_CONV_CONV: {
         // First, set up vlen allocators
         PtrHandleMap handleMap;
         HDF5Vlen2LVConverterState* pState = (HDF5Vlen2LVConverterState*)cdata->priv;
         H5NewData newdata = {&handleMap, pState->offset, false};
         hid_t chain_xfer_plist = H5Pcopy(dset_xfer_plist);
         if(!chain_xfer_plist)
             return -1;
         herr_t err=H5Pset_vlen_mem_manager(chain_xfer_plist, (H5MM_allocate_t)H5NewHandle, &newdata, (H5MM_free_t)H5FreeHandle, &newdata);
         if(err) return err;

         //This will convert the file vlens into memory vlens, allocating memory with DSNewHandle. However, we'll still have to fix up the handles
         H5T_cdata_t chain_cdata;
         memcpy(&chain_cdata, pState->pchain_cdata, sizeof(chain_cdata));
         chain_cdata.command = cdata->command;
         err=pState->chain_fn(src_id, pState->mem_id, &chain_cdata, nelmts, buf_stride, bkg_stride, buffer, bkg_buffer, chain_xfer_plist);
         if(err) return err;

         //Fix up handles, i.e. Change pointers to handles
         if(buf_stride==0)
             buf_stride=H5Tget_size(src_id); //handle a bug in HDF5
         size_t dst_stride = buf_stride - H5Tget_size(src_id)+sizeof(UHandle);
         void *src, *dst;
         src=dst=buffer;
         for(size_t i=0; i<nelmts; ++i)
         {
             UHandle hdl=handleMap[((hvl_t*)src)->p];
             *((int32_t*)(*hdl)) = ((hvl_t*)src)->len; //Set the LV 1-D array length
             *((UHandle*)dst)=hdl;
             src=(char*)src+buf_stride;
             dst=(char*)dst+buf_stride;
         }
         H5Pclose(chain_xfer_plist);
         return 0;
     }
     default:
         return -1;
     }
}

···

On 9/26/2012 10:38 AM, Quincey Koziol wrote:

Hi Jason,

On Sep 25, 2012, at 7:39 PM, Jason Sommerville wrote:

Well, let me be precise.

1) I expect after the call to H5Dread completes that, yes, the opaque (at least opaque to HDF5) values will be in the buffer passed to H5Dread.

2a) I expected that the buffer* parameter as passed to my conversion routine would contain a vector of hvl_ts.
2b) My conversion routine would modify this buffer into a vector of opaques.

The part that is currently problematic is that the buffer passed to the conversion routine does not have a vector of hvl_ts but rather a vector of some sort of internal hdf5 types.

  Yes, I agree with you, sending a buffer full of internal info to the type conversion framework isn't the right behavior for the library.

  I've filed an issue for this, and we'll try to tackle it when possible. (A well-written, well-tested patch addressing it would be welcome also! :slight_smile:

  Quincey

*Not necessarily the same as the buffer parameter passed to H5Dread.

Jason

On 09/25/2012 05:52 PM, Quincey Koziol wrote:

Hi Jason,
  Hmm, so you are registering your datatype conversion routine (with H5Tregister) and then calling H5Dread on your dataset, expecting to get the opaque values in your buffer?

  Quincey

On Sep 25, 2012, at 12:07 PM, Jason Sommerville wrote:

Sure.

Actually, my intention is, for all intents and purposes, identical to the old poster's. I need to read from stored VLENs and write to an object that has a different memory layout. (It's less critical, but it would also be nice to go the other way. I haven't tried that yet.) I'm using an opaque datatype for the memory representation because that seemed to be the best way to describe the data as far as HDF5 was concerned. Basically, all I'm really trying to say is "Convert from a VLEN to this 'other' type of data, and I'll tell you how to do it."

Specifically, I'm poking around at updating the LabVIEW<-> HDF5 library I wrote several years back. The natural representation of the HDF5 VLEN is a LabVIEW array (nearly all arrays in LabVIEW are variable length). However, the representations of the two are different. As you know, a dataset containing HDF5 VLENs look like this in memory:

len 0
pointer 0 -> {data 0,0, data0,1, ...}
len 1
pointer 1 -> {data1,0, data1,1, ...}
...

However, LabVIEW's data looks (for the purposes of this discussion, anyway) like this:

pointer0 -> {len 0, possible padding, data 0,0, data0,1 ...}
pointer1 -> {len 1, possible padding, data 1,0, data1,1 ...}
...

So, I create an opaque type to represent the LabVIEW pointer and register a conversion function from the VLEN datatype to the opaque type. The theory was that when the conversion function got called it would allocate the LabVIEW arrays and convert the buffer from the vector of hvl_ts to a vector of LabVIEW pointers.

I hadn't exactly figured out what would happen next. For atomic base types one could simply copy the data. It's more complicated if the base type is compound or something like that. In any case, it didn't matter because the buffer in the conversion function didn't contain the correct (or at least expected) data in the buffer when the conversion function was called.

Obviously one could call the read (e.g. H5Dread, H5Aread) function and then after the fact convert from the hvl_t dataset to the LabVIEW dataset. However, that has numerous downsides, e.g. what if this VLEN is a member of a compound several levels down? How do I handle the multiple calls (attributes and datasets, primarily) which could require this conversion? How do I efficiently handle LabVIEW memory allocation?

Thanks,
Jason

On 9/25/2012 12:40 PM, Raymond Lu wrote:

Jason,

Could you explain more about your intention? Why do you want to convert VLEN to opaque data? Thanks.

Ray

On Sep 25, 2012, at 10:14 AM, Jason Sommerville wrote:

I encountered a problem when attempting to register a conversion function to convert VLENs to an opaque data type. After some digging, I came across this three-year-old thread

http://hdf-forum.184993.n3.nabble.com/hdf-forum-VLEN-conversion-problems-td193957.html

which describes my problem exactly. To sum up, the memory buffer passed to my conversion function appears to point to heap ids rather than to valid memory. Also, and this was not mentioned in the older thread, the buf_stride parameter is set to zero. Supposedly a bug report was filed (where?).

1) Has any progress been made on this issue?
2) Is there another way around this problem?

One idea that occurred to me is that it is conceivable that under normal operation a built-in conversion function is normally called that converts from the heap representation of VLENs to the memory representation. If so, is there a way to call that function from my conversion?

Thanks,
Jason

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Andrew,
  Excellent, I'm glad it's something that'll help h5py. BTW, we're also planning a new API routine to perform in-memory scatter/gather operations between two memory buffers. (But you might not need that if you don't have to work around this issue :slight_smile:

  Quincey

···

On Sep 27, 2012, at 5:16 PM, Andrew Collette wrote:

Hi Jason & Quincey,

The part that is currently problematic is that the buffer passed to the conversion routine does not have a vector of hvl_ts but rather a vector of some sort of internal hdf5 types.

I'm the original person who asked about this (I'm the main author of
h5py). We have to convert from HDF5 vlen strings to an opaque object
(a Python string). Since h5py has to deal with this behavior in
released versions of HDF5 we implemented a workaround, which I briefly
described in that thread:

1. Read from the dataset selection into a contiguous conversion buffer
with exactly the same type as the dataset
2. Call H5Tconvert to go from the dataset type to your destination
type. The correct data is supplied to the custom converter when you
do this (for some reason)
3. Scatter the converted points from the buffer to your memory destination.

This is kind of annoying because the gather/scatter process is
time-intensive, and getting everything correct w.r.t backing buffers,
etc. is a real headache. You can see our implementation here (in
Cython):

https://code.google.com/p/h5py/source/browse/h5py/_proxy.pyx (starts
at line 102)

I agree it would be great if this were fixed (in HDF5 1.10?). Along
with identifier recycling, this is one of the biggest sources of pain
in the h5py codebase.

Parenthetically, are you aware of h5labview
(http://sourceforge.net/p/h5labview)? Maybe you and Martijn could
join forces.

Andrew

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Andrew and Quincy,

Andrew, thanks for your input.

Where do you do these operations (read to a conversion buffer, etc.)? Do you do this as a separate call, e.g.

a) H5Dread returns buffer followed by n calls to H5Tconvert

or do you do it inside a custom conversion function

b) H5Dread calls custom conversion which calls H5Tconvert

BTW, I've got another work around in progress. I'll post details once I've gotten all the kinks worked out, but a hint is that I can convince HDF5 to convert from the "file-type" vlen that we've encountered to a "memory-type" vlen (i.e. hvl_t) inside by calling H5Tconvert inside my custom conversion function.

Quincy,

Something to think about in planning for the next revision. As will become apparent once I post my work around, the reason it works is that I'm able to adjust my vlen memory allocation routine after I know what the destination type is, but before the vlen is read into memory. Effectively what I'm doing is adjusting the way my vlen memory allocator behaves based on the datatype of the the vlen element. This would be much simpler if this information were simply passed to the allocation routine in the first place, e.g. if src_id and dst_id were passed to the allocation routine (Or, alternatively, and perhaps more efficient, if one could register different allocators depending on the explicit conversion taking place.)

Jason

···

On 9/27/2012 6:16 PM, Andrew Collette wrote:

Hi Jason & Quincey,

The part that is currently problematic is that the buffer passed to the conversion routine does not have a vector of hvl_ts but rather a vector of some sort of internal hdf5 types.

I'm the original person who asked about this (I'm the main author of
h5py). We have to convert from HDF5 vlen strings to an opaque object
(a Python string). Since h5py has to deal with this behavior in
released versions of HDF5 we implemented a workaround, which I briefly
described in that thread:

1. Read from the dataset selection into a contiguous conversion buffer
with exactly the same type as the dataset
2. Call H5Tconvert to go from the dataset type to your destination
type. The correct data is supplied to the custom converter when you
do this (for some reason)
3. Scatter the converted points from the buffer to your memory destination.

This is kind of annoying because the gather/scatter process is
time-intensive, and getting everything correct w.r.t backing buffers,
etc. is a real headache. You can see our implementation here (in
Cython):

https://code.google.com/p/h5py/source/browse/h5py/_proxy.pyx (starts
at line 102)

I agree it would be great if this were fixed (in HDF5 1.10?). Along
with identifier recycling, this is one of the biggest sources of pain
in the h5py codebase.

Parenthetically, are you aware of h5labview
(http://sourceforge.net/p/h5labview)? Maybe you and Martijn could
join forces.

Andrew

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi,

Andrew, thanks for your input.

Where do you do these operations (read to a conversion buffer, etc.)? Do you
do this as a separate call, e.g.

a) H5Dread returns buffer followed by n calls to H5Tconvert

or do you do it inside a custom conversion function

b) H5Dread calls custom conversion which calls H5Tconvert

It's close to (a), but even simpler than that. First I do an H5Dread
into a contiguous buffer with the same datatype as the dataset; this
seems to get rid of the odd internal structures. Then, I convert the
entire buffer in-place from the dataset type to the (opaque) memory
type with a single call to H5Tconvert (you can specify the number of
elements to be converted). As part of the normal course of
conversion, H5Tconvert calls my custom conversion function. Then, I
scatter the points from the conversion buffer to the destination in
memory. I use H5Diterate to do the scattering and gathering.

This has the advantage that my custom conversion callback can be
written as normal. All the workarounds are in the read/write layer,
not the conversion layer.

With regard to your followup post... I do have a hard conversion path
registered from vlen string to Python string, and with this system
it's called as normal. I have not tried generic vlen data myself but
one of my users just contributed a patch which does more or less the
same thing with hvl_t's; I don't know whether this has to be
registered as hard or soft to work.

Andrew

Hi Andrew,

Thanks for the clarification. Thank you also for the introduction to H5Diterate. I had never noticed that function before.

In the most general case, your solution will not work for me, at least without some level of recursion, I think. This goes back to needing different allocators depending on the type of data being converted. So, for instance, if I have a vlen of vlens of doubles, the memory allocator I need for the inner vlen is different than the allocator I need for the outer. You can only set one allocator for any given convert (or read) call, so the implicit recursive conversion of the inner vlen would end up calling the wrong allocator. If the allocators knew which datatype they were allocating for, this would not be an issue. One might also be able to do something really whacked out like predict the order in which the different allocators would be called for a complete conversion (e.g. outer allocator, inner, inner, inner, out, 6*in, out, 3*in, or whatever) and pass that information into a higher level allocator, but that seems really nasty.

If I understand correctly, it also has the draw-back of requiring and extra buffer, namely the one that you allocate before calling H5Dread.

On the plus side, your solution, unlike my solution, is not dependent on a bug in the library!

Thanks for the input.

Jason

···

On 9/28/2012 11:25 AM, Andrew Collette wrote:

Hi,

Andrew, thanks for your input.

Where do you do these operations (read to a conversion buffer, etc.)? Do you
do this as a separate call, e.g.

a) H5Dread returns buffer followed by n calls to H5Tconvert

or do you do it inside a custom conversion function

b) H5Dread calls custom conversion which calls H5Tconvert

It's close to (a), but even simpler than that. First I do an H5Dread
into a contiguous buffer with the same datatype as the dataset; this
seems to get rid of the odd internal structures. Then, I convert the
entire buffer in-place from the dataset type to the (opaque) memory
type with a single call to H5Tconvert (you can specify the number of
elements to be converted). As part of the normal course of
conversion, H5Tconvert calls my custom conversion function. Then, I
scatter the points from the conversion buffer to the destination in
memory. I use H5Diterate to do the scattering and gathering.

This has the advantage that my custom conversion callback can be
written as normal. All the workarounds are in the read/write layer,
not the conversion layer.

With regard to your followup post... I do have a hard conversion path
registered from vlen string to Python string, and with this system
it's called as normal. I have not tried generic vlen data myself but
one of my users just contributed a patch which does more or less the
same thing with hvl_t's; I don't know whether this has to be
registered as hard or soft to work.

Andrew

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org