VOL Datatype Handling

Dear all,
I was trying to provide an alternative implementation in VOL to deal
with the datatypes.
For testing purpose, it will simply keep the data structures in memory.

In particular the functions are problematic:
- VOL_datatype_open()
- VOL_datatype_get()

Internally, the H5T_construct_datatype() function calls datatype_get
to fetch the serialized form.
This requires the datatype to be serialized in a particular form. In
the Dev guide I was reading "8. The Named Datatype Pickle"

I do not completely understand the rationale to differentiate between
datatype_open and datatype_commit and relying on the serialized form:
Since VOL_datatype_commit() receives the user-space datatype as "hid_t",
I would have expected that datatype_open returns a hid to a proper
created datatype.
i.e., i could internally create the datatype with end user functions.
This would give me the opportunity to store it internally in any
format that suits me.

In a primitive in-memory implementation, I believe the following
should work: the datatype_commit() function could store a copy of the
datatype in memory, then return again this copy when datype_open() is
called.

Looking forward to the discussion.

Thanks & regards,
Julian

···

--
http://wr.informatik.uni-hamburg.de/people/julian_kunkel

Hi Julian,

Apologies for the late reply..

Dear all,
I was trying to provide an alternative implementation in VOL to deal
with the datatypes.

HDF5 Datatypes are in-memory objects that are implemented in HDF5. The VOL plugin class for datatypes is just to deal with Named/Committed datatypes that are stored in the HDF5 file, but is in no way designed to change how the Library implements and uses the datatypes themselves. The VOL class is just to provide a different way to store the name datatype in the file.
So I don't understand here what you are trying to accomplish when you say to provide an alternative implementation in the VOL to deal with datatypes.. If it it's just for storing the datatype in the file, then we can move on to your next questions, otherwise, you need to stop here and realize that the VOL is for nothing more than this :slight_smile:

For testing purpose, it will simply keep the data structures in memory.

In particular the functions are problematic:
- VOL_datatype_open()
- VOL_datatype_get()

Internally, the H5T_construct_datatype() function calls datatype_get
to fetch the serialized form.
This requires the datatype to be serialized in a particular form. In
the Dev guide I was reading "8. The Named Datatype Pickle"
https://svn.hdfgroup.org/hdf5doc/trunk/RFCs/HDF5/VOL/developer_guide/main.pdf

I do not completely understand the rationale to differentiate between
datatype_open and datatype_commit and relying on the serialized form:
Since VOL_datatype_commit() receives the user-space datatype as "hid_t",
I would have expected that datatype_open returns a hid to a proper
created datatype.
i.e., i could internally create the datatype with end user functions.
This would give me the opportunity to store it internally in any
format that suits me.

In a primitive in-memory implementation, I believe the following
should work: the datatype_commit() function could store a copy of the
datatype in memory, then return again this copy when datype_open() is
called.

I still don't understand your use case here, but I will try to explain the situation here again as I did in the developer guide that you pointed to.
In order for the library to function properly with datatypes, it will always need the NATIVE implementation of HDF5 datatypes, not any other implementation. Again, the VOL only gives you the flexibility to store the native HDF5 datatype in whatever way you want. You could serialize it and store in a binary blob, or whatever.. BUT the Library will always need the native (internal) HDF5 H5T_t struct for datatypes to function properly in all the pieces of code for datasets, attributes, etc...
In order to do that, the VOL plugin must preserve a native representation for the native datatype which can be accomplished by serializing and deserializing the datatype struct using H5Tencode/decode.

Again, I don't know if I understood what you are aiming to do here, or if I addressed your comments, but there is so much I can do to explain this using email, since this is a pretty complex issue in the first place.

Thanks,
Mohamad

···

On 4/22/16, 4:03 AM, "Hdf-forum on behalf of Julian Kunkel" <hdf-forum-bounces@lists.hdfgroup.org on behalf of juliankunkel@googlemail.com> wrote:

Looking forward to the discussion.

Thanks & regards,
Julian

--
http://wr.informatik.uni-hamburg.de/people/julian_kunkel

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Dear Mohamad,
thanks for your reply;

Sorry for my suboptimal expression:
Absolutely, the use case you describe is what I have in mind.
I was thinking to store the committed datatypes in different way, in
fact potentially in a database and not any file.
But for the first implementation I wanted to store data only in memory
(for performance testing and backwards compatibility).

I still don't understand your use case here, but I will try to explain the situation here again as I did in the developer guide that you pointed to.
In order for the library to function properly with datatypes, it will always need the NATIVE implementation of HDF5 datatypes, not any other implementation.
Again, the VOL only gives you the flexibility to store the native HDF5 datatype in whatever way you want. You could serialize it and store in a binary blob, or whatever.. BUT the Library will always need the native (internal) HDF5 H5T_t struct for datatypes to function properly in all the pieces of code for datasets, attributes, etc...

Yes, that is absolutely clear and not what I try to accomplish (see
below the obstacle).

In order to do that, the VOL plugin must preserve a native representation for the native datatype which can be accomplished by serializing and deserializing the datatype struct using H5Tencode/decode.

Looking at the current implementation in the native plugin:
H5VL_native_datatype_get() calls H5T_encode() which is defined as a
private function and should not be used by any other VOL
implementation and it is expected to return.

I propose to move out this logic and return a completely constructed
datatype as "hid_t".
Since VOL_type_commit() also receives such a type it feels natural
that datatype_get() returns a userspace datatype as hid_t as well.
Since apparently nobody else is interested in this detailed
discussion, we can continue the discussion offline from the list.

Regards,
Julian

···

--
http://wr.informatik.uni-hamburg.de/people/julian_kunkel

Hi Julian,

Looking at the current implementation in the native plugin:
H5VL_native_datatype_get() calls H5T_encode() which is defined as a
private function and should not be used by any other VOL
implementation and it is expected to return.

Right, but you can use the public version of this function:
https://www.hdfgroup.org/HDF5/doc/RM/RM_H5T.html#Datatype-Encode

And the decode:
https://www.hdfgroup.org/HDF5/doc/RM/RM_H5T.html#Datatype-Decode

I propose to move out this logic and return a completely constructed
datatype as "hid_t".

We can't return an hid_t directly from the plugin to the user, as the HDF5 library (the higher level VL layer) takes whatever structure returned from the plugins for objects and wraps it with another structure to include some more information about the plugin this object is being created with and a reference count on that object, and then creates the hid_t on that higher level struct. So we can't have plugin do all of that.

Since VOL_type_commit() also receives such a type it feels natural
that datatype_get() returns a userspace datatype as hid_t as well.
Since apparently nobody else is interested in this detailed
discussion, we can continue the discussion offline from the list.

Sure I'm happy to continue this discussion offline, and maybe schedule a quick call to explain the quirks with this :slight_smile:

Thanks,
Mohamad

···

Regards,
Julian

--
http://wr.informatik.uni-hamburg.de/people/julian_kunkel

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5