HDF5 C++ API's -Finding datatype of a DataSet ?

Francesc_Altet · January 31, 2008, 7:48pm

A Wednesday 30 January 2008, escriguéreu:

Hello frances,

Thanks for your input.
So basically that would mean, whatever format native_char, double,
int or string, whatever format we use while writing data (write(d0,
PredType::NATIVE_INT, However they are all broadly stored in as
H5T_INTEGER, H5T_FLOAT or H5T_STRING type and there is no exact way
HDF provides for identifying the the low level datatype - whether it
belongs to a int, double, float, string or char is ,except that
analysing the element size - u can come to a conclusion - what
datatype it might belong to, However, this is really a platform
specific data - size of datatype. Only issue is what when a source
needs to be supported on multiple platforms - (u will probably end up
lot of #ifdefs for each platform).

Yes, I agree that this is dependent on platform. My solution has been
using fixed-size datatypes in-memory, so that you don't have to worry
about platform dependency. This comes for free when using NumPy, a
package to deal with multidimensional data for Python. For C++ you
should find an alternate solution. In any case, in all the platforms
that I'm aware of, a float takes 4 bytes and a double takes 8 bytes.
And when I said 'most platforms' I wanted to exclude more exotic
architectures (Cray or other kind of super-computers); but if you are
not using them, then I think you are safe assuming the 4/8 sizes for
float/double.

Earlier , i had a question on writing/reading string datasets. Would
it be ok, if you can provide me some suggestions on this.

I'm sorry, but I'm more used to the C interface to HDF5, so I'm not sure
about what's happening in this case. Somebody more knowledable will
have to help you here.

Regards,

···

--

0,0< Francesc Altet http://www.carabos.com/

V V Cárabos Coop. V. Enjoy Data
"-"

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Anish_Anto1 · February 1, 2008, 8:29pm

Hello Francesc,

  I did not find a easy solution to find the low level datatype from a abstract datatype class.
  One solution was to implement a way - just coming to a conclusion of lowlevel datatype based on .getSize() , as you had also suggested.

  Another option, i tried out was to
  //create static datatype objects by
  DataType dt1(PredType::NATIVE_CHAR);
  DataType dt2(PredType::NATIVE_FLOAT);
  DataType dt3(PredType::NATIVE_DOUBLE);

  //Get datatype class object from Dataset - returns abstract datatype - int/string/float -
  //does not return you char, double or long.
  DataType dt_org = dataset.getDataType();

  //compare these 2 datatypes based on ==() oerator of Datatype class.
  if(dt_org==dt1)
  {
    //ok, its a native char - though dt_org.getClass() returns H5T_INTEGER.
  }
  elseif(dt_org==dt2)
  {
  //ok, its a native float}
  }
  elseif(dt_org==dt3)
  {
    //ok, its a native double - though dt_org.getClass() returns H5T_FLOAT}
  }

  Since, i know that x,y,z are the datatypes i would be using - i would create static objects and use them to compare. Not a good way, though.

  Any other recommended elegant way of getting the low level datatype of a Datatype class object ?

  Thanks Again
  A Wednesday 30 January 2008, escrigu�reu:

Hello frances,

Thanks for your input.
So basically that would mean, whatever format native_char, double,
int or string, whatever format we use while writing data (write(d0,
PredType::NATIVE_INT, However they are all broadly stored in as
H5T_INTEGER, H5T_FLOAT or H5T_STRING type and there is no exact way
HDF provides for identifying the the low level datatype - whether it
belongs to a int, double, float, string or char is ,except that
analysing the element size - u can come to a conclusion - what
datatype it might belong to, However, this is really a platform
specific data - size of datatype. Only issue is what when a source
needs to be supported on multiple platforms - (u will probably end up
lot of #ifdefs for each platform).

Yes, I agree that this is dependent on platform. My solution has been
using fixed-size datatypes in-memory, so that you don't have to worry
about platform dependency. This comes for free when using NumPy, a
package to deal with multidimensional data for Python. For C++ you
should find an alternate solution. In any case, in all the platforms
that I'm aware of, a float takes 4 bytes and a double takes 8 bytes.
And when I said 'most platforms' I wanted to exclude more exotic
architectures (Cray or other kind of super-computers); but if you are
not using them, then I think you are safe assuming the 4/8 sizes for
float/double.

Earlier , i had a question on writing/reading string datasets. Would
it be ok, if you can provide me some suggestions on this.

I'm sorry, but I'm more used to the C interface to HDF5, so I'm not sure
about what's happening in this case. Somebody more knowledable will
have to help you here.

Regards,

···

Francesc Altet <faltet@carabos.com> wrote:

--

0,0< Francesc Altet http://www.carabos.com/

V V C�rabos Coop. V. Enjoy Data
"-"

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.

Francesc_Altet · February 3, 2008, 10:06am

Hi Kip,

Yes, I also follow the same approach than you. However, you should
recognize that there is always an ambiguity in recognizing a native
data type based on its size. The most apparent example would be a
H5T_INTEGER with a size 4 (or 8), that is mapped into a int or long
(long long for size 8) in 32-bit platform and into a int (long or long
long for size 8) in 64-bit. In this case, you can select the common
denominator for both cases (int for size 4 and long long for size 8),
but in general, you could find cases (mainly in exotic architectures)
where it is not possible to do such a unique mapping between size <-->
native types.

The solution that I found for this is using a size-base data typing for
in-memory variables (i.e. not based on native types). But, in general,
the proposal of Quincey of using H5Tget_native_datatype() would be a
more sensible approach.

Regards,

A Thursday 31 January 2008, escriguéreu:

···

I don't know if this is correct according to the HDF5 gods, but in my
application we deduce the C type by using H5Tget_class() along with
H5Tget_sign() and H5Tget_size().

So for instance, a class of H5T_INTEGER with a size of 1 and a sign
of H5T_SGN_NONE would be equivalent to the C type of unsigned char in
most compilers. A class of H5T_INTEGER with a size of 2 and a sign
of H5T_SGN_NONE would be equivalent to the C type of unsigned short
in most compilers. Another example would be a class of H5T_FLOAT
with a size of 4 would be equivalent to the C type of float, whereas
a class of H5T_FLOAT with a size of 8 would be equivalent to the C
type of double. In order to deal with memory alignment issues during
read or write, you only need to set the memory and file datatypes
correctly, look at H5Tpack(). In order to deal with endian issues
during read or write, you need to set the set the order on the memory
and file datatypes correctly, look at H5Tset_order().

Thanks,
Kip Streithorst

-----Original Message-----
From: Francesc Altet [mailto:faltet@carabos.com]
Sent: Thursday, January 31, 2008 2:48 PM
To: hdf-forum@hdfgroup.org
Subject: Re: HDF5 C++ API's -Finding datatype of a DataSet ?

A Wednesday 30 January 2008, escriguéreu:
> Hello frances,
>
> Thanks for your input.
> So basically that would mean, whatever format native_char, double,
> int or string, whatever format we use while writing data
> (write(d0, PredType::NATIVE_INT, However they are all broadly
> stored in as H5T_INTEGER, H5T_FLOAT or H5T_STRING type and there is
> no exact way HDF provides for identifying the the low level
> datatype - whether it belongs to a int, double, float, string or
> char is ,except that analysing the element size - u can come to a
> conclusion - what datatype it might belong to, However, this is
> really a platform specific data - size of datatype. Only issue is
> what when a source needs to be supported on multiple platforms - (u
> will probably end up lot of #ifdefs for each platform).

Yes, I agree that this is dependent on platform. My solution has
been using fixed-size datatypes in-memory, so that you don't have to
worry about platform dependency. This comes for free when using
NumPy, a package to deal with multidimensional data for Python. For
C++ you should find an alternate solution. In any case, in all the
platforms that I'm aware of, a float takes 4 bytes and a double takes
8 bytes. And when I said 'most platforms' I wanted to exclude more
exotic architectures (Cray or other kind of super-computers); but if
you are not using them, then I think you are safe assuming the 4/8
sizes for float/double.

> Earlier , i had a question on writing/reading string datasets.
> Would it be ok, if you can provide me some suggestions on this.

I'm sorry, but I'm more used to the C interface to HDF5, so I'm not
sure about what's happening in this case. Somebody more knowledable
will have to help you here.

Regards,

--

0,0< Francesc Altet http://www.carabos.com/

V V Cárabos Coop. V. Enjoy Data
"-"

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Quincey_Koziol · February 1, 2008, 8:58pm

Hi Anish,

Hello Francesc,

I did not find a easy solution to find the low level datatype from a abstract datatype class.
One solution was to implement a way - just coming to a conclusion of lowlevel datatype based on .getSize() , as you had also suggested.

Another option, i tried out was to
//create static datatype objects by
DataType dt1(PredType::NATIVE_CHAR);
DataType dt2(PredType::NATIVE_FLOAT);
DataType dt3(PredType::NATIVE_DOUBLE);

//Get datatype class object from Dataset - returns abstract datatype - int/string/float -
//does not return you char, double or long.
DataType dt_org = dataset.getDataType();

//compare these 2 datatypes based on ==() oerator of Datatype class.
if(dt_org==dt1)
{
//ok, its a native char - though dt_org.getClass() returns H5T_INTEGER.
}
elseif(dt_org==dt2)
{
//ok, its a native float}
}
elseif(dt_org==dt3)
{
//ok, its a native double - though dt_org.getClass() returns H5T_FLOAT}
}

Since, i know that x,y,z are the datatypes i would be using - i would create static objects and use them to compare. Not a good way, though.

Any other recommended elegant way of getting the low level datatype of a Datatype class object ?

Would the H5Tget_native_type() API routine help you?

Quincey

···

On Feb 1, 2008, at 2:29 PM, Anish Anto wrote:

Thanks Again
Francesc Altet <faltet@carabos.com> wrote:
A Wednesday 30 January 2008, escriguéreu:
> Hello frances,
>
> Thanks for your input.
> So basically that would mean, whatever format native_char, double,
> int or string, whatever format we use while writing data (write(d0,
> PredType::NATIVE_INT, However they are all broadly stored in as
> H5T_INTEGER, H5T_FLOAT or H5T_STRING type and there is no exact way
> HDF provides for identifying the the low level datatype - whether it
> belongs to a int, double, float, string or char is ,except that
> analysing the element size - u can come to a conclusion - what
> datatype it might belong to, However, this is really a platform
> specific data - size of datatype. Only issue is what when a source
> needs to be supported on multiple platforms - (u will probably end up
> lot of #ifdefs for each platform).

Yes, I agree that this is dependent on platform. My solution has been
using fixed-size datatypes in-memory, so that you don't have to worry
about platform dependency. This comes for free when using NumPy, a
package to deal with multidimensional data for Python. For C++ you
should find an alternate solution. In any case, in all the platforms
that I'm aware of, a float takes 4 bytes and a double takes 8 bytes.
And when I said 'most platforms' I wanted to exclude more exotic
architectures (Cray or other kind of super-computers); but if you are
not using them, then I think you are safe assuming the 4/8 sizes for
float/double.

> Earlier , i had a question on writing/reading string datasets. Would
> it be ok, if you can provide me some suggestions on this.

I'm sorry, but I'm more used to the C interface to HDF5, so I'm not sure
about what's happening in this case. Somebody more knowledable will
have to help you here.

Regards,

--
>0,0< Francesc Altet http://www.carabos.com/
V V Cárabos Coop. V. Enjoy Data
"-"

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Anish_Anto · February 1, 2008, 11:10pm

Hi Quincey,

I did tried using H5Tget_native_type(), however the results were just the
same.

DataType data_type = dataset.getDataType();

//Returns value from enum H5T_class_t
int val = data_type.getClass(); //returns 0 for integer(could be short,
char, unsigned int) 1 for float(could be double, ...), 3 for string

hid_t native_type_datatype=H5Tget_native_type(datatype.getId(),
H5T_DIR_DEFAULT);

My assumption was that native_type_datatype would be set with a native
datatype value as
//H5T_NATIVE_CHAR, H5T_NATIVE_SCHAR, H5T_NATIVE_UCHAR, H5T_NATIVE_SHORT,
H5T_NATIVE_USHORT,.....),
But, it was returning always H5T_INTEGER even if it was char, uchar,
short, or int, the same value returned by data_type.getClass();

Thanks
Anish

Quincey Koziol <koziol@hdfgroup.org>
02/01/2008 01:00 PM

To
Anish Anto <anto.anish@yahoo.com>
cc
hdf-forum@hdfgroup.org
Subject
Re: HDF5 C++ API's -Finding datatype of a DataSet ?

Hi Anish,

Hello Francesc,

I did not find a easy solution to find the low level datatype from a
abstract datatype class.
One solution was to implement a way - just coming to a conclusion of
lowlevel datatype based on .getSize() , as you had also suggested.

Another option, i tried out was to
//create static datatype objects by
DataType dt1(PredType::NATIVE_CHAR);
DataType dt2(PredType::NATIVE_FLOAT);
DataType dt3(PredType::NATIVE_DOUBLE);

//Get datatype class object from Dataset - returns abstract datatype
- int/string/float -
//does not return you char, double or long.
DataType dt_org = dataset.getDataType();

//compare these 2 datatypes based on ==() oerator of Datatype class.
if(dt_org==dt1)
{
//ok, its a native char - though dt_org.getClass() returns
H5T_INTEGER.
}
elseif(dt_org==dt2)
{
//ok, its a native float}
}
elseif(dt_org==dt3)
{
//ok, its a native double - though dt_org.getClass() returns
H5T_FLOAT}
}

Since, i know that x,y,z are the datatypes i would be using - i
would create static objects and use them to compare. Not a good way,
though.

Any other recommended elegant way of getting the low level datatype
of a Datatype class object ?

Would the H5Tget_native_type() API routine help you?

Quincey

Thanks Again
A Wednesday 30 January 2008, escriguéreu:
> Hello frances,
>
> Thanks for your input.
> So basically that would mean, whatever format native_char, double,
> int or string, whatever format we use while writing data (write(d0,
> PredType::NATIVE_INT, However they are all broadly stored in as
> H5T_INTEGER, H5T_FLOAT or H5T_STRING type and there is no exact way
> HDF provides for identifying the the low level datatype - whether it
> belongs to a int, double, float, string or char is ,except that
> analysing the element size - u can come to a conclusion - what
> datatype it might belong to, However, this is really a platform
> specific data - size of datatype. Only issue is what when a source
> needs to be supported on multiple platforms - (u will probably end
up
> lot of #ifdefs for each platform).

Yes, I agree that this is dependent on platform. My solution has been
using fixed-size datatypes in-memory, so that you don't have to worry
about platform dependency. This comes for free when using NumPy, a
package to deal with multidimensional data for Python. For C++ you
should find an alternate solution. In any case, in all the platforms
that I'm aware of, a float takes 4 bytes and a double takes 8 bytes.
And when I said 'most platforms' I wanted to exclude more exotic
architectures (Cray or other kind of super-computers); but if you are
not using them, then I think you are safe assuming the 4/8 sizes for
float/double.

> Earlier , i had a question on writing/reading string datasets. Would
> it be ok, if you can provide me some suggestions on this.

I'm sorry, but I'm more used to the C interface to HDF5, so I'm not
sure
about what's happening in this case. Somebody more knowledable will
have to help you here.

Regards,

--
>0,0< Francesc Altet http://www.carabos.com/
V V Cárabos Coop. V. Enjoy Data
"-"

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to

hdf-forum-subscribe@hdfgroup.org

···

On Feb 1, 2008, at 2:29 PM, Anish Anto wrote:

Francesc Altet <faltet@carabos.com> wrote:
.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Be a better friend, newshound, and know-it-all with Yahoo! Mobile.
Try it now.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to
hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Quincey_Koziol · February 2, 2008, 3:22am

Hi Anish,

Hi Quincey,

I did tried using H5Tget_native_type(), however the results were just the same.

DataType data_type = dataset.getDataType();

//Returns value from enum H5T_class_t
int val = data_type.getClass(); //returns 0 for integer(could be short, char, unsigned int) 1 for float(could be double, ...), 3 for string

hid_t native_type_datatype=H5Tget_native_type(datatype.getId(), H5T_DIR_DEFAULT);

My assumption was that native_type_datatype would be set with a native datatype value as
//H5T_NATIVE_CHAR, H5T_NATIVE_SCHAR, H5T_NATIVE_UCHAR, H5T_NATIVE_SHORT, H5T_NATIVE_USHORT,.....),
But, it was returning always H5T_INTEGER even if it was char, uchar, short, or int, the same value returned by data_type.getClass();

Ah! You are confusing the class of a datatype with an instance of that datatype class. There's 5-6 classes of datatypes in HDF5 (integer, float, array, variable-length, compound, etc). Then within each class are specific instances of a datatype, like "32-bit, little-endian integer", etc. The H5Tget_native_datatype() call returns an instance of a datatype class.

Does that help?

Quincey

···

On Feb 1, 2008, at 5:10 PM, Anish Anto wrote:

Thanks
Anish

Quincey Koziol <koziol@hdfgroup.org>
02/01/2008 01:00 PM

To
Anish Anto <anto.anish@yahoo.com>
cc
hdf-forum@hdfgroup.org
Subject
Re: HDF5 C++ API's -Finding datatype of a DataSet ?

Hi Anish,

On Feb 1, 2008, at 2:29 PM, Anish Anto wrote:

> Hello Francesc,
>
> I did not find a easy solution to find the low level datatype from a
> abstract datatype class.
> One solution was to implement a way - just coming to a conclusion of
> lowlevel datatype based on .getSize() , as you had also suggested.
>
> Another option, i tried out was to
> //create static datatype objects by
> DataType dt1(PredType::NATIVE_CHAR);
> DataType dt2(PredType::NATIVE_FLOAT);
> DataType dt3(PredType::NATIVE_DOUBLE);
>
> //Get datatype class object from Dataset - returns abstract datatype
> - int/string/float -
> //does not return you char, double or long.
> DataType dt_org = dataset.getDataType();
>
> //compare these 2 datatypes based on ==() oerator of Datatype class.
> if(dt_org==dt1)
> {
> //ok, its a native char - though dt_org.getClass() returns
> H5T_INTEGER.
> }
> elseif(dt_org==dt2)
> {
> //ok, its a native float}
> }
> elseif(dt_org==dt3)
> {
> //ok, its a native double - though dt_org.getClass() returns
> H5T_FLOAT}
> }
>
> Since, i know that x,y,z are the datatypes i would be using - i
> would create static objects and use them to compare. Not a good way,
> though.
>
> Any other recommended elegant way of getting the low level datatype
> of a Datatype class object ?

Would the H5Tget_native_type() API routine help you?

Quincey

>
> Thanks Again
> Francesc Altet <faltet@carabos.com> wrote:
> A Wednesday 30 January 2008, escriguéreu:
> > Hello frances,
> >
> > Thanks for your input.
> > So basically that would mean, whatever format native_char, double,
> > int or string, whatever format we use while writing data (write(d0,
> > PredType::NATIVE_INT, However they are all broadly stored in as
> > H5T_INTEGER, H5T_FLOAT or H5T_STRING type and there is no exact way
> > HDF provides for identifying the the low level datatype - whether it
> > belongs to a int, double, float, string or char is ,except that
> > analysing the element size - u can come to a conclusion - what
> > datatype it might belong to, However, this is really a platform
> > specific data - size of datatype. Only issue is what when a source
> > needs to be supported on multiple platforms - (u will probably end
> up
> > lot of #ifdefs for each platform).
>
> Yes, I agree that this is dependent on platform. My solution has been
> using fixed-size datatypes in-memory, so that you don't have to worry
> about platform dependency. This comes for free when using NumPy, a
> package to deal with multidimensional data for Python. For C++ you
> should find an alternate solution. In any case, in all the platforms
> that I'm aware of, a float takes 4 bytes and a double takes 8 bytes.
> And when I said 'most platforms' I wanted to exclude more exotic
> architectures (Cray or other kind of super-computers); but if you are
> not using them, then I think you are safe assuming the 4/8 sizes for
> float/double.
>
> > Earlier , i had a question on writing/reading string datasets. Would
> > it be ok, if you can provide me some suggestions on this.
>
> I'm sorry, but I'm more used to the C interface to HDF5, so I'm not
> sure
> about what's happening in this case. Somebody more knowledable will
> have to help you here.
>
> Regards,
>
> --
> >0,0< Francesc Altet http://www.carabos.com/
> V V Cárabos Coop. V. Enjoy Data
> "-"
>
> ----------------------------------------------------------------------
> This mailing list is for HDF software users discussion.
> To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org
> .
> To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.
>
> Be a better friend, newshound, and know-it-all with Yahoo! Mobile.
> Try it now.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.