Possible to pack bit types into compound data types?

I am working on an application to record data in HDF5 format, and I'm
completely new to it.
The data is in the form of packets, each of which has an associated
timestamp and class.
Therefore, it would seem appropriate to use the FL_PacketTable class (99% of
the packets are fixed length, so this is my core use case).
The class of the packet indicates the packet contents, and each class
appears to map naturally to the HDF5 "Compound" data type, with a struct for
each class of packet.
Note also that data is retrieved from a legacy file format that uses
individual bits to represent certain data.

So far, so good. I can produce an hdf5 file with the following code
(C++/win32/VisStudio2005); assume that the file object and the group V3 are
defined.

//structured data - "compound" in the HDF5 terminology.
struct _my_type {
   double t;//e.g. time.
   int a;
   float b;
};
CompType mtype1( sizeof(_my_type) );
mtype1.insertMember( "time", HOFFSET(_my_type, t), PredType::NATIVE_DOUBLE);
mtype1.insertMember( "alt", HOFFSET(_my_type, a), PredType::NATIVE_INT);
mtype1.insertMember( "math", HOFFSET(_my_type, b), PredType::NATIVE_FLOAT);

FL_PacketTable pt(V3.getId(),"Packets",mtype1.getId(),500,6);
_my_type s1;
for (int i = 0; i< 400000; i++)
{
  s1.t = i/10.f;//monotonic time
  s1.a = i % 10;//sawtooth integer data
  s1.b = 100.f/(i+1);//math function
  pt.AppendPacket(&s1);
}

The resulting file is maximally self describing, in that when opened with
hdfView, I see a packet table with columns headed time, alt, math, and my
"packets" in the records below.

Now what I would like to do is achieve the same maximally self describing
file for the amended compound type:

struct _my_type {
   double t;//e.g. time.
   int a;
   float b;//so far, so easy....
   //BUT, we would also like...
   union {
     struct {
       unsigned char bit0 : 1;//ideally, should be able to map each bit's
value
       unsigned char bit1 : 1;//to one of a pair of strings, e.g. "VALVE_OPEN"
/ "VALVE_CLOSED"
       unsigned char bit2 : 1;//by using, perhaps, something like the
ENUMERATION feature of
       unsigned char bit3 : 1;//HDF5.
       unsigned char bit4 : 1;
       unsigned char bit5 : 1;
       unsigned char bit6 : 1;
       unsigned char bit7 : 1;
     };
     //..and ideally would ALSO like to be able to retrieve the entire field,
as below....
     unsigned char wholebyte;
   };
};

If I now amend my code to do:

mtype1.insertMember( "wholebyte", HOFFSET(_my_type, wholebyte),
PredType::NATIVE_UCHAR);
s1.wholebyte = 0;

for (int i = 0; i< 400000; i++)
{
  s1.t = i/10.f;//monotonic time
  s1.a = i % 10;//sawtooth integer data
  s1.b = 100.f/(i+1);//math function
  s1.bit1 = ( (0 == (i % 20)) ? 1 : 0);//bit1 goes true every 20th element
  s1.bit2 = ( (10 < (i % 20)) ? 1 : 0);//bit2 goes true about 1/2 the time
  s1.bit3 = ( (10 > (i % 30)) ? 1 : 0);//bit3 goes true about 1/3 the time
  pt.AppendPacket(&s1);
}

then I do indeed see "wholebyte" and its data as an extra column in hdfview.
But end-users will certainly want to see individual bit values, rather than
the entire byte.

So - and this is my problem - if I do this instead (i.e. I do not insert
wholebyte):

//Create single bit transient types, then commit them to the dataset.
//Q: are these types modifying the original types, or are they "copies" in
the H5Tcopy sense?
//Not yet clear without examining c++ library behaviour further.....
IntType mySingleBit1Type(PredType::STD_B8LE);
mySingleBit1Type.setPrecision(1);
mySingleBit1Type.setOffset(1);
mySingleBit1Type.commit(V3,"Bit1Type");

mtype1.insertMember( "bit1", HOFFSET(_my_type,wholebyte), mySingleBit1Type);

Then I do NOT see “bit1” as a field in the packet table using hdfview – that
is, the “self describing” aspect fails.

Worse, if I attempt to define and insert another bit type, as below:

IntType mySingleBit2Type(PredType::STD_B8LE);
mySingleBit2Type.setPrecision(1);
mySingleBit2Type.setOffset(2);
mySingleBit2Type.commit(V3,"Bit2Type");
mtype1.insertMember( "bit2", HOFFSET(_my_type,wholebyte), mySingleBit2Type);

Then I get a "member overlaps with another member" exception from
H5Tcompound.c. This is not surprising, since the API only appears to allow
BYTE offsets.

Now some obvious, but ugly workarounds exist. I could, for example, store my
original bit data as bytes. But this would be very inefficient, in terms of
storage, unless the magic of compression would reduce the problem …..

I can’t believe I’m the first person to encounter this issue, much more
likely is that I’m still too stupid to understand how best to define the bit
fields. Does anyone have any ideas? I'm aware that the above code may not be
completely platform portable in theory due to the C specification not
specifying exactly where bits might be put within the machine word, but this
isn't an issue in our case (at the moment!)
Thanks!

···

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/Possible-to-pack-bit-types-into-compound-data-types-tp1131024p1131024.html
Sent from the hdf-forum mailing list archive at Nabble.com.

Hi,

  while I don't have experience with bitfields in HDF5 myself, maybe using an enum type could help in your case. You can create an enum type which associates names to each of the values, store it as a named datatype in the file, and when you do an h5ls it will show the possible combinations and the current value, such as in this case:

     Attribute: TypeInfo scalar
         Type: shared-1:19496 enum native int {
                    UnknownArrayType = 0
                    Contiguous = 1
                    SeparatedCompound = 2
                    Constant = 3
                    FragmentedContiguous = 4
                    FragmentedSeparatedCompound = 5
                    DirectProduct = 6
                    IndexPermutation = 7
                    UniformSampling = 8
                    FragmentedUniformSampling = 9
                }
         Data: UniformSampling

Mostly it appears to be an issue of H5view or the h5ls tools, whether these tools can display a bitfield by its components. It would still be self-describing if you use your bitfield, but h5ls/h5dump don't resolve it the way you'd like it. So possibly you might just want to modify h5ls/h5dump to show the information as seems appropriate as an easier workaround? For instance, I find it also annoying that for an enum type the h5ls tools always shows the entire list of possible enum values for each data element, whereas it would be sufficient to show these possible values only where the enum is defined as named datatype.

      Werner

···

On Fri, 13 Aug 2010 14:04:16 +0200, Steve Bissell <stephen.bissell@airbus.com> wrote:

I am working on an application to record data in HDF5 format, and I'm
completely new to it.
The data is in the form of packets, each of which has an associated
timestamp and class.
Therefore, it would seem appropriate to use the FL_PacketTable class (99% of
the packets are fixed length, so this is my core use case).
The class of the packet indicates the packet contents, and each class
appears to map naturally to the HDF5 "Compound" data type, with a struct for
each class of packet.
Note also that data is retrieved from a legacy file format that uses
individual bits to represent certain data.

So far, so good. I can produce an hdf5 file with the following code
(C++/win32/VisStudio2005); assume that the file object and the group V3 are
defined.

//structured data - "compound" in the HDF5 terminology.
struct _my_type {
   double t;//e.g. time.
   int a;
   float b;
};
CompType mtype1( sizeof(_my_type) );
mtype1.insertMember( "time", HOFFSET(_my_type, t), PredType::NATIVE_DOUBLE);
mtype1.insertMember( "alt", HOFFSET(_my_type, a), PredType::NATIVE_INT);
mtype1.insertMember( "math", HOFFSET(_my_type, b), PredType::NATIVE_FLOAT);

FL_PacketTable pt(V3.getId(),"Packets",mtype1.getId(),500,6);
_my_type s1;
for (int i = 0; i< 400000; i++)
{
  s1.t = i/10.f;//monotonic time
  s1.a = i % 10;//sawtooth integer data
  s1.b = 100.f/(i+1);//math function
  pt.AppendPacket(&s1);
}

The resulting file is maximally self describing, in that when opened with
hdfView, I see a packet table with columns headed time, alt, math, and my
"packets" in the records below.

Now what I would like to do is achieve the same maximally self describing
file for the amended compound type:

struct _my_type {
   double t;//e.g. time.
   int a;
   float b;//so far, so easy....
   //BUT, we would also like...
   union {
     struct {
       unsigned char bit0 : 1;//ideally, should be able to map each bit's
value
       unsigned char bit1 : 1;//to one of a pair of strings, e.g. "VALVE_OPEN"
/ "VALVE_CLOSED"
       unsigned char bit2 : 1;//by using, perhaps, something like the
ENUMERATION feature of
       unsigned char bit3 : 1;//HDF5.
       unsigned char bit4 : 1;
       unsigned char bit5 : 1;
       unsigned char bit6 : 1;
       unsigned char bit7 : 1;
     };
     //..and ideally would ALSO like to be able to retrieve the entire field,
as below....
     unsigned char wholebyte;
   };
};

If I now amend my code to do:

mtype1.insertMember( "wholebyte", HOFFSET(_my_type, wholebyte),
PredType::NATIVE_UCHAR);
s1.wholebyte = 0;

for (int i = 0; i< 400000; i++)
{
  s1.t = i/10.f;//monotonic time
  s1.a = i % 10;//sawtooth integer data
  s1.b = 100.f/(i+1);//math function
  s1.bit1 = ( (0 == (i % 20)) ? 1 : 0);//bit1 goes true every 20th element
  s1.bit2 = ( (10 < (i % 20)) ? 1 : 0);//bit2 goes true about 1/2 the time
  s1.bit3 = ( (10 > (i % 30)) ? 1 : 0);//bit3 goes true about 1/3 the time
  pt.AppendPacket(&s1);
}

then I do indeed see "wholebyte" and its data as an extra column in hdfview.
But end-users will certainly want to see individual bit values, rather than
the entire byte.

So - and this is my problem - if I do this instead (i.e. I do not insert
wholebyte):

//Create single bit transient types, then commit them to the dataset.
//Q: are these types modifying the original types, or are they "copies" in
the H5Tcopy sense?
//Not yet clear without examining c++ library behaviour further.....
IntType mySingleBit1Type(PredType::STD_B8LE);
mySingleBit1Type.setPrecision(1);
mySingleBit1Type.setOffset(1);
mySingleBit1Type.commit(V3,"Bit1Type");

mtype1.insertMember( "bit1", HOFFSET(_my_type,wholebyte), mySingleBit1Type);

Then I do NOT see “bit1” as a field in the packet table using hdfview – that
is, the “self describing” aspect fails.

Worse, if I attempt to define and insert another bit type, as below:

IntType mySingleBit2Type(PredType::STD_B8LE);
mySingleBit2Type.setPrecision(1);
mySingleBit2Type.setOffset(2);
mySingleBit2Type.commit(V3,"Bit2Type");
mtype1.insertMember( "bit2", HOFFSET(_my_type,wholebyte), mySingleBit2Type);

Then I get a "member overlaps with another member" exception from
H5Tcompound.c. This is not surprising, since the API only appears to allow
BYTE offsets.

Now some obvious, but ugly workarounds exist. I could, for example, store my
original bit data as bytes. But this would be very inefficient, in terms of
storage, unless the magic of compression would reduce the problem …..

I can’t believe I’m the first person to encounter this issue, much more
likely is that I’m still too stupid to understand how best to define the bit
fields. Does anyone have any ideas? I'm aware that the above code may not be
completely platform portable in theory due to the C specification not
specifying exactly where bits might be put within the machine word, but this
isn't an issue in our case (at the moment!)
Thanks!

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Hi Steve,
  HDF5 does allow creating bitfield datatypes, but the underlying type must currently be an integral number of bytes in size. It sounds like a reasonable extension to allow some way to pack bitfields into one underlying byte, but we haven't explored it seriously. If you'd like to think about it and propose an interface that you think would work, that would kick off the discussion nicely. :slight_smile:

  Quincey

···

On Aug 13, 2010, at 7:04 AM, Steve Bissell wrote:

I am working on an application to record data in HDF5 format, and I'm
completely new to it.
The data is in the form of packets, each of which has an associated
timestamp and class.
Therefore, it would seem appropriate to use the FL_PacketTable class (99% of
the packets are fixed length, so this is my core use case).
The class of the packet indicates the packet contents, and each class
appears to map naturally to the HDF5 "Compound" data type, with a struct for
each class of packet.
Note also that data is retrieved from a legacy file format that uses
individual bits to represent certain data.

So far, so good. I can produce an hdf5 file with the following code
(C++/win32/VisStudio2005); assume that the file object and the group V3 are
defined.

//structured data - "compound" in the HDF5 terminology.
struct _my_type {
  double t;//e.g. time.
  int a;
  float b;
};
CompType mtype1( sizeof(_my_type) );
mtype1.insertMember( "time", HOFFSET(_my_type, t), PredType::NATIVE_DOUBLE);
mtype1.insertMember( "alt", HOFFSET(_my_type, a), PredType::NATIVE_INT);
mtype1.insertMember( "math", HOFFSET(_my_type, b), PredType::NATIVE_FLOAT);

FL_PacketTable pt(V3.getId(),"Packets",mtype1.getId(),500,6);
_my_type s1;
for (int i = 0; i< 400000; i++)
{
  s1.t = i/10.f;//monotonic time
  s1.a = i % 10;//sawtooth integer data
  s1.b = 100.f/(i+1);//math function
  pt.AppendPacket(&s1);
}

The resulting file is maximally self describing, in that when opened with
hdfView, I see a packet table with columns headed time, alt, math, and my
"packets" in the records below.

Now what I would like to do is achieve the same maximally self describing
file for the amended compound type:

struct _my_type {
  double t;//e.g. time.
  int a;
  float b;//so far, so easy....
  //BUT, we would also like...
  union {
     struct {
       unsigned char bit0 : 1;//ideally, should be able to map each bit's
value
       unsigned char bit1 : 1;//to one of a pair of strings, e.g. "VALVE_OPEN"
/ "VALVE_CLOSED"
       unsigned char bit2 : 1;//by using, perhaps, something like the
ENUMERATION feature of
       unsigned char bit3 : 1;//HDF5.
       unsigned char bit4 : 1;
       unsigned char bit5 : 1;
       unsigned char bit6 : 1;
       unsigned char bit7 : 1;
     };
     //..and ideally would ALSO like to be able to retrieve the entire field,
as below....
     unsigned char wholebyte;
  };
};

If I now amend my code to do:

mtype1.insertMember( "wholebyte", HOFFSET(_my_type, wholebyte),
PredType::NATIVE_UCHAR);
s1.wholebyte = 0;

for (int i = 0; i< 400000; i++)
{
  s1.t = i/10.f;//monotonic time
  s1.a = i % 10;//sawtooth integer data
  s1.b = 100.f/(i+1);//math function
  s1.bit1 = ( (0 == (i % 20)) ? 1 : 0);//bit1 goes true every 20th element
  s1.bit2 = ( (10 < (i % 20)) ? 1 : 0);//bit2 goes true about 1/2 the time
  s1.bit3 = ( (10 > (i % 30)) ? 1 : 0);//bit3 goes true about 1/3 the time
  pt.AppendPacket(&s1);
}

then I do indeed see "wholebyte" and its data as an extra column in hdfview.
But end-users will certainly want to see individual bit values, rather than
the entire byte.

So - and this is my problem - if I do this instead (i.e. I do not insert
wholebyte):

//Create single bit transient types, then commit them to the dataset.
//Q: are these types modifying the original types, or are they "copies" in
the H5Tcopy sense?
//Not yet clear without examining c++ library behaviour further.....
IntType mySingleBit1Type(PredType::STD_B8LE);
mySingleBit1Type.setPrecision(1);
mySingleBit1Type.setOffset(1);
mySingleBit1Type.commit(V3,"Bit1Type");

mtype1.insertMember( "bit1", HOFFSET(_my_type,wholebyte), mySingleBit1Type);

Then I do NOT see “bit1” as a field in the packet table using hdfview – that
is, the “self describing” aspect fails.

Worse, if I attempt to define and insert another bit type, as below:

IntType mySingleBit2Type(PredType::STD_B8LE);
mySingleBit2Type.setPrecision(1);
mySingleBit2Type.setOffset(2);
mySingleBit2Type.commit(V3,"Bit2Type");
mtype1.insertMember( "bit2", HOFFSET(_my_type,wholebyte), mySingleBit2Type);

Then I get a "member overlaps with another member" exception from
H5Tcompound.c. This is not surprising, since the API only appears to allow
BYTE offsets.

Now some obvious, but ugly workarounds exist. I could, for example, store my
original bit data as bytes. But this would be very inefficient, in terms of
storage, unless the magic of compression would reduce the problem …..

I can’t believe I’m the first person to encounter this issue, much more
likely is that I’m still too stupid to understand how best to define the bit
fields. Does anyone have any ideas? I'm aware that the above code may not be
completely platform portable in theory due to the C specification not
specifying exactly where bits might be put within the machine word, but this
isn't an issue in our case (at the moment!)
Thanks!

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/Possible-to-pack-bit-types-into-compound-data-types-tp1131024p1131024.html
Sent from the hdf-forum mailing list archive at Nabble.com.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Thanks for the quick reply, Quincey. I won't be able to investigate this
further for a while, as I'm away on holiday as of tomorrow :slight_smile: and have
been tagged for an urgent job on my return :frowning: - that should be of
short duration, so I'll be able to look into this properly when that's
done. My worst case fallback would be to store each bit as a single byte
bitfield type of precision 1 (to preserve the information that the type
is a single bit); I'll do some tests to see what the file size penalty
is.

On a related note, I notice that there is an enumeration type, but it
does not seem possible to define a string for "any OTHER value". This is
necessary to be able to use your enumeration scheme to map LOGICAL
values that need to be mapped to one of exactly two strings, e.g.
== 0 ==> VALVE_OPEN
== ANY OTHER VALUE ==> VALVE_CLOSED.

Further, the enumeration scheme of HDF5 seems to be tightly coupled to
the integer datatype, whereas on our system, "enumeration" is just a
view/transform applicable to any datatype for which you can define a
transform (e.g. we might choose to map float data to an enumeration set
via a rounding transform).

This can all be handled at my middleware layer between HDF5 and my
client interface via attributes and/or a XML in the user-block, I guess,
but I will have a quick look at what might be involved in making bit
types, at least, "native" to HDF5.

Finally, I notice from other posts that "HDF5 does not currently support
attaching attributes to fields of compound types". But if the fields
within a compound type are user-defined types, and those user-defined
types themselves have attributes attached, doesn't that achieve the
required end of defining meta-data for compound fields?

Thanks again,
Steve

[mailto:hdf-forum-bounces@hdfgroup.org] On Behalf Of Quincey Koziol

···

-----Original Message-----
From: hdf-forum-bounces@hdfgroup.org
Sent: 13 August 2010 21:42
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Possible to pack bit types into compound data
types?

Hi Steve,
  HDF5 does allow creating bitfield datatypes, but the underlying
type must currently be an integral number of bytes in size. It sounds
like a reasonable extension to allow some way to pack bitfields into one
underlying byte, but we haven't explored it seriously. If you'd like to
think about it and propose an interface that you think would work, that
would kick off the discussion nicely. :slight_smile:

  Quincey

On Aug 13, 2010, at 7:04 AM, Steve Bissell wrote:

I am working on an application to record data in HDF5 format, and I'm
completely new to it.
The data is in the form of packets, each of which has an associated
timestamp and class.
Therefore, it would seem appropriate to use the FL_PacketTable class

(99% of

the packets are fixed length, so this is my core use case).
The class of the packet indicates the packet contents, and each class
appears to map naturally to the HDF5 "Compound" data type, with a

struct for

each class of packet.
Note also that data is retrieved from a legacy file format that uses
individual bits to represent certain data.

So far, so good. I can produce an hdf5 file with the following code
(C++/win32/VisStudio2005); assume that the file object and the group

V3 are

defined.

//structured data - "compound" in the HDF5 terminology.
struct _my_type {
  double t;//e.g. time.
  int a;
  float b;
};
CompType mtype1( sizeof(_my_type) );
mtype1.insertMember( "time", HOFFSET(_my_type, t),

PredType::NATIVE_DOUBLE);

mtype1.insertMember( "alt", HOFFSET(_my_type, a),

PredType::NATIVE_INT);

mtype1.insertMember( "math", HOFFSET(_my_type, b),

PredType::NATIVE_FLOAT);

FL_PacketTable pt(V3.getId(),"Packets",mtype1.getId(),500,6);
_my_type s1;
for (int i = 0; i< 400000; i++)
{
  s1.t = i/10.f;//monotonic time
  s1.a = i % 10;//sawtooth integer data
  s1.b = 100.f/(i+1);//math function
  pt.AppendPacket(&s1);
}

The resulting file is maximally self describing, in that when opened

with

hdfView, I see a packet table with columns headed time, alt, math, and

my

"packets" in the records below.

Now what I would like to do is achieve the same maximally self

describing

file for the amended compound type:

struct _my_type {
  double t;//e.g. time.
  int a;
  float b;//so far, so easy....
  //BUT, we would also like...
  union {
     struct {
       unsigned char bit0 : 1;//ideally, should be able to

map each bit's

value
       unsigned char bit1 : 1;//to one of a pair of strings,

e.g. "VALVE_OPEN"

/ "VALVE_CLOSED"
       unsigned char bit2 : 1;//by using, perhaps, something

like the

ENUMERATION feature of
       unsigned char bit3 : 1;//HDF5.
       unsigned char bit4 : 1;
       unsigned char bit5 : 1;
       unsigned char bit6 : 1;
       unsigned char bit7 : 1;
     };
     //..and ideally would ALSO like to be able to retrieve the

entire field,

as below....
     unsigned char wholebyte;
  };
};

If I now amend my code to do:

mtype1.insertMember( "wholebyte", HOFFSET(_my_type, wholebyte),
PredType::NATIVE_UCHAR);
s1.wholebyte = 0;

for (int i = 0; i< 400000; i++)
{
  s1.t = i/10.f;//monotonic time
  s1.a = i % 10;//sawtooth integer data
  s1.b = 100.f/(i+1);//math function
  s1.bit1 = ( (0 == (i % 20)) ? 1 : 0);//bit1 goes true every 20th

element

  s1.bit2 = ( (10 < (i % 20)) ? 1 : 0);//bit2 goes true about 1/2

the time

  s1.bit3 = ( (10 > (i % 30)) ? 1 : 0);//bit3 goes true about 1/3

the time

  pt.AppendPacket(&s1);
}

then I do indeed see "wholebyte" and its data as an extra column in

hdfview.

But end-users will certainly want to see individual bit values, rather

than

the entire byte.

So - and this is my problem - if I do this instead (i.e. I do not

insert

wholebyte):

//Create single bit transient types, then commit them to the dataset.
//Q: are these types modifying the original types, or are they

"copies" in

the H5Tcopy sense?
//Not yet clear without examining c++ library behaviour further.....
IntType mySingleBit1Type(PredType::STD_B8LE);
mySingleBit1Type.setPrecision(1);
mySingleBit1Type.setOffset(1);
mySingleBit1Type.commit(V3,"Bit1Type");

mtype1.insertMember( "bit1", HOFFSET(_my_type,wholebyte),

mySingleBit1Type);

Then I do NOT see "bit1" as a field in the packet table using hdfview

- that

is, the "self describing" aspect fails.

Worse, if I attempt to define and insert another bit type, as below:

IntType mySingleBit2Type(PredType::STD_B8LE);
mySingleBit2Type.setPrecision(1);
mySingleBit2Type.setOffset(2);
mySingleBit2Type.commit(V3,"Bit2Type");
mtype1.insertMember( "bit2", HOFFSET(_my_type,wholebyte),

mySingleBit2Type);

Then I get a "member overlaps with another member" exception from
H5Tcompound.c. This is not surprising, since the API only appears to

allow

BYTE offsets.

Now some obvious, but ugly workarounds exist. I could, for example,

store my

original bit data as bytes. But this would be very inefficient, in

terms of

storage, unless the magic of compression would reduce the problem

.....

I can't believe I'm the first person to encounter this issue, much

more

likely is that I'm still too stupid to understand how best to define

the bit

fields. Does anyone have any ideas? I'm aware that the above code may

not be

completely platform portable in theory due to the C specification not
specifying exactly where bits might be put within the machine word,

but this

isn't an issue in our case (at the moment!)
Thanks!

--
View this message in context:

http://hdf-forum.184993.n3.nabble.com/Possible-to-pack-bit-types-into-co
mpound-data-types-tp1131024p1131024.html

Sent from the hdf-forum mailing list archive at Nabble.com.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

This mail has originated outside your organization, either from an
external partner or the Global Internet.
Keep this in mind if you answer this message.

This e-mail and any attachment may contain confidential and/or privileged information. If you have received this e-mail and/or attachment in error, please notify the sender immediately and delete the e-mail and any attachment from your system. If you are not the intended recipient you must not copy, distribute, disclose or use the contents of the e-mail or any attachment.
All e-mail sent to or from this address may be accessed by someone other than the recipient for system management and security reasons or for other lawful purposes.
Airbus Operations Limited does not accept liability for any damage or loss which may be caused by software viruses.
Airbus Operations Limited is registered in England and Wales under company number 3468788. The company's registered office is at New Filton House, Filton, Bristol, BS99 7AR.

Hi Steve,

Thanks for the quick reply, Quincey. I won't be able to investigate this
further for a while, as I'm away on holiday as of tomorrow :slight_smile: and have
been tagged for an urgent job on my return :frowning: - that should be of
short duration, so I'll be able to look into this properly when that's
done.

  OK, we'll wait to hear from you then. Happy holiday! :slight_smile:

My worst case fallback would be to store each bit as a single byte
bitfield type of precision 1 (to preserve the information that the type
is a single bit); I'll do some tests to see what the file size penalty
is.

  Yes, that'll definitely work.

On a related note, I notice that there is an enumeration type, but it
does not seem possible to define a string for "any OTHER value". This is
necessary to be able to use your enumeration scheme to map LOGICAL
values that need to be mapped to one of exactly two strings, e.g.
== 0 ==> VALVE_OPEN
== ANY OTHER VALUE ==> VALVE_CLOSED.

  Interesting idea, I'll add it to our issue tracker.

Further, the enumeration scheme of HDF5 seems to be tightly coupled to
the integer datatype, whereas on our system, "enumeration" is just a
view/transform applicable to any datatype for which you can define a
transform (e.g. we might choose to map float data to an enumeration set
via a rounding transform).

  Yes, this is already in our issue tracker. :slight_smile:

This can all be handled at my middleware layer between HDF5 and my
client interface via attributes and/or a XML in the user-block, I guess,
but I will have a quick look at what might be involved in making bit
types, at least, "native" to HDF5.

  OK, let me know how it goes.

Finally, I notice from other posts that "HDF5 does not currently support
attaching attributes to fields of compound types". But if the fields
within a compound type are user-defined types, and those user-defined
types themselves have attributes attached, doesn't that achieve the
required end of defining meta-data for compound fields?

  Yes, that's a potential work-around, although it requires using committed datatypes in the file, which may not work for some use cases and might be pretty complicated for others. I'd like to have a more obvious, self-describing implementation that would work in all situations.

  Quincey

···

On Aug 16, 2010, at 5:52 AM, BISSELL, Stephen wrote:

Thanks again,
Steve

-----Original Message-----
From: hdf-forum-bounces@hdfgroup.org
[mailto:hdf-forum-bounces@hdfgroup.org] On Behalf Of Quincey Koziol
Sent: 13 August 2010 21:42
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Possible to pack bit types into compound data
types?

Hi Steve,
  HDF5 does allow creating bitfield datatypes, but the underlying
type must currently be an integral number of bytes in size. It sounds
like a reasonable extension to allow some way to pack bitfields into one
underlying byte, but we haven't explored it seriously. If you'd like to
think about it and propose an interface that you think would work, that
would kick off the discussion nicely. :slight_smile:

  Quincey

On Aug 13, 2010, at 7:04 AM, Steve Bissell wrote:

I am working on an application to record data in HDF5 format, and I'm
completely new to it.
The data is in the form of packets, each of which has an associated
timestamp and class.
Therefore, it would seem appropriate to use the FL_PacketTable class

(99% of

the packets are fixed length, so this is my core use case).
The class of the packet indicates the packet contents, and each class
appears to map naturally to the HDF5 "Compound" data type, with a

struct for

each class of packet.
Note also that data is retrieved from a legacy file format that uses
individual bits to represent certain data.

So far, so good. I can produce an hdf5 file with the following code
(C++/win32/VisStudio2005); assume that the file object and the group

V3 are

defined.

//structured data - "compound" in the HDF5 terminology.
struct _my_type {
double t;//e.g. time.
int a;
float b;
};
CompType mtype1( sizeof(_my_type) );
mtype1.insertMember( "time", HOFFSET(_my_type, t),

PredType::NATIVE_DOUBLE);

mtype1.insertMember( "alt", HOFFSET(_my_type, a),

PredType::NATIVE_INT);

mtype1.insertMember( "math", HOFFSET(_my_type, b),

PredType::NATIVE_FLOAT);

FL_PacketTable pt(V3.getId(),"Packets",mtype1.getId(),500,6);
_my_type s1;
for (int i = 0; i< 400000; i++)
{
  s1.t = i/10.f;//monotonic time
  s1.a = i % 10;//sawtooth integer data
  s1.b = 100.f/(i+1);//math function
  pt.AppendPacket(&s1);
}

The resulting file is maximally self describing, in that when opened

with

hdfView, I see a packet table with columns headed time, alt, math, and

my

"packets" in the records below.

Now what I would like to do is achieve the same maximally self

describing

file for the amended compound type:

struct _my_type {
double t;//e.g. time.
int a;
float b;//so far, so easy....
//BUT, we would also like...
union {
     struct {
       unsigned char bit0 : 1;//ideally, should be able to

map each bit's

value
       unsigned char bit1 : 1;//to one of a pair of strings,

e.g. "VALVE_OPEN"

/ "VALVE_CLOSED"
       unsigned char bit2 : 1;//by using, perhaps, something

like the

ENUMERATION feature of
       unsigned char bit3 : 1;//HDF5.
       unsigned char bit4 : 1;
       unsigned char bit5 : 1;
       unsigned char bit6 : 1;
       unsigned char bit7 : 1;
     };
     //..and ideally would ALSO like to be able to retrieve the

entire field,

as below....
     unsigned char wholebyte;
};
};

If I now amend my code to do:

mtype1.insertMember( "wholebyte", HOFFSET(_my_type, wholebyte),
PredType::NATIVE_UCHAR);
s1.wholebyte = 0;

for (int i = 0; i< 400000; i++)
{
  s1.t = i/10.f;//monotonic time
  s1.a = i % 10;//sawtooth integer data
  s1.b = 100.f/(i+1);//math function
  s1.bit1 = ( (0 == (i % 20)) ? 1 : 0);//bit1 goes true every 20th

element

  s1.bit2 = ( (10 < (i % 20)) ? 1 : 0);//bit2 goes true about 1/2

the time

  s1.bit3 = ( (10 > (i % 30)) ? 1 : 0);//bit3 goes true about 1/3

the time

  pt.AppendPacket(&s1);
}

then I do indeed see "wholebyte" and its data as an extra column in

hdfview.

But end-users will certainly want to see individual bit values, rather

than

the entire byte.

So - and this is my problem - if I do this instead (i.e. I do not

insert

wholebyte):

//Create single bit transient types, then commit them to the dataset.
//Q: are these types modifying the original types, or are they

"copies" in

the H5Tcopy sense?
//Not yet clear without examining c++ library behaviour further.....
IntType mySingleBit1Type(PredType::STD_B8LE);
mySingleBit1Type.setPrecision(1);
mySingleBit1Type.setOffset(1);
mySingleBit1Type.commit(V3,"Bit1Type");

mtype1.insertMember( "bit1", HOFFSET(_my_type,wholebyte),

mySingleBit1Type);

Then I do NOT see "bit1" as a field in the packet table using hdfview

- that

is, the "self describing" aspect fails.

Worse, if I attempt to define and insert another bit type, as below:

IntType mySingleBit2Type(PredType::STD_B8LE);
mySingleBit2Type.setPrecision(1);
mySingleBit2Type.setOffset(2);
mySingleBit2Type.commit(V3,"Bit2Type");
mtype1.insertMember( "bit2", HOFFSET(_my_type,wholebyte),

mySingleBit2Type);

Then I get a "member overlaps with another member" exception from
H5Tcompound.c. This is not surprising, since the API only appears to

allow

BYTE offsets.

Now some obvious, but ugly workarounds exist. I could, for example,

store my

original bit data as bytes. But this would be very inefficient, in

terms of

storage, unless the magic of compression would reduce the problem

.....

I can't believe I'm the first person to encounter this issue, much

more

likely is that I'm still too stupid to understand how best to define

the bit

fields. Does anyone have any ideas? I'm aware that the above code may

not be

completely platform portable in theory due to the C specification not
specifying exactly where bits might be put within the machine word,

but this

isn't an issue in our case (at the moment!)
Thanks!

--
View this message in context:

http://hdf-forum.184993.n3.nabble.com/Possible-to-pack-bit-types-into-co
mpound-data-types-tp1131024p1131024.html

Sent from the hdf-forum mailing list archive at Nabble.com.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

This mail has originated outside your organization, either from an
external partner or the Global Internet.
Keep this in mind if you answer this message.

This e-mail and any attachment may contain confidential and/or privileged information. If you have received this e-mail and/or attachment in error, please notify the sender immediately and delete the e-mail and any attachment from your system. If you are not the intended recipient you must not copy, distribute, disclose or use the contents of the e-mail or any attachment.
All e-mail sent to or from this address may be accessed by someone other than the recipient for system management and security reasons or for other lawful purposes.
Airbus Operations Limited does not accept liability for any damage or loss which may be caused by software viruses.
Airbus Operations Limited is registered in England and Wales under company number 3468788. The company's registered office is at New Filton House, Filton, Bristol, BS99 7AR.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org