C structs in public interface

Hello HDF5 developers!

HDF5 library exposes a number of functions which take parameters in form of C struct (H5E_error2, H5G_info_t, H5FD_class_t etc). It means that C structs are efficiently a part of binary interface. However, such interface causes some problems:

1) C struct layout is not guaranteed to be binary compatible among C compilers. Even more, the same compiler may produce binary incompatible struct layouts depending on compiler flags.

2) Languages other than C, although often having a notion of tuple/record/struct, rarely have the exact, binary-compatible equivalent of C struct (given 1), it's clear why). For example, both Delphi "record" and Delphi "packed record" are not guaranteed to be binary compatible with public structs from official HDF5 library release. It makes using advanced HDF5 features from non-C languages harder.

It would be of great help if public HDF5 C structs were of some documented, easy-to-support format, most preferably without field alignment (Delphi "packed record"). Or maybe there is at least some compiler flag to force no alignment for *public* structs (forcing no alignment for all structs might be bad for performance, I presume)?

Best wishes,
Andrey Paramonov

···

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Privet, Andrey!

I'm wondering on this occasion, if using structs in an C API is an issue, how do system libraries do it, for instance int fstat(int fd, struct stat *buf); where "struct stat" is a header-defined C struct as well. The same mechanism should be applicable to HDF5 as well, if there are problems?

Poka,

Werner

···

On 05.05.2014 13:35, Андрей Парамонов wrote:

Hello HDF5 developers!

HDF5 library exposes a number of functions which take parameters in form of C struct (H5E_error2, H5G_info_t, H5FD_class_t etc). It means that C structs are efficiently a part of binary interface. However, such interface causes some problems:

1) C struct layout is not guaranteed to be binary compatible among C compilers. Even more, the same compiler may produce binary incompatible struct layouts depending on compiler flags.

2) Languages other than C, although often having a notion of tuple/record/struct, rarely have the exact, binary-compatible equivalent of C struct (given 1), it's clear why). For example, both Delphi "record" and Delphi "packed record" are not guaranteed to be binary compatible with public structs from official HDF5 library release. It makes using advanced HDF5 features from non-C languages harder.

It would be of great help if public HDF5 C structs were of some documented, easy-to-support format, most preferably without field alignment (Delphi "packed record"). Or maybe there is at least some compiler flag to force no alignment for *public* structs (forcing no alignment for all structs might be bad for performance, I presume)?

Best wishes,
Andrey Paramonov

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Center for Computation & Technology at Louisiana State University (CCT/LSU)
2019 Digital Media Center, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

05.05.2014 15:57, Werner Benger пишет:

Privet, Andrey!

I'm wondering on this occasion, if using structs in an C API is an
issue, how do system libraries do it, for instance int fstat(int fd,
struct stat *buf); where "struct stat" is a header-defined C struct as
well. The same mechanism should be applicable to HDF5 as well, if there
are problems?

As far as I know, there is no universal solution. For example, some time ago I tried to use UContext interface as described in http://pubs.opengroup.org/onlinepubs/007908799/xsh/ucontext.h.html from FreePascal. It turned out that actual binary C structs are very different in Linux and FreeBSD (and I think they are different on different Linuxes), let alone compiler alignment issues. I had to give up.

On Windows, all WinAPI public structs are declared as "packed", i.e. they do not have alignment.

Here are my considerations from user perspective:

1) C struct layout is a part of binary interface.

2) Binary interface should be clearly and completely defined by specification.

3) Currently C record layout is not completely defined. Thus, binary specification cannot be considered complete.

4) From all possibilities, non-aligned records are most easy to use.

Best wishes,
Andrey Paramonov

···

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

05.05.2014 16:52, Elena Pourmal пишет:

Privet, Werner and Andrey!

Cannot comment on the topic except to find out what did Andrey mean
under documenting HDF5 C structures… But was impressed with your Russian :slight_smile:

I'll try to clarify, with an example.

Consider public function H5Gget_info, which takes 2 arguments:

H5_DLL herr_t H5Gget_info(hid_t loc_id, H5G_info_t *ginfo);

H5G_info_t is described as

/* Information struct for group (for H5Gget_info/H5Gget_info_by_name/H5Gget_info_by_idx) */
typedef struct H5G_info_t {
     H5G_storage_type_t storage_type; /* Type of storage for links in group */
     hsize_t nlinks; /* Number of links in group */
     int64_t max_corder; /* Current max. creation order value for group */
     hbool_t mounted; /* Whether group has a file mounted on it */
} H5G_info_t;

So, ginfo is a pointer to H5G_info_t structure somewhere in the heap or stack. Please note that H5G_info_t is not a primitive type. I'm not calling H5Gget_info from C program, so I need to know explicitly how exactly H5G_info_t members are laid out in memory, to access them. Might seem straightforward that the layout is (on my w32 machine):

  Offset Field
  0 byte: storage_type
  4 byte: nlinks
12 byte: max_corder
20 byte: mounted

However, I discovered that the actual binary layout is:

  Offset Field
  0 byte: storage_type
  8 byte: nlinks
16 byte: max_corder
24 byte: mounted

So, if I would try to access nlinks at offset 4, I would get incorrect value.

The reason for layout difference is so-called data structure alignment (http://en.wikipedia.org/wiki/Data_structure_alignment). It is specific to compiler and even compiler flags. Alignment settings are neither specified in the HDF5 documentation explicitly nor are possible to guess during the run-time.

When HDF5 library is linked statically, the same compiler settings are used for structs, so they end up binary compatible. Thus, for many C users it's not a big problem. However, when calling from non-C code, the same field alignment cannot be applied automatically.

Although, applying *any* alignment greatly complicates the usage of HDF5 functions from non-C code. If no alignment is applied, .h struct declarations are possible to parse and convert to native code constructs. If some alignment algorithm comes into play, it additional non-trivial step.

The above considerations are significant for public structs only. For private structs, it's no problem in using alignment, for performance or whatever reasons.

Best wishes,
Andrey Paramonov

···

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

It would seem that alignment attributes would help in case this is implemented in a future version of HDF5, like in GCC

http://gcc.gnu.org/onlinedocs/gcc/Variable-Attributes.html#Variable-Attributes

or C++11: http://en.cppreference.com/w/cpp/language/alignas

But it's probably not possible with every compiler to do.

          Werner

···

On 05.05.2014 16:05, Андрей Парамонов wrote:

05.05.2014 16:52, Elena Pourmal пишет:

Privet, Werner and Andrey!

Cannot comment on the topic except to find out what did Andrey mean
under documenting HDF5 C structures… But was impressed with your Russian :slight_smile:

I'll try to clarify, with an example.

Consider public function H5Gget_info, which takes 2 arguments:

H5_DLL herr_t H5Gget_info(hid_t loc_id, H5G_info_t *ginfo);

H5G_info_t is described as

/* Information struct for group (for H5Gget_info/H5Gget_info_by_name/H5Gget_info_by_idx) */
typedef struct H5G_info_t {
    H5G_storage_type_t storage_type; /* Type of storage for links in group */
    hsize_t nlinks; /* Number of links in group */
    int64_t max_corder; /* Current max. creation order value for group */
    hbool_t mounted; /* Whether group has a file mounted on it */
} H5G_info_t;

So, ginfo is a pointer to H5G_info_t structure somewhere in the heap or stack. Please note that H5G_info_t is not a primitive type. I'm not calling H5Gget_info from C program, so I need to know explicitly how exactly H5G_info_t members are laid out in memory, to access them. Might seem straightforward that the layout is (on my w32 machine):

Offset Field
0 byte: storage_type
4 byte: nlinks
12 byte: max_corder
20 byte: mounted

However, I discovered that the actual binary layout is:

Offset Field
0 byte: storage_type
8 byte: nlinks
16 byte: max_corder
24 byte: mounted

So, if I would try to access nlinks at offset 4, I would get incorrect value.

The reason for layout difference is so-called data structure alignment (http://en.wikipedia.org/wiki/Data_structure_alignment). It is specific to compiler and even compiler flags. Alignment settings are neither specified in the HDF5 documentation explicitly nor are possible to guess during the run-time.

When HDF5 library is linked statically, the same compiler settings are used for structs, so they end up binary compatible. Thus, for many C users it's not a big problem. However, when calling from non-C code, the same field alignment cannot be applied automatically.

Although, applying *any* alignment greatly complicates the usage of HDF5 functions from non-C code. If no alignment is applied, .h struct declarations are possible to parse and convert to native code constructs. If some alignment algorithm comes into play, it additional non-trivial step.

The above considerations are significant for public structs only. For private structs, it's no problem in using alignment, for performance or whatever reasons.

Best wishes,
Andrey Paramonov

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Center for Computation & Technology at Louisiana State University (CCT/LSU)
2019 Digital Media Center, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

When HDF5 library is linked statically, the same compiler settings are

used for structs, so they end up binary compatible. Thus, for many C users
it's not a big problem. However, when calling from non-C code, the same
field alignment cannot be applied automatically.

Statically linked or dynamically, it doesn't matter. If the library is
compiled as an rpm, for instance, and then you compile your application
with PGI or Intel compiler, you may have the same issue.

Matthieu

I don't think it's an alignment problem. I think this happens
because H5G_storage_type_t is an enum and the size of an enum in C is not
fixed (although is guaranteed to be big enough to contain an int).

See e.g.

my 2c,
Andrea

···

On 6 May 2014 00:05, Андрей Парамонов <paramon@acdlabs.ru> wrote:

Consider public function H5Gget_info, which takes 2 arguments:

H5_DLL herr_t H5Gget_info(hid_t loc_id, H5G_info_t *ginfo);

H5G_info_t is described as

/* Information struct for group (for H5Gget_info/H5Gget_info_by_name/H5Gget_info_by_idx)
*/
typedef struct H5G_info_t {
    H5G_storage_type_t storage_type; /* Type of storage for links in
group */
    hsize_t nlinks; /* Number of links in group */
    int64_t max_corder; /* Current max. creation order
value for group */
    hbool_t mounted; /* Whether group has a file
mounted on it */
} H5G_info_t;

So, ginfo is a pointer to H5G_info_t structure somewhere in the heap or
stack. Please note that H5G_info_t is not a primitive type. I'm not calling
H5Gget_info from C program, so I need to know explicitly how exactly
H5G_info_t members are laid out in memory, to access them. Might seem
straightforward that the layout is (on my w32 machine):

Offset Field
0 byte: storage_type
4 byte: nlinks
12 byte: max_corder
20 byte: mounted

However, I discovered that the actual binary layout is:

Offset Field
0 byte: storage_type
8 byte: nlinks
16 byte: max_corder
24 byte: mounted

So, if I would try to access nlinks at offset 4, I would get incorrect
value.

--
Andrea Bedini <andrea.bedini@gmail.com>

05.05.2014 19:52, Matthieu Brucher пишет:

> When HDF5 library is linked statically, the same compiler settings
are used for structs, so they end up binary compatible. Thus, for many C
users it's not a big problem. However, when calling from non-C code, the
same field alignment cannot be applied automatically.

Statically linked or dynamically, it doesn't matter. If the library is
compiled as an rpm, for instance, and then you compile your application
with PGI or Intel compiler, you may have the same issue.

By static compilation I was meaning the setup where HDF5 code is compiled into client application. Sorry if I wasn't clear.

As you correctly stated (and as I mentioned earlier) C users may also be in trouble in case they use different compiler or different compiler flags.

Best wishes,
Andrey Paramonov

···

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

05.05.2014 18:41, Werner Benger пишет:

It would seem that alignment attributes would help in case this is
implemented in a future version of HDF5, like in GCC

http://gcc.gnu.org/onlinedocs/gcc/Variable-Attributes.html#Variable-Attributes

or C++11: http://en.cppreference.com/w/cpp/language/alignas

But it's probably not possible with every compiler to do.

There is also something called "pragma pack":

#pragma pack(push, 1)
struct Foo
{
     // ...
};
#pragma pack(pop)

Not sure how widely it is supported, but both GCC and Microsoft compilers are aware of it.

Best wishes,
Andrey Paramonov

···

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

06.05.2014 3:22, Andrea Bedini пишет:

  I don't think it's an alignment problem. I think this happens
because H5G_storage_type_t is an enum and the size of an enum in C is
not fixed (although is guaranteed to be big enough to contain an int).

See e.g.
http://stackoverflow.com/questions/366017/what-is-the-size-of-an-enum-in-c

Thank you for your comment!

Could someone with working C compiler at hand please check the sizes of H5G_storage_type_t and H5G_info_t, and also check whether using alignas(1) changes anything?

Best wishes,
Andrey Paramonov

···

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

That pragma will not deal with the issue of variable type sizes (enum, int, long, etc).
If you want structures to be compatible across compilers, you must also use fixed size types only (int64_t & friends).
Of course, such a change would definitely break ABI compatibility with older versions of the library,
so it would require an .so-name change.

Cheers,
Nathanael Hübbe

···

On 05/06/2014 07:41 AM, Андрей Парамонов wrote:

There is also something called "pragma pack":

#pragma pack(push, 1)
struct Foo
{
    // ...
};
#pragma pack(pop)

Andrey,

it turns out you were correct. I run a simple test

#include <hdf5.h>
#include <stddef.h>

int main(int argc, char const *argv[])
{
printf("sizeof(H5G_storage_type_t) = %ld\n", sizeof(H5G_storage_type_t));
printf("sizeof(H5G_info_t) = %ld\n", sizeof(H5G_info_t));

printf("offsetof(H5G_info_t, storage_type) = %ld\n", offsetof(H5G_info_t,
storage_type));
printf("offsetof(H5G_info_t, nlinks) = %ld\n", offsetof(H5G_info_t,
nlinks));
printf("offsetof(H5G_info_t, max_corder) = %ld\n", offsetof(H5G_info_t,
max_corder));
printf("offsetof(H5G_info_t, mounted) = %ld\n", offsetof(H5G_info_t,
mounted));
return 0;
}

and obtained the following output:

sizeof(H5G_storage_type_t) = 4
sizeof(H5G_info_t) = 32
offsetof(H5G_info_t, storage_type) = 0
offsetof(H5G_info_t, nlinks) = 8
offsetof(H5G_info_t, max_corder) = 16
offsetof(H5G_info_t, mounted) = 24

It's an alignment problem.

Andrea

···

On 6 May 2014 14:57, Андрей Парамонов <paramon@acdlabs.ru> wrote:

06.05.2014 3:22, Andrea Bedini пишет:

   I don't think it's an alignment problem. I think this happens

because H5G_storage_type_t is an enum and the size of an enum in C is
not fixed (although is guaranteed to be big enough to contain an int).

See e.g.
http://stackoverflow.com/questions/366017/what-is-the-
size-of-an-enum-in-c

Thank you for your comment!

Could someone with working C compiler at hand please check the sizes of
H5G_storage_type_t and H5G_info_t, and also check whether using alignas(1)
changes anything?

Best wishes,
Andrey Paramonov

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-
forum_lists.hdfgroup.org

--
Andrea Bedini <andrea.bedini@gmail.com>

06.05.2014 12:23, huebbe пишет:

That pragma will not deal with the issue of variable type sizes (enum, int, long, etc).

This is a different issue which shouldn't bother us right now.
I propose we keep focused on the alignment issue.

Of course, such a change would definitely break ABI compatibility with older versions of the library,
so it would require an .so-name change.

Indeed, the ABI would change. But in fact, the ABI was not completely defined so far. I'm not sure such ABI refinement qualifies as "break".

Best wishes,
Andrey Paramonov

···

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.