This is Ray from the HDF Group. I'm trying to solicit suggestions from you to improve our alignment detection algorithm.
Alignment restriction can be described in this manner: “Some computers allow data objects to reside in storage at any address regardless of the data’s type. Others impose alignment restrictions on certain data types, requiring that objects of those types occupy only certain addresses. It is not unusual for a byte-addressed computer, for example, to require that 32-bit integers be located on addresses that are a multiple of four. In this case, we say that the ‘alignment modulus’ of those integers is four.” (Harbison and Steele, C: A Reference Manual, 6.1.3 Alignment Restrictions)
The HDF5 library uses the datatype's alignment in its data conversion operation. Imagine a user has a memory data buffer of H5T_NATIVE_UCHAR. On most machines, the data alignment in memory is 1 for char, meaning no alignment restriction. If the user tries to convert (in-place conversion) the data to H5T_NATIVE_INT using H5Tconvert, Our library has to know whether the compiler requires us to align data of "int" in the memory buffer.
To show it in a pseudo code, a memory buffer contains 2 elements of unsigned char. The values are 1 and 2:
unsigned char buf[8] = {1, 2, 0, 0, 0, 0, 0, 0};
:
H5Tconvert(H5T_NATIVE_UCHAR, H5T_NATIVE_INT, 2, buf, NULL, H5P_DEFAULT);
After the converting to int type, the values of the two data elements become {0, 0, 0, 1, 0, 0, 0, 2} on a big-endian machine. When a compiler doesn't require alignment, the library casts the data directly and puts it in the buffer. But when a compiler requires alignment, the library has to memcpy the data to an aligned location and cast it, then memcpy it back to the user's original buffer. These extra steps can be expensive as every data element has to be treated in this way.
The HDF5 library has its own alignment detection algorithm. I attached an excerpt of this algorithm as a standalone C program. There is a problem in this algorithm: in Line 30, the casting of a pointer to an integer may cause undefined behavior for some compilers. For example, a user pointed out the CLANG compiler (Version 4.2) on Mac OS Darwin 12.5 failed the program with "Illegal Instruction" error message when the program is compiled with -fcatch-undefined-behavior flag. The C manual states: "C does give the programmer the ability to violate the alignment restrictions by casting pointers to different types." (Harbison and Steele, C: A Reference Manual, 6.1.3 Alignment Restrictions) As the HDF5 library is probing the memory alignment for a datatype, it's no surprise to violate the restriction. However, in order to improve the quality of our software, we want to find out if anybody in the community knows a better algorithm to detect an integer datatype's alignment in memory. It should NOT trigger compiler's undefined behavior.
We're NOT interested in the alignment of a datatype in a structure. It refers to the value expressed in COMP_ALIGN in the following pseudo code:
struct {
char c;
TYPE x;
} s;
COMP_ALIGN = (char*)(&(s.x)) - (char*)(&s);
On Linux, for "int" type, the value of COMP_ALIGN is 4. The C keyword __alignof__ returns the alignment of the type in a structure, not the alignment of the type in memory. Our library's algorithm of memory alignment finds that the alignment for "int" is 1, meaning no alignment restriction, on Linux.
We'll appreciate your comments and suggestions. Thanks in advance.
Ray
align.c (912 Bytes)