Thursday, June 17, 2010

Becoming a console programmer : Structure Alignment

This is an area that all programmers should be familiar with however i've met and worked with a fair few who never had to care... until they had to care.
consider:
struct foo1
{
char m_char;
};

struct foo2
{
int m_int;
char m_char;
};
struct foo3
{
long long m_longlong;
int m_int;
char m_char1;
char m_char2;
};

using default settings under x86 these three structs are 1, 8 and 16 bytes respectively. What is actually allocated is

struct foo2
{
int m_int; // offset 0 : size 4 bytes
char m_char; // offset 4 : size 1 byte
char m_pad[3]; // 3 bytes padding
// 8 bytes
};
struct foo3
{
long long m_longlong; // offset 0 : size 8 bytes
int m_int; // offset 8 : size 4 bytes
char m_char1; // offset 12 : size 1 byte
char m_char2; // offset 13 : size 1 byte
char m_pad[2]; // 2 bytes padding
// 16 bytes total
};

now lets take foo3 and jumble those members a little.
struct foo4
{
char m_char1;
long long m_longlong;
char m_char2;
int m_int;
};

One might expect foo4 to be the same size as foo3 however it is not, it comes out to 24 bytes, the padding is as follows.
struct foo4
{
char m_char1; // offset 0 : size 1 byte
char m_pad7[7]; // 7 bytes padding
long long m_longlong; // offset 8 : 8 bytes
char m_char2; // offset 16 : 1 byte
char m_pad[3]; // 3 bytes padding
int m_int; // offset 20 : 4 bytes
}; // 24 bytes total

each type contained within a struct must begin on a natively aligned boundary, so the 8 byte "long long" must start on an 8 byte offset within the struct but be after m_char1 so the compiler inserts 7 bytes of "padding" that the user cannot directly access to ensure this.

to inspect this for yourself use the code

printf("\n(%d,",offsetof(foo4, m_char1));
printf("%d,",offsetof(foo4, m_longlong));
printf("%d,",offsetof(foo4, m_char2));
printf("%d)",offsetof(foo4, m_int));
printf(": %d",sizeof(foo4));


output : "(0, 8, 16, 20): 24"

Manual Alignment
It is often beneficial to align the base of a structure beyond its native requirements. I won't go into the reasons for this here (thats a later subject) but it does change the internals of a struct when you do this.
Consider:
struct _CRT_ALIGN(16) foo_aligned4
{
char m_char1;
long long m_longlong;
char m_char2;
int m_int;
};

the
    _CRT_ALIGN(16)

informs the compiler that this structure should be 16 byte aligned (and therefore a multiple of 16 bytes in size), this is actually a wrapper around a declspec (see below for further reading)

In the case of foo4 this doesn't change the offsets of any of the internals but it does change the size of the struct

struct _CRT_ALIGN(16) foo_aligned4
{
char m_char1; // offset 0 : size 1 byte
// 7 bytes padding
long long m_longlong; // offset 8 : 8 bytes
char m_char2; // offset 16 : 1 byte
// 3 bytes padding
int m_int; // offset 20 : 4 bytes
8 bytes adding
}; // 32 bytes total


if we were to then include an instance of foo_aligned4 inside another struct, said struct will inherit the same aligment. In general a structure will always be aligned to the largest alignment of its components.

struct foo_composite
{
char m_char1; // offset 0: size 1 byte
// 15 bytes padding
foo_aligned4 m_foo_aligned4; // offset 16: size 32 byte
long long m_longlong; // offset 48: size 8 byte
char m_char2; // offset 56: size 1 byte
// 3 bytes padding
int m_int; // offset 60: size 4 byte
// total 64 bytes
};


Now given our composite structure of 64 bytes lets see how small we can make it without altering the client facing elements.
struct _CRT_ALIGN(16) foo_aligned4_opt
{
long long m_longlong; // offset 0: size 8 bytes
int m_int; // offset 8: size 4 bytes
char m_char1; // offset 12: size 1 byte
char m_char2; // offset 13: size 1 byte
char m_pad[2]; // offset 14: size 2 bytes
}; // 16 bytes total


struct foo_composite_opt
{
foo_aligned4_opt m_foo_aligned4;// offset 0: size 16 bytes
long long m_longlong; // offset 16: size 8 byte
int m_int; // offset 24: size 4 byte
char m_char1; // offset 28: size 1 byte
char m_char2; // offset 29: size 1 byte
char m_pad[2]; // offset 30: size 2 bytes
// 32 bytes total
};


Our new composite struct is 32 bytes rather than 64, every member still exists but in a different configuration.
You may notice that I manually inserted the padding into the struct, this isn't required but personally i prefer it in cases where i require specific alignment or optimal structure size.

So key points are

  • structures inherit the largest alignment of their elements
  • elements will always begin at an offset that is a multiple of their alignment
  • the compiler will insert padding into your structure to achieve alignment requirements
  • structure size will always be a multiple of its alignment
  • to reduce alignment padding, arrange structures largest to smallest in alignement requirements.
Note that normal allocators, such as malloc, C++ operator new, and the Win32 allocators return memory that will most likely not be sufficiently aligned for __declspec(align(#)) structures or arrays of structures. I recommend writing your own allocator either as a wrapper around these or as a region allocator to achieve alignment above default (4 or 8 bytes usually).


further reading -_CRL_ALIGN, __alignof,

Becoming a console programmer : LHS

Zak Whaley pointed me at this article on "Some Assembly required" discussing float-> int conversion and what occurs when its done following onto the LHS article within the same site which describes relatively well what "Load Hits Store" (LHS) is and how to avoid it using the restrict keyword.

Later when i discuss math libraries i will come back to this with respect to vector <=> float interchange and how to effectively remove it from transient use.

Wednesday, June 16, 2010

Becoming a console programmer

Recently on a games industry forum, a friend who teaches over in the Netherlands (not Denmark :p) was asking us (programmers) what we would ensure a PC programmer moving to Console Development should know, before considering themselves a "console programmer"

None of these particularly apply to console programming however i've found over the years that programmers who come form a pure PC background often are lacking in this area so i personally consider them important.

Programmers new to console should strive to achieve:
  • an awareness of memory consumption - PC programmers tend to be somewhat liberal with memory allocation and stack usage. On Consoles we're often faced with very limited memory resources that shrink with time (due to the game getting larger) rather than the opposite on PC where virtual memory allows incredibly large memory footprints.
  • knowledge of alignment - internal to structures and how it pertains to their size AND how not paying attention can massively increase cache misses
  • appreciation of the cost of type conversions both in terms of float <=> integer, float <=> simd and the assumed simpler integer to integer conversions thar can cost significantly more than is assumed.
  • full understanding of what a Load Hits Store (LHS) is, how it occurs, how to avoid it and how to spot it in everyday code.
  • full understanding of the implications of a cache miss, what it is, how to avoid them and how to fill the time around the unavoidable ones.
  • full understanding of what a branch is, how it affects the cpu, how to avoid using them and what patterns should be avoided to minimize them in general purpose code
  • understand what select functions are (fsel, vsel family) and how/when to use them.
SIMD is important however there are MANY companies that write a math lib once and never look back, i've been guilty of this myself however now i know i was wrong back then and no doubt in many ways what i've written most recently is still wrong.

writing it once implies that when you wrote the lib you knew everything there was to know then and nothing changed in the years afterwards.... yes many games will be fine with an "ok" math lib however a "good" math lib can change the performance of a game and an entire team.

re-write it as often as you can afford learning from the mistakes each time... so much is based upon it that this is one of the few systems that i'd recommend this behavior on.

so a final point
  • learn simd math in all its forms, learn the extra instructions that most don't use and what they provide, learn how to use #.INF to your advantage and most of all use instructions such as vsel / shuffle/ permute to their fullest advantage.
I will be posting expansions on the above points over the next few weeks depending on demand.