Lately, I was discussing with other developers at my company about the memory alignment of structure members and I thought why not make a short post about it. To be honest I think it’s an underrated topic nowadays with machines with basically infinite memory and billions of cycles per second. Regardless, even if it’s not important for many developers, it’s still good to know and important if you develop on devices with limited resources.
Data Models and Resulting Sizes of Fundamental Types
The C++ standard is at least guarantying 16bit width for short
, unsigned short
, int
, and unsigned int
. 32bit width is guaranteed for long
and unsigned long
and 64bit for long long
and unsigned long long
. Eventually, the resulting bit size of the fundamental types is defined by the implemented data model. Four data models are widely-used, LP32, ILP32 on 32bit machines, and LLP64, LP64 on 64bit machines.
Type | Standard | LP32 | ILP32 | LLP64 | LP64 |
---|---|---|---|---|---|
short/unsigned short | 16 | 16 | 16 | 16 | 16 |
int/unsigned int | 16 | 16 | 32 | 32 | 32 |
long/unsigned long | 32 | 32 | 32 | 32 | 64 |
long long/unsigned long long | 64 | 64 | 64 | 64 | 64 |
Let’s have a look at the bit size of some fundamental types of a specific implementation on a Ubuntu Linux with clang 7 compiler.
Size of char = sizeof(char) = 1 byte -> 8bit Size of short = sizeof(short) = 2 byte -> 16bit Size of int = sizeof(int) = 4 byte -> 32bit Size of long = sizeof(long) = 8 byte -> 64bit
Size of Structures
Let’s assume we have the following c++ structure.
struct S1 { char a; int b; char c; short d; };
Then we would apparently assume the size of the structure in memory will be 8 byte. Let’s check this:
Size of S1 = 1 + 4 + 1 + 2 = 8 byte, real size is sizeof(S1) = 12 byte
That’s interesting. Why is the size of S1
12 byte instead of the expected 8 byte? Tho find an answer we have to dig a little bit deeper into how memory is managed by processors. Roughly explained a processor is capable of transferring 1 word (can be 4 bytes on 32bit and 8 bytes on 64bit machines) to and from memory in one cycle, and an element of fundamental type can be stored at multiples of its byte size in memory (depending on the used compiler). Let’s call it 1/2/4 Rule…
- Types with a size of 1 Byte can be stored at multiple of 1 Byte
- Types with a size of 2 Byte can be stored at multiple of 2 Byte
- Types with a size of 4 Byte can be stored at multiple of 4 Byte
- and so on
As a result, the memory alignment of our structure S1
would probably look like the following in memory with memory address from 0
to f
and the resulting size of 12 bytes with 4 bytes of empty memory. This is called padding. The print out of the real memory address (first byte of the allocated memory of an element) is confirming it.
Memory Adress: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f |
Allocated Memory: | a | – | – | – | b | b | b | b | c | – | d | d |
Address a: 0x7ffe2b97da30 Address b: 0x7ffe2b97da34 Address c: 0x7ffe2b97da38 Address d: 0x7ffe2b97da3a
Now it might strike you that it would probably possible to save memory by simply rearranging structure members of S1
and leveraging how the compiler is aligning elements in memory.
struct S2 { char a; char c; short d; int b; };
Size of S2 = 1 + 1 + 2 + 4 = 8 byte, real size is sizeof(S2) = 8 byte
Address a: 0x7fff3ae69ab8 Address c: 0x7fff3ae69ab9 Address d: 0x7fff3ae69aba Address b: 0x7fff3ae69abc
Memory Adress: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f |
Allocated Memory: | a | c | d | d | b | b | b | b |
This is much better now. But be aware, it’s not recommended to just reorder structure elements in ascending order (smallest to the largest element). It is always necessary to keep in mind what we called earlier the 1/2/4 rule. What is the size of S3
? Does it have the size of 5 bytes?
struct S3 { char a; int b; };
Size of S3 = 1 + 4 = 5 byte, real size is sizeof(S3) = 8 byte
Again, keep in mind the 1/2/4 rule. Because of this, the memory alignment of the elements look now like this:
Address a: 0x7ffe2e8dc7a8 Address b: 0x7ffe2e8dc7ac
Memory Adress: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f |
Allocated Memory: | a | – | b | b | b | b |
The full source code of this example:
tl;dr
Nowadays micro-optimizations like the padding/memory alignment of structures might be irrelevant to many fields of software engineering. But still, I think it is an important topic in cases of limited resources and should be always known by every C++ developer in general. It’s good to know your memory.
Did you like the post?
What are your thoughts? Did you like the post?
Feel free to comment and share this post.
The information is somewhat correct (and probably was correct on older processors),but number of cycles and assuming that the word size is the automic memory fetch is WRONG on modern processors. The processors cycle faster than the memory fetch time, so it can take several cycles for a memory fetch to become available. And many fetch an entire cache line which may be multiple words.
LikeLike
Thanks Ron for clarifying that
LikeLike