Today, we are going to go through the following code.
The very interesting thing about this code, is that when compiled and ran, it seems it is working just fine. But actually it is not working properly and there is a big problem with it 🙂 And by going through it we are going to learn a ton of things.
At Holberton, we have a strict coding style for each programming language we are using. Let’s start by applying our coding style for C.
main() should be written
main(void). Is that a huge mistake? No. But this forces us to be more structured, and always explicitly write everything.
– We don’t want to initialize our variables at the same time as we are declaring them (Exceptions can apply for arrays.)
– Also we want to group our variable declarations together.
– No empty lines within the code. Only one empty line between declaration and code.
So the code should look like this:
It’s cleaner, more professional, follows the style of the school (remember that in any company we have to follow the coding style of the company, so it’s important that we get into the habit of following strictly one particular style).
We are using the function
printf without including its prototype. So when we will compile, we will get a warning from the compiler, even without additional flags.
In order to know the prototype of a function, we can always look at its man page. In this case,
man 3 printf.
The man page gives us the prototype and what header to include. We can use either to indicate to the compiler what is the prototype of the function
printf (and make the warning go away). We don’t strictly have to include the header, we can simply include the prototype itself. Like so:
But it’s a good habit to include the header (which includes the prototype).
Now let’s see what is happening in the program and why it is working but not really. We will go step by step through the code and look at what happens in memory. Let’s start with the declarations:
At this point, this is what the virtual memory looks like (we are going to assume we are working on a 64-bit, Linux machine):
– The string literals are copied into the addresses of the arrays. The arrays have been automatically sized (the compiler can do that because it knows the size of the string literals to copy).
b are pointers so on a 64-bit machine they take 8 bytes in memory.
– The variable
aa, is an array of chars of size 14 bytes (14 chars, so 14 *
sizeof(char) = 14 bytes).
– At this point the variables
b have a value but we do not know what it is. The next two lines of code will initialize them.
After these two lines of code,
a points to the first letter of the array
aa (so it contains the address of the first letter of the array
aa, which is also the address of the array
b points to the first letter of the array
bb. This is what the virtual memory looks like:
So far so good. With the next lines of code we are going at the end of the “string” (remember there is no type string in C). This code is correct. So at the end of the while loop,
a points to the
\0 of the array
At the end of this
while loop, the virtual memory looks like this:
The next lines of code are the following:
The above loop copies the content of the array
b points to the first char contained in
bb) at the end of the array
aa (as the variable
a, at the beginning of the loop, points to the last char (
\0) of the array
aa). And that is both what we wanted the code to do, AND the problem 🙂
The content of
bb is copied, one char at a time, starting from the memory address 19 (in our example). But, our variable
aa ENDS at 19 too. That means that we are writing the content of
bb AFTER the variable
aa, not inside. After 12 iterations, the virtual memory (in our example) looks like this:
In red, we have written 11 bytes outside of the memory reserved for
aa, and will continue to do so via the loop for another 10 bytes. The problem of course, is that we are probably replacing the values of other variables, or writing in a memory address that we do not have write access (and will get a beautiful
Segmentation Fault). In this particular case, the program still runs “properly” and without warning (because we are unlucky), and as a result, we don’t realize that we are making a mistake.
In fact, in this example, we are actually “destroying” our array
bb. Let’s modify a bit the program in order to check that out:
It seems like we changed
bb by concatenating it to
bb is not 1 char “shorter”, it still takes the same size in memory, but its content has changed. It is happening, because in the actual virtual memory of our running process the two arrays are next to each other, like so:
So when we are concatenating
aa, we are doing this (concatenated letters in pink):
After this concatenation,
bb size doesn’t change, but now the content has changed, and it “seems” it was shifted to the left by 1 char. But that’s because the
- of the beginning is now part of
aa as the last letter in the reserved memory for
aa. Note that
bb now has two
\0, the one copied, and the initial one.
THE END 🙂 If you would like to learn more about the virtual memory, you can read these articles:
- Chapter 0: Hack The Virtual Memory: C strings & /proc
- Chapter 1: Hack The Virtual Memory: Python bytes
- Chapter 2: Hack The Virtual Memory: Drawing the VM diagram
- Chapter 3: Hack the Virtual Memory: malloc, the heap & the program break
To finish with, I would like to thank the author of this code, because thanks to them we learn a ton of things!
“Experience is simply the name we give our mistakes.” Oscar Wilde