Tutorial – Buffer Overflows Part 1
ORIGINALLY POSTED BY NOKIA FOR THETAZZONE/TAZFORUM HERE
Do not use, republish, in whole or in part, without the consent of the Author. TheTAZZone policy is that Authors retain the rights to the work they submit and/or post…we do not sell, publish, transmit, or have the right to give permission for such…TheTAZZone merely retains the right to use, retain, and publish submitted work within it’s Network
Buffer Overflows – what they are and how they work.
This can be quite a complicated issue, so I will try to break it down into different parts and put it into everyday language.
I will assume that if you are reading this you understand a little programming (functions, integers etc)
To understand buffer overflows it helps to know a bit about how a program utilizes memory.
First it will help to understand what an EIP is:
[it is essential to understand an EIP to understand how a buffer overflow works]
Extended Instruction Pointer.
The processor has a very small chunk of memory itself, divided into what is called registers.
The most common register is the EIP; this tells the processor where to look in the system memory for the function (or piece of code) that it has to execute.
I.e. the code could be to print the word TAZZone on to your monitor and has been written to the memory at the address of 0x12345678. (Memory uses the hex numbering system).
The EIP would now tell the processor to go to 0x12345678 and do what ever the code is telling it to do, hence the word “TAZZone” will be printed on the screen.
There are five types of program memory, text, heap, stack, bss and data.
Each one of these is a special piece of memory reserved for a certain type of purpose.
I will cover text and stack for the purpose of this paper.
This is where the compiled machine language is stored. Write permissions are disabled here as it is used only to store the code, which is being executed.
When you compile a program, what you are doing is converting it from human readable form into a language the computer understands, it is the output of a compiled program that is stored in the text segment)
So for a very simplified example say you wanted to print the words Hello, goodbye, thank you, and Microsoft rules
(For ease I will use 1,2,3,4,5 etc for memory addresses instead of the correct addresses.
So hello is stored at 1, goodbye at 2, thank you at 3 and Microsoft rules at 4.
Here is what the processor will do
1. Get the address for the first function to complete from the EIP and go there
2. Add the number of bytes in the instruction to the EIP
3. Do what ever the piece of code is telling it to do, (print Hello.)
4. Go back to the EIP to get the next address.
The EIP will know when the instruction has been completed because in step 2 the processor told it how many bytes there was.
The stack memory is used as a tempory storage space for functions.
When a function (print) is called by a program it will have its own variables (hello,goodbye,thank you etc)
and the code will be at a different place in the text segment of memory.
(I.e. hello cannot be at the same memory address as goodbye otherwise they would over write each other.)
So the function is to print Hello (1) Goodbye (2) and Thank you (3)
The whole function will be read from the text segment and get passed to the stack segment.
The stack segment will remember the addresses (1,2,3) of each variable and pass this data to the EIP to tell it which memory address to return to when the function is finished.
There is a lot more to the stack segment but it’s not really relevant at this point!
Ok, so the programmer has specified that the word Hello with need 5 bytes of memory, but what happens when 7 characters try to write them selves to this piece of memory instead, the word goodbye for example:
|H|E|L|L|O| – No probs here
1 2 3 4 5
|G|O|O|D|B| —— |Y|E| – | – | – | – They overflow into memory held for something else
-1- 2- 3-4-5 —— -6- 7- 8- 9 – 0
5 bytes are allocated but the variable was 7 bytes long. Now it can’t just disappear, it has to be written somewhere so a buffer overflow occurs. If the data that was overwritten in 6 + 7 were a critical part of the program, the program would have crashed.
Here is a well know piece of code to cause a buffer overflow (its very well known and is in most books about the subject, so know one jump on my back for posting it, please)
- Code: Select all
Void overflow_function (char *str)
char buffer; // size of the buffer is 20 bytes
for(i=0; I < 128; i++0 // this makes it loop 128 times
big_string[i] = ‘A’; // Fill the big_string with AAAAA’s
There are a few subtle things left out here, (unlucky skiddies), it should be easy for someone to fix who has a basic knowledge of C.
This program should crash as a result of the overflow
How can this be utilized to take control of a program?
Refer to the sample code above, 128 bytes where wrote to a space 20 bytes big.
The remaining 108 bytes overflow overwriting amongst other things the return address for the EIP, so now the EIP aint got a scoobie doo where to go and the program can no longer carry on so it just stops.
BUT, what if the return address was overwrote with an address of your choosing????
An address that contained code to be run?????
The program wouldn’t crash because as far as the processor is concerned its just gone to the next part of the program, it doesn’t know what’s meant to come next, it just reads what’s there and does what it says
What if the return address that you have specified to the EIP, contains information that a user has entered into your program?
So your program could have been to get someone to type their name and then make it print out on the screen.
But the user doesn’t type his name; instead he has typed a small piece of shellcode!
Say, you know where this input is stored so you cause a buffer overflow in your code on purpose, one that give the return address for the EIP as the address the user input is stored (user name)
The EIP will now tell the processor to go here and execute the shellscript that is there.
Shellcode is bytecode that just spawns a shell.
Now say a SUID program with root privileges was made to spawn a shell, the shell that was spawned would have the same privileges as the program (#root!)
How do you find out what SUID files have root??
find / -type f –perm -04000 –ls – this will list all suid programs with root privileges on *nix systems.
The following code is taken from the book: Hacking, the art of exploitation
- Code: Select all
Int main(int argc, char *argv)
This code just miss-manages memory, but if the ownership was changed to root and was changed to a suid root program that is susseptable to buffer overflows (Bugtraq!!)
$ sudo chown root vuln
$ sudo chmod +s vuln
$ ls –l vuln
-rwsr-sr-x 1 root users etc
vuln is now a root suid program.
Now all you need is code to make a buffer containing the shellcode to be fed to the vuln program that will overwrite the return address to execute the shellcode.
Obviously the address of the shellcode must be known in advance – this is a tutorial in itself though.
I do have the code required to create a buffer to be fed to the vuln program to on some systems trick it into executing shellcode when it crashes, hence, spawning a #root shell for the user to do as he wishes
However I am not going to post it here, if you would like it PM me!
*I did take a few liberties with how code, functions memory addressing etc works just for the sake of keeping it simple.
There is a hell of a lot more to buffer overflows and loads of different types; I have covered “Stack Based Overflows” here.