xiphos,
a security engine with rules of its own.
A Rust security engine for OSS supply chains. SAST, SCA/SBOM, secrets, IaC, WASM plugins, SARIF 2.1 out, CWE/OWASP/CVSS on every finding.
Xiphos is a single-binary scanner that fans out to AST-based static analysis (tree-sitter), SCA + SPDX/CycloneDX SBOM, secret detection, IaC misconfig, and a Wasmtime-hosted plugin surface for community rules. Output is SARIF 2.1.0 with CWE / OWASP / CVSS metadata on every finding. As of writing, the workspace is an architectural skeleton --- the layering rules and RFCs exist before the crates do, on purpose.
ITech scope
- End-user binary
xiphos-clidrives every subcommand (scan,sbom,plugin,verify) and is the only crate in the workspace with abintarget. Everything else is a library, which keeps the dependency graph honest and makes "what runs in the user’s process" trivially inspectable. - Capability scanners ship as separate crates —
xiphos-sast,-sca,-secrets,-heuristic,-iac. Each is blind to the others; coordination only happens inxiphos-scanner, which means a bug in the secret detector cannot leak signals into the SAST result set or vice versa. - Three rule flavors, one host. Built-in Rust rules for performance-critical paths, declarative YAML in
rules/for the long tail community contributions, and WebAssembly components run byxiphos-pluginagainst thexiphos:plugin@0.1.0WIT world. The WIT world is versioned; a plugin compiled against an older world will be refused at load rather than crash mid-scan. - Plugin capabilities (
read-source,read-metadata,read-findings,emit-finding,decorate-finding,http-egress) are default-deny. The host enforces them per-call with explicit grants inxiphos.toml; a plugin that asks forhttp-egresswithout it being granted getsErr(Forbidden)at the WIT boundary, not silent egress. - Optional
xiphos-llmcan add remediation prose and downgrade severity with a written justification, but it cannot create new findings. The asymmetry is deliberate — an LLM mistake is allowed to be quieter than a static analyzer, never louder. - Tree-sitter grammars drive AST queries for SAST. Each language pack is its own dependency and the grammar version is pinned in
Cargo.lock, so a parser update can’t land without an explicit bump and a determinism rerun.
IIArchitecture
The pipeline runs in one direction. xiphos-cli builds a ScanRequest from flags + config; xiphos-scanner walks the source tree using the ignore crate for .gitignore / .xiphosignore semantics, classifies files by language and content sniff, and schedules AnalysisUnits onto a Rayon pool sized to min(num_cpus, --jobs). Capability scanners consume units and produce Vec<Finding> into a deduplicating FindingSet; the WASM plugin host runs after that and may add or decorate findings but never delete them; the LLM decorator runs last and may downgrade severity with justification; finally, xiphos-sarif emits canonical SARIF 2.1 with the CWE / OWASP / CVSS metadata attached.
Determinism is a hard contract. Same inputs → byte-identical canonical JSON, regardless of thread count, OS, or wall-clock time. Finding.id is a content hash over (rule_id, location_fingerprint, message_template) only — time, env vars, machine identifiers, and absolute paths are explicitly excluded by the schema. tests/determinism.rs runs every scanner twice and byte-diffs the output; CI rejects any change that breaks the diff. The canonical-JSON encoder sorts object keys, freezes float formatting to 17-digit shortest-round-trip, and rejects NaN / Infinity, so the contract holds across serde versions.
Layering is enforced, not aspirational. xiphos-core has zero I/O, zero async, zero workspace-internal deps, and zero std::time calls. cargo xtask check-layers walks Cargo.toml across the workspace and refuses a violation; CI runs it on every PR. Crates above core in the layering DAG can use I/O and async freely, which is what allows the scanner orchestrator and the plugin host to stay readable while the core remains a pure library.
IIIThreat model and constraints
- Scanned source is treated as hostile from the first byte. Per-file size cap (default 10 MiB, configurable), per-scanner timeout, and an out-of-memory guard that converts the OOM into a non-zero exit rather than a panic. A panic mid-scan would discard partial findings; a clean exit lets CI report what was scanned before the budget tripped.
xiphos.tomlis read only from the path given to--config, never auto-discovered from the scanned tree. A repo cannot tell xiphos to scan with a relaxed ruleset by checking in its own config.--outputis canonicalized before any write. Symlinks in the output path are disallowed; an attacker cannot use a symlinked output target to overwrite an arbitrary file in the operator’s home.- LLM passes redact and truncate freeform code excerpts before they leave the process. The default redactor strips strings longer than 256 chars and any line matching the secret-detector regexes, which makes prompt injection from a hostile source file harder to land.
- Severity is never set directly. A scanner emits a base CVSS score and a confidence value; the orchestrator computes
final_cvss = base_cvss ☆ environmental_modifiers ☆ confidence_modifier, where environmental modifiers come fromxiphos.toml(think "internet-facing service" vs "internal batch job"). A rule writer cannot mark every finding "critical" out of an abundance of caution; the math will downgrade it. unsafe_code = "deny"applies workspace-wide with one fenced exception: the Wasmtime glue, which needsunsafefor component-model calls. That exception is annotated, scoped to a single module, and audited on every Wasmtime upgrade.
IVLayering and crate map
The 12-crate workspace exists today as Cargo.toml declarations and three accepted RFCs — the architectural skeleton on top of which the implementation will be poured. Every crate has a written purpose statement and a layering position before any code lands, which is the only way to keep determinism and the layering rule honest once contributors arrive.
xiphos-core—Finding,Severity,Location, the canonical-JSON encoder, the content-hash function. No I/O, no async, no other workspace deps.xiphos-scanner— tree walker, file classifier, Rayon scheduler, dedupFindingSet. The only crate that talks to the filesystem during a scan.xiphos-sast/-sca/-secrets/-iac/-heuristic— capability scanners, each blind to the others.xiphos-plugin— Wasmtime-hosted component runtime againstxiphos:plugin@0.1.0, with per-call capability enforcement.xiphos-llm— optional decorator. Adds prose, may downgrade severity with justification, cannot emit new findings.xiphos-sarif— SARIF 2.1 emitter. Canonical-JSON, deterministic ordering, CWE / OWASP / CVSS embedded inproperties.xiphos-rules— declarative YAML rule loader; validates against a JSON Schema before the scanner sees a rule.xiphos-cli— the onlybincrate; argument parsing, config discovery, exit-code mapping.
VSurface
Today the surface is the workspace and the architecture document — the crates declared in Cargo.toml are intent, not yet implementation. The eventual end-user binary is xiphos-cli, the only crate that will carry a bin target. xiphos scan, xiphos sbom, and xiphos plugin compose into one binary; capability scanners (sast, sca, secrets, iac, heuristic) live in their own crates and only see the work the orchestrator hands them. Plugins are WebAssembly components against the xiphos:plugin@0.1.0 WIT world; capabilities are default-deny and per-call enforced inside the host, not in the scanner that called the plugin — which is what makes a “the plugin asked for HTTP egress” finding traceable to a specific grant rather than buried in a transitive dependency.
VINumbers
The numbers above are honest about the stage. 12 crates declared, 0 implemented — the workspace is the architectural skeleton on top of which the code will be poured, and the layering rules and RFCs exist before the crates do, on purpose. MSRV 1.76 is firm; cargo features that landed in 1.77+ are not available, which keeps the toolchain story compatible with corporate Rust toolchains that lag a few releases. unsafe_code = "deny" applies workspace-wide with one fenced exception in the Wasmtime glue, where the component-model FFI requires it. Determinism is the load-bearing contract: same inputs → byte-identical canonical SARIF, regardless of thread count, OS, or wall-clock time. Finding.id is a content hash over (rule_id, location_fingerprint, message_template) only; time, env vars, machine identifiers, and absolute paths are excluded by schema, and tests/determinism.rs byte-diffs every scanner’s output across two runs to keep that contract honest.