№ 03 · product · TypeScript / Node 20 · 2025—

gapr,
SEO and GEO in one ledger.

A self-hostable, AGPL-licensed alternative to Ahrefs and SEMrush. First-class GEO --- Generative Engine Optimization --- scoring across the major LLMs.

seo geo agpl

gapr is built around the premise that ranking inside an LLM's generated answer matters as much as ranking on a Google SERP. Beyond traditional SEO (crawls, on-page audits, SERP tracking, backlinks, content briefs, keyword difficulty) it fans a tracked prompt out to ChatGPT, Claude, Perplexity, and Gemini, captures the answer + citations, and computes a transparent GEO Presence Score. Every score in the system exposes its factor breakdown via the Score.breakdown JSON column.

ITech scope

Three-process runtime, deployed independently. apps/api is a Fastify server with JWT auth and an auto-generated Swagger surface at /docs. apps/web is the Next.js 15 dashboard (App Router, React 19 RC, server components for the heavy data screens). apps/worker is the BullMQ consumer that owns the only network-heavy work in the system: Playwright sessions, LLM provider calls, and SerpAPI fetches.
The API never blocks on long work. HTTP routes enqueue jobs and return a job id; only apps/worker talks to the outside world. Concurrency is per-queue and tuned to the upstream rate limits — serp and rank at 8, crawl and audit at 2 (Playwright sessions are memory-heavy), and geo at 4 because the LLM providers’ per-account quotas saturate faster than CPU does.
Multi-tenant via Workspace → WorkspaceMember → Project. Every domain record (Keyword, Crawl, Audit, AiQuery, Backlink, ContentBrief, Report) hangs off a Project; row-level scoping lives in packages/db as a generic scoped(projectId) helper rather than as ad-hoc WHERE clauses, which keeps the boundary auditable in one place.
The generic Score model is the transparent-scoring ledger. Any new metric writes its factor breakdown into Score.breakdown as JSON, so a customer asking "why did my GEO score drop 12 points" gets a deterministic, structured answer rather than an opinion.
Raw HTML is content-addressed. Page.hash and SerpSnapshot.htmlHash are SHA-256 over the canonicalized response; the bytes themselves live in MinIO/S3, while Postgres only holds the references. A re-crawl that hits the same content costs nothing in object storage.
Postgres extensions are pinned and assumed: pgcrypto for hashes, pg_trgm for fuzzy keyword match, btree_gin for composite indexes on big tables, citext for case-insensitive domain comparisons. Migration tooling fails closed if an extension is missing, rather than silently fall back to a slower path.

IIGEO subsystem

Generative Engine Optimization is a first-class concern, not a bolt-on. The premise of gapr is that ranking inside an LLM-generated answer is now as load-bearing as ranking on a Google SERP; the schema reflects that:

AiQuery — a tracked prompt for a project. Carries the prompt text, the project context, and the cadence (one-shot, daily, weekly).
AiAnswer — one row per engine per fetch. The full raw response is retained (in object storage, referenced by hash) so a score can be re-derived months later if the rubric changes.
AiBrandMention — per-domain mention extracted from the answer. Each row carries kind = CITATION / PROSE / BOTH, a position rank, and a sentiment label, so a citation that praises and a citation that warns are not collapsed into the same number.

packages/geo and packages/llm route the prompt across OPENAI_CHATGPT, ANTHROPIC_CLAUDE, PERPLEXITY, and GOOGLE_GEMINI in parallel. The GEO Presence Score is citation share + prose share + position + engine coverage + sentiment; the weights are configurable per workspace, the inputs are surfaced verbatim in Score.breakdown, and the engine version ("claude-sonnet-4-6", etc.) is pinned per fetch so a score is reproducible against a specific model snapshot.

IIIWorkspace map

14 packages under apps/* and packages/*, pnpm 9 + Turborepo. The split is by responsibility, not by technology — everything that talks Postgres lives in packages/db, everything that talks an LLM provider lives in packages/llm, and the apps are thin compositions over those.

apps/web — Next.js 15 dashboard. Server components for the keyword and GEO grids, client components only for the live charts.
apps/api — Fastify HTTP API. Every write enqueues to BullMQ; reads hit Postgres + the object store directly.
apps/worker — BullMQ consumer of 8 queues. The only process that holds Playwright contexts and LLM provider clients.
packages/db — Prisma schema + generated client + the row-level scoped() helper.
packages/types — shared Zod schemas for HTTP payloads, BullMQ job payloads, and the central QUEUES registry. Adding a queue means editing one file, which is what stops the queue list from drifting.
packages/llm — provider router with circuit-breaker per provider; a 5xx storm at one upstream does not poison the others.
packages/crawler — Playwright crawler. Honors robots.txt, respects Crawl-Delay, exposes a polite-default per-host rate limit so an operator never accidentally hammers an upstream.
packages/serp — SERP fetcher with SerpAPI fallback, Google + Bing parsers, intent classifier (informational / navigational / commercial / transactional).
packages/geo — AI-engine fan-out, brand-mention detection, GEO Presence scoring math.
packages/analyzer + packages/entities + packages/scoring — on-page rules, entity extraction (people, products, organizations), and the writers that land scores in the Score ledger.

IVWhy AGPL self-host

The license choice is operational, not ideological. A self-hosted single-tenant index avoids the per-tenant boundary problems that hosted SaaS competitors keep running into — cross-tenant data leakage from a shared crawler, throttle-budget collisions between customers, and the audit-trail tax of storing every customer’s queries in one database. AGPL is the contract that keeps fork-and-host viable: an operator can run gapr on their own infra and modify it freely, but a hosted derivative must publish its modifications. That trade keeps the project oriented toward operators rather than toward a hosted offering it doesn’t plan to build.

Fig. I.

as of 2026-04-26

Fig. II.

monorepo: apps/ + packages/ · as of 2026-04-26

Fig. III.

apps/ + packages/ · as of 2026-04-26

VSurface

Self-hostable means an operator gets everything needed to bring a single-tenant index up in one repo: the Playwright crawler, the on-page extractor, the GEO scorer (which runs LLM-style queries across OPENAI_CHATGPT, ANTHROPIC_CLAUDE, PERPLEXITY, GOOGLE_GEMINI and scores presence in their generated answers), the index, the API, and the operator console. Single-tenant is a deliberate constraint, not a limitation. Per-tenant boundaries are a known source of cross-tenant data leakage and noisy-neighbor throttle collisions in hosted SEO/GEO platforms; the AGPL self-host alternative is to make the operator own their boundary, which is structurally simpler and auditably yours.

VIRoadmap

The shape ahead: more LLM targets pinned to specific snapshot dates (so a GEO score is reproducible against an exact model version, not a moving target like ‘the latest Claude’), expansion of the crawl schedulers to honor robots.txt + per-host Crawl-Delay caps without operator hand-holding, and a small import path for existing Ahrefs / SEMrush keyword exports so operators can backfill rather than re-crawl from scratch. None of this is shippable yet — the scoring rubric is the first thing that needs to be settled, because the rubric determines what every other piece in the pipeline is being asked to optimize for. Ship a wrong rubric and every score in the system carries the same wrong shape.