№ 09 · tool · Python 3.12 · 2025—

tokeney,
a tab-corner readout for an interactive bill.

A floating PySide6 HUD that reads Claude Code transcripts and session files off disk and reports tokens, cost, and context-window usage in real time. No network. No Anthropic API calls.

claude-code hud

tokeney is a Qt HUD that lives in the corner of the screen and answers, at a glance, "what does this conversation actually cost?" Every number it shows comes from JSONL transcripts and session JSON in ~/.claude/ --- nothing leaves the machine. The pricing table is a versioned JSON resource, refreshed by hand after Anthropic price changes; a weekly subagent watches Claude Code's on-disk schemas for drift and reports before tokeney undercounts.

ITech scope

Configuration and state live in %APPDATA%\tokenbar\ (Windows) or the platform-equivalent under QStandardPaths. Tests monkeypatch APPDATA to a tmp_path so the suite never touches real user state — including the tests that exercise the "first run" branch, which would otherwise create a stray config file in the developer’s home directory.
Hotkeys are Ctrl+Alt+G (toggle visibility), Ctrl+Alt+H (cycle metric scope: session / day / 7d / 30d / alltime), and Ctrl+Alt+D (toggle always-on-top). They are bound through the keyboard library, which runs on its own OS thread, with a Qt-signal bridge in HotkeyManager for thread safety — the keyboard thread never calls a Qt API directly, because Qt requires GUI calls to be on the main thread and a cross-thread call from a hotkey would crash the process intermittently.
File watches use Qt’s QFileSystemWatcher backed by ReadDirectoryChangesW on Windows, with a 10-second sweep timer as belt-and-suspenders. Windows occasionally drops change events under load (network drives, antivirus scans, atomic-rename publishes), and a slow polling timer turns "missed event → HUD freezes for an hour" into "missed event → HUD reconciles in ten seconds."
Parsers tolerate missing optional fields and convert any exception into log.debug — never an uncaught raise. One malformed transcript line should never blank the HUD; if Anthropic adds a new type value, tokeney’s ignore-unknown-types path keeps the HUD live while the operator updates.
Packaging is PyInstaller via tokenbar.spec, producing an unsigned dist\tokenbar.exe. The unsigned status is deliberate — this is a personal HUD, not a redistribution target, and code-signing certificates introduce a lifecycle the project does not need.

IIArchitecture

__main__.py  →  app.py::run()
                 │―― TranscriptIndex     # file watchers, tail-reads JSONL
                 │―― UsageAggregator     # in-memory rollup (session/day/7d/30d/alltime)
                 │―― WidgetWindow        # the floating HUD
                 │―― Tray                # system tray icon + menu
                 └― HotkeyManager       # global hotkeys

TranscriptIndex is the part that has to be right; the HUD is mostly composition over its output. The split is enforced by the test ratio — 442 LOC of tests against 1 443 LOC of source (~23%), almost all of it concentrated on the parser, which is exactly where a silent miscount is hardest to notice. The HUD itself is mostly visible behavior; if the layout is wrong, an operator notices in seconds.

IIIInput contract — brittle on purpose

tokeney parses two on-disk formats that Anthropic has not committed to keeping stable. The brittleness is acknowledged and managed, not avoided:

Transcript JSONL — ~/.claude/projects/<slug>/*.jsonl, one JSON object per line, append-only within a session. Only type == "assistant" rows are parsed for usage. Required fields: timestamp (ISO 8601 in UTC), message.model (the model id string), and the nested message.usage.* token counters — input_tokens, output_tokens, cache_read_input_tokens, and the cache-creation breakdown cache_creation.ephemeral_5m_input_tokens + ephemeral_1h_input_tokens. The 5m / 1h split matters because the two cache tiers price differently.
Session JSON — ~/.claude/sessions/*.json, one file per live session. Required fields: pid and sessionId. Optional but consumed: cwd (drives the project label in the tray) and startedAt (used to scope "session" totals correctly when a session is rejoined).

Neither format carries a schema version. If Anthropic renames or restructures a field, tokeney silently drops the record and the HUD undercounts — the failure is conservative (you see less) rather than alarming (the HUD blanks), but it is still a failure. A weekly drift report is run by the tokeney-schema-check subagent against the Claude Code changelog and the live JSONL on this machine; the report flags new fields, renamed fields, and removed fields, and its output is checked before tagging a release.

IVPricing table

src/tokenbar/resources/default_prices.json is hardcoded and not auto-refreshed. The file holds, per model, the dollar-per-million-tokens rate for input, output, cache_creation_5m, cache_creation_1h, and cache_read, plus the context_window in tokens. Stale pricing means a wrong displayed cost — not a wrong token count, which is computed independently — and the same drift subagent checks for Anthropic pricing changes alongside schema drift, so a price update lands as a single PR with a clear diff.

Fig. I.

no network calls; mtime polling for cross-platform parity · as of 2026-04-26

Fig. II.

as of 2026-04-26

VSurface

tokeney is a floating PySide6 window that reads transcript JSONL files out of ~/.claude/projects/<slug>/*.jsonl and reports four numbers in real time: tokens used in the live turn, accumulated cost, cache hit ratio, and remaining context budget. There is no API call — everything is derived from on-disk state. The read path is a tail-once-then-watch loop using QFileSystemWatcher backed by ReadDirectoryChangesW on Windows, plus a 10-second sweep timer as belt-and-suspenders against the dropped-event scenarios that show up under antivirus scans, network drives, and atomic-rename publishes. fsnotify-style coalesced notifications were rejected for cross-platform parity reasons.

VINumbers

Zero API calls is the load-bearing number on this page. The HUD has to keep up with a Claude Code session in real time without making network calls itself, both for the obvious cost reason and so it works against any project regardless of whether the operator has API connectivity at the moment. Four metrics is the right scope for a HUD; a fifth would lose the at-a-glance read and turn the widget into a dashboard. The test ratio (442 / 1 885 ≈ 23%) sits where it does because the parser is the part that has to be right — a silent miscount on cache-creation tokens would compound across a full work day before anyone noticed — while the HUD itself is mostly composition over the parser’s output and gets caught by visual inspection in seconds when wrong.