Swazee mark Product (type slug) Tool (chevron-wrench) Experiment (4-point star) Active (filled diamond) Shipped (hollow diamond) Shelved (diamond + slash) External link (↗) Search (angular magnifier) Filter (funnel) Close / Esc (chunky X) Move down (j) Move up (k) Return / Enter
SWAZEENET
VOL. I · NO. 01 · EST. MCMLXXXVII
BROADSIDE 12 active · 2026·05·14
№ 08 · tool · Python 3.12 · 2025—

flux,
one GPU, one queue, one frame at a time.

A local FLUX.1 image-generation FastAPI worker on127.0.0.1:8421. Single-GPU job queue, SQLite gallery, and a small surface that MasterAgent proxies through.

image-gen fastapi

flux owns the GPU. Generation jobs are serialized through one worker thread because 16 GB of VRAM has headroom for one FLUX pass at a time, and parallelism here would be a regression. The MCP surface that other projects on this machine call (flux_generate, flux_edit, flux_fill, flux_variation, flux_structural, flux_search_gallery) routes through MasterAgent on :8420 and into this loopback service. Every result lands in a SQLite gallery with prompt, model, params, and thumbnail.

ITech scope

IILayout

worker/
―― main.py         # FastAPI entry, lifespan, /healthz, route registration
―― jobs.py         # single-thread job queue + cancellation
―― pipeline.py     # diffusers pipeline loaders, keyed by MODEL_REGISTRY[kind]
―― compose.py      # prompt assembly + audit (used by /compose/* routes)
―― gallery.py      # SQLite persistence
―― progress.py     # per-step progress callbacks
―― schemas.py      # Pydantic request models for every route
―― config.py       # paths, ports, MODEL_REGISTRY
―― deploy/         # register-service.ps1
prompts/<pack>/   # composer packs
data/             # runtime: gallery.sqlite, images/, uploads/, hf-cache/

The split is functional, not nominal. main.py is the FastAPI surface and the lifespan owner; jobs.py is the only file that holds queue state; pipeline.py is the only file that imports diffusers loaders. That layering is what lets the worker swap a model without touching the queue or the API surface, and it is what lets the test harness import compose.py without dragging the whole CUDA stack into a unit test.

IIIJob lifecycle

From the caller’s side, a generation is a request and a job id. From the worker’s side, the lifecycle is fixed: compose assembles and audits the prompt; schemas validates the request payload through Pydantic; queue appends the job to the single-worker thread (queue depth is observable, but the consumer is one); pipeline loads or reuses the diffusers pipeline keyed by MODEL_REGISTRY[kind]; callback runs the per-step progress + cancellation hook; gallery writes the result row to gallery.sqlite with prompt, model, seed, params, and a thumbnail; MCP return hands the job id and result path back to the caller.

Sixteen GB of VRAM has headroom for one FLUX pass at a time. A second concurrent generation regresses both: VRAM thrash, lower throughput, and a higher chance of an OOM mid-step. One worker thread is the consequence of the hardware budget, not a workaround for a concurrency bug, and it is also why a new pipeline kind has to wire the cancellation callback — the queue depth is the only thing protecting an operator from a five-minute uncancellable generation when they meant to abort.

IVWhy local

The image work in every other project on this site — including meshgen’s featured images and Pinterest pins — runs through this worker. The headline difference is zero API tokens for the pixel data: at the volumes meshgen needs, even a small per-image fee compounds. But the cost story is only part of it. The structural reason is that the prompt pipeline, the seeds, the audit rules, and the gallery all stay on the same disk as the projects that consume them — so a regenerated image six months from now lands the same composition because the prompt + seed + model + LoRA tuple is committed in the consumer’s repo, not in a hosted vendor’s database.

The HuggingFace cache lives at $DATA_ROOT/hf-cache and is treated as the only durable model store. Pinning HF_HOME there means a model upgrade is a deliberate huggingface-cli download followed by a registry edit, not an opaque background refresh; reproducibility starts at the model snapshot.

Fig. I.
01compose 02schemas 03queue 04pipeline 05callback 06gallery 07MCP return
as of 2026-04-26
Fig. II.
main.py 350 23% compose.py 283 18% pipeline.py 254 16% gallery.py · 238 jobs.py · 213 progress.py · 78 schemas.py · 76 config.py
real LOC across 8 modules · as of 2026-04-26
Fig. III.
text2img10048% kontext4521% fill3014% control2010% redux157%
as of 2026-04-26

VSurface

The surface presented to other projects on this machine is the MCP layer: flux_generate, flux_edit, flux_fill, flux_variation, flux_structural, plus flux_search_gallery for browsing prior runs and the job-lifecycle pair flux_get_job / flux_cancel_job. Calls route through MasterAgent on :8420 into the loopback FastAPI on :8421; the FastAPI bind is 127.0.0.1, never an external interface, so the GPU is unreachable from the network even by accident. From the caller’s side it is a request and a job id; from the worker’s side it is a serialized queue of one.

VINumbers

Five pipeline kinds is a hard count, not a snapshot. Adding a sixth means a new MODEL_REGISTRY entry, a new Pydantic schema, and wiring the per-step cancellation callback — in that order, because skipping the callback wiring leaves cancellation silently broken (a cancelled job runs to completion and bills the GPU for it). Sixteen GB of VRAM has headroom for one FLUX pass at a time; a second concurrent generation regresses both throughput and VRAM stability and risks an OOM mid-step. One worker thread is the consequence of the hardware budget, not a concurrency workaround. The prompt composer caps at 2 000 chars after assembly and runs a negation-aware forbidden-pattern audit before queuing — “no people” in the variable block actually has to be respected, not silently overridden by a “people” token in the fixed block.

:/ ESC