TL;DR: Follow a single prompt from keystroke to LLM response through Crush's entire pipeline: Bubble Tea event loop, coordinator queuing, Fantasy streaming, permission gates, loop detection, and automatic summarization. This 10-step trace reveals exactly where latency hides, how tools execute mid-stream, and why the TUI never blocks.

Crush: Complete Prompt Execution Walkthrough

Status: Complete

A detailed trace of what happens from the moment a user types a prompt in Crush to when the response is finalized and the system returns to idle.

Step 1: The TUI Captures Your Input

When you type a prompt into Crush, you're interacting with the Bubble Tea v2 TUI framework. The root component is the UI struct in internal/ui/model/ui.go, which follows the Elm architecture (Model-Update-View). Your keystrokes land in the textarea.Model component — a text input widget. The UI tracks a focus state (uiFocusState) that determines whether keystrokes go to the editor or elsewhere. When you're in the Chat state (uiChat) and the editor has focus (uiFocusEditor), your typing goes into that textarea. When you hit Enter (or the submit key), the UI packages your text — along with any attachments you've added — and sends it as a Bubble Tea command/message into the update loop, which routes it to the application layer.

TUI Rendering Model

Bubble Tea does not use ncurses or any curses-like library. There's no termcap/terminfo database lookup, no virtual screen buffer with cell-by-cell addressing. Instead, it's entirely ANSI escape sequence-based. Every time the UI needs to update, the View() method on the root model is called, and it returns a single string — the entire screen represented as styled text. Lipgloss v2 (charm.land/lipgloss/v2) handles styling by embedding ANSI escape codes directly into those strings (colors, bold, borders, padding — all inline). Bubble Tea then diffs the new string against the previous frame and writes only the changed lines to the terminal.

This is fundamentally different from curses, where you imperatively say "move cursor to row 5, col 10, write character 'X' with attribute BOLD." In Bubble Tea, you declaratively say "here's what the whole screen should look like now" and the framework figures out the minimal write.

Aspect	Curses (ncurses)	Bubble Tea
Paradigm	Imperative (move cursor, write char)	Declarative (return full screen string)
Screen buffer	Virtual 2D cell grid in memory	No grid — just strings with ANSI codes
Diffing	Character-level cell comparison	Line-level string diffing
Terminal compat	termcap/terminfo databases	Assumes modern ANSI support
Styling	Attribute flags per cell	Lipgloss inline ANSI in strings
Render trigger	Explicit `refresh()` calls	Automatic on model change via `Msg`
Framerate	Manual (you call refresh)	Event-driven (Msg → Update → View)
Layout	Manual row/col math	Lipgloss flexbox-like composition

Framerate: Event-Driven, Not Steady

Bubble Tea does not maintain a steady framerate like a game loop. It's purely event-driven. A re-render only happens when a Msg flows through the Update() function and produces a changed model. If the user is idle and nothing is happening, zero renders occur. For things that need periodic updates — like spinner animations in internal/ui/anim/ or streaming LLM responses — Bubble Tea uses tick commands (tea.Tick) that schedule a Msg to arrive after a duration (e.g., every 100ms). Animations create their own pseudo-framerate by self-scheduling, but it's still message-driven under the hood.

"Modern ANSI Support"

Lipgloss and Bubble Tea assume the terminal supports:

Basic ANSI (1978+, VT100/ECMA-48) — cursor movement, clear screen, 8/16 colors, bold/underline. Nearly universal.
256-color palette (xterm, ~2000s) — ESC[38;5;Nm for indexed colors. Widely supported for 20+ years.
24-bit True Color / RGB (~2012+) — ESC[38;2;R;G;Bm for exact RGB values. Supported by iTerm2, Kitty, Alacritty, WezTerm, Windows Terminal, GNOME Terminal. Not supported by raw Linux framebuffer consoles or some SSH relay situations.
Unicode/UTF-8 — Box-drawing characters, wide characters, emoji. Lipgloss uses these heavily for borders and layout.
Newer extensions (optional) — Hyperlinks, bracketed paste, synchronized output. Used when available.

Lipgloss does some detection (checks COLORTERM env var) to degrade color output gracefully, but doesn't have full terminfo-level compatibility.

Step 2: The Coordinator Receives Your Prompt

Once the TUI fires off your prompt, it arrives at the Coordinator — the top-level orchestrator defined in internal/agent/coordinator.go. The Coordinator's Run() method is the single entry point. But your prompt doesn't execute immediately. The Coordinator maintains a per-session FIFO queue, meaning each session can only have one active request at a time. If you fire off a prompt while a previous one is still streaming, yours gets queued and will execute automatically when the current one finishes. You can check how many are waiting via QueuedPrompts() and clear the queue with ClearQueue(). This queuing is what lets the UI stay responsive — the TUI doesn't block, it just drops the prompt into the queue and goes back to listening for input.

Step 3: Model and Provider Setup

Before your prompt touches an LLM, the Coordinator's Run() method does housekeeping. First, it calls UpdateModels() to refresh available model metadata from Catwalk (charm.land/catwalk), Charm's community-maintained model registry — a live catalog of what models exist, their context window sizes, pricing, and capabilities.

Then it assembles the provider configuration through a 3-layer merge:

Catwalk defaults — baseline model metadata
Provider-level config — your API keys, base URLs
Model-level config — per-model overrides like temperature or max tokens

The Coordinator then calls buildProvider(), which is a factory dispatching to one of ~10 provider-specific constructors:

buildOpenaiProvider        buildAzureProvider
buildAnthropicProvider     buildBedrockProvider
buildOpenrouterProvider    buildGoogleProvider
buildVercelProvider        buildGoogleVertexProvider
buildOpenaiCompatProvider  buildHyperProvider

This returns a fantasy.Provider — the abstraction from Charm's Fantasy library (charm.land/fantasy) that provides a unified interface regardless of backend. If your provider uses OAuth (GitHub Copilot, Hyper), and the token has expired, the Coordinator handles a 401 refresh cycle here before proceeding.

Step 4: Building the Agent and Tools

With a provider ready, the Coordinator calls buildAgent() to construct a SessionAgent. This agent gets two models:

Large model: Primary generation and tool use
Small model: Summarization, title generation, lightweight tasks

Both are resolved through Fantasy's provider.LanguageModel(ctx, modelID) call, returning a LanguageModel interface that can Generate(), Stream(), GenerateObject(), or StreamObject().

Next, buildTools() assembles the tool set. Every tool is constructed with its dependencies injected — the permission service, the LSP manager, the history service, the file tracker, etc. The full list gets filtered in two passes:

AllowedTools slice — controls which built-in tools this agent type can use
AllowedMCP map — maps MCP server names to their permitted tool names

The filtered tools are sorted alphabetically for deterministic ordering. Coder agents get full tool access; Task agents get read-only tools only.

Finally, the agent gets its system prompt — loaded from coder.md.tpl in internal/agent/templates/ and rendered as a Go template with dynamic information about the current session, project, and available tools. All of this is bundled into a fantasy.Agent — the agentic loop runner from Fantasy — configured with the system prompt, tools, and model.

Step 5: The Agent Loop Executes

The SessionAgent first persists your prompt as a user message in the SQLite database (internal/db/), then broadcasts a pubsub.Event[message.Message] so the TUI knows to display it in the chat.

Then it calls fantasy.Agent.Stream() — this is where your prompt actually hits the LLM. Fantasy opens a streaming connection to the provider and begins receiving deltas — chunks of the response as they're generated.

The SessionAgent registers callbacks on this stream, and as deltas arrive, they're handled in real-time:

Text deltas get appended to the assistant message in the database and broadcast via pub/sub so the TUI can render them incrementally (this is why you see text appearing word-by-word)
Tool call deltas are structured JSON fragments indicating the model wants to invoke a tool

When a complete tool call is received, execution pauses streaming, runs the tool synchronously within the streaming loop, persists the tool result back to the database, and feeds it back into the stream so the LLM can see the result and decide what to do next.

This cycle — stream text → encounter tool call → execute tool → feed result back — repeats until the model emits a final response with no more tool calls, at which point the stream completes.

Step 6: Tool Execution and Permissions

When the model emits a tool call mid-stream (e.g., a view tool call with {"file_path": "/some/file.go"}), the Fantasy framework deserializes the JSON parameters into the tool's typed params struct using the JSON schema auto-generated from struct tags at registration time.

The tool's Run() function fires, but before doing any actual work, it calls permissions.Request() on the permission service. This is a blocking call. The permission service checks a fast path:

Is --yolo flag set? (auto-approve everything)
Is the session pre-approved via AutoApproveSession()?
Does the tool/action match a persistent grant already cached from earlier in this session?

If none hit, it publishes a pubsub.Event[permission.PermissionRequest] — which the TUI subscribes to. The TUI renders a dialog overlay (via internal/ui/dialog/) asking you to approve or deny the action.

Your response flows back through a channel. The tool's Run() function has been blocked this entire time, waiting on that channel. Once you approve, execution resumes, the tool does its work, and returns a fantasy.ToolResponse back into the stream.

If you chose "Grant Persistently," the permission service caches that approval so identical future requests in this session skip the dialog entirely.

Permission Flow Summary

Tool.Run() → permissions.Request()
  → Fast path check (yolo / auto-approve / cache)
  → Miss → pubsub.Publish(PermissionRequest)
  → TUI renders dialog overlay
  → User approves/denies
  → Channel delivers response
  → Tool.Run() resumes or aborts

Step 7: Context Window Management and Summarization

While the agent loop cycles through generate-tool-generate, the SessionAgent tracks token usage. Every response from the LLM includes token counts, accumulated against the model's known context window size (from Catwalk metadata).

When usage approaches the limit, automatic summarization kicks in:

Threshold: Reserve 20,000 tokens (for windows >200k) or 20% (for smaller windows)
Process: Halt generation → switch to small model → produce compressed summary of entire conversation → persist summary as special message → reset token counters → resume generation with summary as new history
Tradeoff: Pre-summary messages remain in SQLite for the record, but are discarded from active context on session reload

This is how Crush handles arbitrarily long sessions without hitting context limits — it trades perfect recall for continuity.

Step 8: Loop Detection

A safety mechanism runs alongside the agent loop to prevent repetitive cycles (internal/agent/loop_detection.go).

As each tool call completes, the loop detector examines the last 10 tool call steps. For each step, it creates a SHA-256 fingerprint by hashing the combination of:

Tool name
Input parameters
Output result

If any single fingerprint appears more than 5 times within that 10-step window, the detector triggers. It doesn't kill the agent — instead, it injects a signal into the conversation context telling the model it's repeating itself and should try a different approach.

This prevents LLMs from burning tokens and context window on identical failed retries.

Step 9: Stream Completion and Message Finalization

When the model emits a response with no further tool calls, the stream completes. The SessionAgent finalizes the assistant message in SQLite:

Full text content
All tool calls and their results
Token usage statistics
Model that generated it

A final pubsub.Event[message.Message] is broadcast, which the TUI picks up to render the completed message (replacing the streaming partial with the final version).

Additional bookkeeping:

File tracker (internal/filetracker/) records which files were touched during this turn (reads, edits, writes), associated with the session
History service (internal/history/) snapshots file edits so they can be undone
Usage statistics (tokens in, tokens out, cost) are aggregated and written to the stats tables in SQLite, viewable via crush stats

Step 10: Queue Drain and Return to Idle

With the current request resolved, the SessionAgent checks the per-session FIFO queue for any prompts that stacked up while it was busy. If there's a queued prompt waiting, the entire cycle (Steps 5–9) kicks off again immediately and automatically.

This continues until the queue is empty. Once drained:

IsSessionBusy() flips to false
Coordinator signals the TUI that the session is idle
TUI shifts focus back to the textarea input
Submit action re-enabled on the editor
Status bar updates to reflect idle state
Pub/sub goes quiet — no events, no renders, zero CPU on rendering

The application sits waiting for the next keystroke to generate a KeyMsg and start the whole cycle over again.

Complete Lifecycle Summary

Keystroke → TUI textarea
  → Bubble Tea Msg → Update → View
  → Prompt dispatched to Coordinator.Run()
    → FIFO queue (one active per session)
    → UpdateModels() from Catwalk
    → 3-layer config merge
    → buildProvider() → fantasy.Provider
    → buildAgent() → SessionAgent (large + small models)
    → buildTools() → filtered, sorted tool set
    → System prompt from coder.md.tpl
    → Persist user message to SQLite
    → Broadcast pubsub event → TUI renders user message
    → fantasy.Agent.Stream()
      → LLM streaming connection
      → Text deltas → persist + broadcast → TUI renders incrementally
      → Tool call deltas → deserialize → permission check → execute → feed result back
      → Loop detection (SHA-256 fingerprints, 5-of-10 threshold)
      → Context management (summarize if approaching window limit)
      → Repeat until no more tool calls
    → Finalize message in SQLite
    → File tracker + history snapshots
    → Usage stats persisted
    → Broadcast completion event → TUI renders final message
  → Check queue → drain remaining prompts
  → IsSessionBusy() = false
  → TUI returns to idle