Crush SQLite Schema Reference
TL;DR: Crush stores everything in SQLite with WAL mode for concurrent reads during streaming LLM responses. The schema is elegantly simple: sessions (tokens, cost, todos), messages (JSON parts array for text + tool calls), files (full snapshots per version for trivial undo), and read_files (dedup tracking). Every table, trigger, query, and design decision mapped to the prompt execution critical path.
Crush SQLite Schema Reference
Status: Complete
A complete reference of Crush's SQLite database schema, including every table, column, index, trigger, and query — mapped to the prompt execution critical path described in CRUSH_PROMPT_WALKTHROUGH.md.
Database Configuration
Location: crush.db in the configured data directory
Driver: Two pure-Go SQLite drivers available — ncruces/go-sqlite3 (primary) and modernc.org/sqlite (alternative). No CGO required.
Code Generation: sqlc generates type-safe Go from SQL queries. Config in sqlc.yaml:
version: "2"
sql:
- engine: "sqlite"
schema: "internal/db/migrations"
queries: "internal/db/sql"
gen:
go:
package: "db"
out: "internal/db"
emit_json_tags: true
emit_prepared_queries: true
emit_interface: true
emit_exact_table_names: false
emit_empty_slices: true
Connection Pragmas:
PRAGMA foreign_keys = ON; -- Enforce foreign key constraints
PRAGMA journal_mode = WAL; -- Write-Ahead Logging for concurrent reads during writes
PRAGMA page_size = 4096; -- 4KB pages
PRAGMA cache_size = -8000; -- 8MB page cache
PRAGMA synchronous = NORMAL; -- Balanced durability/performance (WAL makes this safe)
PRAGMA secure_delete = ON; -- Overwrite deleted data (security)
PRAGMA busy_timeout = 30000; -- 30-second lock wait before SQLITE_BUSY
Migration Tool: Goose (embedded migrations via internal/db/embed.go). Migrations run automatically on startup.
Entity Relationship Diagram
sessions (root entity)
│
├── 1:N ──→ messages
│ Stores all user prompts, assistant responses, tool calls, tool results.
│ Cascading delete.
│
├── 1:N ──→ files
│ Version-controlled snapshots of every file edit.
│ Cascading delete.
│
├── 1:N ──→ read_files
│ Tracks which files the agent has read (dedup optimization).
│ Cascading delete.
│
└── 1:0..1 ──→ sessions (self-reference via parent_session_id)
Session branching/forking.
Table 1: sessions
Role in Critical Path: Created in Step 2 (Coordinator receives prompt). Token counts and cost updated throughout Steps 5-9 (agent loop, tool execution, finalization). Summary message ID set during Step 7 (context window summarization). Todos updated by the todos tool during Step 6 (tool execution).
Purpose: Root entity for all conversation data. Tracks session metadata, cumulative token usage, cost, and optional todo lists.
CREATE Statement
CREATE TABLE IF NOT EXISTS sessions (
id TEXT PRIMARY KEY,
parent_session_id TEXT,
title TEXT NOT NULL,
message_count INTEGER NOT NULL DEFAULT 0 CHECK (message_count >= 0),
prompt_tokens INTEGER NOT NULL DEFAULT 0 CHECK (prompt_tokens >= 0),
completion_tokens INTEGER NOT NULL DEFAULT 0 CHECK (completion_tokens >= 0),
cost REAL NOT NULL DEFAULT 0.0 CHECK (cost >= 0.0),
updated_at INTEGER NOT NULL,
created_at INTEGER NOT NULL,
summary_message_id TEXT,
todos TEXT
);
CREATE INDEX IF NOT EXISTS idx_sessions_created_at ON sessions (created_at);
Column Breakdown
| Column | Type | Constraints | Description |
|---|---|---|---|
id | TEXT | PRIMARY KEY | UUID. Generated when the user starts a new session or when a sub-agent is spawned (deterministic ID from parent message + tool call ID). |
parent_session_id | TEXT | NULLABLE | Self-referencing FK. Non-null for session branches/forks and sub-agent sessions. ListSessions filters to parent_session_id IS NULL to show only top-level sessions. |
title | TEXT | NOT NULL | Display name. Initially a placeholder, then auto-generated by the small model after the first exchange. Updated via UpdateSession. |
message_count | INTEGER | NOT NULL, DEFAULT 0, CHECK >= 0 | Running count of messages. Auto-maintained by triggers on message insert/delete — never manually updated. Used for display in the TUI sidebar. |
prompt_tokens | INTEGER | NOT NULL, DEFAULT 0, CHECK >= 0 | Cumulative input tokens across all LLM calls in this session. Incremented via UpdateSessionTitleAndUsage after each assistant response in Step 9. |
completion_tokens | INTEGER | NOT NULL, DEFAULT 0, CHECK >= 0 | Cumulative output tokens. Same update path as prompt_tokens. |
cost | REAL | NOT NULL, DEFAULT 0.0, CHECK >= 0.0 | Cumulative API cost in USD. Calculated from token counts × model pricing (from Catwalk metadata). Sub-agent costs roll up to the parent session. |
updated_at | INTEGER | NOT NULL | Unix timestamp (seconds). Auto-updated by trigger on any row change. Used for sorting sessions by recency in ListSessions. |
created_at | INTEGER | NOT NULL | Unix timestamp (seconds). Set once at creation. Indexed for efficient date-range queries in statistics. |
summary_message_id | TEXT | NULLABLE | FK to messages.id. Points to the summary message created during Step 7 (automatic summarization). When non-null, pre-summary messages are excluded from the active context on session reload. |
todos | TEXT | NULLABLE | JSON-serialized todo list. Written by the todos tool during agent execution (Step 6). Stored at the session level so todos persist across messages. |
Triggers
-- Auto-update updated_at on any change
CREATE TRIGGER IF NOT EXISTS update_sessions_updated_at
AFTER UPDATE ON sessions
BEGIN
UPDATE sessions SET updated_at = strftime('%s', 'now')
WHERE id = new.id;
END;
Queries
| Query Name | Operation | Purpose |
|---|---|---|
CreateSession | INSERT RETURNING | Create new session with initial metadata |
GetSessionByID | SELECT | Load session by ID (on prompt arrival, session reload) |
ListSessions | SELECT WHERE parent IS NULL ORDER BY updated_at DESC | TUI sidebar session list (excludes sub-agent sessions) |
UpdateSession | UPDATE RETURNING | Full update: title, tokens, cost, summary_message_id, todos |
UpdateSessionTitleAndUsage | UPDATE (additive) | Incremental token/cost update after each response (+= delta) |
DeleteSession | DELETE | Cascades to messages, files, read_files |
Table 2: messages
Role in Critical Path: User message created in Step 5 (agent loop start). Assistant message created and incrementally updated throughout Step 5 (streaming deltas). Tool call and tool result parts appended during Step 6 (tool execution). finished_at set in Step 9 (stream completion). Summary messages flagged during Step 7 (summarization).
Purpose: Stores every message in a conversation — user inputs, assistant responses (including streaming partials), tool calls, and tool results. The parts column is the key: it's a JSON array that can contain heterogeneous content types.
CREATE Statement
CREATE TABLE IF NOT EXISTS messages (
id TEXT PRIMARY KEY,
session_id TEXT NOT NULL,
role TEXT NOT NULL,
parts TEXT NOT NULL DEFAULT '[]',
model TEXT,
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL,
finished_at INTEGER,
provider TEXT,
is_summary_message INTEGER DEFAULT 0 NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_messages_session_id ON messages (session_id);
CREATE INDEX IF NOT EXISTS idx_messages_created_at ON messages (created_at);
Column Breakdown
| Column | Type | Constraints | Description |
|---|---|---|---|
id | TEXT | PRIMARY KEY | UUID for the message. |
session_id | TEXT | NOT NULL, FK → sessions(id) ON DELETE CASCADE | Parent session. Indexed for efficient session message listing. |
role | TEXT | NOT NULL | Message role: 'user', 'assistant', or 'tool'. Determines how the message is rendered in the TUI and how it's included in the LLM context. |
parts | TEXT | NOT NULL, DEFAULT '[]' | JSON array of message content blocks. This is the most important column — see Parts Schema below. Updated incrementally during streaming (Step 5) as deltas arrive. |
model | TEXT | NULLABLE | LLM model identifier (e.g., 'claude-3-5-sonnet', 'gpt-4o'). Set for assistant messages. Null for user messages. Used in statistics queries. |
created_at | INTEGER | NOT NULL | Unix timestamp. When the message record was first inserted. |
updated_at | INTEGER | NOT NULL | Unix timestamp. Auto-updated by trigger. Changes on every streaming delta write. |
finished_at | INTEGER | NULLABLE | Unix timestamp. Set when the assistant finishes responding (Step 9). Null for user messages and in-progress assistant messages. Used to calculate response time in statistics (finished_at - created_at). |
provider | TEXT | NULLABLE | Provider name (e.g., 'anthropic', 'openai', 'openrouter'). Set for assistant messages. Enables per-provider usage analytics. |
is_summary_message | INTEGER | NOT NULL, DEFAULT 0 | Boolean flag (0/1). Set to 1 for messages created during Step 7 (automatic summarization). When loading a session, if sessions.summary_message_id is set, only messages after the summary are loaded into the active context. |
Parts Schema (JSON)
The parts column stores a JSON array of typed content blocks:
[
{"type": "text", "text": "Here's the file you asked about..."},
{"type": "tool_call", "data": {"id": "tc_123", "name": "view", "input": {"file_path": "/src/main.go"}}},
{"type": "tool_result", "data": {"tool_call_id": "tc_123", "content": "package main\n...", "is_error": false}}
]
| Part Type | Role | Description |
|---|---|---|
text | user, assistant | Plain text content. For user: the prompt. For assistant: generated response text. |
tool_call | assistant | The model's request to invoke a tool. Contains tool name and JSON input parameters. |
tool_result | tool | The output from executing a tool. Contains the tool call ID (for pairing) and content/error. |
This structure means a single assistant message can interleave text and multiple tool calls, accurately representing the streaming loop from Step 5 where the model alternates between generating text and invoking tools.
Triggers
-- Auto-update updated_at on any change
CREATE TRIGGER IF NOT EXISTS update_messages_updated_at
AFTER UPDATE ON messages
BEGIN
UPDATE messages SET updated_at = strftime('%s', 'now')
WHERE id = new.id;
END;
-- Auto-increment session message_count on insert
CREATE TRIGGER IF NOT EXISTS update_session_message_count_on_insert
AFTER INSERT ON messages
BEGIN
UPDATE sessions SET message_count = message_count + 1
WHERE id = new.session_id;
END;
-- Auto-decrement session message_count on delete
CREATE TRIGGER IF NOT EXISTS update_session_message_count_on_delete
AFTER DELETE ON messages
BEGIN
UPDATE sessions SET message_count = message_count - 1
WHERE id = old.session_id;
END;
Queries
| Query Name | Operation | Purpose |
|---|---|---|
CreateMessage | INSERT RETURNING | Persist new user/assistant/tool message |
GetMessage | SELECT | Load single message by ID |
ListMessagesBySession | SELECT ORDER BY created_at ASC | Load full conversation history for a session (chronological) |
UpdateMessage | UPDATE | Update parts (streaming deltas) and finished_at (completion) |
DeleteMessage | DELETE | Remove single message (triggers session count decrement) |
DeleteSessionMessages | DELETE WHERE session_id | Bulk delete all messages in a session |
ListUserMessagesBySession | SELECT WHERE role='user' ORDER BY created_at DESC | Recent user prompts for a session |
ListAllUserMessages | SELECT WHERE role='user' ORDER BY created_at DESC | All user prompts across all sessions (for search/analytics) |
Table 3: files
Role in Critical Path: Written during Step 6 (tool execution) when the edit, multi_edit, or write tools modify files. Each edit creates a new row with an incremented version number. Used in Step 9 (finalization) by the history service for undo snapshots.
Purpose: Version-controlled storage of complete file contents at each edit point. Enables full undo/redo by storing the entire file content per version, not diffs. This is intentionally simple — no delta compression, no patch format, just full snapshots.
CREATE Statement
CREATE TABLE IF NOT EXISTS files (
id TEXT PRIMARY KEY,
session_id TEXT NOT NULL,
path TEXT NOT NULL,
content TEXT NOT NULL,
version INTEGER NOT NULL DEFAULT 0,
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL,
FOREIGN KEY (session_id) REFERENCES sessions (id) ON DELETE CASCADE,
UNIQUE(path, session_id, version)
);
CREATE INDEX IF NOT EXISTS idx_files_session_id ON files (session_id);
CREATE INDEX IF NOT EXISTS idx_files_path ON files (path);
CREATE INDEX IF NOT EXISTS idx_files_created_at ON files (created_at);
Column Breakdown
| Column | Type | Constraints | Description |
|---|---|---|---|
id | TEXT | PRIMARY KEY | UUID for this file version record. |
session_id | TEXT | NOT NULL, FK → sessions(id) ON DELETE CASCADE | Session where the edit occurred. Scopes file history to sessions. |
path | TEXT | NOT NULL | Full file path (absolute or project-relative). Indexed for path-based lookups. |
content | TEXT | NOT NULL | Complete file content at this version. Not a diff — the entire file. This makes undo trivial (just restore the previous version's content) at the cost of storage. |
version | INTEGER | NOT NULL, DEFAULT 0 | Sequential version number for this file within this session. Version 0 = original content before first edit (snapshot taken before the edit tool modifies the file). Version 1 = after first edit, etc. |
created_at | INTEGER | NOT NULL | Unix timestamp of version creation. |
updated_at | INTEGER | NOT NULL | Unix timestamp, auto-updated by trigger. |
Unique Constraint
UNIQUE(path, session_id, version)
Ensures no duplicate versions for the same file in the same session. This is the composite key that the version history queries rely on.
Triggers
CREATE TRIGGER IF NOT EXISTS update_files_updated_at
AFTER UPDATE ON files
BEGIN
UPDATE files SET updated_at = strftime('%s', 'now')
WHERE id = new.id;
END;
Queries
| Query Name | Operation | Purpose |
|---|---|---|
CreateFile | INSERT RETURNING | Store a new file version snapshot |
GetFile | SELECT | Load single file version by ID |
GetFileByPathAndSession | SELECT ORDER BY version DESC LIMIT 1 | Get the latest version of a file in a session (most common query — used before edits to get current content) |
ListFilesBySession | SELECT ORDER BY version ASC | All file versions in a session (chronological — for history view) |
ListFilesByPath | SELECT ORDER BY version DESC | All versions of a specific file across sessions (reverse chronological) |
ListLatestSessionFiles | SELECT with MAX(version) JOIN | Get the most recent version of every file in a session (for session file summary) |
DeleteFile | DELETE | Remove single file version |
DeleteSessionFiles | DELETE WHERE session_id | Bulk delete all file versions in a session |
How Undo Works
- Before editing a file, the
edittool queriesGetFileByPathAndSessionto get the latest version - If no version exists yet, it reads the file from disk and stores it as version 0 (the "original" snapshot)
- After applying the edit, it stores the new content as version N+1
- To undo: query
ListFilesByPath, get the previous version'scontent, write it back to disk
Table 4: read_files
Role in Critical Path: Written during Step 6 (tool execution) when the view or grep tools read files. Queried before file reads to avoid redundant reads. Part of the file tracker service referenced in Step 9 (finalization).
Purpose: Tracks which files the agent has read during a session, with timestamps. Serves as an optimization to avoid re-reading files the agent has already seen, and as metadata for the agent's context about what it has and hasn't looked at.
CREATE Statement
CREATE TABLE IF NOT EXISTS read_files (
session_id TEXT NOT NULL CHECK (session_id != ''),
path TEXT NOT NULL CHECK (path != ''),
read_at INTEGER NOT NULL,
FOREIGN KEY (session_id) REFERENCES sessions (id) ON DELETE CASCADE,
PRIMARY KEY (path, session_id)
);
CREATE INDEX IF NOT EXISTS idx_read_files_session_id ON read_files (session_id);
CREATE INDEX IF NOT EXISTS idx_read_files_path ON read_files (path);
Column Breakdown
| Column | Type | Constraints | Description |
|---|---|---|---|
session_id | TEXT | NOT NULL, CHECK != '', FK → sessions(id) ON DELETE CASCADE | Session where the read occurred. Part of composite primary key. |
path | TEXT | NOT NULL, CHECK != '' | File path that was read. Part of composite primary key. |
read_at | INTEGER | NOT NULL | Unix timestamp of the most recent read. Updated via upsert when the same file is read again. |
Primary Key
PRIMARY KEY (path, session_id)
Composite primary key ensures one record per file per session. The RecordFileRead query uses ON CONFLICT ... DO UPDATE to upsert — if the file was already read, just update the timestamp.
Queries
| Query Name | Operation | Purpose |
|---|---|---|
RecordFileRead | INSERT ON CONFLICT UPDATE | Record or update a file read (upsert pattern) |
GetFileRead | SELECT | Check if a specific file has been read in this session |
ListSessionReadFiles | SELECT ORDER BY read_at DESC | List all files read in a session (most recent first) |
Statistics Queries (Virtual — No Dedicated Table)
Role in Critical Path: Not on the critical path. These are read-only analytics queries run by the crush stats subcommand (internal/cmd/stats/), querying the existing sessions and messages tables.
Purpose: Comprehensive usage analytics derived from session and message data. No dedicated stats table — all metrics are computed on-the-fly from the core tables.
Query Breakdown
| Query Name | Source Tables | What It Computes |
|---|---|---|
GetTotalStats | sessions | Lifetime totals: session count, total tokens (prompt + completion), total cost, total messages, averages per session |
GetUsageByDay | sessions | Daily breakdown: token counts, cost, session count, grouped by date |
GetUsageByModel | messages | Per-model usage: message counts grouped by model + provider (assistant messages only) |
GetUsageByHour | sessions | Hourly activity distribution: session counts by hour of day (0-23) |
GetUsageByDayOfWeek | sessions | Weekly patterns: session counts and tokens by day of week (0=Sunday) |
GetRecentActivity | sessions | Last 30 days: daily session counts, total tokens, cost |
GetAverageResponseTime | messages | Mean assistant response time: AVG(finished_at - created_at) for completed assistant messages |
GetToolUsage | messages (JSON) | Tool call frequency: extracts tool names from parts JSON via json_extract + json_each, counts invocations per tool |
GetHourDayHeatmap | sessions | 2D activity heatmap: session counts by (day_of_week, hour) pairs |
Tool Usage Query (Notable)
This query is particularly interesting because it reaches into the JSON parts column:
SELECT
json_extract(value, '$.data.name') as tool_name,
COUNT(*) as call_count
FROM messages, json_each(parts)
WHERE json_extract(value, '$.type') = 'tool_call'
AND json_extract(value, '$.data.name') IS NOT NULL
GROUP BY tool_name
ORDER BY call_count DESC;
It uses SQLite's json_each() table-valued function to unnest the parts JSON array, then filters for tool_call type entries and extracts the tool name. This gives a complete breakdown of which tools the agent uses most — across all sessions and all time.
Critical Path Data Flow
Mapping database operations to the prompt execution walkthrough:
Step 2: Coordinator receives prompt
└── GetSessionByID → load session state
Step 5: Agent loop executes
├── CreateMessage (role='user') → persist user prompt
├── CreateMessage (role='assistant') → create assistant message record
└── UpdateMessage (loop) → append streaming deltas to parts JSON
Step 6: Tool execution
├── [view tool] → RecordFileRead (upsert)
├── [edit tool] → GetFileByPathAndSession → get current content
│ → CreateFile (version 0) → snapshot original
│ → CreateFile (version N+1) → store edited content
├── [write tool] → CreateFile → store new file content
├── [todos tool] → UpdateSession → write todos JSON
└── UpdateMessage → append tool_call + tool_result parts
Step 7: Summarization
├── CreateMessage (is_summary_message=1) → persist summary
└── UpdateSession (summary_message_id) → link summary to session
Step 9: Finalization
├── UpdateMessage (finished_at) → mark completion timestamp
└── UpdateSessionTitleAndUsage → increment token counts and cost
Step 10: Queue drain
└── GetSessionByID → reload session for next queued prompt
Migration History
| Migration | Date | Description |
|---|---|---|
20250424200609_initial.sql | 2025-04-24 | Initial schema: sessions, messages, files tables with all triggers and indexes |
20250515105448_add_summary_message_id.sql | 2025-05-15 | Added summary_message_id column to sessions for context window summarization |
20250624000000_add_created_at_indexes.sql | 2025-06-24 | Added created_at indexes on all tables for sort performance |
20250627000000_add_provider_to_messages.sql | 2025-06-27 | Added provider column to messages for multi-provider analytics |
20250810000000_add_is_summary_message.sql | 2025-08-10 | Added is_summary_message flag to messages for summary identification |
20250812000000_add_todos_to_sessions.sql | 2025-08-12 | Added todos JSON column to sessions for task tracking |
20260127000000_add_read_files_table.sql | 2026-01-27 | New read_files table for file read deduplication |
Generated Go Models
The sqlc-generated structs in internal/db/models.go:
type Session struct {
ID string `json:"id"`
ParentSessionID sql.NullString `json:"parent_session_id"`
Title string `json:"title"`
MessageCount int64 `json:"message_count"`
PromptTokens int64 `json:"prompt_tokens"`
CompletionTokens int64 `json:"completion_tokens"`
Cost float64 `json:"cost"`
UpdatedAt int64 `json:"updated_at"`
CreatedAt int64 `json:"created_at"`
SummaryMessageID sql.NullString `json:"summary_message_id"`
Todos sql.NullString `json:"todos"`
}
type Message struct {
ID string `json:"id"`
SessionID string `json:"session_id"`
Role string `json:"role"`
Parts string `json:"parts"`
Model sql.NullString `json:"model"`
CreatedAt int64 `json:"created_at"`
UpdatedAt int64 `json:"updated_at"`
FinishedAt sql.NullInt64 `json:"finished_at"`
Provider sql.NullString `json:"provider"`
IsSummaryMessage int64 `json:"is_summary_message"`
}
type File struct {
ID string `json:"id"`
SessionID string `json:"session_id"`
Path string `json:"path"`
Content string `json:"content"`
Version int64 `json:"version"`
CreatedAt int64 `json:"created_at"`
UpdatedAt int64 `json:"updated_at"`
}
type ReadFile struct {
SessionID string `json:"session_id"`
Path string `json:"path"`
ReadAt int64 `json:"read_at"`
}
Notable: sql.NullString and sql.NullInt64 for nullable columns. Parts is a raw JSON string — deserialized in the message service layer (internal/message/) into typed Go structs for the content blocks.
Design Decisions
| Decision | Rationale |
|---|---|
| Full file snapshots, not diffs | Simplicity. Undo is trivial (restore previous version content). Storage cost is acceptable for code files. No diff algorithm complexity. |
| JSON parts in messages | Schema flexibility. Tool calls and text can interleave freely without a rigid relational model. SQLite's json_extract/json_each still allows structured queries. |
| Triggers for counts/timestamps | Data integrity. message_count and updated_at are always consistent without requiring application-level coordination. |
| WAL journal mode | Concurrency. Allows the TUI to read (display messages) while the agent loop writes (streaming deltas). Critical for responsive UI during generation. |
| Cascading deletes | Cleanup simplicity. Deleting a session removes all associated data in one operation. |
| Composite PK on read_files | Natural key. A file is either read or not in a given session — no need for a surrogate key. Upsert pattern keeps the table compact. |
| No separate stats table | Stats are derived on-the-fly from sessions/messages. Avoids data duplication and staleness. Acceptable because stats are queried rarely (user command, not during generation). |
| Unix timestamps as integers | SQLite doesn't have a native datetime type. Integer timestamps are compact, sortable, and work well with strftime() for formatted output. |
Sources
- charmbracelet/crush —
internal/db/directory - CRUSH_ARCHITECTURE.md — Full architecture reference
- CRUSH_PROMPT_WALKTHROUGH.md — Prompt execution flow