← Back to blog

Crush SQLite Schema Reference

·Crush
crushsqlitedatabase

TL;DR: Crush stores everything in SQLite with WAL mode for concurrent reads during streaming LLM responses. The schema is elegantly simple: sessions (tokens, cost, todos), messages (JSON parts array for text + tool calls), files (full snapshots per version for trivial undo), and read_files (dedup tracking). Every table, trigger, query, and design decision mapped to the prompt execution critical path.

Crush SQLite Schema Reference

Status: Complete

A complete reference of Crush's SQLite database schema, including every table, column, index, trigger, and query — mapped to the prompt execution critical path described in CRUSH_PROMPT_WALKTHROUGH.md.


Database Configuration

Location: crush.db in the configured data directory

Driver: Two pure-Go SQLite drivers available — ncruces/go-sqlite3 (primary) and modernc.org/sqlite (alternative). No CGO required.

Code Generation: sqlc generates type-safe Go from SQL queries. Config in sqlc.yaml:

version: "2"
sql:
  - engine: "sqlite"
    schema: "internal/db/migrations"
    queries: "internal/db/sql"
    gen:
      go:
        package: "db"
        out: "internal/db"
        emit_json_tags: true
        emit_prepared_queries: true
        emit_interface: true
        emit_exact_table_names: false
        emit_empty_slices: true

Connection Pragmas:

PRAGMA foreign_keys = ON;          -- Enforce foreign key constraints
PRAGMA journal_mode = WAL;         -- Write-Ahead Logging for concurrent reads during writes
PRAGMA page_size = 4096;           -- 4KB pages
PRAGMA cache_size = -8000;         -- 8MB page cache
PRAGMA synchronous = NORMAL;       -- Balanced durability/performance (WAL makes this safe)
PRAGMA secure_delete = ON;         -- Overwrite deleted data (security)
PRAGMA busy_timeout = 30000;       -- 30-second lock wait before SQLITE_BUSY

Migration Tool: Goose (embedded migrations via internal/db/embed.go). Migrations run automatically on startup.


Entity Relationship Diagram

sessions (root entity)
│
├── 1:N ──→ messages
│            Stores all user prompts, assistant responses, tool calls, tool results.
│            Cascading delete.
│
├── 1:N ──→ files
│            Version-controlled snapshots of every file edit.
│            Cascading delete.
│
├── 1:N ──→ read_files
│            Tracks which files the agent has read (dedup optimization).
│            Cascading delete.
│
└── 1:0..1 ──→ sessions (self-reference via parent_session_id)
               Session branching/forking.

Table 1: sessions

Role in Critical Path: Created in Step 2 (Coordinator receives prompt). Token counts and cost updated throughout Steps 5-9 (agent loop, tool execution, finalization). Summary message ID set during Step 7 (context window summarization). Todos updated by the todos tool during Step 6 (tool execution).

Purpose: Root entity for all conversation data. Tracks session metadata, cumulative token usage, cost, and optional todo lists.

CREATE Statement

CREATE TABLE IF NOT EXISTS sessions (
    id TEXT PRIMARY KEY,
    parent_session_id TEXT,
    title TEXT NOT NULL,
    message_count INTEGER NOT NULL DEFAULT 0 CHECK (message_count >= 0),
    prompt_tokens INTEGER NOT NULL DEFAULT 0 CHECK (prompt_tokens >= 0),
    completion_tokens INTEGER NOT NULL DEFAULT 0 CHECK (completion_tokens >= 0),
    cost REAL NOT NULL DEFAULT 0.0 CHECK (cost >= 0.0),
    updated_at INTEGER NOT NULL,
    created_at INTEGER NOT NULL,
    summary_message_id TEXT,
    todos TEXT
);

CREATE INDEX IF NOT EXISTS idx_sessions_created_at ON sessions (created_at);

Column Breakdown

ColumnTypeConstraintsDescription
idTEXTPRIMARY KEYUUID. Generated when the user starts a new session or when a sub-agent is spawned (deterministic ID from parent message + tool call ID).
parent_session_idTEXTNULLABLESelf-referencing FK. Non-null for session branches/forks and sub-agent sessions. ListSessions filters to parent_session_id IS NULL to show only top-level sessions.
titleTEXTNOT NULLDisplay name. Initially a placeholder, then auto-generated by the small model after the first exchange. Updated via UpdateSession.
message_countINTEGERNOT NULL, DEFAULT 0, CHECK >= 0Running count of messages. Auto-maintained by triggers on message insert/delete — never manually updated. Used for display in the TUI sidebar.
prompt_tokensINTEGERNOT NULL, DEFAULT 0, CHECK >= 0Cumulative input tokens across all LLM calls in this session. Incremented via UpdateSessionTitleAndUsage after each assistant response in Step 9.
completion_tokensINTEGERNOT NULL, DEFAULT 0, CHECK >= 0Cumulative output tokens. Same update path as prompt_tokens.
costREALNOT NULL, DEFAULT 0.0, CHECK >= 0.0Cumulative API cost in USD. Calculated from token counts × model pricing (from Catwalk metadata). Sub-agent costs roll up to the parent session.
updated_atINTEGERNOT NULLUnix timestamp (seconds). Auto-updated by trigger on any row change. Used for sorting sessions by recency in ListSessions.
created_atINTEGERNOT NULLUnix timestamp (seconds). Set once at creation. Indexed for efficient date-range queries in statistics.
summary_message_idTEXTNULLABLEFK to messages.id. Points to the summary message created during Step 7 (automatic summarization). When non-null, pre-summary messages are excluded from the active context on session reload.
todosTEXTNULLABLEJSON-serialized todo list. Written by the todos tool during agent execution (Step 6). Stored at the session level so todos persist across messages.

Triggers

-- Auto-update updated_at on any change
CREATE TRIGGER IF NOT EXISTS update_sessions_updated_at
AFTER UPDATE ON sessions
BEGIN
    UPDATE sessions SET updated_at = strftime('%s', 'now')
    WHERE id = new.id;
END;

Queries

Query NameOperationPurpose
CreateSessionINSERT RETURNINGCreate new session with initial metadata
GetSessionByIDSELECTLoad session by ID (on prompt arrival, session reload)
ListSessionsSELECT WHERE parent IS NULL ORDER BY updated_at DESCTUI sidebar session list (excludes sub-agent sessions)
UpdateSessionUPDATE RETURNINGFull update: title, tokens, cost, summary_message_id, todos
UpdateSessionTitleAndUsageUPDATE (additive)Incremental token/cost update after each response (+= delta)
DeleteSessionDELETECascades to messages, files, read_files

Table 2: messages

Role in Critical Path: User message created in Step 5 (agent loop start). Assistant message created and incrementally updated throughout Step 5 (streaming deltas). Tool call and tool result parts appended during Step 6 (tool execution). finished_at set in Step 9 (stream completion). Summary messages flagged during Step 7 (summarization).

Purpose: Stores every message in a conversation — user inputs, assistant responses (including streaming partials), tool calls, and tool results. The parts column is the key: it's a JSON array that can contain heterogeneous content types.

CREATE Statement

CREATE TABLE IF NOT EXISTS messages (
    id TEXT PRIMARY KEY,
    session_id TEXT NOT NULL,
    role TEXT NOT NULL,
    parts TEXT NOT NULL DEFAULT '[]',
    model TEXT,
    created_at INTEGER NOT NULL,
    updated_at INTEGER NOT NULL,
    finished_at INTEGER,
    provider TEXT,
    is_summary_message INTEGER DEFAULT 0 NOT NULL
);

CREATE INDEX IF NOT EXISTS idx_messages_session_id ON messages (session_id);
CREATE INDEX IF NOT EXISTS idx_messages_created_at ON messages (created_at);

Column Breakdown

ColumnTypeConstraintsDescription
idTEXTPRIMARY KEYUUID for the message.
session_idTEXTNOT NULL, FK → sessions(id) ON DELETE CASCADEParent session. Indexed for efficient session message listing.
roleTEXTNOT NULLMessage role: 'user', 'assistant', or 'tool'. Determines how the message is rendered in the TUI and how it's included in the LLM context.
partsTEXTNOT NULL, DEFAULT '[]'JSON array of message content blocks. This is the most important column — see Parts Schema below. Updated incrementally during streaming (Step 5) as deltas arrive.
modelTEXTNULLABLELLM model identifier (e.g., 'claude-3-5-sonnet', 'gpt-4o'). Set for assistant messages. Null for user messages. Used in statistics queries.
created_atINTEGERNOT NULLUnix timestamp. When the message record was first inserted.
updated_atINTEGERNOT NULLUnix timestamp. Auto-updated by trigger. Changes on every streaming delta write.
finished_atINTEGERNULLABLEUnix timestamp. Set when the assistant finishes responding (Step 9). Null for user messages and in-progress assistant messages. Used to calculate response time in statistics (finished_at - created_at).
providerTEXTNULLABLEProvider name (e.g., 'anthropic', 'openai', 'openrouter'). Set for assistant messages. Enables per-provider usage analytics.
is_summary_messageINTEGERNOT NULL, DEFAULT 0Boolean flag (0/1). Set to 1 for messages created during Step 7 (automatic summarization). When loading a session, if sessions.summary_message_id is set, only messages after the summary are loaded into the active context.

Parts Schema (JSON)

The parts column stores a JSON array of typed content blocks:

[
    {"type": "text", "text": "Here's the file you asked about..."},
    {"type": "tool_call", "data": {"id": "tc_123", "name": "view", "input": {"file_path": "/src/main.go"}}},
    {"type": "tool_result", "data": {"tool_call_id": "tc_123", "content": "package main\n...", "is_error": false}}
]
Part TypeRoleDescription
textuser, assistantPlain text content. For user: the prompt. For assistant: generated response text.
tool_callassistantThe model's request to invoke a tool. Contains tool name and JSON input parameters.
tool_resulttoolThe output from executing a tool. Contains the tool call ID (for pairing) and content/error.

This structure means a single assistant message can interleave text and multiple tool calls, accurately representing the streaming loop from Step 5 where the model alternates between generating text and invoking tools.

Triggers

-- Auto-update updated_at on any change
CREATE TRIGGER IF NOT EXISTS update_messages_updated_at
AFTER UPDATE ON messages
BEGIN
    UPDATE messages SET updated_at = strftime('%s', 'now')
    WHERE id = new.id;
END;

-- Auto-increment session message_count on insert
CREATE TRIGGER IF NOT EXISTS update_session_message_count_on_insert
AFTER INSERT ON messages
BEGIN
    UPDATE sessions SET message_count = message_count + 1
    WHERE id = new.session_id;
END;

-- Auto-decrement session message_count on delete
CREATE TRIGGER IF NOT EXISTS update_session_message_count_on_delete
AFTER DELETE ON messages
BEGIN
    UPDATE sessions SET message_count = message_count - 1
    WHERE id = old.session_id;
END;

Queries

Query NameOperationPurpose
CreateMessageINSERT RETURNINGPersist new user/assistant/tool message
GetMessageSELECTLoad single message by ID
ListMessagesBySessionSELECT ORDER BY created_at ASCLoad full conversation history for a session (chronological)
UpdateMessageUPDATEUpdate parts (streaming deltas) and finished_at (completion)
DeleteMessageDELETERemove single message (triggers session count decrement)
DeleteSessionMessagesDELETE WHERE session_idBulk delete all messages in a session
ListUserMessagesBySessionSELECT WHERE role='user' ORDER BY created_at DESCRecent user prompts for a session
ListAllUserMessagesSELECT WHERE role='user' ORDER BY created_at DESCAll user prompts across all sessions (for search/analytics)

Table 3: files

Role in Critical Path: Written during Step 6 (tool execution) when the edit, multi_edit, or write tools modify files. Each edit creates a new row with an incremented version number. Used in Step 9 (finalization) by the history service for undo snapshots.

Purpose: Version-controlled storage of complete file contents at each edit point. Enables full undo/redo by storing the entire file content per version, not diffs. This is intentionally simple — no delta compression, no patch format, just full snapshots.

CREATE Statement

CREATE TABLE IF NOT EXISTS files (
    id TEXT PRIMARY KEY,
    session_id TEXT NOT NULL,
    path TEXT NOT NULL,
    content TEXT NOT NULL,
    version INTEGER NOT NULL DEFAULT 0,
    created_at INTEGER NOT NULL,
    updated_at INTEGER NOT NULL,
    FOREIGN KEY (session_id) REFERENCES sessions (id) ON DELETE CASCADE,
    UNIQUE(path, session_id, version)
);

CREATE INDEX IF NOT EXISTS idx_files_session_id ON files (session_id);
CREATE INDEX IF NOT EXISTS idx_files_path ON files (path);
CREATE INDEX IF NOT EXISTS idx_files_created_at ON files (created_at);

Column Breakdown

ColumnTypeConstraintsDescription
idTEXTPRIMARY KEYUUID for this file version record.
session_idTEXTNOT NULL, FK → sessions(id) ON DELETE CASCADESession where the edit occurred. Scopes file history to sessions.
pathTEXTNOT NULLFull file path (absolute or project-relative). Indexed for path-based lookups.
contentTEXTNOT NULLComplete file content at this version. Not a diff — the entire file. This makes undo trivial (just restore the previous version's content) at the cost of storage.
versionINTEGERNOT NULL, DEFAULT 0Sequential version number for this file within this session. Version 0 = original content before first edit (snapshot taken before the edit tool modifies the file). Version 1 = after first edit, etc.
created_atINTEGERNOT NULLUnix timestamp of version creation.
updated_atINTEGERNOT NULLUnix timestamp, auto-updated by trigger.

Unique Constraint

UNIQUE(path, session_id, version)

Ensures no duplicate versions for the same file in the same session. This is the composite key that the version history queries rely on.

Triggers

CREATE TRIGGER IF NOT EXISTS update_files_updated_at
AFTER UPDATE ON files
BEGIN
    UPDATE files SET updated_at = strftime('%s', 'now')
    WHERE id = new.id;
END;

Queries

Query NameOperationPurpose
CreateFileINSERT RETURNINGStore a new file version snapshot
GetFileSELECTLoad single file version by ID
GetFileByPathAndSessionSELECT ORDER BY version DESC LIMIT 1Get the latest version of a file in a session (most common query — used before edits to get current content)
ListFilesBySessionSELECT ORDER BY version ASCAll file versions in a session (chronological — for history view)
ListFilesByPathSELECT ORDER BY version DESCAll versions of a specific file across sessions (reverse chronological)
ListLatestSessionFilesSELECT with MAX(version) JOINGet the most recent version of every file in a session (for session file summary)
DeleteFileDELETERemove single file version
DeleteSessionFilesDELETE WHERE session_idBulk delete all file versions in a session

How Undo Works

  1. Before editing a file, the edit tool queries GetFileByPathAndSession to get the latest version
  2. If no version exists yet, it reads the file from disk and stores it as version 0 (the "original" snapshot)
  3. After applying the edit, it stores the new content as version N+1
  4. To undo: query ListFilesByPath, get the previous version's content, write it back to disk

Table 4: read_files

Role in Critical Path: Written during Step 6 (tool execution) when the view or grep tools read files. Queried before file reads to avoid redundant reads. Part of the file tracker service referenced in Step 9 (finalization).

Purpose: Tracks which files the agent has read during a session, with timestamps. Serves as an optimization to avoid re-reading files the agent has already seen, and as metadata for the agent's context about what it has and hasn't looked at.

CREATE Statement

CREATE TABLE IF NOT EXISTS read_files (
    session_id TEXT NOT NULL CHECK (session_id != ''),
    path TEXT NOT NULL CHECK (path != ''),
    read_at INTEGER NOT NULL,
    FOREIGN KEY (session_id) REFERENCES sessions (id) ON DELETE CASCADE,
    PRIMARY KEY (path, session_id)
);

CREATE INDEX IF NOT EXISTS idx_read_files_session_id ON read_files (session_id);
CREATE INDEX IF NOT EXISTS idx_read_files_path ON read_files (path);

Column Breakdown

ColumnTypeConstraintsDescription
session_idTEXTNOT NULL, CHECK != '', FK → sessions(id) ON DELETE CASCADESession where the read occurred. Part of composite primary key.
pathTEXTNOT NULL, CHECK != ''File path that was read. Part of composite primary key.
read_atINTEGERNOT NULLUnix timestamp of the most recent read. Updated via upsert when the same file is read again.

Primary Key

PRIMARY KEY (path, session_id)

Composite primary key ensures one record per file per session. The RecordFileRead query uses ON CONFLICT ... DO UPDATE to upsert — if the file was already read, just update the timestamp.

Queries

Query NameOperationPurpose
RecordFileReadINSERT ON CONFLICT UPDATERecord or update a file read (upsert pattern)
GetFileReadSELECTCheck if a specific file has been read in this session
ListSessionReadFilesSELECT ORDER BY read_at DESCList all files read in a session (most recent first)

Statistics Queries (Virtual — No Dedicated Table)

Role in Critical Path: Not on the critical path. These are read-only analytics queries run by the crush stats subcommand (internal/cmd/stats/), querying the existing sessions and messages tables.

Purpose: Comprehensive usage analytics derived from session and message data. No dedicated stats table — all metrics are computed on-the-fly from the core tables.

Query Breakdown

Query NameSource TablesWhat It Computes
GetTotalStatssessionsLifetime totals: session count, total tokens (prompt + completion), total cost, total messages, averages per session
GetUsageByDaysessionsDaily breakdown: token counts, cost, session count, grouped by date
GetUsageByModelmessagesPer-model usage: message counts grouped by model + provider (assistant messages only)
GetUsageByHoursessionsHourly activity distribution: session counts by hour of day (0-23)
GetUsageByDayOfWeeksessionsWeekly patterns: session counts and tokens by day of week (0=Sunday)
GetRecentActivitysessionsLast 30 days: daily session counts, total tokens, cost
GetAverageResponseTimemessagesMean assistant response time: AVG(finished_at - created_at) for completed assistant messages
GetToolUsagemessages (JSON)Tool call frequency: extracts tool names from parts JSON via json_extract + json_each, counts invocations per tool
GetHourDayHeatmapsessions2D activity heatmap: session counts by (day_of_week, hour) pairs

Tool Usage Query (Notable)

This query is particularly interesting because it reaches into the JSON parts column:

SELECT
    json_extract(value, '$.data.name') as tool_name,
    COUNT(*) as call_count
FROM messages, json_each(parts)
WHERE json_extract(value, '$.type') = 'tool_call'
  AND json_extract(value, '$.data.name') IS NOT NULL
GROUP BY tool_name
ORDER BY call_count DESC;

It uses SQLite's json_each() table-valued function to unnest the parts JSON array, then filters for tool_call type entries and extracts the tool name. This gives a complete breakdown of which tools the agent uses most — across all sessions and all time.


Critical Path Data Flow

Mapping database operations to the prompt execution walkthrough:

Step 2: Coordinator receives prompt
  └── GetSessionByID → load session state

Step 5: Agent loop executes
  ├── CreateMessage (role='user') → persist user prompt
  ├── CreateMessage (role='assistant') → create assistant message record
  └── UpdateMessage (loop) → append streaming deltas to parts JSON

Step 6: Tool execution
  ├── [view tool] → RecordFileRead (upsert)
  ├── [edit tool] → GetFileByPathAndSession → get current content
  │                → CreateFile (version 0) → snapshot original
  │                → CreateFile (version N+1) → store edited content
  ├── [write tool] → CreateFile → store new file content
  ├── [todos tool] → UpdateSession → write todos JSON
  └── UpdateMessage → append tool_call + tool_result parts

Step 7: Summarization
  ├── CreateMessage (is_summary_message=1) → persist summary
  └── UpdateSession (summary_message_id) → link summary to session

Step 9: Finalization
  ├── UpdateMessage (finished_at) → mark completion timestamp
  └── UpdateSessionTitleAndUsage → increment token counts and cost

Step 10: Queue drain
  └── GetSessionByID → reload session for next queued prompt

Migration History

MigrationDateDescription
20250424200609_initial.sql2025-04-24Initial schema: sessions, messages, files tables with all triggers and indexes
20250515105448_add_summary_message_id.sql2025-05-15Added summary_message_id column to sessions for context window summarization
20250624000000_add_created_at_indexes.sql2025-06-24Added created_at indexes on all tables for sort performance
20250627000000_add_provider_to_messages.sql2025-06-27Added provider column to messages for multi-provider analytics
20250810000000_add_is_summary_message.sql2025-08-10Added is_summary_message flag to messages for summary identification
20250812000000_add_todos_to_sessions.sql2025-08-12Added todos JSON column to sessions for task tracking
20260127000000_add_read_files_table.sql2026-01-27New read_files table for file read deduplication

Generated Go Models

The sqlc-generated structs in internal/db/models.go:

type Session struct {
    ID               string         `json:"id"`
    ParentSessionID  sql.NullString `json:"parent_session_id"`
    Title            string         `json:"title"`
    MessageCount     int64          `json:"message_count"`
    PromptTokens     int64          `json:"prompt_tokens"`
    CompletionTokens int64          `json:"completion_tokens"`
    Cost             float64        `json:"cost"`
    UpdatedAt        int64          `json:"updated_at"`
    CreatedAt        int64          `json:"created_at"`
    SummaryMessageID sql.NullString `json:"summary_message_id"`
    Todos            sql.NullString `json:"todos"`
}

type Message struct {
    ID               string         `json:"id"`
    SessionID        string         `json:"session_id"`
    Role             string         `json:"role"`
    Parts            string         `json:"parts"`
    Model            sql.NullString `json:"model"`
    CreatedAt        int64          `json:"created_at"`
    UpdatedAt        int64          `json:"updated_at"`
    FinishedAt       sql.NullInt64  `json:"finished_at"`
    Provider         sql.NullString `json:"provider"`
    IsSummaryMessage int64          `json:"is_summary_message"`
}

type File struct {
    ID        string `json:"id"`
    SessionID string `json:"session_id"`
    Path      string `json:"path"`
    Content   string `json:"content"`
    Version   int64  `json:"version"`
    CreatedAt int64  `json:"created_at"`
    UpdatedAt int64  `json:"updated_at"`
}

type ReadFile struct {
    SessionID string `json:"session_id"`
    Path      string `json:"path"`
    ReadAt    int64  `json:"read_at"`
}

Notable: sql.NullString and sql.NullInt64 for nullable columns. Parts is a raw JSON string — deserialized in the message service layer (internal/message/) into typed Go structs for the content blocks.


Design Decisions

DecisionRationale
Full file snapshots, not diffsSimplicity. Undo is trivial (restore previous version content). Storage cost is acceptable for code files. No diff algorithm complexity.
JSON parts in messagesSchema flexibility. Tool calls and text can interleave freely without a rigid relational model. SQLite's json_extract/json_each still allows structured queries.
Triggers for counts/timestampsData integrity. message_count and updated_at are always consistent without requiring application-level coordination.
WAL journal modeConcurrency. Allows the TUI to read (display messages) while the agent loop writes (streaming deltas). Critical for responsive UI during generation.
Cascading deletesCleanup simplicity. Deleting a session removes all associated data in one operation.
Composite PK on read_filesNatural key. A file is either read or not in a given session — no need for a surrogate key. Upsert pattern keeps the table compact.
No separate stats tableStats are derived on-the-fly from sessions/messages. Avoids data duplication and staleness. Acceptable because stats are queried rarely (user command, not during generation).
Unix timestamps as integersSQLite doesn't have a native datetime type. Integer timestamps are compact, sortable, and work well with strftime() for formatted output.

Sources