Architecture
& Schematic
A high-fidelity breakdown of how the AI Incident Assistant orchestrates UI, model generation, and tool execution.
Current anti-pattern
Historical behavior combined oversized static context and duplicated history, which inflated tokens and blurred instruction focus.
Note: Before optimization: duplicated history and oversized system context.
Optimized request pipeline
Thin system prompt, LangChain-trimmed history, persisted session memory folded into the system string, then branch into model-only or tool-enabled execution.
Note: After optimization: bounded history plus durable memory injection before the decision gate.
Platform context
Next.js sits in front of Supabase (auth + Postgres with RLS), Vercel AI Gateway and MCP for execution, with Turnstile and per-user quotas at the trust boundary. Chat routes also read and write the session JSON envelope for memory.
Note: Supabase persistence and auth alongside gateway and tool execution.
Conversation context retention
Large language models only see what you send on each request. If the transcript is long, older turns may be trimmed to stay within a token budget, which can make the assistant feel like it “forgot” an early incident dump. This product addresses that in three ways—without relying on pgvector or ad-hoc retrieval for the core path.
- Structured session memory in Postgres (summary + key facts) lives beside the message list in the same JSON envelope. The API refreshes it from each user message and injects it into the system prompt every turn so anchors survive trimming.
- History hygiene: the server skips appending the current user message twice when the client already included it, and LangChain trimming enforces a configurable budget so costs stay predictable.
- CAN-style grounding: when you ask for a CAN report, the route checks that required fact keys were captured first; otherwise it asks for the missing fields instead of hallucinating after context loss.
Vector search (pgvector) is optional for future knowledge bases; the playbook here is deterministic persistence plus explicit validation—not semantic recall of raw logs.
Context retention pipeline
From the browser through load/merge of the envelope, optional CAN guardrails, token trimming, and generation—with memory written back under the same row the user already owns via RLS.
Note: Yellow node: early return when a CAN report is requested but structured facts are incomplete.
Memory + generation sequence
Narrow sequence emphasizing Supabase envelope I/O, the CAN short-circuit path, persistence of memory before the model call, and the client’s debounced transcript save.
Structural Layers
Frontend Interface
Orchestration Engine
Model Gateway
Tool Ecosystem (MCP)
Observability
Backend / Data (Supabase)
Conversation memory & CAN grounding
Runtime Execution
End-to-end runtime flow
Request execution from user submit through API orchestration, decision gates, optional MCP calls, and response delivery.
MCP tool selection flow
Shows how tool schemas are exposed to the model and how tool calls are selected only when needed.
Operator configuration save flow
Runtime config editing path from browser to authenticated API validation and persistence.
Server Helpers
These modules are TypeScript helpers used by the playground UI and the API routes. Together they assemble prompts, classify intent, load config, and persist chat sessions through Supabase.
Helper module relationships
How configuration flows into prompt assembly, how the chat route enforces quotas against Supabase, and how the browser client lists or saves sessions under RLS.