System Design

Architecture
& Schematic

A high-fidelity breakdown of how the AI Incident Assistant orchestrates UI, model generation, and tool execution.

Diagram Module

Current anti-pattern

Historical behavior combined oversized static context and duplicated history, which inflated tokens and blurred instruction focus.

Note: Before optimization: duplicated history and oversized system context.

Diagram Module

Optimized request pipeline

Thin system prompt, LangChain-trimmed history, persisted session memory folded into the system string, then branch into model-only or tool-enabled execution.

Note: After optimization: bounded history plus durable memory injection before the decision gate.

Diagram Module

Platform context

Next.js sits in front of Supabase (auth + Postgres with RLS), Vercel AI Gateway and MCP for execution, with Turnstile and per-user quotas at the trust boundary. Chat routes also read and write the session JSON envelope for memory.

Note: Supabase persistence and auth alongside gateway and tool execution.

Conversation context retention

Large language models only see what you send on each request. If the transcript is long, older turns may be trimmed to stay within a token budget, which can make the assistant feel like it “forgot” an early incident dump. This product addresses that in three ways—without relying on pgvector or ad-hoc retrieval for the core path.

Structured session memory in Postgres (summary + key facts) lives beside the message list in the same JSON envelope. The API refreshes it from each user message and injects it into the system prompt every turn so anchors survive trimming.
History hygiene: the server skips appending the current user message twice when the client already included it, and LangChain trimming enforces a configurable budget so costs stay predictable.
CAN-style grounding: when you ask for a CAN report, the route checks that required fact keys were captured first; otherwise it asks for the missing fields instead of hallucinating after context loss.

Vector search (pgvector) is optional for future knowledge bases; the playbook here is deterministic persistence plus explicit validation—not semantic recall of raw logs.

Diagram Module

Context retention pipeline

From the browser through load/merge of the envelope, optional CAN guardrails, token trimming, and generation—with memory written back under the same row the user already owns via RLS.

Note: Yellow node: early return when a CAN report is requested but structured facts are incomplete.

Diagram Module

Memory + generation sequence

Narrow sequence emphasizing Supabase envelope I/O, the CAN short-circuit path, persistence of memory before the model call, and the client’s debounced transcript save.

Structural Layers

LAY-01

Frontend Interface

Next.js, Tailwind CSS, TypeScript

Next.js 15 App Router provides a responsive, high-performance interface. The UI is built with Tailwind CSS for a strictly minimalist, high-contrast aesthetic.

LAY-02

Orchestration Engine

Next.js Route Handlers, LangChain Core, Vercel AI SDK

The Next.js route orchestrates prompt assembly and tool-capable generation. LangChain trimMessages bounds transcript tokens; the same route deduplicates the latest user turn when the client already sent it in history, and concatenates a compact persisted memory block into the system string before calling the model.

LAY-03

Model Gateway

Vercel AI Gateway, OpenAI SDK

Vercel AI Gateway acts as a unified proxy for model providers. It handles OIDC-first authentication, failover, and cost tracking across multiple LLMs.

LAY-04

Tool Ecosystem (MCP)

MCP SDK, Node.js, Zod

The Model Context Protocol (MCP) server executes backend tools for incident CRUD, document generation, and template management via stdio transport.

LAY-05

Observability

LangSmith, Langfuse

A pluggable adapter system supports LangSmith or Langfuse for end-to-end tracing of every AI interaction and tool execution.

LAY-06

Backend / Data (Supabase)

Supabase Auth, Postgres, RLS, Turnstile

Supabase Auth issues JWT-backed sessions for the playground. Postgres stores chat_sessions (RLS per user, max 20 pruned) and user_chat_usage for rolling chat quotas enforced on POST /api/chat. Login is gated with Cloudflare Turnstile plus a signed httpOnly proof cookie for retry-friendly UX.

LAY-07

Conversation memory & CAN grounding

Postgres JSONB, server-side validation, prompt composition

Each session row stores a JSON envelope: the visible message list plus a small structured memory object (rolling incident summary and key-value facts extracted from your text). That memory is re-injected on every turn so long incidents stay anchored even when the raw transcript is trimmed for token limits. Structured outputs such as CAN reports are gated until required fact keys are present, which avoids the model inventing details after context drops off.

Runtime Execution

Diagram Module

End-to-end runtime flow

Request execution from user submit through API orchestration, decision gates, optional MCP calls, and response delivery.

Diagram Module

MCP tool selection flow

Shows how tool schemas are exposed to the model and how tool calls are selected only when needed.

Diagram Module

Operator configuration save flow

Runtime config editing path from browser to authenticated API validation and persistence.

Server Helpers

These modules are TypeScript helpers used by the playground UI and the API routes. Together they assemble prompts, classify intent, load config, and persist chat sessions through Supabase.

promptEngine.ts

Builds the system string from static instructions and rules.

contextDetector.ts

Lightweight intent and entity extraction from user messages.

promptConfigStore.ts

Reads and writes context and prompt runtime JSON.

chatSessions.ts

Maps Supabase rows to a JSON envelope: typed messages for the UI plus optional incident memory (summary and key facts).

supabase/* clients

Browser and server Supabase clients for RLS-backed CRUD and JWT refresh.

Diagram Module

Helper module relationships

How configuration flows into prompt assembly, how the chat route enforces quotas against Supabase, and how the browser client lists or saves sessions under RLS.

Architecture & Schematic

Current anti-pattern

Optimized request pipeline

Platform context

Conversation context retention

Context retention pipeline

Memory + generation sequence

Structural Layers

Frontend Interface

Orchestration Engine

Model Gateway

Tool Ecosystem (MCP)

Observability

Backend / Data (Supabase)

Conversation memory & CAN grounding

Runtime Execution

End-to-end runtime flow

MCP tool selection flow

Operator configuration save flow

Server Helpers

Helper module relationships

Architecture
& Schematic