Short prompts were producing title-generation meta responses such as "I
am a title generator" and prompt-echo titles. This rewrites the
automatic and manual title prompts to be shorter, less self-referential,
and more focused on returning only the title text.
The change also removes the broader post-generation guard layer, updates
manual regeneration to send real conversation text instead of a meta
instruction, and keeps regression coverage focused on the slimmer prompt
contract.
This patch shows the loading spinner on the AI Bridge Sessions table
during pagination. These API calls can take a noticeable amount of time
on dogfood, and without this it shows stale data while the fetch is
fetching.
Claude with the assist 🤖
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Kayla はな <mckayla@hey.com>
Closes#22136
This pull-request implements a `<ClientFilter />` to our `Request Logs`
page for AI Bridge. This will allow the user to select a client which
they wish to filter against. Technically the backend is able to actually
filter against multiple clients at once however the frontend doesn't
currently have a nice way of supporting this (future improvement).
<img width="1447" height="831" alt="image"
src="https://github.com/user-attachments/assets/0be234e2-25f2-4a89-b971-d74817395da1"
/>
---------
Co-authored-by: Jeremy Ruppel <jeremy.ruppel@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Split out from #23000.
## Problem
`openAppInNewWindow()` opens workspace apps in a slim popup without
`noopener`, leaving `window.opener` intact. Since workspace apps can
proxy arbitrary user-hosted content under the dashboard origin, this
exposes the Coder dashboard to tabnabbing and same-origin DOM access.
We cannot simply pass `"noopener"` to `window.open()` because the WHATWG
spec mandates that `window.open()` returns `null` when `"noopener"` is
present — indistinguishable from a blocked popup — which would break the
popup-blocked error toast.
## Solution
Use a two-step approach in `openAppInNewWindow()`:
1. Open `about:blank` first (without `noopener`) so we can detect popup
blockers via the `null` return
2. If the popup succeeded, sever the opener reference with `popup.opener
= null`
3. Navigate the popup to the target URL via `popup.location.href`
This preserves popup-block detection while eliminating the security
exposure.
## Tests
A vitest case in `AppLink.test.tsx` asserts that `open_in="slim-window"`
anchors do not carry `target` or `rel` attributes (since slim-window
opening is handled programmatically via `onClick` / `window.open()`, not
anchor attributes).
---------
Co-authored-by: Kayla はな <kayla@tree.camp>
Adds skill discovery and tools to chatd so the agent can discover and
load `.agents/skills/` from workspaces, following the same pattern as
AGENTS.md instruction loading and MCP tool discovery.
## What changed
### `chattool/skill.go` — discovery, loading, and tools
- **DiscoverSkills** — walks `.agents/skills/` via `conn.LS()` +
`conn.ReadFile()`, parses SKILL.md frontmatter (name + description),
validates kebab-case names match directory names, silently skips
broken/missing entries.
- **FormatSkillIndex** — renders a compact `<available-skills>` XML
block for system prompt injection (~60 tokens for 3 skills). Progressive
disclosure: only names + descriptions in context, full body loaded on
demand.
- **LoadSkillBody** / **LoadSkillFile** — on-demand loading with path
traversal protection and size caps (64KB for SKILL.md, 512KB for
supporting files).
- **read_skill** / **read_skill_file** tools — `fantasy.AgentTool`
implementations following the same pattern as ReadFile and
WorkspaceMCPTool. Receive pre-discovered `[]SkillMeta` via closure to
avoid re-scanning on every call.
### `chatd.go` — integration into runChat
- Skills discovered in the `g2` errgroup parallel with instructions and
MCP tools.
- `skillsCache` (sync.Map) per chat+agent, same invalidation pattern as
MCP tools cache.
- Skill index injected via `InsertSystem` after workspace instructions.
- Re-injected in `ReloadMessages` callback so it survives compaction.
- `read_skill` + `read_skill_file` tools registered when skills are
present (for both root and subagent chats).
- Cache cleaned up in `cleanupStreamIfIdle` alongside MCP tools cache.
## Format compatibility
Uses the same `.agents/skills/<name>/SKILL.md` format as
[coder/mux](https://github.com/coder/mux) and
[openai/codex](https://github.com/openai/codex).
Removes the `font-semibold` class applied to unread chat titles in the
`/agents` sidebar. The unread indicator dot already provides sufficient
visual distinction for unread chats.
## Summary
Adds read/unread tracking for chats so users can see which agent
conversations have new assistant messages they haven't viewed.
## Backend Changes
- Adds `last_read_message_id` column to the `chats` table (migration
000439).
- Computes `has_unread` as a virtual column in `GetChatsByOwnerID` using
an `EXISTS` subquery checking for assistant messages beyond the read
cursor.
- Exposes `has_unread` on the `codersdk.Chat` struct and auto-generated
TypeScript types.
- Updates `last_read_message_id` on stream connect/disconnect in
`streamChat`, avoiding per-message API calls during active streaming.
- Uses `context.WithoutCancel` for the deferred disconnect write so the
DB update succeeds even after the client disconnects.
## Frontend Changes
- Bold title (`font-semibold`) for unread chats in the sidebar.
- Small blue dot indicator next to the relative timestamp.
- Suppresses unread indicator for the currently active chat via
`isActive` from NavLink.
## Design Decisions
- Only `assistant` messages count as unread — the user's own messages
don't trigger the indicator.
- No foreign key on `last_read_message_id` since messages can be deleted
(via rollback/truncation) and the column is just a high-water mark.
- Zero API calls during streaming: exactly 2 DB writes per stream
session (connect + disconnect).
- Unread state refreshes on chat list load and window focus. The
`watchChats` WebSocket optimistically marks non-active chats as unread
on `status_change` events, but does not carry a server-computed
`has_unread` field. Navigating to a chat optimistically clears its
unread indicator in the cache.
- Trim leading/trailing whitespace from metadata `value` and `error`
fields before storage
- Trimming happens before length validation so whitespace-padded values
are handled correctly
- Add `TrimWhitespace` test covering spaces, tabs, newlines, and
preserved inner whitespace
- No backfill needed (unlogged table, stores only latest value)
> 🤖 Created by a Coder Agent, reviewed by me.
Replace overly-broad `AsSystemRestricted` with purpose-built actors:
- **OAuth2 provider paths** → `AsSystemOAuth2` (13 call sites across
`tokens.go`, `registration.go`, `apikey.go`)
- **Provisioner daemon health read** → `AsSystemReadProvisionerDaemons`
(1 site in `healthcheck/provisioner.go`)
- **Provisionerd file cache paths** → `AsProvisionerd` (2 sites in
`provisionerdserver.go`, matching existing usage nearby)
<details>
<summary>Implementation notes</summary>
Each replacement actor is a strict subset of `AsSystemRestricted`. Every
DB method
at each call site is already covered by the narrower actor's
permissions:
- `subjectSystemOAuth2`: OAuth2App/Secret/CodeToken (all), ApiKey (Read,
Delete), User (Read), Organization (Read)
- `subjectSystemReadProvisionerDaemons`: ProvisionerDaemon (Read)
- `subjectProvisionerd`: File (Create, Read) plus provisionerd-scoped
resources
No new permissions added. `nolint:gocritic` comments updated to reflect
the new actors.
</details>
> 🤖 Created by a Coder Agent, reviewed by me.
`onChatReady` now waits for the store to have all fetched messages (not
just query success), so the DOM has content when the parent frame acts
on the signal. Removed the `scroll-to-bottom` round-trip from
`onChatReady`, the embed page no longer needs the parent to echo a
scroll command back.
Also moved `autoScrollRef` check before the `widthChanged` guard in
`ScrollAnchoredContainer`'s ResizeObserver. The width guard is only
relevant for scroll compensation (user scrolled up); it should never
prevent pinning to bottom when auto-scroll is active. Also tightened the
store sync guard to `storeMessageCount < fetchedMessageCount`.
Addresses chat page rendering performance. Profiling with React Profiler
showed `AgentChat` actual render times of 20–31ms (exceeding the
16ms/60fps
budget), with `StickyUserMessage` as the #1 component bottleneck at
35.7%
of self time.
## Changes
**Hoist `createComponents` to module scope** (`response.tsx`):
Previously every `<Response>` instance called `createComponents()` per
render, creating a fresh components map that forced Streamdown to
discard
its cached render tree. Now both light/dark variants are precomputed
once
at module scope.
**Wrap `StickyUserMessage` in `memo()`** (`ConversationTimeline.tsx`):
Profile-confirmed #1 bottleneck. Each instance carries
IntersectionObserver
+ ResizeObserver + scroll handlers; skipping re-render avoids all that
setup.
**Wrap `ConversationTimeline` in `memo()`**
(`ConversationTimeline.tsx`):
Prevents cascade re-renders from the parent when props haven't changed.
**Remove duplicate `buildSubagentTitles`** (`ConversationTimeline.tsx` →
`AgentDetailContent.tsx`): Was computed in both `AgentDetailTimeline`
and
`ConversationTimeline`. Now computed once and passed as a prop.
<details>
<summary>Profiling data & analysis</summary>
### Profiler Metrics
| Metric | Value |
|--------|-------|
| INP (Interaction to Next Paint) | 82ms |
| Processing duration (event handlers) | 52ms |
| AgentChat actual render | 20–31ms (budget: 16ms) |
| AgentChat base render (no memo) | ~100ms |
### Top Bottleneck Components (self-time %)
| Component | Self Time | % |
|-----------|-----------|---|
| StickyUserMessage | 11.0ms | 35.7% |
| ForwardRef (radix-ui) | 7.4ms | 24.0% |
| Presence (radix-ui) | 2.0ms | 6.5% |
| AgentChatInput | 1.4ms | 4.5% |
### Decision log
- Chose module-scope precomputation over `useMemo` for
`createComponents`
because there are only two possible theme variants and they're static.
- Did not add virtualization — sticky user messages + scroll anchoring
make it complex. The memoization fixes should be measured first.
- Did not wrap `BlockList` in `memo()` — the React Compiler (enabled for
`pages/AgentsPage/`) already auto-memoizes JSX elements inside it.
- Phase 2 (verify React Compiler effectiveness on
`parseMessagesWithMergedTools`)
and Phase 3 (radix-ui Tooltip lazy-mounting) deferred to follow-up PRs.
</details>
Add a per-MCP-server `model_intent` toggle that wraps tool schemas with
a
`model_intent` field, requiring the LLM to provide a human-readable
description of each tool call's purpose. The intent string is shown as a
status label in the UI instead of opaque tool names, and is
transparently
stripped before the call reaches the remote MCP server.
Built-in tools have rich specialized renderers (terminal blocks, file
diffs,
etc.) and don't need this. MCP tools hit `GenericToolRenderer` which
only
shows raw tool names and JSON — that's where model_intent adds value.
The model learns what to provide via the JSON Schema `description` on
the
`model_intent` property itself — no system prompt changes needed.
<details>
<summary>Implementation details</summary>
### Architecture
Inspired by the `withModelIntent()` pattern from `coder/blink`, adapted
for
Go + React. The wrapping is entirely in the `mcpclient` layer — tool
implementations never see `model_intent`.
**Schema wrapping** (`mcpToolWrapper.Info()`): When enabled, wraps the
original tool parameters under a `properties` key and adds a
`model_intent`
string field with a rich description that teaches the model inline.
**Input unwrapping** (`mcpToolWrapper.Run()`): Strips `model_intent` and
unwraps `properties` before forwarding to the remote MCP server. Handles
three input shapes models may produce:
1. `{ model_intent, properties: {...} }` — correct format
2. `{ model_intent, key: val, ... }` — flat, no wrapper
3. Malformed — falls through gracefully
**Frontend extraction**: `streamState.ts` extracts `model_intent` from
incrementally parsed streaming JSON. `messageParsing.ts` extracts it
from
persisted tool call args.
**UI rendering**: `GenericToolRenderer` shows the capitalized intent
string
as the primary label when available, falling back to the raw tool name.
### Changes
- Database: `model_intent` boolean column on `mcp_server_configs`
- SDK: `ModelIntent` field on config/create/update types
- API: pass-through in create/update handlers + converter
- mcpclient: schema wrapping in `Info()`, input unwrapping in `Run()`
- Frontend: extraction from streaming + persisted args
- UI: intent label in `GenericToolRenderer`, toggle in admin panel
- Tests: 6 new tests (schema wrapping, unwrapping, passthrough,
fallback)
### Decision log
- **Option lives on MCPServerConfig, not model config**: Built-in tools
already have rich renderers; only MCP tools benefit from model_intent.
- **No system prompt changes**: The JSON Schema `description` on the
`model_intent` property teaches the model inline.
- **Pointer bool on update request**: Follows existing pattern (`*bool`)
so PATCH requests don't reset the value when omitted.
</details>
Fixes expired MCP OAuth2 tokens causing 401 errors and stale
`auth_connected` status in the UI.
When users authenticate MCP servers (e.g. GitHub) via OAuth2, the access
token and refresh token are stored in the database. However, when the
access token expired, nothing refreshed it anywhere:
- **chatd**: sent the expired token as-is, getting a 401 and skipping
the MCP server
- **list/get endpoints**: reported `auth_connected: true` just because a
token record existed, regardless of expiry
## Changes
### Shared utility: `mcpclient.RefreshOAuth2Token`
Pure function that uses `golang.org/x/oauth2` `TokenSource` to check if
a token is expired (or within 10s of expiry) and refresh it. No DB
dependency — callers handle persistence.
### chatd (`coderd/x/chatd/chatd.go`)
Before calling `mcpclient.ConnectAll`, refreshes expired tokens.
Persists new credentials to the database. Falls back to the old token if
refresh fails.
### List/get MCP server endpoints (`coderd/mcp.go`)
Both `listMCPServerConfigs` and `getMCPServerConfig` now attempt refresh
when checking `auth_connected`. If the token is expired:
- **Has refresh token**: attempt refresh, persist result, report
`auth_connected` based on success
- **No refresh token**: report `auth_connected: false` if expired
This means the UI accurately reflects whether the user's token is
actually usable, rather than just whether a record exists.
<details>
<summary>Design notes</summary>
- `RefreshOAuth2Token` lives in `mcpclient` to avoid circular imports
(`coderd` → `chatd` → `mcpclient` is fine; `chatd` → `coderd` would be
circular).
- DB persistence is handled by each caller with their own authz context
(`AsSystemRestricted` in both cases).
- The `buildAuthHeaders` warning in mcpclient about expired tokens is
kept as defense-in-depth logging.
</details>
<!--
If you have used AI to produce some or all of this PR, please ensure you
have read our [AI Contribution
guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING)
before submitting.
-->
Adds the Tunneler state machine and logic for handling build updates.
This is a partial implementation and tests. Further PRs will fill out
the other event types.
Relates to GRU-18
## Summary
Fixes two issues with pinned chat drag-to-reorder in the agents sidebar:
1. **Missing optimistic reorder**: After dragging a pinned chat to a new
position, the sidebar flashed back to the old order while waiting for
the server response, then snapped to the new order. Now the react-query
cache is updated optimistically in `onMutate` so the reorder is visually
instant.
2. **Post-drag navigation on upward drag**: Dragging a pinned chat
**upward** caused unintended navigation to that chat on drop. The
browser synthesizes a click from the final `pointerup`, and the previous
suppression listener was attached too low in the DOM tree to reliably
intercept it after items reorder.
## Changes
### Optimistic cache reorder (`site/src/api/queries/chats.ts`)
`reorderPinnedChat.onMutate` now extracts all pinned chats from the
cache, reorders them to match the new position, and writes the updated
`pin_order` values back via `updateInfiniteChatsCache`. The server
response still reconciles on `onSettled`.
### Document-level click suppression (`AgentsSidebar.tsx`)
- Moved click suppression from the pinned container div to
`document.addEventListener('click', handler, true)`. This fires before
any element-level handlers regardless of DOM reordering during drag.
Scoped via `pinnedContainerRef.contains(target)` so it only affects
pinned rows.
- Replaced `recentDragRef` (boolean cleared by `setTimeout(100ms)`) with
`lastDragEndedAtRef` (`performance.now()` timestamp checked against a
300ms window). This eliminates the race where the timeout fires before
the synthetic click arrives.
---
PR generated with Coder Agents
Add a keyboard shortcut (ArrowUp on empty input) to start editing the
most recent user message, mirroring Mux's behavior. The shortcut reuses
the existing history-edit flow triggered by the pencil button.
Extract a shared `getEditableUserMessagePayload` helper so the pencil
button and the new shortcut both derive the edit payload identically.
Derive the last editable user message during render in
`AgentDetailInput` from the existing store selectors, keeping the
implementation Effect-free and React Compiler friendly.
## Problem
Chats with a persisted `agent_id` binding hang indefinitely when the
workspace is stopped. The stale agent row still exists in the DB, so
`ensureWorkspaceAgent` succeeds, but the dial blocks forever in
`AwaitReachable`. The MCP discovery goroutine used an unbounded context,
so `g2.Wait()` never returned and the LLM never started.
## Fix
Three targeted changes restore the pre-binding behavior where stopped
workspaces degrade gracefully instead of blocking:
1. **`dialWithLazyValidation`**: "no agents in latest build" is now a
terminal fast-fail — the hanging dial is canceled and
`errChatHasNoWorkspaceAgent` returned immediately, instead of falling
through to `waitForOriginalDial`.
2. **Pre-LLM workspace setup**: MCP discovery and instruction
persistence gate on `workspaceAgentIDForConn` before attempting any
dial. MCP discovery is bounded by a 5s timeout and checks the in-memory
tool cache first (using the cheap cached agent from
`ensureWorkspaceAgent`), so the common subsequent-turn path has zero DB
queries.
3. **`persistInstructionFiles`**: tracks whether the workspace
connection succeeded and skips sentinel persistence on failure, so the
next turn retries if the workspace is restarted.
## Scenarios
**Running workspace, subsequent turn (hot path):** MCP cache hit via
in-memory cached agent. Zero DB queries, zero dials. Unchanged from
#23274.
**Stopped workspace, persisted binding (the bug):** MCP cache hit (stale
descriptors, fine — they fail at invocation). Pre-LLM setup completes
instantly. Tool invocation enters `dialWithLazyValidation`, dial fails
or hangs, validation discovers no agents, returns
`errChatHasNoWorkspaceAgent`. Model sees the error and can call
`start_workspace`.
**New chat, running workspace:** `ensureWorkspaceAgent` resolves via
latest-build, persists binding. MCP discovery dials and caches tools.
**New chat, stopped workspace:** `ensureWorkspaceAgent` finds no agents,
returns `errChatHasNoWorkspaceAgent`. Pre-LLM setup skips. LLM starts
with built-in tools only.
**Rebuilt workspace (agent switched):** MCP cache hit with stale agent
(harmless for one turn). Tool invocation dials stale agent, fails fast,
`dialWithLazyValidation` switches to new agent, persists updated
binding.
**Workspace restarted after stop:** No sentinel was persisted during the
stopped turn, so instruction persistence retries. Agent binding switches
to the new agent via `workspaceAgentIDForConn`.
**Transient DB error during validation:** Not
`errChatHasNoWorkspaceAgent`, so `dialWithLazyValidation` falls through
to `waitForOriginalDial` (cannot prove stale). No false positive.
**Tool invocation on stopped workspace:** `getWorkspaceConn` calls
`ensureWorkspaceAgent` (returns stale row), then
`dialWithLazyValidation` validation discovers no agents, returns
`errChatHasNoWorkspaceAgent`, cached state cleared, error returned to
model.
## Problem
The agent chat delayed-startup banner ("Response startup is taking
longer than expected") could appear even though the model was already
streaming.
The root cause is in `Subscribe()`: `message_part` events were delivered
via the fast local in-process stream, while `status` events were
delivered via PostgreSQL pubsub. Both feed into the same `select`
statement, and Go's `select` picks whichever channel is ready first —
there is no ordering guarantee between channels. So a `message_part`
could outrun the `status=running` that logically precedes it.
The frontend saw content arrive while it still thought the chat was
pending, triggering the banner.
## Fix
Also forward `status` events from the local channel, alongside
`message_part`.
Both event types already travel through the same FIFO subscriber
channel: `publishStatus()` is called before the first `message_part`, so
channel ordering guarantees the frontend sees `status=running` before
any content.
Pubsub still delivers a duplicate `status` event later; the frontend
deduplicates it (`setChatStatus` is idempotent — it early-returns when
the status hasn't changed).
## Summary
Add site-wide banners for AI Governance seat usage thresholds:
1. **90% capacity warning (admin-only):** When actual AI Governance
seats are ≥90% and <100% of the license limit, admins see:
> "You have used 90% of your AI governance add-on seats."
2. **Over-limit banner (admin-only):** When actual seats exceed the
license limit, admins see a prominent warning:
> "Your organization is using {actual} / {limit} AI Governance user
seats ({X}% over the limit). Contact sales@coder.com"
- Uses floor whole percentage (Go int division / `Math.floor`)
- Includes a clickable `mailto:sales@coder.com` link
## Summary
Adds a "Generate new title" action that lets users manually regenerate a
chat's title using richer conversation context than the automatic
first-message title path.
## Changes
### Backend
- **New endpoint:** `POST
/api/experimental/chats/{chatID}/title/regenerate` returns the updated
Chat with a regenerated title
- **Manual title algorithm:** Extracts useful user/assistant text turns
→ selects first user turn + last 3 turns → builds context with gap
markers → renders prompt with anti-recency guidance → calls lightweight
model → normalizes output
- **Helpers:** `extractManualTitleTurns`,
`selectManualTitleTurnIndexes`, `buildManualTitleContext`,
`renderManualTitlePrompt`, `generateManualTitle` — all private, with the
public `Server.RegenerateChatTitle` method
- **SDK:** `ExperimentalClient.RegenerateChatTitle(ctx, chatID) (Chat,
error)`
- Persists title via existing `UpdateChatByID` and broadcasts
`ChatEventKindTitleChange`
### Frontend
- API client method + React Query mutation with cache invalidation
- "Generate new title" menu item (with wand icon) in both TopBar and
Sidebar dropdown menus
- Loading/disabled state while regeneration is in-flight
- Error toast on failure
- Stories updated for both menus
### Tests
- `quickgen_test.go`: Table-driven tests for all 4 helper functions
(turn extraction, index selection, context building, prompt rendering)
- `exp_chats_test.go`: Handler tests (ChatNotFound,
NotFoundForDifferentUser, NoDaemon)
## Design notes
- The existing auto-title path (`maybeGenerateChatTitle`, `titleInput`)
is completely unchanged
- Manual regeneration uses richer context (first user turn + last 3
turns + gap markers) vs the auto path's single first message
- Endpoint is experimental and marked with `@x-apidocgen {"skip": true}`
Converts the Agents page transcript from reverse-scroll
(\`flex-col-reverse\`) to normal top-to-bottom document flow with
explicit auto-scroll pinning.
The previous \`flex-col-reverse\` approach (from #23451) required
browser-specific workarounds for Chrome vs Firefox \`scrollTop\` sign
conventions, and compensation logic that could fight user scroll intent
during streaming. This replaces it with standard scroll math
(\`scrollHeight - scrollTop - clientHeight\`) while keeping the proven
\`ScrollAnchoredContainer\` observer machinery.
Changes:
- \`AgentDetailView.tsx\`: layout flip to \`flex-col\`, standard bottom
detection, explicit prepend restoration via \`pendingPrependRef\`
snapshot, sentinel moved inside content wrapper, initial mount bottom
pin, \`scrollToBottomRef\` imperative API for send/edit, user-interrupt
guard with \`isNearBottom\` check
- \`AgentDetailView.stories.tsx\`: all scroll stories updated for
normal-flow semantics, new \`ScrollAnchorPreservedOnOlderHistoryLoad\`
story with deterministic \`IntersectionObserver\` mock
- \`AgentsSkeletons.tsx\` + \`AgentDetailLoadingView\`: removed
\`flex-col-reverse\` so loading state matches the real transcript layout
- \`AgentDetail.tsx\`: replaced \`scrollTop = 0\` on send/edit with
imperative \`scrollToBottomRef\` call
Supersedes #23576 (which was reverted in #23638). Same behavioral goals
but keeps scroll logic local to \`ScrollAnchoredContainer\` rather than
extracting a separate hook.
Adds preview cards for the `spawn_computer_use_agent` tool. Spawning
that agent now renders a rich saying "Spawning computer use sub-agent".
The "waiting for" tool now displays an inline desktop preview which can
be clicked to reveal the desktop in the sidebar.
https://github.com/user-attachments/assets/e486ca0e-a569-4142-bb12-db3b707967b8
https://github.com/user-attachments/assets/bd5d12a1-61b3-4b7d-83b6-317bdfb60b3c
## Summary
Adds pinned chats to the agents page sidebar with server-side
persistence and drag-to-reorder. Users can pin/unpin chats via the
context menu, and pinned chats appear in a dedicated "Pinned" section
above the time-grouped list.
## Database
Migration `000453_chat_pin_order`: adds `pin_order integer DEFAULT 0 NOT
NULL` column on `chats` (0 = unpinned, 1+ = pinned in display order).
Three SQL queries handle pin operations server-side using CTEs with
`ROW_NUMBER()`:
- `PinChatByID`: normalizes existing orders and appends to end
- `UnpinChatByID`: sets target to 0 and compacts remaining pins
- `UpdateChatPinOrder`: shifts neighbors, clamps to `[1, pinned_count]`
All queries exclude archived chats. `ArchiveChatByID` clears `pin_order`
on archive. The handler rejects pinning archived chats with 400.
## Backend
Pin/unpin/reorder go through the existing `PATCH
/api/experimental/chats/{chat}` via the `pin_order` field on
`UpdateChatRequest`. The handler routes based on current pin state:
`pin_order == 0` unpins, `> 0` on an already-pinned chat reorders, `> 0`
on an unpinned chat appends to end.
## Frontend
- `pinChat` / `unpinChat` / `reorderPinnedChat` optimistic mutations
using shared `isChatListQuery` predicate
- Sidebar renders Pinned section above time groups, excludes pinned
chats from time groups
- Pin/Unpin context menu items (hidden for child/delegated chats)
- `@dnd-kit/core` + `@dnd-kit/sortable` for drag-to-reorder with
`MouseSensor`, `TouchSensor`, and `KeyboardSensor`
- Local pin-order override prevents flash on drop; click blocker
prevents NavLink navigation after drag
---
*PR generated with Coder Agents*
The hook lived at src/hooks/, outside the React Compiler scope.
It returned a new object literal every render, causing three
handler guards and two downstream JSX guards in AgentChatInput
(233 cache slots) to always miss.
Move the hook to src/pages/AgentsPage/hooks/ where the compiler
processes it. The compiler auto-memoizes the return object, so
manual useMemo is unnecessary.
Also replace ctorRef.current render-time access with a useState
lazy initializer. The ref access caused a CompileError that
would have prevented compilation. Browser API availability is
constant, so useState captures it once.
Coder's chat (chatd) can now discover and use MCP servers configured in
a workspace's `.mcp.json` file. This brings project-specific tooling
(GitHub, databases, docs servers, etc.) into the chat without any manual
configuration.
## How it works
The workspace agent reads `.mcp.json` from the workspace directory (same
format Claude Code uses), connects to the declared MCP servers —
spawning child processes for stdio servers and connecting over the
network for HTTP/SSE — and caches their tool lists. Two new agent HTTP
endpoints expose this:
- `GET /api/v0/mcp/tools` returns the cached tool list (supports
`?refresh=true`)
- `POST /api/v0/mcp/call-tool` proxies calls to the correct server
On each chat turn, chatd calls `ListMCPTools` through the existing
`AgentConn` tailnet connection, wraps each tool as a
`fantasy.AgentTool`, and adds them to the LLM's tool set alongside
built-in and admin-configured MCP tools. Tool names are prefixed with
the server name (`github__create_issue`) to avoid collisions.
Failed server connections are logged and skipped — they never block the
agent or break the chat. Child stdio processes are terminated on agent
shutdown.
The React Compiler failed to memoize the messages derivation
chain because a useDashboard() hook call sat between the
messages computation and its consumer (getLatestContextUsage).
An IIFE around the context usage logic also fragmented the
dependency chain.
Replacing the IIFE with a ternary and reordering the non-hook
computation before the hook call lets the compiler group
messages + getLatestContextUsage into a single cache guard
keyed on messagesByID and orderedMessageIDs.
Two test fixtures (devcontainer-resources, multiple-agents-multiple-envs)
were generated before terraform-provider-coder v2.15.0 added the
merge_strategy attribute to coder_env. Running generate.sh with the
current provider adds merge_strategy: "replace" (the default) to all
coder_env resources, causing unstable diffs on every regeneration.
The React Compiler guarded buildStreamTools on the whole
streamState ref, which changes on every text chunk. Refactoring
the function to accept toolCalls and toolResults directly lets
the compiler guard on those sub-fields, which are stable during
text-only streaming.
Before: $[0] !== streamState (misses every text chunk)
After: $[0] !== toolCalls || $[1] !== toolResults (passes when
only blocks change)
Verified: 181 functions compile, 0 diagnostics. Reference
stability tests confirm toolCalls/toolResults retain identity
across text-part updates and change when tool data updates.
*Problem:* `publishChatPubsubEvent` was constructing a partial
`codersdk.Chat` that omitted `LastModelConfigID` and other fields. Go's
zero-value UUID caused the sidebar to show "Default model" for chats
received via SSE.
*Solution:*
- Extracted `convertChat`/`convertChats` from `exp_chats.go` into
`db2sdk.Chat`/`db2sdk.Chats`, alongside existing `ChatMessage`,
`ChatQueuedMessage`, and `ChatDiffStatus` converters.
`publishChatPubsubEvent` now calls `db2sdk.Chat(chat, nil)` instead of
maintaining its own copy of the conversion logic
- Added backend integration test
`TestWatchChats/CreatedEventIncludesAllChatFields`
- Added frontend regression tests for nil-UUID and valid model config ID
cases
> 🤖 Created by Coder Agents, reviewed by this human.
During streaming, StreamingOutput's compiler cache guard misses
every chunk because streamState.blocks and streamTools are new
references. This causes renderBlockList to recreate all child JSX
elements, and React calls every child function even for blocks
that finished streaming.
Wrapping SmoothedResponse and ReasoningDisclosure in React.memo
lets React skip the function call entirely when props are stable.
For N completed response blocks and M completed thinking blocks,
this reduces per-chunk function calls from N+M+1 to 1. The
compiler still compiles both inner functions cleanly (6 and 12
cache slots respectively, zero diagnostics).
If the desktop viewer component was hidden, for example after collapsing
the sidebar, the next time it was shown the viewer would be blank. This
PR fixes that.
Adds an `enabled` toggle to the chat model admin create/edit form so
admins
can disable a model without soft-deleting it. Disabled models stay
visible
in admin settings but stop appearing in user-facing model selectors.
The backend already supported this (`chat_model_configs.enabled` column,
filtered queries, and SDK fields). This change wires it into the admin
UI
and adds coverage on both sides.
**Backend:** three new subtests in `coderd/exp_chats_test.go` verifying
the visibility contract (admin sees disabled models, non-admin doesn't,
update-to-disabled preserves the record).
**Frontend:** `enabled` field added to form logic and seeded from the
existing model (defaults to `true` for new models). A Switch+Tooltip
control renders in the form header, matching the MCP Server panel
pattern.
Two interaction stories cover the create-disabled and toggle-existing
flows.
Add theme synchronization, navigation blocking, scroll-to-bottom
handling, and chat-ready signaling to the agent embed page. The parent
frame can now set light/dark theme via postMessage or query param, and
ThemeProvider skips its own class manipulation when the embed marker is
present. Navigation attempts that leave the embed route are intercepted
and forwarded to the parent frame. The scroll container ref is lifted
to the layout so the parent can request scroll-to-bottom.
## Problem
When the user sends a message while the agent is actively streaming a
response, `handleSend` called `store.clearStreamState()`
**unconditionally before** the POST request. If the server queues the
message (`response.queued = true` because the agent is busy), the
in-progress stream output is immediately wiped from the UI. The full
text only reappears once the agent finishes and the durable message
arrives via WebSocket — causing a visible cutoff mid-stream.
## Fix
Move `clearStreamState()` from before the POST to **after** the
response, gated behind `!response.queued`:
- **Queued sends** (`response.queued === true`): `clearStreamState()` is
never called. The stream continues uninterrupted. The WebSocket `status`
handler already clears stream state when the chat transitions to
`"pending"` / `"waiting"` after the queued message is dequeued.
- **Non-queued sends** (`response.queued === false`):
`clearStreamState()` + `upsertDurableMessage()` fire immediately after
the POST, same net behavior as before.
- **Edit and promote paths**: Unchanged — those are intentional
interruptions where eager clearing is correct.
### Additional behavior changes (both improvements)
1. **Failed sends no longer wipe stream state.** Previously
`clearStreamState()` ran before the `try` block, so a network error
still wiped the agent's in-progress output. Now the `catch` re-throws
before reaching `clearStreamState()`, preserving the stream on failure.
2. **`clearStreamState()` fires for all non-queued responses**, not just
those with a `message` body. The original guard was `!response.queued &&
response.message`; now `clearStreamState()` is under `!response.queued`
while `upsertDurableMessage` retains the `response.message` check. The
server always sets `message` for non-queued responses, so this is a
no-op in practice but is semantically correct.
## Testing
**AgentDetail.stories.tsx**: New `StreamingSurvivesQueuedSend` story
exercises the full flow — mocks `createChatMessage` to return `{ queued:
true }`, delivers streaming text via WebSocket, sends a message through
the UI, and asserts the streaming text remains visible.
Array.from(graphemeSegmenter.segment(text)) materializes the
entire text into an array before iterating, even though the loop
breaks early at the visible prefix length. During streaming at
60fps, this makes each frame O(full text) instead of O(prefix).
Benchmark on 5000-char text with 200-char prefix: 22.6x faster
(1.44ms to 0.06ms per call, saving 8.3% of the frame budget).
The fallback codepoint path had the same issue with Array.from.
_Generated by mux but reviewed by a human_
Several stories computed dates relative to `dayjs()` / `new Date()` at
render time, causing snapshot text to shift daily. I ran into this on my
PRs.
This adds an optional `now` prop to `DateRangePicker`,
`TemplateInsightsControls`, and `CreateTokenForm` so stories can inject
a deterministic clock without global mocking. License stories replace
the misleadingly-named `FIXED_NOW = dayjs().startOf("day")` with
absolute timestamps. All fixed timestamps use noon UTC to avoid timezone
boundary issues.
Affected stories:
- `AgentSettingsPageView`: Usage Date Filter, Usage Date Filter Refetch
Overlay
- `LicenseCard`: Expired/future AI Governance variants, Not Yet Valid
- `LicensesSettingsPage`: Shows Addon Ui For Future License Before Nbf
- `TemplateInsightsControls`: Day
- `CreateTokenPage`: Default
> **PR Stack**
> 1. **#23351** ← `#23282` *(you are here)*
> 2. #23282 ← `#23275`
> 3. #23275 ← `#23349`
> 4. #23349 ← `main`
---
## Summary
`chatretry.Retry()` used pure exponential backoff (1 s, 2 s, 4 s, …) and
never consulted provider `Retry-After` headers. Fantasy's
`ProviderError` carries `ResponseHeaders` including `Retry-After`, but
`chaterror.Classify()` only parsed error text and silently dropped the
structured transport metadata.
This makes `Retry-After` a first-class signal in the classification →
retry pipeline.
<img width="853" height="346" alt="image"
src="https://github.com/user-attachments/assets/65f012b6-8173-43d2-957e-ab9faddea525"
/>
## Changes
### `coderd/chatd/chaterror/classify.go`
- Added `RetryAfter time.Duration` field to `ClassifiedError` — a
normalized minimum retry delay derived from provider response metadata.
- `Classify()` now calls `extractProviderErrorDetails()` before falling
back to text heuristics. Structured `ProviderError.StatusCode` takes
priority over regex extraction.
- `normalizeClassification()` preserves and clamps `RetryAfter`.
### `coderd/chatd/chaterror/provider_error.go` (new)
Provider-specific extraction, isolated from the text-based
classification logic:
- `extractProviderErrorDetails()` unwraps `*fantasy.ProviderError` from
the error chain via `errors.As`.
- `retryAfterFromHeaders()` parses headers in priority order:
1. `retry-after-ms` (OpenAI-specific, millisecond precision)
2. `retry-after` (standard HTTP — integer seconds or HTTP-date)
- Case-insensitive header key lookup.
### `coderd/chatd/chatretry/chatretry.go`
- `effectiveDelay(attempt, classified)` computes `max(Delay(attempt),
classified.RetryAfter)` — the provider hint acts as a floor without
weakening the local exponential backoff.
- `Retry()` now uses `effectiveDelay` and passes the effective delay to
both `onRetry(...)` and the sleep timer, so downstream payloads, logs,
and the frontend countdown stay aligned automatically.
### Tests
- `classify_test.go`: Structured provider status + `Retry-After`
extraction, `retry-after-ms` priority, HTTP-date parsing, invalid header
fallback, `WithProvider` preservation.
- `chatretry_test.go`: Retry-after-as-floor semantics — longer hint
wins, shorter hint keeps base delay.
## Design notes
- **No SDK/API/frontend changes needed.** `codersdk.ChatStreamRetry`
already carries `DelayMs` and `RetryingAt`, and the frontend already
consumes them. The fix is purely in the server-side delay computation.
- **Existing retryability rules unchanged.** This fixes *when* we sleep,
not *whether* an error is retryable.
- **Provider hint is a floor:** `max(baseDelay, RetryAfter)` ensures we
never retry earlier than the provider asks, and never weaken our own
backoff curve.
## Changes
- **Commit 1**: Remove 17 unnecessary `//nolint` directives:
- `//nolint:varnamelen` — linter not active
- `//nolint:unused` on exported `SlimUnsupported`
- `//nolint:govet` in `coderd/httpmw/csrf` — no longer fires
- `//nolint:revive` on functions refactored since the nolint was added
- `//nolint:paralleltest` citing Go 1.22 loop variable capture
(obsolete)
- Bare `//nolint` narrowed to specific `//nolint:gocritic` with
justification
- **Commit 2**: Fix root causes behind 5 dangerous nolint suppressions:
- Add `MinVersion: tls.VersionTLS12` to TLS client config (removes
`gosec` G402)
- Delete trivial unexported wrappers `apiKey()`/`normalizeProvider()` in
chatprovider (removes `revive` confusing-naming)
- Add doc comments to `StartWithAssert` and `Router` (removes `revive`
exported)
- Rename unused parameters to `_` in integration test helpers
> 🤖 This PR was created using Coder Agents and reviewed by me.
- Suppress informational `log.Printf` messages from the metrics scanner
when stdout is not a TTY (i.e. piped via `atomic_write` in `make gen` or
CI)
- Genuine warnings (`warnf`) still print unconditionally so real
problems remain visible
- `log.Fatalf` for fatal errors is unchanged
> 🤖 Created by Coder Agents and reviewed by a human
Admins can now control whether the built-in Coder Agents default system
prompt is prepended to their custom instructions, rather than having the
custom prompt silently replace the default.
**Changes:**
- New `include_default_system_prompt` boolean toggle (defaults to `true`
for existing deployments) stored as a site config key — no migration
needed.
- GET `/api/experimental/chats/config/system-prompt` returns the toggle
state, the custom prompt, and a preview of the built-in default.
- PUT persists both the toggle and custom prompt atomically in a single
transaction.
- `resolvedChatSystemPrompt()` composes `[default?, custom?]` joined by
`\n\n`, falling back to the built-in default on DB errors.
- Settings UI adds a Switch toggle with conditional helper text and a
"Preview" button that shows the built-in default prompt via the existing
`TextPreviewDialog`.
- Comprehensive test coverage: 15 subtests covering toggle behavior,
prompt composition matrix, auth boundaries, and integration with chat
creation.
- Adds `GET /api/experimental/chats/by-workspace` endpoint that returns
workspace_id → latest chat_id mapping
- Modifies FE to fetch this alongside the workspace list, gated on
`agents` experiment and render an "Agent" badge similar to the existing
"Task" badge in `WorkspacesTable`
- Badge links to the "latest chat" linked to the given workspace.
Notes:
- Intentionally uses `fetchWithPostFilter` for RBAC to decouple from
workspaces API — will migrate to `workspaces_expanded` view later.
- If users have multiple chats linked to the same workspace, the badge
will link to the most recently updated one.
> 🤖 This PR was created with the help of Coder Agents, and has been
reviewed by my human. 🧑💻
## Summary
Adds an entitlement-gated **AI add-on** column to both the **Users**
table and the **Organization Members** table. When
`ai_governance_user_limit` is entitled, each row shows whether the user
is consuming an AI seat.
## Background
The AI governance add-on tracks which users are consuming AI seats.
Admins need visibility into per-user seat consumption directly from the
user management tables. This change surfaces that information through
both the site-wide Users table and the per-organization Members table,
gated behind the `ai_governance_user_limit` entitlement so the column
only appears when the feature is licensed.
## Implementation
### Backend
- **New SQL query** `GetUserAISeatStates`
(`coderd/database/queries/aiseatstate.sql`) — returns user IDs consuming
an AI seat, derived from:
- Users with entries in `aibridge_interceptions` (AI Bridge usage)
- Users who own workspaces with `has_ai_task = true` builds (AI Tasks
usage)
- **SDK types** — added `has_ai_seat: boolean` to `codersdk.User` and
`codersdk.OrganizationMemberWithUserData`
- **Handler wiring** — both the Users list endpoint (`coderd/users.go`)
and all Members endpoints (`coderd/members.go`) query AI seat state per
page of user IDs and populate the response field
- **dbauthz** — per-user `ActionRead` checks on `ResourceUserObject`
### Frontend
- **Shared `AISeatCell` component**
(`site/src/modules/users/AISeatCell.tsx`) — green `CircleCheck` for
consuming, gray `X` for non-consuming
- **`TableColumnHelpTooltip`** — extended with `ai_addon` variant with
tooltip: *"Users with access to AI features like AI Bridge, Boundary, or
Tasks who are actively consuming a seat."*
- **Column visibility** gated behind
`useFeatureVisibility().ai_governance_user_limit`
## Validation
- Backend: dbauthz full method suite (`TestMethodTestSuite`) passes
including new `GetUserAISeatStates` test
- Backend: `TestGetUsers`, `TestUsersFilter`, CLI golden file tests pass
- Frontend: 7/7 tests pass across `UsersPage.test.tsx` and
`OrganizationMembersPage.test.tsx` (column visibility gating both
directions)
- `go build ./coderd/...` compiles clean
- `pnpm --dir site run lint:types` passes
- `make gen` clean
## Risks
- **Pagination performance**: The AI seat query is scoped to the current
page's user IDs (not a full table scan), keeping it efficient for
paginated views.
- **Semantic scope**: The workspace-side AI seat derivation uses "any
build with `has_ai_task = true`" rather than "latest build only". If the
product intent is latest-build-only, this can be tightened in a
follow-up.
---
_Generated with `mux` • Model: `anthropic:claude-opus-4-6` • Thinking:
`xhigh` • Cost: `$27.25`_
<!-- mux-attribution: model=anthropic:claude-opus-4-6 thinking=xhigh
costs=27.25 -->
*Disclaimer: implemented by a Coder Agent using Claude Opus 4.6,
reviewed by me.*
Replace the transitional soft warning message:
> AI Bridge is now Generally Available in v2.30. In a future Coder
version, your deployment will require the AI Governance Add-On to
continue using this feature. Please reach out to your account team or
sales@coder.com to learn more.
with the definitive requirement message:
> The AI Governance Add-On is required to use AI Bridge. Please reach
out to your account team or sales@coder.com to learn more.
Updated in:
- `enterprise/coderd/license/license.go`
- `enterprise/coderd/license/license_test.go` (2 occurrences)
## Summary
Adds a process-wide cache for three hot database queries in `chatd` that
were hitting Postgres on **every chat turn** despite returning
rarely-changing configuration data:
| Query | Before (50k turns) | After | Reduction |
|---|---|---|---|
| `GetEnabledChatProviders` | ~98.6k calls | ~500-1000 | ~99% |
| `GetChatModelConfigByID` | ~49.2k calls | ~500-1000 | ~98% |
| `GetUserChatCustomPrompt` | ~46.7k calls | ~1000-2000 | ~97% |
These were identified via `coder exp scaletest chat` (5000 concurrent
chats × 10 turns) as the dominant source of Postgres load during chat
processing.
## Design
Follows the established **webpush subscription cache pattern**
(`coderd/webpush/webpush.go`):
- `sync.RWMutex` + `tailscale.com/util/singleflight` (generic) +
generation-based stale prevention + TTL
- 10s TTL for provider/model config, 5s TTL for user prompts
- Negative caching for `sql.ErrNoRows` on user prompts (the common case
— most users don't set custom prompts)
- Deep-clones `ChatModelConfig.Options` (`json.RawMessage` = `[]byte`)
on both store and read paths
### Invalidation
Single pubsub channel (`chat:config_change`) with kind discriminator for
cross-replica cache invalidation. Seven publish points in
`coderd/chats.go` cover all admin mutation endpoints
(create/update/delete for providers and model configs, put for user
prompts).
_This PR was generated with mux and was reviewed by a human_
*Disclaimer: implemented by a Coder Agent using Claude Opus 4.6*
Adds an info banner on the `/aibridge/request-logs` page encouraging
users to visit `/aibridge/sessions` for an improved audit experience.
This allows us to validate whether customers still find the raw request
logs view useful before removing it in a future release.
Fixes#23563
When a provider request fails and retries, the "Retrying request" banner
lingered in the UI after the retry succeeded. This happened because
`retryState` was only cleared on explicit `status` events (`running`,
`pending`, `waiting`), not when the stream resumed with `message_part`
or `message` events. Since the backend does not publish a
dedicated"retry resolved" event, the banner stayed visible for the
entire duration of the successful response.
Add `store.clearRetryState()` calls to the `message_part`, `message`,
and `status` event handlers so the banner disappears as soon as content
flows again.
Closes https://github.com/coder/coder/issues/23624
Follow-up to #23282. The retry and terminal error callouts had a few UX
oddities:
- Auto-retrying states reused backend error text that said "Please try
again" even while the UI was already retrying on behalf of the user.
- Terminal error states also said "Please try again" with no action the
user could take.
- `startup_timeout` had no specific title or retry copy — it fell
through to the generic "Retrying request" heading.
- The kind pill showed raw enum values like `startup_timeout` and
`rate_limit`.
- Terminal error metadata showed a "Retryable" / "Not retryable" label
that does not help users.
- A separate "Provider anthropic" metadata row duplicated information
already present in the message body.
- The `usage-limit` error kind used a hyphen while every backend kind
uses underscores.
Changes:
**Backend (`chaterror/message.go`)**
- Split message generation into `terminalMessage()` and
`retryMessage()`, replacing the old `userFacingMessage()`.
- Terminal messages include HTTP status codes and actionable guidance
(e.g. "Check the API key, permissions, and billing settings.").
- Retry messages are clean factual statements without status codes or
remediation, suitable for the retry countdown UI (e.g. "Anthropic is
temporarily overloaded.").
- Removed "Please try again" / "Please try again later" from all paths.
- `StreamRetryPayload` calls `retryMessage()` instead of forwarding
`classified.Message`.
**Frontend**
- Removed the parallel frontend message-generation system:
`getRetryMessage()`, `getProviderDisplayName()`,
`getRetryProviderSubject()`, and the `PROVIDER_DISPLAY_NAMES` map are
all deleted from `chatStatusHelpers.ts`.
- `liveStatusModel.ts` passes `retryState.error` through directly — the
backend owns the copy.
- Added specific title and retry copy for `startup_timeout`, and
extended the title mapping to cover `auth` and `config`.
- Kind pills now show humanized labels ("Startup timeout", "Rate limit",
etc.) instead of raw enum strings.
- Removed the redundant "Provider anthropic" metadata row.
- Removed the terminal "Retryable" / "Not retryable" badge.
- Normalized `"usage-limit"` → `"usage_limit"` and added it to
`ChatProviderFailureKind` so all error kinds follow the same underscore
convention and live in one enum.
Refs #23282.
## Problem
The dogfood startup script uses `gh auth status` to decide whether to
re-authenticate the GitHub CLI. That command exits non-zero when **any**
stored credential is invalid—even if Coder external auth already injects
a working `GITHUB_TOKEN` into the environment and `gh` commands work
fine.
On workspaces with a persistent home volume, `~/.config/gh/hosts.yml`
retains OAuth tokens written by previous `gh auth login --with-token`
calls. These tokens are issued by Coder's external auth integration and
can be rotated or revoked between workspace starts, but the copy in
`hosts.yml` persists on the volume. When the stored token goes stale,
`gh auth status` reports two accounts:
```
✓ Logged in to github.com account user (GITHUB_TOKEN) ← works fine
✗ Failed to log in to github.com account user (hosts.yml) ← stale token
```
It exits 1 because of the stale entry, even though `gh` API calls
succeed via `GITHUB_TOKEN`. This makes the auth state **indeterminate**
from `gh auth status` alone—you can't tell whether `gh` actually works
or not.
When the script enters the login branch:
1. `gh auth login --with-token` **refuses** to accept piped input when
`GITHUB_TOKEN` is already set in the environment, and exits 1.
2. `set -e` kills the script before it reaches `sudo service docker
start`.
The result: Docker never starts, devcontainer health checks fail, and
the workspace reports a startup error—all because of a stale GitHub CLI
credential that has no bearing on workspace functionality.
## Fix
- Switch the auth guard from `gh auth status` to `gh api user --jq
.login`, which tests whether GitHub API access actually works regardless
of which credential provides it.
- Wrap the fallback `gh auth login` so a failure logs the indeterminate
state but does not abort the script.
## Summary
This change removes the steady-state "resolve the latest workspace
agent" query from chat execution.
Instead of asking the database for the latest build's agent on every
turn, a chat now persists the workspace/build/agent binding it actually
uses and reuses that binding across subsequent turns. The common path
becomes "load the bound agent by ID and dial it", with fallback paths to
repair the binding when it is missing, stale, or intentionally changed.
## What changes
- add `workspace_id`, `build_id`, and `agent_id` binding fields to
`chats`
- expose those fields through the chat API / SDK so the execution
context is explicit
- load the persisted binding first in chatd, instead of always resolving
the latest build's agent
- persist a refreshed binding when chatd has to re-resolve the workspace
agent
- keep child / subagent chats on the same bound workspace context by
inheriting the parent binding
- leave `build_id` / `agent_id` unset for flows like `create_workspace`,
then bind them lazily on the next agent-backed turn
## Runtime behavior
The binding is treated as an optimistic cache of the agent a chat should
use:
- if the bound agent still exists and dials successfully, we use it
without a latest-build lookup
- if the bound agent is missing or no longer reachable, chatd
re-resolves against the latest build and persists the new binding
- if a workspace mutation changes the chat's target workspace, the
binding is updated as part of that mutation
To avoid reintroducing a hot-path query, dialing uses lazy validation:
- start dialing the cached agent immediately
- only validate against the latest build if the dial is still pending
after a short delay
- if validation finds a different agent, cancel the stale dial, switch
to the current agent, and persist the repaired binding
## Result
The hot path stops issuing
`GetWorkspaceAgentsInLatestBuildByWorkspaceID` for every user message,
which is the source of the DB pressure this PR is addressing. At the
same time, chats still converge to the correct workspace agent when the
binding becomes stale due to rebuilds or explicit workspace changes.
Previously, long bash commands in the execute tool were truncated with
an ellipsis and could not be viewed in full. The only way to see the
full command was to copy it via the copy button.
Adds overflow detection and an inline expand/collapse chevron next to
the copy button. Clicking the command text or the chevron toggles
between truncated and wrapped views. Short commands that fit on one line
are visually unchanged.
https://github.com/user-attachments/assets/88ec6cd4-5212-4608-9a90-9ce217d5dce7
EDIT: couldn't be bothered re-recording the video but the chevron is
hidden until hovered now, like the copy button.
_PR generated by Mux but reviewed by a human_
## Problem
The e2e test `template update with new name redirects on successful
submit` is flaky.
After saving template settings, the app navigates to
`/templates/<name>`, which immediately redirects to
`/templates/<name>/docs` via the router's index route (`<Navigate
to="docs" replace />`). The assertion used `expect.poll()` with
`toHavePathNameEndingWith(`/${name}`)`, which matches only the
**transient intermediate URL** — it only exists while `TemplateLayout`'s
async data fetch is pending. Once the fetch resolves and the `<Outlet
/>` renders, the index route fires the `/docs` redirect and the URL no
longer matches.
## Why it's flaky (not deterministic)
The flakiness depends on whether the template query cache is warm:
- **Cache miss → PASSES**: The mutation's `onSuccess` handler
invalidates the query cache. If `TemplateLayout` needs to re-fetch, it
shows a `<Loader />`, which delays rendering the `<Outlet />` that
contains the `<Navigate to="docs">`. This gives `expect.poll()` time to
see the transient `/new-name` URL → **pass**.
- **Cache hit → FAILS**: If the template data is still in the query
client, `TemplateLayout` renders immediately and the `<Navigate
to="docs" replace />` fires nearly instantly. By the time the first poll
runs, the URL is already `/new-name/docs` → **fail**.
## Fix
Assert the **final stable URL** (`/${name}/docs`) instead of the
transient one.
This is safe because `expect.poll()` is retry-based: it keeps sampling
until a match is found (or timeout). Seeing the transient `/new-name`
URL just causes harmless retries — once the redirect completes and the
URL settles on `/new-name/docs`, the poll matches and the test passes.
| Poll | URL | Ends with `/new-name/docs`? | Action |
|---|---|---|---|
| 1st | `/templates/new-name` | No | Retry |
| 2nd | `/templates/new-name` | No | Retry |
| 3rd | `/templates/new-name/docs` | Yes | **Pass** ✅ |
Closes https://github.com/coder/internal/issues/1403
Problem: previously, the deployment-wide chat template allowlist was never actually wired in from `chatd.go`
- Extracts `parseChatTemplateAllowlist` into shared `coderd/util/xjson.ParseUUIDList`
- Adds `Server.chatTemplateAllowlist()` method that reads the allowlist from DB
- Passes `AllowedTemplateIDs` callback to `ListTemplates`, `ReadTemplate`, and `CreateWorkspace` tool constructors
> 🤖 Created by Coder Agents and reviewed by a human.
The `/agents` model picker collapsed distinct configured model variants
into fewer entries because options were built from the deduplicated
catalog (`ChatModelsResponse`). Two configs with the same provider/model
but different display names or settings appeared as a single option.
Switch option building from `getModelOptionsFromCatalog()` to a new
`getModelOptionsFromConfigs()` that emits one `ModelSelectorOption` per
enabled `ChatModelConfig` row. The option ID is the config UUID
directly, eliminating the catalog-ID ↔ config-ID mapping layer
(`buildModelConfigIDByModelID`, `buildModelIDByConfigID`).
Provider availability is still gated by the catalog response, and status
messaging ("no models configured" vs "models unavailable") is unchanged.
The sidebar now resolves model labels by config ID first, and the
/agents Storybook fixtures were updated so the stories seed matching
config IDs and model-config query data after the picker contract change.
The `/agents` transcript used `flex-col-reverse` for bottom-anchored
chat layout, where `scrollTop = 0` means bottom and the sign of
`scrollTop` when scrolled up varies by browser engine. A
`ResizeObserver` detected content height changes and applied manual
`compensateScroll(delta)` to preserve position, which fought manual
upward scrolling during streaming — repeatedly adjusting the user's
scroll position when they were trying to read earlier content.
This replaces that model with normal DOM order (`flex-col`, standard
`overflow-y: auto`) and a dedicated `useAgentTranscriptAutoScroll` hook
that only auto-scrolls when follow-mode is enabled. When the user
scrolls up, follow-mode disables and incoming content does not move the
viewport.
Changes:
- **New**: `useAgentTranscriptAutoScroll.ts` — local hook with
follow-mode state, RAF-throttled button visibility, dual
`ResizeObserver` (content + container), and `jumpToBottom()`
- **Modified**: `AgentDetailView.tsx` — removed
`ScrollAnchoredContainer` (~350 lines of reverse-layout compensation),
replaced with normal-order container wired to the new hook, added
pagination prepend scroll restoration
- **Modified**: `AgentDetailView.stories.tsx` — updated scroll stories
for normal-order bottom-distance assertions
## Problem
MCP OAuth2 auto-discovery stripped the path component from the MCP
server URL
before looking up Protected Resource Metadata. Per RFC 9728 §3.1, the
well-known
URL should be path-aware:
```
{origin}/.well-known/oauth-protected-resource{path}
```
For `https://api.githubcopilot.com/mcp/`, the correct metadata URL is
`https://api.githubcopilot.com/.well-known/oauth-protected-resource/mcp/`,
not
`https://api.githubcopilot.com/.well-known/oauth-protected-resource`
(which
returns 404).
The same issue applied to RFC 8414 Authorization Server Metadata for
issuers
with path components (e.g. `https://github.com/login/oauth` →
`/.well-known/oauth-authorization-server/login/oauth`).
## Fix
Replace the `mcp-go` `OAuthHandler`-based discovery with a
self-contained
implementation that correctly follows path-aware well-known URI
construction for
both RFC 9728 and RFC 8414, falling back to root-level URLs when the
path-aware
form returns an error. Also implements RFC 7591 registration directly,
removing
the `mcp-go/client/transport` dependency from the discovery path.
Note: this fix resolves the discovery half of the problem for servers
like
GitHub Copilot. Full OAuth2 support for GitHub's MCP server also
requires
dynamic client registration (RFC 7591), which GitHub's authorization
server
does not currently support — that will be addressed separately.
## Summary
Tool call failures in `/agents` previously displayed alarming red
styling (red icons, red text, red alert icons) that made it look like
the user did something wrong. This PR replaces the scary error
presentation with a calm, unified style and adds a dedicated timeout
display for subagent tools.
## Changes
### Unified failure style (all tools)
- Replace red `CircleAlertIcon` + `text-content-destructive` with a
muted `TriangleAlertIcon` in `text-content-secondary` across **all 11
tool renderers**.
- Remove red icon/label recoloring on error from `ToolIcon` and all
specialized tool components.
- Error details remain accessible via tooltip on hover.
### Subagent timeout display
- `ClockIcon` with "Timed out waiting for [Title]" instead of a generic
error display.
- `CircleXIcon` for non-timeout subagent errors with proper error verbs
("Failed to spawn", "Failed waiting for", etc.) instead of the
misleading running verb ("Waiting for").
- Timeout detection from result string/error field containing "timed
out".
### Title resolution for historical messages
- `ConversationTimeline` now computes `subagentTitles` via
`useMemo(buildSubagentTitles(...))` and passes it to historical
`ChatMessageItem` rendering, so `wait_agent` can resolve the actual
agent title from a prior `spawn_agent` result even outside streaming
mode.
### Stories
8 new stories: `GenericToolFailed`, `GenericToolFailedNoResult`,
`SubagentWaitTimedOut`, `SubagentWaitTimedOutWithTitle`,
`SubagentWaitTimedOutTitleFromMap`, `SubagentSpawnError`,
`SubagentWaitError`, `MCPToolFailedUnifiedStyle`.
## Files changed (15)
- `tool/Tool.tsx` — GenericToolRenderer + SubagentRenderer
- `tool/SubagentTool.tsx` — timeout/error verbs, icon changes
- `tool/ToolIcon.tsx` — remove destructive recoloring
- `tool/*.tsx` (10 specialized tools) — unified warning icon
- `ConversationTimeline.tsx` — pass subagentTitles to historical
rendering
- `tool.stories.tsx` — 8 new stories, updated existing assertions
- Move `agent/agentdesktop/` to `agent/x/agentdesktop/` to signal
experimental/unstable status
- Update import paths in `agent/agent.go` and `api_test.go`
> 🤖 This mechanical refactor was performed by an agent. I made sure it
didn't change anything it wasn't supposed to.
- Clarify the system prompt to prefer tools before asking the user for
clarification.
- Limit clarification to cases where ambiguity or user preferences
materially affect the outcome.
- Remove the contradictory instruction to always start by asking
clarifying questions.
> 🤖 This PR has been reviewed by the author.
### Changes
**coder/coder:**
- `coderd/aibridge/aibridge.go` — Added `HeaderCoderBYOKToken` constant,
`IsBYOK()` helper, and updated `ExtractAuthToken` to check the BYOK
header first.
- `enterprise/aibridged/http.go` — BYOK-aware header stripping: in BYOK
mode only the BYOK header is stripped (user's LLM credentials
preserved); in centralized mode all auth headers are stripped.
<hr/>
**NOTE**: `X-Coder-Token` was removed! As of now `ExtractAuthToken`
retrieves token either from `X-Coder-AI-Governance-BYOK-Token` or from
`Authorization`/`X-Api-Key`.
---------
Co-authored-by: Susana Ferreira <susana@coder.com>
Co-authored-by: Danny Kopping <danny@coder.com>
## Summary
Adds a general-purpose `map[string]string` label system to chats, stored
as jsonb with a GIN index for efficient containment queries.
This is a standalone foundational feature that will be used by the
upcoming Automations feature for session identity (matching webhook
events to existing chats), replacing the need for bespoke session-key
tables.
## Changes
### Database
- **Migration 000451**: Adds `labels jsonb NOT NULL DEFAULT '{}'` column
to `chats` table with a GIN index (`idx_chats_labels`)
- **`InsertChat`**: Accepts labels on creation via `COALESCE(@labels,
'{}')`
- **`UpdateChatByID`**: Supports partial update —
`COALESCE(sqlc.narg('labels'), labels)` preserves existing labels when
NULL is passed
- **`GetChats`**: New `has_labels` filter using PostgreSQL `@>`
containment operator
- **`GetAuthorizedChats`**: Synced with generated `GetChats` (new column
scan + query param)
### API
- **Create chat** (`POST /chats`): Accepts optional `labels` field,
validated before creation
- **Update chat** (`PATCH /chats/{chat}`): Supports `labels` field for
atomic label replacement
- **List chats** (`GET /chats`): Supports `?label=key:value` query
parameters (multiple are AND-ed)
### SDK
- `Chat`, `CreateChatRequest`, `UpdateChatRequest`, `ListChatsOptions`
all gain `Labels` fields
- `UpdateChatRequest.Labels` is a pointer (`*map[string]string`) so
`nil` means "don't change" vs empty map means "clear all"
### Validation (`coderd/httpapi/labels.go`)
- Max 50 labels per chat
- Key: 1–64 chars, must match `[a-zA-Z0-9][a-zA-Z0-9._/-]*` (supports
namespaced keys like `github.repo`, `automation/pr-number`)
- Value: 1–256 chars
- 13 test cases covering all edge cases
### Chat runtime
- `chatd.CreateOptions` gains `Labels` field, threaded through to
`InsertChat`
- Existing `UpdateChatByID` callers (e.g., quickgen title updates) are
unaffected — NULL labels preserve existing values via COALESCE
We had a bug where computer use base64-encoded screenshots would not be
interpreted as screenshots anymore once saved to the db, loaded back
into memory, and sent to Anthropic. Instead, they would be interpreted
as regular text. Once a computer use agent made enough screenshots and
stopped, and you tried sending it another message, you'd get an out of
context error:
<img width="808" height="367" alt="Screenshot 2026-03-23 at 12 02 54"
src="https://github.com/user-attachments/assets/f0bf6be2-4863-47ca-a7a9-9e6d9dfceeed"
/>
This PR fixes that.
## Summary
Introduces a new `context-file` ChatMessagePart type for persisting
workspace instruction files (AGENTS.md) as durable, frontend-visible
message parts. This is the foundation for showing loaded context files
in the chat input's context indicator tooltip.
### Problem
Previously, instruction files were resolved transiently on every turn
via `resolveInstructions()` → `InsertSystem()` and injected into the
in-memory prompt without persistence. The frontend had no knowledge that
instruction files were loaded into context, and there was no way to
surface this information to users.
### Solution
Instruction files are now read **once** when a workspace is first
attached to a chat (matching how [openai/codex handles
it](https://developers.openai.com/codex/guides/agents-md)) and persisted
as `user`-role, `both`-visibility message parts with a new
`context-file` type. This ensures:
- **Durability**: survives page refresh (data is in the DB, returned by
`getChatMessages`)
- **Cache-friendly**: `user`-role avoids the system-message hoisting
that providers do, keeping the instruction content in a stable position
for prompt caching
- **Frontend-visible**: the frontend receives paths and truncation
status for future context indicator rendering
- **Extensible**: the same pattern works for Skills (future)
### Key changes
| Layer | Change |
|---|---|
| **SDK** (`codersdk/chats.go`) | Add `ChatMessagePartTypeContextFile`
with `context_file_path`, `context_file_content` (internal, stripped
from API), `context_file_truncated` fields |
| **Prompt expansion** (`chatprompt`) | Expand `context-file` parts to
`<workspace-context>` text blocks in `partsToMessageParts()` |
| **Chat engine** (`chatd.go`) | Add `persistInstructionFiles()`, called
on first turn with a workspace. Remove per-turn `resolveInstructions()`
+ `InsertSystem()` from `processChat()` and `ReloadMessages` |
| **Frontend** | Ignore `context-file` parts in `messageParsing.ts` and
`streamState.ts` (no rendering yet — follow-up will add tooltip display)
|
### How it works
1. On each turn, `processChat` checks if any loaded message contains
`context-file` parts
2. If not (first turn with a workspace), reads AGENTS.md files via the
workspace agent connection and persists them
3. For this first turn, also injects the instruction text into the
prompt (since messages were loaded before persistence)
4. On all subsequent turns, `ConvertMessagesWithFiles()` encounters the
persisted `context-file` parts and expands them into text automatically
— no extra resolution needed
The description read "Control access of templates for users and groups
to templates" with "templates" appearing twice and garbled grammar.
Simplified to "Control user and group access to templates."
---------
Co-authored-by: Jake Howell <jacob@coder.com>
HealthLayout sets refetchInterval: 30_000 on its health query.
In storybook tests, the seeded cache data prevents the initial
fetch, but interval polling still fires after 30s, hitting the
Vite proxy with no backend. This caused test-storybook to hang
indefinitely in environments without a running coderd.
Set refetchInterval: false in the storybook QueryClient defaults
alongside the existing staleTime: Infinity and retry: false.
The `ComputerUseProviderTool` function needed a little bit of an
adjustment because I changed `NewComputerUseTool`'s signature in
upstream fantasy a little bit.
- Stores a deployment-wide agents template allowlist in `site_configs`
(`agents_template_allowlist`)
- Adds `GET/PUT /api/experimental/chats/config/template-allowlist`
endpoints
- Filters `list_templates`, `read_template`, and `create_workspace` chat
tools by allowlist, if defined (empty=all allowed)
- Add "Templates" admin settings tab in Agents UI ([what it looks
like](https://624de63c6aacee003aa84340-sitjilsyrr.chromatic.com/?path=/story/pages-agentspage-agentsettingspageview--template-allowlist))
> 🤖 This PR was created with the help of Coder Agents, and has been
reviewed by my human. 🧑💻
> **PR Stack**
>
> 1. #23351 ← `#23282`
> 2. **#23282** ← `#23275` *(you are here)*
> 3. #23275 ← `#23349`
> 4. #23349 ← `main`
---
## Summary
Replaces raw error strings and infinite "Thinking..." spinners in the
agents chat UI with a structured live-status model that drives startup,
retry, and failure UI from one source of truth.
This branch also folds in the frontend follow-up fixes that fell out of
that refactor: malformed `retrying_at` timestamps no longer render
`Retrying in NaNs`, stale persisted generic errors no longer outlive a
recovered chat status, and partial streamed output stays visible when a
response fails after blocks have already rendered.
Consumes the structured error metadata added in #23275.
Retry-After header handling remains in #23351.
<img width="853" height="493" alt="image"
src="https://github.com/user-attachments/assets/5a4a1690-5e22-4ece-965c-a000fd669244"
/>
<img width="812" height="517" alt="image"
src="https://github.com/user-attachments/assets/e78d28ce-1566-48ca-a991-62c6e1838079"
/>
<img width="847" height="523" alt="image"
src="https://github.com/user-attachments/assets/e5fd7b60-4a3c-4573-ba4c-4e5f6dbfbdc3"
/>
## Problem
The previous AgentDetail chat UI derived startup, retry, and failure
behavior from several loosely connected bits of state spread across
`ChatContext`, `AgentDetailContent`, `ConversationTimeline`, and ad hoc
props. That made the UI inconsistent: some failures were just raw
strings, retry state could only partially describe what was happening,
startup could sit on an infinite spinner, and rendering decisions
depended on local booleans instead of one authoritative model.
Those splits also made edge cases brittle. Invalid retry timestamps
could produce broken countdown text, persisted generic errors could
linger after recovery, and streamed partial output could disappear if
the turn later failed.
## Fix
Introduce a structured live-status pipeline for AgentDetail.
`ChatContext` now normalizes stream errors and retry metadata into
richer state, `liveStatusModel` centralizes precedence and phase
derivation, and `ChatStatusCallout` renders startup, retry, and terminal
failure states with shared copy, provider attribution, status links,
attempt metadata, and guarded countdown handling.
`AgentDetailContent` and `ConversationTimeline` now consume that single
model instead of juggling separate error and stream booleans, while
usage-limit messaging stays on its explicit path. The result is a
timeline that shows consistent state transitions, preserves accumulated
assistant output across failures, suppresses stale generic errors once
live state recovers, and has focused model, store, and story coverage
around those behaviors.
## Problem
Text attachments (`InlineTextAttachmentButton`) and image thumbnails
(`ImageThumbnail`) rendered at different heights when displayed side by
side in user messages. Text cards had no explicit height
(content-driven), while images used `h-16` (64px).
## Changes
**`ConversationTimeline.tsx`**
- Added `h-16` to `InlineTextAttachmentButton` to match `ImageThumbnail`
- Added `isPlaceholder` prop: when the content hasn't been fetched yet
(file_id path), renders "Pasted text" in sans-serif `text-sm` with
`items-center` alignment instead of monospace `text-xs`
- Once real content loads, it still renders in `font-mono text-xs` with
`formatTextAttachmentPreview()`
**`ConversationTimeline.stories.tsx`**
- Added `UserMessageWithMixedAttachments` story showing a text
attachment and image side by side as a visual regression guard
When a chat is created, createChat.onSuccess invalidates the sidebar
list query, triggering a background refetch. The refetch can hit the
server before async title generation finishes, returning the fallback
(truncated) title. If the title_change WebSocket event arrives and
writes the generated title into the cache, the in-flight refetch
response then overwrites it with the stale fallback title.
Cancel any in-flight sidebar-list and per-chat refetches before every
WebSocket-driven cache write. This mirrors the existing pattern in
archiveChat/unarchiveChat, which cancel queries before optimistic
updates for the same reason.
## Problem
When chatd pushes a branch and then creates a PR (e.g. `git push`
followed by `gh pr create`), the gitsync background worker often picks
up the stale `chat_diff_statuses` row between the two operations. At
that point no PR exists yet, so the worker skips the row. However, the
acquisition SQL locks the row for **5 minutes** (crash-recovery
interval), creating a dead zone where the PR diff is invisible in the UI
until the user manually navigates to the chat.
### Root cause
1. `git push` triggers `GIT_ASKPASS` → coderd external-auth handler →
`MarkStale()` sets `stale_at = now - 1s`
2. Background worker acquires the row within ~10s, atomically bumps
`stale_at = NOW() + 5 min` (crash-recovery lock)
3. Worker calls `ResolveBranchPullRequest` → no PR exists yet → returns
`nil` → worker skips with `continue`
4. `gh pr create` completes moments later, but uses its own auth (not
`GIT_ASKPASS`), so no second `MarkStale` fires
5. Row is locked for 5 minutes before the worker can retry
Loading the chat works immediately because `GET /chats/{chat}` calls
`resolveChatDiffStatus` synchronously, which discovers the PR inline.
## Fix
When `ResolveBranchPullRequest` returns nil (no PR yet) **and** the row
was recently marked stale (within 2 minutes), apply a short 15-second
backoff via `BackoffChatDiffStatus` instead of letting the 5-minute
acquisition lock stand. Outside the retry window, the worker skips the
row as before — no indefinite fast-polling for branches that never
receive a PR.
To make the "recently marked stale" check work, `updated_at` is no
longer overwritten by the acquisition and backoff SQL queries. This
preserves it as a reliable "last externally changed" timestamp (set by
`MarkStale` or a successful refresh).
### Behavior summary
| Scenario | `updated_at` age | Backoff | Effective retry |
|---|---|---|---|
| Fresh push, no PR yet | < 2 min | 15s (`NoPRBackoff`) | ~15s |
| Old row, no PR | ≥ 2 min | None (skip) | ~5 min (acquisition lock) |
| Error (any age) | Any | 120s (`DiffStatusTTL`) | ~120s |
| Success (any age) | Any | 120s (`DiffStatusTTL`) | ~120s |
## Changes
- **`coderd/database/queries/chats.sql`** — Remove `updated_at = NOW()`
from `AcquireStaleChatDiffStatuses` and `BackoffChatDiffStatus`
- **`coderd/database/queries.sql.go`** — Regenerated
- **`coderd/x/gitsync/worker.go`** — Add `NoPRBackoff` (15s) and
`NoPRRetryWindow` (2 min) constants; apply short backoff only within the
retry window
- **`coderd/x/gitsync/worker_test.go`** — Add
`TestWorker_NoPR_RecentMarkStale_BacksOffShort` and
`TestWorker_NoPR_OldRow_Skips`
- Add `SanitizePromptText` stripping ~24 invisible Unicode codepoints
and collapsing excessive newlines
- Apply at write and read paths for defense-in-depth
- Frontend: warn in both prompt textareas when invisible characters
detected
- Explicit codepoint list (not blanket `unicode.Cf`) to avoid breaking
flag emoji
- 34 Go tests + idempotency meta-test, 11 TS unit tests, 4 Storybook
stories
> This PR was created with the help of Coder Agents, and was reviewed by my human.
## Changes
Replaces the "Loading model catalog..." / "Loading models..." text flash
on `/agents` with a clean skeleton loading state, and removes the
admin-nag status messages entirely.
### Removed
- `getModelCatalogStatusMessage()` function and
`modelCatalogStatusMessage` prop chain — "Loading model catalog..." /
"No chat models are configured. Ask an admin to configure one." text
below the input
- `inputStatusText` prop chain — "No models configured. Ask an admin." /
"Models are configured but unavailable. Ask an admin." inline text
- `modelCatalogError` prop from `AgentCreateForm`
### Changed
- `AgentChatInput`: when `isModelCatalogLoading` is true, renders a
`Skeleton` in place of the `ModelSelector`
- `getModelSelectorPlaceholder()`: "No Models Configured" / "No Models
Available" (title case)
### Added
- `LoadingModelCatalog` story — skeleton where model selector sits
- `NoModelsConfigured` story — selector shows "No Models Configured"
Net -69 lines.
- Add `X-Coder-Owner-Id`, `X-Coder-Chat-Id`, `X-Coder-Subchat-Id`,
`X-Coder-Workspace-Id` headers to all outgoing LLM API requests from
chatd
- Extend `ModelFromConfig` with `extraHeaders` param, forwarded via
Fantasy `WithHeaders` on all 8 providers
- Add `CoderHeaders(database.Chat)` helper to build the header map from
chat state
- Update all 4 `ModelFromConfig` call sites (resolveChatModel,
computer-use override, title gen, push summary)
- Thread `database.Chat` into `generatePushSummary` (was `chatTitle
string`)
- Tests: `TestCoderHeaders` (4 subtests),
`TestModelFromConfig_ExtraHeaders` (OpenAI + Anthropic),
`TestModelFromConfig_NilExtraHeaders`
- Refactor existing `TestModelFromConfig_UserAgent` to use channel-based
signaling
> 🤖 This PR was generated by Coder Agents and self-reviewed by a human.
create_workspace could create a replacement workspace after a single 5s
agent dial failed, even when the existing workspace agent had recently
checked in. That made temporary reachability blips look like dead
workspaces and let chatd replace a running workspace too aggressively.
Use the workspace agent's DB-backed status with the deployment's
AgentInactiveDisconnectTimeout before allowing replacement. Recently
connected and still-connecting agents now reuse the existing workspace,
while disconnected or timed-out agents still allow a new workspace. This
also threads the inactivity timeout through chatd and adds focused
coverage for the reuse and replacement branches.
## Problem
The sticky user message visual state (`--clip-h`, fade gradient, push-up
positioning) is driven by an `update()` function that only ran on
`scroll` events. The chat scroll container uses `flex-col-reverse`,
where `scrollTop = 0` means "at bottom." When streaming content grows
the transcript while the user is auto-scrolled to the bottom,
`scrollTop` stays at `0` — no `scroll` event fires — so `update()` never
runs and the sticky messages become visually stale until the user
manually scrolls.
## Fix
Add a `ResizeObserver` on the scroller's content wrapper inside the
existing `useLayoutEffect` that sets up the scroll/resize listeners.
When the content wrapper resizes (streaming growth), it fires the
observer which calls `update()` through the same RAF-throttle pattern
used by the scroll handler.
Single observer per sticky message instance. Zero cost when nothing is
resizing. Cleanup handled in the same effect teardown.
Wraps the `GenericToolRenderer` (used for MCP and unrecognized tools) in
`ToolCollapsible` so the result content is hidden behind a
click-to-expand chevron, matching the pattern used by `read_file`,
`write_file`, and other built-in tool renderers.
### Changes
- Move `ToolIcon` + `ToolLabel` into the `ToolCollapsible` `header` prop
- Compute `hasContent` from `writeFileDiff` / `fileContent` /
`resultOutput` — when there's no content, the header renders as a plain
div with no chevron
- Remove `ml-6` from `ScrollArea` classNames (the `ToolCollapsible`
button handles its own layout)
- `defaultExpanded` is `false` by default in `ToolCollapsible`, so
results start collapsed
### Before
MCP tool results were always fully visible inline.
### After
MCP tool results are collapsed by default with a chevron toggle,
consistent with `read_file`, `edit_files`, `list_templates`, etc.
## Problem
When an MCP tool returns an `EmbeddedResource` content item (e.g. GitHub
MCP server returning file contents via `get_file_contents`), the
`convertCallResult` function falls through to the `default` case,
producing:
```
[unsupported content type: mcp.EmbeddedResource]
```
This loses the actual resource content and shows an unhelpful message in
the chat UI.
## Root Cause
The type switch in `convertCallResult` handles `TextContent`,
`ImageContent`, and `AudioContent`, but not the other two `mcp.Content`
implementations from `mcp-go`:
- `mcp.EmbeddedResource` — wraps a `ResourceContents` (either
`TextResourceContents` or `BlobResourceContents`)
- `mcp.ResourceLink` — contains a URI, name, and description
## Fix
Add two new cases to the type switch:
1. **`mcp.EmbeddedResource`**: nested type switch on `.Resource`:
- `TextResourceContents` → append `.Text` to `textParts`
- `BlobResourceContents` → base64-decode `.Blob` as binary (type
`"image"` or `"media"` based on MIME)
- Unknown → fallback `[unsupported embedded resource type: ...]`
2. **`mcp.ResourceLink`**: render as `[resource: Name (URI)]` text
## Testing
Added 3 new test cases (all passing, full suite 23/23 PASS):
- `TestConnectAll_EmbeddedResourceText` — text resource extraction
- `TestConnectAll_EmbeddedResourceBlob` — binary blob decoding
- `TestConnectAll_ResourceLink` — resource link rendering
## Summary
Previously the user's MCP server toggles were ephemeral — every page
reload or navigation to a new chat reset them to the admin-configured
defaults (`force_on` + `default_on`). This was frustrating for users who
routinely disabled a default-on server or enabled a default-off one.
This PR persists the MCP server picker selection to `localStorage` under
the key `agents.selected-mcp-server-ids`.
## Changes
### `MCPServerPicker.tsx`
- **`mcpSelectionStorageKey`** — exported constant for the localStorage
key.
- **`getSavedMCPSelection(servers)`** — reads from localStorage, filters
out stale/disabled IDs, always includes `force_on` servers.
- **`saveMCPSelection(ids)`** — writes the current selection to
localStorage.
### `AgentCreateForm.tsx`
- Initialises `userMCPServerIds` from `getSavedMCPSelection` instead of
`null`.
- Calls `saveMCPSelection` on every toggle.
### `AgentDetail.tsx`
- Adds localStorage as a fallback tier in `effectiveMCPServerIds`: user
override → chat record → **saved selection** → defaults.
- Calls `saveMCPSelection` on every toggle.
### `MCPServerPicker.test.ts` (new)
- 13 unit tests covering save, restore, stale-ID filtering, force_on
merging, invalid JSON handling, and disabled server filtering.
## Fallback priority
| Priority | Source | When |
|----------|--------|------|
| 1 | In-memory state | User toggled during this session |
| 2 | Chat record | Existing conversation with `mcp_server_ids` |
| 3 | localStorage | User has a saved selection from a prior session |
| 4 | Server defaults | `force_on` + `default_on` servers |
Child chats created via `spawn_agent` and `spawn_computer_use_agent`
were not inheriting the parent's `MCPServerIDs`, meaning subagents lost
access to the parent's MCP server tools.
## Changes
- Pass `parent.MCPServerIDs` in the `CreateOptions` for both
`createChildSubagentChat()` and the `spawn_computer_use_agent` tool
handler in `coderd/x/chatd/subagent.go`.
## Tests
Added 3 tests in `subagent_internal_test.go`:
- `TestCreateChildSubagentChat_InheritsMCPServerIDs` — verifies child
chat gets parent's MCP server IDs (multiple servers)
- `TestSpawnComputerUseAgent_InheritsMCPServerIDs` — verifies computer
use subagent gets parent's MCP server IDs via the tool
- `TestCreateChildSubagentChat_NoMCPServersStaysEmpty` — verifies no
regression when parent has no MCP servers
*Disclaimer: implemented by a Coder Agent and reviewed by me.*
Renames the env vars used by chatd integration tests from the canonical
`SOMEPROVIDER_API_KEY` (e.g. `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`) to
`SOMEPROVIDER_TEST_API_KEY` (e.g. `ANTHROPIC_TEST_API_KEY`,
`OPENAI_TEST_API_KEY`) so that test-specific keys don't collide with
production/canonical provider credentials.
Relates to https://github.com/coder/internal/issues/1425
See also:
https://codercom.slack.com/archives/C0AGTPWLA3U/p1774433646799499
## Summary
Adds `xhigh` to the OpenAI reasoning effort normalizer so GPT-5.4 class
models can use `reasoning_effort: xhigh` without it being silently
dropped.
## Problem
The SDK schema (`codersdk/chats.go`) already advertises `xhigh` as a
valid `reasoning_effort` value, but the runtime normalizer in
`chatprovider.go` only accepts `minimal|low|medium|high` for the OpenAI
provider. When a user sets `xhigh`, `ReasoningEffortFromChat()` returns
`nil` and the value never reaches the OpenAI API.
## Changes
- **Fantasy dependency**: Updated `kylecarbs/fantasy` (cj/go1.25) which
now includes the `ReasoningEffortXHigh` constant
([kylecarbs/fantasy#9](https://github.com/kylecarbs/fantasy/pull/9)).
- **`chatprovider.go`**: Adds `fantasyopenai.ReasoningEffortXHigh` to
the OpenAI case in `ReasoningEffortFromChat()`.
- **`chatprovider_test.go`**: Adds `OpenAIXHighEffort` test case.
## Upstream
-
[charmbracelet/fantasy#186](https://github.com/charmbracelet/fantasy/pull/186)
Consolidates coderdtest invocations in 7 tests to reduce 23 instances to 7 across:
- `TestGetUser` (3 → 1) — read-only user lookups
- `TestUserTerminalFont` (3 → 1) — each creates own user via
CreateAnotherUser
- `TestUserTaskNotificationAlertDismissed` (3 → 1) — each creates own
user
- `TestUserLogin` (3 → 1) — each creates/deletes own user
- `TestExpMcpConfigureClaudeCode` (5 → 1) — writes to isolated temp dirs
- `TestOAuth2RegistrationTokenSecurity` (3 → 1) — independent
registrations
- `TestOAuth2SpecificErrorScenarios` (3 → 1) — independent error
scenarios
> 🤖 This PR was created with the help of Coder Agents, and has been
reviewed by my human. 🧑💻
## Description
When duplicating a template that has prebuilds configured, a warning
alert is now shown above the create template form. The warning displays
the total number of prebuilds that will be automatically created.

### Changes
**Single file modified:**
`site/src/pages/CreateTemplatePage/DuplicateTemplateView.tsx`
- Fetches presets for the template's active version using the existing
`templateVersionPresets` React Query helper
- Computes total prebuild count by summing `DesiredPrebuildInstances`
across all presets
- Renders a warning `<Alert>` above the form when prebuilds are
configured
### Design decisions
| Decision | Rationale |
|---|---|
| Warning in `DuplicateTemplateView`, not `CreateTemplateForm` | Only
the duplicate flow needs this. Keeps data fetching local. No new props.
|
| Feature-flag gated (`workspace_prebuilds`) | Matches existing pattern
in `TemplateLayout.tsx`. |
| Non-blocking query | Presets fetch failure shouldn't prevent
duplication. Warning is informational. |
| Count with pluralization | Users know exactly how many prebuilds will
spin up. |
<img width="1136" height="375" alt="image"
src="https://github.com/user-attachments/assets/1ca42608-a204-48f5-b27d-6d476ab32fa7"
/>
Closes#18987
GPG emits an "untrusted key" warning when signing with a key that hasn't
been assigned a trust level, which can cause verification steps to fail
or produce noisy output.
Example:
```sh
gpg: Signature made Tue Mar 24 20:56:59 2026 UTC
gpg: using RSA key 21C96B1CB950718874F64DBD6A5A671B5E40A3B9
gpg: Good signature from "Coder Release Signing Key <security@coder.com>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 21C9 6B1C B950 7188 74F6 4DBD 6A5A 671B 5E40 A3B9
```
After importing the release key, derive its fingerprint from the keyring
and mark it as ultimately trusted via `--import-ownertrust`.
The fingerprint is extracted dynamically rather than hard-coded, so this
works for any key supplied via `CODER_GPG_RELEASE_KEY_BASE64`.
## Problem
Template administrators cannot delete templates that have running
prebuilds.
The `deleteTemplate` handler fetches all non-deleted workspaces and
blocks
deletion if any exist, making no distinction between human-owned
workspaces
and prebuild workspaces (owned by the system `PrebuildsSystemUserID`).
This forces admins into a manual multi-step workflow: set
`desired_instances`
to 0 on every preset, wait for the reconciler to drain prebuilds, then
retry
deletion. Prebuilds are an internal system concern that admins should
not need
to manage manually.
## Fix
Replace the blanket `len(workspaces) > 0` guard in `deleteTemplate` with
a
loop that only blocks deletion when a non-prebuild (human-owned)
workspace
exists. Prebuild workspaces — owned by `database.PrebuildsSystemUserID`
— are
now ignored during the check.
Once the template is soft-deleted (`deleted=true`), the existing
prebuilds
reconciler detects `isActive()=false` and cleans up remaining prebuilds
asynchronously. No changes to the reconciler are needed.
The error message and HTTP status for human workspaces remain unchanged.
## Testing
Added two new subtests to `TestDeleteTemplate`:
- **`OnlyPrebuilds`**: deletion succeeds when only prebuild workspaces
exist.
- **`PrebuildsAndHumanWorkspaces`**: deletion is blocked when both
prebuild
and human workspaces exist.
Existing reconciler test ("soft-deleted templates MAY have prebuilds")
already
covers post-deletion prebuild cleanup.
> **PR Stack**
> 1. #23351 ← `#23282`
> 2. #23282 ← `#23275`
> 3. **#23275** ← `#23349` *(you are here)*
> 4. #23349 ← `main`
---
## Summary
Extracts a structured error classification subsystem for agent chat
(`chatd`) so that retry and error payloads carry machine-readable
metadata — error kind, provider name, HTTP status code, and retryability
— instead of raw error strings.
This is the **backend half** of the error-handling work. The frontend
counterpart is in #23282.
## Changes
### New package: `coderd/chatd/chaterror/`
Canonical error classification — extracts error kind, provider, status
code, and user-facing message from raw provider errors. One source of
truth that drives both retry policy and stream payloads.
- **`kind.go`**: Error kind enum (`rate_limit`, `timeout`, `auth`,
`config`, `overloaded`, `unknown`).
- **`signals.go`**: Signal extraction — parses provider name, HTTP
status code, and retryability from error strings and wrapped types.
- **`classify.go`**: Classification logic — maps extracted signals to an
error kind.
- **`message.go`**: User-facing message templates keyed by kind +
signals.
- **`payload.go`**: Projectors that build `ChatStreamError` and
`ChatStreamRetry` payloads from a classified error.
### Modified
- **`codersdk/chats.go`**: Added `Kind`, `Provider`, `Retryable`,
`StatusCode` fields to `ChatStreamError` and `ChatStreamRetry`.
- **`coderd/chatd/chatretry/`**: Thinned to retry-policy only;
classification logic moved to `chaterror`.
- **`coderd/chatd/chatloop/`**: Added per-attempt first-chunk timeout
(60 s) via `guardedStream` wrapper — produces retryable
`startup_timeout` errors instead of hanging forever.
- **`coderd/chatd/chatd.go`**: Publishes normalized retry/error payloads
via `chaterror` projectors.
<!--
If you have used AI to produce some or all of this PR, please ensure you
have read our [AI Contribution
guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING)
before submitting.
-->
Adds the AI Bridge sessions list page.
## Summary
The `build` CI job on `main` is failing with:
```
ERROR: Computed invalid windows version format: 2.32.0-rc.0.1
```
This started when the `v2.32.0-rc.0` tag was created, making `git
describe` produce versions like `2.32.0-rc.0-devel+4f571f8ff`.
## Root cause
`scripts/build_go.sh` converts the version to a Windows-compatible
`X.Y.Z.{0,1}` format by stripping pre-release segments. It uses
`${var%-*}` (shortest suffix match), which only removes the last
`-segment`. For RC versions this leaves `-rc.0` intact:
```
2.32.0-rc.0-devel → strip %-* → 2.32.0-rc.0 → + .1 → 2.32.0-rc.0.1 ✗
```
## Fix
Switch to `${var%%-*}` (longest suffix match) so all pre-release
segments are stripped from the first hyphen onward:
```
2.32.0-rc.0-devel → strip %%-* → 2.32.0 → + .1 → 2.32.0.1 ✓
```
Verified all version patterns produce valid output:
| Input | Output |
|---|---|
| `2.32.0` | `2.32.0.0` |
| `2.32.0-devel` | `2.32.0.1` |
| `2.32.0-rc.0-devel` | `2.32.0.1` |
| `2.32.0-rc.0` | `2.32.0.1` |
Fixes
https://github.com/coder/coder/actions/runs/23511163474/job/68434008241
Nine subtests covering the poll loop, pubsub notification path,
timeout, context cancellation, descendant auth check, and both
error-status branches in handleSubagentDone.
Wire p.clock through awaitSubagentCompletion's timer and ticker
so future tests can use quartz mock clock. Tests use channel-based
coordination and context.WithTimeout instead of time.Sleep.
Coverage: awaitSubagentCompletion 0%->70.3%, handleSubagentDone
0%->100%, checkSubagentCompletion 0%->77.8%,
latestSubagentAssistantMessage 0%->78.9%.
Super unclear why CI hates me and only [fails
lint](https://github.com/coder/coder/actions/runs/23504799702/job/68409588632?pr=23385)
for me (I feel personally attacked), but `dpdm` detected a circular
dependency between the WorkspaceSettingPage and its Sidebar in my other
branch. They both wanted the same context/hook combo, so easy solve to
move the hook/context into a third module to resolve the circular dep.
## Summary
Large pasted text that the UI collapses into an attachment chip was
completely invisible to the LLM. Providers only accept specific MIME
types (images, PDFs) in file content blocks — a `text/plain` `FilePart`
is silently dropped, so the model received nothing for pasted content.
## Fix
Detect paste-originated text files by their
`pasted-text-{timestamp}.txt` filename pattern and convert them to
`fantasy.TextPart` with a bounded 128 KiB inline body and truncation
notice. Binary uploads and real uploaded text files keep their existing
`FilePart` semantics.
The detection uses the existing frontend naming convention
(`pasted-text-YYYY-MM-DD-HH-MM-SS.txt`) combined with a text-like MIME
check for defense-in-depth. A TODO marks this for migration to explicit
origin metadata.
<details>
<summary>Review notes: intentionally skipped findings</summary>
A 10-reviewer deep review was run on this change. The following findings
were raised and intentionally dropped after cross-check. Documenting
them here so future reviewers do not re-flag the same concerns:
**"Unresolved file IDs cause silent data loss" (Edge Case Analyst P1)**
— When a file ID is not in the resolver map, `name` stays empty and
paste detection fails. This is pre-existing behavior for ALL file types
(not introduced by this change). The resolver calls `GetChatFilesByIDs`
which returns whatever rows exist; missing IDs simply fall through to an
empty `FilePart`. The Contract Auditor independently traced this path
and confirmed the fallback is safe. If the file was deleted between
message construction and conversion, the model already saw nothing
before this patch — this change does not make it worse.
**"String builder pre-allocation overhead" (Performance Analyst P1)** —
Misidentified scope. `formatSyntheticPasteText` is only called when
`isSyntheticPaste` returns true (actual synthetic pastes), not for every
file part. The `Grow()` call is correct and efficient.
**"Constant naming violates Uber style" (Style Reviewer P1)** —
Over-severity. `syntheticPasteInlineBudget` is standard Go camelCase for
unexported constants, consistent with the Uber guide and surrounding
code.
**"`IsSyntheticPasteForTest` naming is misleading" (Style Reviewer P2)**
— This is the standard Go `export_test.go` pattern. The `ForTest` suffix
is conventional.
</details>
Observations bypassed the severity test entirely. A reviewer filing
a convention violation as Obs meant it skipped both the upgrade
check and the unnecessary-novelty gate. The combination let issues
pass through as dropped observations when they warranted P3+.
Two changes:
- Severity test now applies to findings AND observations.
- Unnecessary novelty check now covers reviewer-flagged Obs.
When developers switch branches, the database may have migrations
from the other branch that don't exist in the current binary.
This causes coder server to fail at startup, leaving developers
stuck.
The develop script now detects this before starting the server:
1. Connects to postgres (starts temp embedded instance for
built-in postgres, or uses CODER_PG_CONNECTION_URL).
2. Compares DB version against the source's latest migration.
3. If DB is ahead, searches git history for the missing down
SQL files and applies them in a transaction.
4. If git recovery fails (ambiguous versions across branches,
missing files), falls back to resetting the public schema.
Also adds --reset-db and --skip-db-recovery flags.
When edit_files receives multiple files, each file was processed
independently: read, compute edits, write. If file B failed, file A
was already written to disk. The caller got an error but had no way
to know which files were modified.
Split editFile into prepareFileEdit (read + compute, no side
effects) and a write phase. The handler runs all preparations
first and writes only if every file's edits succeed.
A write-phase failure (e.g. disk full) can still leave earlier
files committed. True cross-file atomicity would require
filesystem transactions. The prepare phase catches the common
failure modes: bad paths, search misses, permission errors.
## Summary
Fixes several bugs in the markdown URL transform that replaces
`localhost` URLs with workspace port-forward URLs in the AI agent chat.
## Bugs Fixed
### 1. URLs without an explicit port produce `NaN` in the subdomain
When an LLM outputs a URL like `http://localhost/path` (no port),
`parsed.port` is the empty string `""`. `parseInt("", 10)` returns
`NaN`, producing a broken URL like:
```
http://NaN--agent--workspace--user.proxy.example.com/path
```
Now defaults to port 80 for HTTP and 443 for HTTPS via the new
`resolveLocalhostPort()` helper.
### 2. Protocol always hardcoded to `"http"`
The `urlTransform` in `AgentDetail.tsx` always passed `"http"` as the
protocol argument, silently discarding the original URL's scheme. This
meant `https://localhost:8443/...` would not get the `s` suffix in the
subdomain. Now extracts the protocol from the parsed URL, matching the
existing behavior of `openMaybePortForwardedURL`.
### 3. `urlTransform` not memoized
The closure was re-created on every render. Wrapped in `useCallback`
with the four primitive dependencies (`proxyHost`, `agentName`,
`wsName`, `wsOwner`).
### 4. Duplicated `localHosts` definition
The localhost detection set was defined separately in both
`AgentDetail.tsx` and `portForward.ts`. Consolidated into a single
shared export from `portForward.ts`.
## Changes
- **`site/src/utils/portForward.ts`**: Export shared `localHosts` set
and new `resolveLocalhostPort()` helper. Update
`openMaybePortForwardedURL` to use both.
- **`site/src/pages/AgentsPage/AgentDetail.tsx`**: Import shared
`localHosts` and `resolveLocalhostPort`. Fix protocol extraction.
Memoize `urlTransform`.
- **`site/src/utils/portForward.jest.ts`**: Add tests for
`resolveLocalhostPort` and `localHosts`. Renamed from `.test.ts` to
`.jest.ts` to match project convention.
Wraps the chat timeline in React's <Profiler> to emit
performance.measure() entries and throttled console.warn for
slow renders. Inert in standard builds, only produces output
with a profiling build.
Refs #23354
The React Compiler (babel-plugin-react-compiler@1.0.0) handles
memoization automatically for all components in the AgentsPage
compiled path. Three memo() wrappers were redundant:
- ChatMessageItem in ConversationTimeline.tsx
- LazyFileDiff in DiffViewer.tsx
- ChatTreeNode in AgentsSidebar.tsx
Also migrate three Context.Provider usages to the React 19
shorthand (<Context value={...}>) and simplify the EmbedContext
export to use the context directly instead of re-exporting
.Provider as an alias.
Linear's MCP server (`mcp.linear.app`) returns `token_type="bearer"`
(lowercase) in its OAuth2 token response but rejects requests that use
the lowercase form in the `Authorization` header. RFC 6750 says the
scheme is case-insensitive, but Linear enforces capital-B `Bearer`.
Confirmed by running the actual Linear MCP OAuth flow end-to-end:
- `Authorization: Bearer <token>` → **42 tools, works**
- `Authorization: bearer <token>` → **401 invalid_token**
This is a one-line fix: normalize any case variant of `bearer` to
`Bearer` before building the `Authorization` header, matching the
behavior of the mcp-go library's own OAuth handler.
Depot runners are running out of disk space and blocking builds.
Temporarily switch the build and release jobs from depot runners to
GitHub-hosted runners:
- `ci.yaml` build job: `depot-ubuntu-22.04-8` → `ubuntu-latest`
- `release.yaml` check-perms + release jobs: `depot-ubuntu-22.04-8` →
`ubuntu-latest`
**This is intended to be reverted once depot resolves their disk space
issues.**
> 🤖 This PR was created with the help of Coder Agents, and will be
reviewed by my human. 🧑💻
<img width="684" height="540" alt="image"
src="https://github.com/user-attachments/assets/ccd09873-4640-4a54-b3ca-f740dd50b38d"
/>
## Changes
- Move filter dropdown from top nav bar to inline with the first time
group header (e.g. "Today")
- Remove analytics icon from desktop sidebar nav bar
- Change "View details" to "View usage" in the usage indicator dropdown
- Fix green progress bar visibility in dark mode (`bg-surface-green` →
`bg-content-success`)
- Fix missing space before date in "Resets" text
---
PR generated with Coder Agents
- update some config settings to support "absolute"-style imports by
using a `#/` prefix
- migrate some of the imports in the `WorkspacesPage` to use the new
import style as a proof of concept
because of the change in import sorting behavior this results in, this
diff is already kind of hard to look at–even just from a small migration
for a single page. I think breaking this up into bite size pieces isn't
gonna be worth the work, and leaves more time for merge conflicts to
accrue, more times people would likely have to resolve them.
so I think as far as process for this, I'd like to...
- merge this PR as is, where the config changes are relatively easy to
spot in the haystack, with just enough imports updated to prove that the
config changes are correct
- merge another mega PR after this one which just bites the bullet and
migrates everything else in one fell swoop. it'll probably result in a
ton of merge conflicts for open PRs, but at least it'll only do so once
and then it can be over with.
The @ imports at the bottom of this file are auto-loaded by Claude Code
but silently ignored by other agent runtimes (Coder Agents, Zed, etc.).
Add an explicit fallback so those agents know what to read and when.
After a long requestAnimationFrame pause (e.g. backgrounded tab), the
time delta can be very large, causing the character budget to spike and
bypass smooth rendering entirely. Clamp to 100ms.
Extracted from #23236.
## Problem
MCP servers like Linear (`mcp.linear.app`) require PKCE (RFC 7636) for
their OAuth2 flow. Without it, the token exchange may succeed but the
resulting access token is immediately rejected with a 401
`invalid_token` error when the chat daemon tries to connect to the MCP
server.
This means users can authenticate successfully in the UI (the OAuth
popup completes, `auth_connected` shows `true`), but the model never
receives the MCP tools — they silently fail to load.
### Root cause
The `mcpServerOAuth2Connect` handler was calling
`oauth2Config.AuthCodeURL(state)` without any PKCE parameters
(`code_challenge`, `code_challenge_method`). The callback was calling
`oauth2Config.Exchange(ctx, code)` without a `code_verifier`. Linear's
MCP OAuth endpoint decoded state confirms it expected PKCE with
`codeChallengeMethod: "plain"`.
### Investigation
- The chat (`c2c04fc5-5622-4b71-a5a9-80508e86f78e`) had the Linear MCP
server ID in `mcp_server_ids`
- `auth_connected: true` (token row exists in DB)
- No "expired" or "empty token" warnings in logs
- Server log showed: `skipping MCP server due to connection failure ...
error="initialize: transport error: request failed with status 401:
{"error":"invalid_token","error_description":"Missing or invalid access
token"}"`
- Decoding Linear's OAuth state revealed PKCE was expected
## Changes
- Generate a PKCE `code_verifier` during the OAuth2 connect step using
`oauth2.GenerateVerifier()` and store it in a cookie scoped to the
callback path
- Include `code_challenge` (S256) in the authorization redirect URL via
`oauth2.S256ChallengeOption()`
- Pass the `code_verifier` during the token exchange in the callback via
`oauth2.VerifierOption()`
- Fix a nil-pointer guard on `api.HTTPClient` in the callback
- Add tests verifying PKCE parameters are sent correctly and backwards
compatibility when no verifier cookie is present
feat: add deep-review skill for multi-reviewer code review
Add a skill to .agents/skills/deep-review/ that orchestrates parallel
code reviews from domain-specific reviewers (test auditor, security
reviewer, concurrency reviewer, etc.), cross-checks their findings for
contradictions and convergence, then posts a single structured GitHub
review with inline comments.
Each reviewer reads only its own methodology file (roles/{name}.md) to
preserve independent perspectives. The orchestrator cross-checks across
all findings before posting, tracing combined consequences and
calibrating severity in both directions.
Key capabilities: re-review gate for tracking prior findings across
rounds, consequence-based severity (P0-P4), quoting discipline
separating reviewer evidence from orchestrator judgment, and author
independence (same rigor regardless of who wrote the PR).
- Detect `-testify.m` sub-test filtering in `SetupSuite` and skip the `Accounting` check.
> 🤖 This PR was created with the help of Coder Agents, and was reviewed by my human. 🧑💻
## Summary
Stabilizes the /agents chat viewport so users can read older messages
without being yanked to the bottom when new content arrives.
## Architecture
Replaces the implicit scroll-follow behavior with
**ResizeObserver-driven
scroll anchoring**:
- **`autoScrollRef`** is the single source of truth. User scrolling away
from bottom turns it off; scrolling back near bottom or clicking the
button turns it back on.
- A **content ResizeObserver** on an inner wrapper detects transcript
growth. When auto-scroll is on, it re-pins to bottom via double-RAF
(waiting for React commit + layout to settle). When off, it
compensates `scrollTop` by the height delta to preserve the reading
position. Sign-aware for both Chrome-style negative and Firefox-style
positive `flex-col-reverse` scrollTop conventions.
- Compensation is **skipped during pagination** (older messages prepend
into the overflow direction; the browser preserves scrollTop) and
**during reflow** from width changes.
- A **container ResizeObserver** re-pins to bottom after viewport
resizes
(composer growth, panel changes) when auto-scroll is on.
- **`isRestoringScrollRef`** guards against feedback loops from
programmatic scroll writes. The smooth-scroll guard stays active
until the scroll handler detects arrival at bottom.
## Files changed
- **AgentDetailView.tsx**: Rewrote `ScrollAnchoredContainer` with the
new approach.
- **AgentDetailView.stories.tsx**: Refactored `ScrollToBottomButton`
story
scroll helpers into shared utilities.
## Behavior
- **At bottom + new content**: stays pinned, button hidden.
- **Scrolled up + new content**: reading position preserved, no jump.
- **Viewport resize while pinned**: re-pins to bottom.
- Scroll-to-bottom button and smooth scroll still work.
Adds a `propose_plan` tool that presents a workspace markdown file as a
dedicated plan card in the agent UI.
The workflow is: the agent uses `write_file`/`edit_files` to build a
plan file (e.g. `/home/coder/PLAN.md`), then calls `propose_plan(path)`
to present it. The backend reads the file via `ReadFile` and the
frontend renders it as an expanded markdown preview card.
**Backend** (`coderd/x/chatd/chattool/proposeplan.go`): new tool
registered as root-chat-only. Validates `.md` suffix, requires an
absolute path, reads raw file content from the workspace agent. Includes
1 MiB size cap.
**Frontend** (`site/src/components/ai-elements/tool/`): dedicated
`ProposePlanTool` component with `ToolCollapsible` + `ScrollArea` +
`Response` markdown renderer, expanded by default. Custom icon
(`ClipboardListIcon`) and filename-based label.
**System prompt** (`coderd/x/chatd/prompt.go`): added `<planning>`
section guiding the agent to research → write plan file → iterate → call
`propose_plan`.
OpenAI Responses follow-up turns were replaying full assistant/tool
history even when `store=true`, which breaks after reasoning +
provider-executed `web_search` output.
This change persists the OpenAI response ID on assistant messages, then
in `coderd/x/chatd` switches `store=true` follow-ups to
`previous_response_id` chaining with a system + new-user-only prompt.
`store=false` and missing-ID cases still fall back to manual replay.
It also updates the fake OpenAI server and integration coverage for the
chaining contract, and carries the rebased path move to `coderd/x/chatd`
plus the migration renumber needed after rebasing onto `main`.
## Summary
- use React Query's `dataUpdatedAt` as the remote diff cache
invalidation token instead of a component-local counter
- keep the `@pierre/diffs` cache key stable across remounts without a
custom hashing implementation
- preserve targeted coverage for the cache-key helper used by the
/agents remote diff viewer
## Testing
- `cd site && pnpm exec biome check
src/pages/AgentsPage/components/DiffViewer/RemoteDiffPanel.tsx
src/pages/AgentsPage/components/DiffViewer/diffCacheKey.ts
src/pages/AgentsPage/components/DiffViewer/diffCacheKey.test.ts`
- `cd site && pnpm exec vitest run
src/pages/AgentsPage/components/DiffViewer/diffCacheKey.test.ts
--project=unit`
- `cd site && pnpm exec tsc -p .`
https://github.com/user-attachments/assets/a482ef45-402a-4d86-af59-b1526b2ce3e2
## Summary
Redesigns the **Default Autostop** section on the `/agents` settings
page to clarify that it is a fallback for chat-linked workspaces whose
templates do not define their own autostop policy. Template-level
settings always take priority — this is a backstop, not an override.
## Changes
### UX
- Renamed to **Workspace Autostop Fallback** with clearer description
- Replaced always-visible duration field (confusing `0` in an hours box)
with a **toggle-to-enable** pattern matching the Virtual Desktop section
- Toggle ON auto-saves with a 1-hour default; toggle OFF auto-saves with
0
- Save button is always visible when the toggle is on but disabled until
the user changes the duration value
- Per-section disabled flags — toggling autostop no longer freezes the
Virtual Desktop switch or prompt textareas during the save round-trip
### Reliability
- `onError` rollback on toggle auto-saves so the UI snaps back to server
truth on failure
- Stateful mocks in Storybook stories to prevent race conditions from
instant mock resolution
### Accessibility
- Added `aria-label="Autostop duration"` to the DurationField input
- Updated `DurationField` component to merge external `inputProps` with
internal ones (preserves `step: 1`)
### Stories
- Updated all existing autostop stories for the new toggle-based flow
- Added `DefaultAutostopToggleOff` — tests disabling from an enabled
state
- Added `DefaultAutostopSaveDisabled` — verifies Save button is visible
but disabled when no duration change
---
PR generated with Coder Agents
## Problem
`TestConnectAll_MultipleServers` flakes with:
```
net/http: HTTP/1.x transport connection broken: http: CloseIdleConnections called
```
Each MCP client connection implicitly uses `http.DefaultTransport`. When
`httptest.Server.Close()` runs during parallel test cleanup, it calls
`CloseIdleConnections` on `http.DefaultTransport`, breaking in-flight
connections from other goroutines or parallel tests sharing that
transport.
## Fix
Clone the default transport for each MCP connection via
`http.DefaultTransport.(*http.Transport).Clone()`, passed through
`WithHTTPBasicClient` (StreamableHTTP) and `WithHTTPClient` (SSE). This
scopes idle connection cleanup to a single MCP server so it cannot
disrupt unrelated connections.
Fixescoder/internal#1420
## Problem
During OAuth2 auto-discovery for MCP servers, the callback URL
registered with the remote authorization server via Dynamic Client
Registration (RFC 7591) contained the literal string `{id}` instead of
the actual config UUID:
```
https://coder.example.com/api/experimental/mcp/servers/{id}/oauth2/callback
```
This happened because the discovery and registration occurred **before**
the database insert that generates the ID. When the user later initiated
the OAuth2 connect flow, the redirect URL used the real UUID, causing
the authorization server to reject it with:
> The provided redirect URIs are not approved for use by this
authorization server
## Fix
Restructure the auto-discovery flow in `createMCPServerConfig` to:
1. **Insert** the MCP server config first (with empty OAuth2 fields) to
get the database-generated UUID
2. **Build** the callback URL with the actual UUID
3. **Perform** OAuth2 discovery and dynamic client registration with the
correct URL
4. **Update** the record with the discovered OAuth2 credentials
5. **Clean up** the record if discovery fails
## Testing
Added regression test
`TestMCPServerConfigsOAuth2AutoDiscovery/RedirectURIContainsRealConfigID`
that:
- Stands up mock auth + MCP servers
- Captures the `redirect_uris` sent during dynamic client registration
- Asserts the URI contains the real config UUID, not `{id}`
- Verifies the full callback path structure
All existing MCP server config tests continue to pass.
Fallback to the configured model name in PR Insights when a model config
has a blank display name.
This updates both the by-model breakdown and recent PR rows, and adds a
regression test for blank display names.
replace_all in fuzzy mode (passes 2 and 3 of fuzzyReplace) only
replaced the first match. seekLines returned the first match,
spliceLines replaced one range, and there was no loop.
Extract fuzzy pass logic into fuzzyReplaceLines which:
- Returns a 3-tuple (result, matched, error) for clean caller flow
- When replaceAll is true, collects all non-overlapping matches
then applies replacements from last to first to preserve indices
- When replaceAll is false with multiple matches, returns an error
Add test cases for replace_all with fuzzy trailing whitespace and
fuzzy indent matching.
Both write_file and edit_files use atomic writes (write to temp
file, then rename). Since rename operates on directory entries, it
replaces symlinks with regular files instead of writing through
the link to the target.
Add resolveSymlink() that uses afero.Lstater/LinkReader to resolve
symlink chains (up to 10 levels) before the atomic write. Both
writeFile and editFile resolve the path before any filesystem
operations, matching the behavior of 'echo content > symlink'.
Gracefully no-ops on filesystems that don't support symlinks (e.g.
MemMapFs used in existing tests).
## Summary
Adds a user-facing MCP server configuration panel to the chat input
toolbar. Users can toggle which MCP servers provide tools for their chat
sessions, and authenticate with OAuth2 servers via popup windows.
## Changes
### New Components
- **`MCPServerPicker`** (`MCPServerPicker.tsx`): Popover-based picker
that appears in the chat input toolbar next to the model selector. Shows
all enabled MCP servers with toggles.
- **`MCPServerPicker.stories.tsx`**: 13 Storybook stories covering all
states.
### Availability Policies
Respects the admin-configured availability for each server:
- **`force_on`**: Always active, toggle disabled, lock icon shown. User
cannot disable.
- **`default_on`**: Pre-selected by default, user can opt out via
toggle.
- **`default_off`**: Not selected by default, user must opt in via
toggle.
### OAuth2 Authentication
For servers with `auth_type: "oauth2"`:
- Shows auth status (connected/not connected)
- "Connect to authenticate" link opens a popup window to
`/api/experimental/mcp/servers/{id}/oauth2/connect`
- Listens for `postMessage` with `{type: "mcp-oauth2-complete"}` from
the callback page
- Same UX pattern as external auth on the Create Workspace screen
### Integration Points
- `AgentChatInput`: MCP picker appears in the toolbar after the model
selector
- `AgentDetail`: Manages MCP selection state, initializes from
`chat.mcp_server_ids` or defaults
- `AgentDetailView` / `AgentDetailContent`: Props plumbed through to
input
- `AgentCreatePage` / `AgentCreateForm`: MCP selection for new chats
- `mcp_server_ids` now sent with `CreateChatMessageRequest` and
`CreateChatRequest`
### Helper
- `getDefaultMCPSelection()`: Computes default selection from
availability policies (`force_on` + `default_on`)
## Storybook Stories
| Story | Description |
|-------|-------------|
| NoServers | No servers - picker hidden |
| AllDisabled | All disabled servers - picker hidden |
| SingleForceOn | Force-on server with locked toggle |
| SingleDefaultOnNoAuth | Default-on with no auth required |
| SingleDefaultOff | Optional server not selected |
| OAuthNeedsAuth | OAuth2 server needing authentication |
| OAuthConnected | OAuth2 server already connected |
| MixedServers | Multiple servers with mixed availability/auth |
| AllConnected | All OAuth2 servers authenticated |
| Disabled | Picker in disabled state |
| WithDisabledServer | Disabled servers filtered out |
| AllOptedOut | All toggled off except force_on |
| OptionalOAuthNeedsAuth | Optional OAuth2 needing auth |
All map iterations in ConvertState now use sorted helpers instead of
ranging over Go maps directly. Previously only coder_env and
coder_script were sorted (via sortedResourcesByType). This extends
the pattern to coder_agent, coder_devcontainer, coder_agent_instance,
coder_app, coder_metadata, coder_external_auth, and the main
resource output list.
Also fixes generate.sh writing version.txt to the wrong directory
(resources/ instead of testdata/), which caused the Makefile version
check to silently desync and trigger unnecessary regeneration.
Adds TestConvertStateDeterministic that calls ConvertState 10 times
per fixture and asserts byte-identical JSON output without any
post-hoc sorting.
## Context
`./scripts/develop.sh` was failing to build in my dogfood workspace
with:
```
src/testHelpers/handlers.ts(346,35): error TS2345: Argument of type 'NonSharedBuffer'
is not assignable to parameter of type 'ArrayBuffer'.
Type 'Buffer<ArrayBuffer>' is missing the following properties from type 'ArrayBuffer':
maxByteLength, resizable, resize, detached, and 2 more.
```
## Alternatives considered
**`fileBuffer.buffer`** — `.buffer` gives you the underlying
`ArrayBuffer`, but Node pools small buffers into a shared 8 KB slab. A
`Buffer.from("hello")` has `byteOffset: 1472` and `.buffer.byteLength:
8192` — passing `.buffer` to a `Response` sends all 8,192 bytes instead
of 5. It happens to work for `readFileSync` (dedicated allocation,
offset 0), but breaks silently if someone refactors how the buffer is
constructed.
**`fileBuffer.buffer.slice(byteOffset, byteOffset + byteLength)`** — the
safe version of the above. Always correct, but unnecessarily complex.
**`new HttpResponse(fileBuffer)`** (chosen) — `HttpResponse` extends
`Response`, whose constructor accepts `BodyInit` which includes
`Uint8Array`. When you pass a typed array view, `Response` reads only
the bytes within that view (respecting `byteOffset`/`byteLength`), so
it's safe regardless of pooling. `Buffer` is a `Uint8Array` subclass, so
this just works:
```
pooled = Buffer.from("hello") → byteOffset: 1472, .buffer: 8192 bytes
new Response(pooled.buffer) → body: 8192 bytes ✗
new Response(pooled) → body: 5 bytes ✓
```
_Disclaimer:_ _produced_ _by_ _Claude_ _Opus_ _4\.6,_ _reviewed_ _by_ _me._
**This is a breaking change.** Users who are not have `owner` or sitewide `auditor` roles will no longer be able to view interceptions.
Regular users should not need to view this information; in fact, it could be used by a malicious insider to see what information we track and don't track to exfiltrate data or perform actions unobserved.
---
Changed authorization for AI Bridge interception-related operations from system-level permissions to resource-specific permissions. The following functions now authorize against `rbac.ResourceAibridgeInterception` instead of `rbac.ResourceSystem`:
- `ListAIBridgeTokenUsagesByInterceptionIDs`
- `ListAIBridgeToolUsagesByInterceptionIDs`
- `ListAIBridgeUserPromptsByInterceptionIDs`
Updated RBAC roles to grant AI Bridge interception permissions:
- **User/Member roles**: Can create and update AI Bridge interceptions but cannot read them back
- **Service accounts**: Same create/update permissions without read access
- **Owners/Auditors**: Retain full read access to all interceptions
Removed system-level authorization bypass in `populatedAndConvertAIBridgeInterceptions` function, allowing proper resource-level authorization checks.
Updated tests to reflect the new permission model where members cannot view AI Bridge interceptions, even their own, while owners and auditors maintain full visibility.
<!--
If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting.
-->
_Disclaimer:_ _initially_ _produced_ _by_ _Claude_ _Opus_ _4\.6,_ _heavily_ _modified_ _and_ _reviewed_ _by_ _me._
Closes https://github.com/coder/internal/issues/1360
Adds a new `/api/v2/aibridge/sessions` API which returns "sessions".
Sessions, as defined in the [RFC](https://www.notion.so/coderhq/AI-Bridge-Sessions-Threads-2ccd579be59280f28021d3baf7472fbe?source=copy_link), are a set of interceptions logically grouped by a session key issued by the client.
The API design for this endpoint was done in [this doc](https://github.com/coder/internal/issues/1360).
If the client has not provided a session ID, we will revert to the thread root ID, and if that's not present we use the interception's own ID (i.e. a session of a single interception - which is effectively what we show currently in our `/api/v2/aibridge/interceptions` API).
The SQL query looks gnarly but it's relatively simple, and seems to perform well (~200ms) even when I import dogfood's `aibridge_*` tables into my workspace. If we need to improve performance on this later we can investigate materialized views, perhaps, but for now I don't think it's warranted.
---
_The PR looks large but it's got a lot of generated code; the actual changes aren't huge._
## Summary
- replace dynamic dayjs() date generation in LicenseCard stories with
fixed deterministic timestamps
- preserve story behavior while preventing day-over-day visual drift in
Chromatic
- use shared constants for expired and future date scenarios
## Issue context
On `dev.coder.com`, users could successfully log in, briefly see the web
UI, and then get redirected back to `/login`.
We traced the most reliable repro to viewing Tracy's workspaces on the
`/workspaces` page. That page eagerly issues authenticated per-row
requests such as:
- `POST /api/v2/authcheck`
- `GET /api/v2/workspacebuilds/:workspacebuild/parameters`
One confirmed failing request was for Tracy's workspace
`nav-scroll-fix-1f6b`:
- route: `GET
/api/v2/workspacebuilds/f2104ae6-7d53-457c-a8df-de831bee76db/parameters`
- build owner/workspace: `tracy/nav-scroll-fix-1f6b`
The failing response body was:
- message: `An internal error occurred. Please try again or contact the
system administrator.`
- detail: `Internal error fetching API key by id. fetch object: pq:
password authentication failed for user "coder"`
That showed the request was not actually unauthorized. The server hit an
internal database/authentication problem while resolving the session API
key. The underlying issue was that DB password rotation had been
enabled, it has since been disabled.
However, the logout cascade happened because:
1. `APIKeyFromRequest()` returned `ok=false` for both genuine auth
failures and internal backend failures.
2. `ValidateAPIKey()` wrapped every `!ok` result as `401 Unauthorized`.
3. `RequireAuth.tsx` signs the user out on any `401` response.
So a transient backend/database failure was being misreported as an auth
failure, which made the client forcibly log the user out.
A useful extra clue was that the installed PWA did not repro. The PWA
starts on `/agents`, which avoids the `/workspaces` request fan-out.
That helped narrow the problem to the eager authenticated requests on
the workspace list rather than to cookies or the login flow itself.
## What changed
This PR now fixes the bug without changing the exported
`APIKeyFromRequest()` surface:
- `ValidateAPIKey()` now uses a new internal helper that returns a typed
`ValidateAPIKeyError`
- the exported `APIKeyFromRequest()` helper remains compatible for
existing callers like `userauth.go`
- internal API-key lookup failures are classified as `500 Internal
Server Error` plus `Hard: true`
- internal `UserRBACSubject()` failures now return `500 Internal Server
Error` instead of `401 Unauthorized`
- a focused regression test verifies that an internal `GetAPIKeyByID`
failure surfaces as `500`
This removes the brittle message-based classification and makes the
internal-auth-failure path robust for all API-key lookup failures
handled by auth middleware.
## What
Adds per-user per-model auto-compaction threshold overrides. Users can
now customize the percentage of context window usage that triggers chat
compaction, independently for each enabled model.
## Why
The compaction threshold was previously only configurable at the
deployment level (`chat_model_configs.compression_threshold`). Different
users have different preferences — some want aggressive compaction to
keep costs low, others prefer higher thresholds to retain more context.
This gives users control without requiring admin intervention.
## Architecture
**Storage:** Reuses the existing `user_configs` table (no migration
needed). Overrides are stored as key/value pairs with keys shaped
`chat_compaction_threshold:<modelConfigID>` and integer percent values.
**API:** Three new experimental endpoints under
`/api/experimental/chats/config/`:
- `GET /user-compaction-thresholds` — list all overrides for the current
user
- `PUT /user-compaction-thresholds/{modelConfig}` — upsert an override
(validates model exists and is enabled, validates 0–100 range)
- `DELETE /user-compaction-thresholds/{modelConfig}` — clear an override
(idempotent)
**Runtime resolution:** In `coderd/chatd/chatd.go`, a new
`resolveUserCompactionThreshold()` helper runs at the start of each chat
turn (inside `runChat()`), after the model config is resolved but before
`CompactionOptions` is built. If a valid override exists, it replaces
`modelConfig.CompressionThreshold`. The threshold source
(`user_override` vs `model_default`) is logged with each compaction
event.
**Precedence:** `effectiveThreshold = userOverride ??
modelConfig.CompressionThreshold`
**UI:** New "Context Compaction" subsection in the Agents → Settings →
Behavior tab, placed after Personal Instructions. Shows one row per
enabled model with the system default, a number input for the override,
and Save/Reset controls.
## Testing
- 9 API subtests covering CRUD, validation (boundary values 0/100,
out-of-range rejection), upsert behavior, idempotent delete, user
isolation, and non-existent model config
- 4 dbauthz tests (16 scenarios) verifying `ActionReadPersonal` /
`ActionUpdatePersonal` on all query methods
- 4 Storybook stories with play functions (Default, WithOverrides,
Loading, Error)
<details>
<summary>Implementation plan</summary>
### Phase 1 — Tests
- Backend API tests in `coderd/chats_test.go` (9 subtests)
- Database auth wrapper tests in
`coderd/database/dbauthz/dbauthz_test.go` (4 methods)
- Frontend stories in `UserCompactionThresholdSettings.stories.tsx` (4
stories)
### Phase 2 — Backend preference surface
- 4 SQL queries in `coderd/database/queries/users.sql` (list, get,
upsert, delete)
- `make gen` to propagate into generated artifacts
- Auth/metrics wrappers in dbauthz and dbmetrics
- SDK types and client methods in `codersdk/chats.go`
- HTTP handlers and routes in `coderd/chats.go` and `coderd/coderd.go`
- Key prefix constant shared between handlers and runtime
### Phase 3 — Runtime override
- `resolveUserCompactionThreshold()` helper in `coderd/chatd/chatd.go`
- Override injection in `runChat()` before building `CompactionOptions`
- `threshold_source` field added to compaction log
### Phase 4 — Settings UI
- API client methods and React Query hooks in `site/src/api/`
- `UserCompactionThresholdSettings` component extracted from
`SettingsPageContent`
- Per-model mutation tracking (only the active row disables during save)
- 100% warning, "System default" label, helpful empty state copy
### Phase 5 — Refactor and review fixes
- Consolidated key prefix constant in `codersdk`
- Explicit PUT range validation (not just struct tags)
- GET handler gracefully skips malformed rows instead of 500
- Boundary value, upsert, and non-existent model config tests
- UX improvements: per-model mutation state, aria-live on errors
</details>
## Problem
When adding an external MCP server with `auth_type=oauth2`, admins
currently must manually provide:
- `oauth2_client_id`
- `oauth2_client_secret`
- `oauth2_auth_url`
- `oauth2_token_url`
This requires the admin to manually register an OAuth2 client with the
external MCP server's authorization server first — a friction-heavy
process that contradicts the MCP spec's vision of plug-and-play
discovery.
## Solution
When an admin creates an MCP server config with `auth_type=oauth2` and
omits the OAuth2 fields, Coder now automatically discovers and registers
credentials following the MCP authorization spec:
1. **Protected Resource Metadata (RFC 9728)** — Fetches
`/.well-known/oauth-protected-resource` from the MCP server to discover
its authorization server. Falls back to probing the server URL for a
`WWW-Authenticate` header with a `resource_metadata` parameter.
2. **Authorization Server Metadata (RFC 8414)** — Fetches
`/.well-known/oauth-authorization-server` from the discovered auth
server to find all endpoints.
3. **Dynamic Client Registration (RFC 7591)** — Registers Coder as an
OAuth2 client at the auth server's registration endpoint, obtaining a
`client_id` and `client_secret` automatically.
The discovered/generated credentials are stored in the MCP server
config, and the existing per-user OAuth2 connect flow works unchanged.
### Backward compatibility
- **Manual config still works**: If all three fields
(`oauth2_client_id`, `oauth2_auth_url`, `oauth2_token_url`) are
provided, the existing behavior is unchanged.
- **Partial config is rejected**: Providing some but not all fields
returns a clear error explaining the two options.
- **Discovery failure is clear**: If auto-discovery fails, the error
message explains what went wrong and suggests manual configuration.
## Changes
- **New package `coderd/mcpauth`** — Self-contained discovery and DCR
logic with no `codersdk` dependency
- **Modified `coderd/mcp.go`** — `createMCPServerConfig` handler now
attempts auto-discovery when OAuth2 fields are omitted
- **Tests** — Unit tests for discovery (happy path, WWW-Authenticate
fallback, no registration endpoint, registration failure) and
`parseResourceMetadataParam` helper
The slices package provides type-safe generic replacements for the
old typed sort convenience functions. The codebase already uses
slices.Sort in 43 call sites; this finishes the migration for the
remaining 29.
- sort.Strings(x) -> slices.Sort(x)
- sort.Float64s(x) -> slices.Sort(x)
- sort.StringsAreSorted(x) -> slices.IsSorted(x)
Consolidates invocations of `coderdtest.New` to a single shared instance per
parent for the following tests:
- `TestOAuth2ClientMetadataValidation`
- `TestOAuth2ClientNameValidation`
- `TestOAuth2ClientScopeValidation`
- `TestOAuth2ClientMetadataEdgeCases`
> 🤖 This PR was created with the help of Coder Agents, and was
reviewed by my human. 🧑💻
The test-storybook target uses @vitest/browser-playwright with
Chromium but never installs the browser binaries. pnpm install
only fetches the npm package; the actual browser must be
downloaded separately via playwright install. This mirrors what
test-e2e already does.
Consolidates 6 tests that spun up separate coderdtest instances per sub-test into a single shared instance per parent.
> 🤖 This PR was created with the help of Coder Agents, and has been
reviewed by my human. 🧑💻
TestInterruptAutoPromotionIgnoresLaterUsageLimitIncrease still relied on
wall-clock polling after the acquire loop moved to a mock clock, so it
could assert before chatd finished its asynchronous cleanup and
auto-promotion work.
Wait on explicit request-start signals and on the server's in-flight
chat work before asserting the intermediate and final database state.
This keeps the test synchronized with the actual processor lifecycle
instead of scheduler timing.
Closes https://github.com/coder/internal/issues/1406
Removes unused tools from dogfood Dockerfile:
- Go tools `moq`, `go-swagger`, `goreleaser`, `goveralls`, `kind`,
`helm-docs`, `gcr-cleaner-cli`
- curl-installed `cloud_sql_proxy`, `dive`, `docker-credential-gcr`, `grype`,
`kube-linter`, `stripe` CLI, `terragrunt`, `yq` v3, GoLand 2021.2 , ANTLR v4 jar
- apt packages `cmake`, `google-cloud-sdk-datastore-emulator`, `graphviz`, `packer`
> 🤖 This PR was created with the help of Coder Agents, and was reviewed by my human. 🧑💻
The tool descriptions pushed agents toward backgrounding anything over
5 seconds, including builds, tests, and installs where you actually
want to wait for the result. This led to unnecessary process_output
round-trips and missed the foreground timeout-to-reattach workflow
entirely.
Reframe background mode as the exception (persistent processes with
no natural exit) and foreground with an appropriate timeout as the
default. Replace "background process" with "tracked process" in
process_output, process_list, and process_signal since they work on
all tracked processes regardless of how they were started.
- Moves `coderd/chatd/`, `coderd/gitsync/`, `enterprise/coderd/chatd/`
under `x/` parent directories to signal instability
- Adds `Experimental:` glue code comments in `coderd/coderd.go`
> 🤖 This PR was created with the help of Coder Agents, and was
reviewed by my human. 🧑💻
- Changes all 41 chat method receivers in `codersdk/chats.go` from
`*Client` to `*ExperimentalClient` to ensure that callers are aware that
these reference potentially unstable `/api/experimental` endpoints.
> 🤖 This PR was created with the help of Coder Agents, and has been
reviewed by my human. 🧑💻
## Problem
When the Coder chat UI is embedded in a VS Code webview, the session
token is set via the Coder-Session-Token header for HTTP requests.
However, browsers cannot attach custom headers to WebSocket connections,
and VS Code Electron webview environment does not support cookies set
via Set-Cookie from iframe origins. This causes all chat WebSocket
connections to fail with authorization errors.
## Solution
Pass the session token as a coder_session_token query parameter on all
chat-related WebSocket connections. The backend already accepts this
parameter (see APITokenFromRequest in coderd/httpmw/apikey.go).
The token is only included when API.getSessionToken() returns a value,
which only happens in the embed bootstrap flow. Normal browser sessions
use cookies and are unaffected.
> Built with [Coder Agents](https://coder.com/agents)
The prompt told the model to "describe the primary intent" and gave
only generic examples, so it stripped PR numbers, repo names, and
other distinguishing details. Added explicit GOOD/BAD examples to
steer away from generic titles like "Review pull request changes".
Also removed "no special characters" which prevented # and / in
identifiers.
- Guard both `onSettled` callbacks in
`archiveAndDeleteMutation.mutate()` with `shouldNavigateAfterArchive()`,
which checks whether the user is still viewing the archived chat (or a
sub-agent of it) before calling `navigate("/agents")`
- Extract `shouldNavigateAfterArchive` into `agentWorkspaceUtils.ts`
with 6 unit test cases covering: direct match, different chat, no active
chat, sub-agent of archived parent, sub-agent of different parent, and
cache-cleared fallback
- Look up the active chat's `root_chat_id` from the per-chat query cache
(stable across WebSocket eviction of sub-agents) to handle the sub-agent
case
> 🤖 This PR was created with the help of Coder Agents, and has been
reviewed by my human. 🧑💻
Bumps ubuntu from `3ba65aa` to `ce4a593`.
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps rust from `7d37016` to `f7bf1c2`.
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This PR changes agents desktop resolution from 1366x768 to 1920x1080.
Anthropic requires the that the resolution of desktop screenshots fits
in 1,150,000 total pixels, so we downscale screenshots to 1280x720
before sending them to the LLM provider.
Resolution scaling was already implemented, but our code didn't exercise
it. The resolution bump showed that there were some bugs in the scaling
logic - this PR fixes these bugs too.
The "Task app is not ready to accept input" error occurs when the
agent responds successfully but its status is not "stable" (e.g.
"running"). This is a state conflict, not a gateway error. 502 was
semantically wrong because the gateway communication succeeded.
409 Conflict is correct because the request conflicts with the
agent's current state. This is consistent with how
authAndDoWithTaskAppClient already returns 409 for pending,
initializing, and paused agent states.
## Problem
The `UsageUserDrillIn` play function in
`AgentSettingsPageView.stories.tsx`
flakes in Chromatic (noticed in #23282). After clicking a user row to
drill
into the detail view, sync assertions fire before React finishes the
state
transition — element not found.
<img width="1110" height="649" alt="image"
src="https://github.com/user-attachments/assets/8b5c36c2-09c4-4dd6-a280-ab6379c1464e"
/>
### Root cause
The play function clicks "Alice Liddell" and then waits with
`findByText("Alice Liddell")` before asserting on detail-view content.
But
"Alice Liddell" appears in **both** the list row and the detail header,
so
`findByText` resolves immediately against the stale list-row text that
is
still in the DOM. The same is true for `"@alice"` — `UserRow` renders
`@${user.username}` as a subtitle in the list, and `AvatarData` renders
it
again in the detail view.
### Fix
Gate on `"User ID: ..."` instead — text that **only** renders in the
detail
panel. Once it is in the DOM, the detail view is fully mounted and all
sync
assertions are safe.
Applied to both `UsageUserDrillIn` and `UsageUserDrillInAndBack`, which
had
the same issue.
## Problem
`/agents/analytics` showed an infinite loading spinner. The browser
devtools revealed repeated requests to the chat cost summary endpoint
with `start_date` and `end_date` shifting by a few milliseconds on each
request.
`AgentAnalyticsPage` called `createDateRange(now)` on every render. When
`now` is not passed (production), `createDateRange` falls through to
`dayjs()`, which produces a new millisecond-precision timestamp each
time. Those timestamps became part of the React Query key via
`chatCostSummary()`, so every render created a new query identity, fired
a new fetch, state-updated, re-rendered, and the cycle repeated. The
page never left the loading branch because no query result was ever
observed for the `current` key before it changed.
The same pattern existed in `InsightsContent`, where
`timeRangeToDates()` called `dayjs()` on every render and fed the result
into `prInsights()`.
Storybook didn't catch this because stories pass a fixed `now` prop,
keeping the date range stable.
## Fix
Anchor the date window once using `useState`'s lazy initializer, then
derive `start_date`/`end_date` from the stable anchor during render — no
`useEffect`, no memoization for correctness, just stable input → stable
query key.
- **`AgentAnalyticsPage`**: `const [anchor] = useState<Dayjs>(() =>
dayjs())`, then `createDateRange(now ?? anchor)`. The `now` prop still
takes priority so Storybook snapshots remain deterministic.
- **`InsightsContent`**: Collapses `timeRange` and its anchor into a
single `TimeRangeSelection` state object. A fresh anchor is captured
only when the user changes the selected range (event handler), not on
render. Clicking the already-selected range is a no-op.
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's major version (unless you unignore this specific
dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's minor version (unless you unignore this specific
dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR
and stop Dependabot creating any more for the specific dependency
(unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore
conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will
remove the ignore condition of the specified dependency and ignore
conditions
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Continuation of https://github.com/coder/coder/pull/23067
Add filtering to the paginated org member endpoint (pretty much the same
as what I did in the previous PR with group members, except there I also
had to add pagination since it was missing).
## What
Replace the hardcoded 30-day date window on the Agents Settings Usage
page (`/agents/settings/usage`) with an interactive date-range picker.
## Why
The usage page previously showed a static 30-day lookback with no way
for admins to adjust the time window. The backend API already supports
`start_date`/`end_date` parameters — only the frontend was missing the
controls.
## How
- Reuse the existing `DateRange` picker component from Template Insights
- Store selected dates in URL search params (`startDate`/`endDate`) for
persistence across navigation
- Default to last 30 days when no params are present
- Memoize date range for stable React Query keys
- Both the user list and per-user drill-in views respect the selected
range
- Normalize exclusive end-date boundaries for display
- Preset clicks (Last 7 days, etc.) apply immediately with a single
click
- Semi-transparent loading overlay during data refetch
## Changes
- `site/src/pages/AgentsPage/SettingsPageContent.tsx` — Replace
hardcoded range with interactive picker, URL param state, memoized
params, refetch overlay
- `site/src/pages/AgentsPage/SettingsPageContent.stories.tsx` — Add
stories for date filter interaction, preset single-click, and refetch
overlay
- `site/src/pages/TemplatePage/TemplateInsightsPage/DateRange.tsx` —
Detect preset clicks and apply immediately (single-click) instead of
requiring two clicks
## Validation
- TypeScript ✅
- Biome lint ✅
- Storybook tests 13/13 ✅
- Visual verification via Storybook ✅
The attribution footer in the PR style guide assumed all AI-generated
PRs come from Claude Code using Claude Sonnet 4.5. PRs can be generated
through different tools and models (e.g. Coder Agents), so a hardcoded
template is misleading.
Co-authored-by: Michael Suchacz <ibetitsmike@users.noreply.github.com>
## Summary
Adds inline editing of existing per-user and per-group chat usage limit
overrides from the Limits tab. Admins can now click Edit on any override
row to modify the spend limit in-place, using the same form used for
adding overrides.
## Changes
**Backend** (`coderd/chats_test.go`)
- Added `UpdateUserOverride` and `UpdateGroupOverride` test cases
covering the upsert-in-place behavior.
**Frontend** (3 component files + 2 story files)
- `LimitsTab.tsx`: Edit state management, mutual-exclusion between
user/group edit modes, and handlers that prefill the form from the
existing override.
- `GroupLimitsSection.tsx`: Edit button per row, read-only group
identity in edit mode, Save/Cancel actions, disable states during
pending operations.
- `UserOverridesSection.tsx`: Same pattern as groups — Edit button,
read-only user identity, Save/Cancel, proper disable states.
- New Storybook stories for both sections (Default, EmptyState,
AddForm, EditForm).
## UX behavior
- Clicking Edit opens the inline form with the current spend limit
prefilled and the entity shown as read-only.
- Save uses the existing PUT upsert endpoint (no new API surface).
- Cancel returns to normal list view with form state cleared.
- Edit modes are mutually exclusive — editing a user override closes
any open group form and vice versa.
- All buttons and inputs disable during pending mutations.
- Add and delete continue to work after editing.
Partially addresses #21813 (still need to make changes to the "add user"
button to be complete)
Since there are a lot of user tests already, I moved them into
`coderdtest` to be shared.
We currently call GetWorkspaceAgentByID millions of times at scale
unnecessarily. This PR embeds immutable fields into the relevant
services instead of fetching for them every time.
resolves https://github.com/coder/scaletest/issues/84
Confirmed with a 10k scaletest that this changeset takes the query from
10M+ queries down to 39k
Add Unwrap() to StatusWriter so http.ResponseController.SetWriteDeadline
can reach the underlying net.Conn through the middleware wrapper. Without
this, the agent's 20s WriteTimeout killed blocking process output
connections.
Also add 30s headroom to the write deadline in handleProcessOutput so
the response can be written after a full-duration blocking wait.
On the tool layer, waitForProcess and the process_output tool now try a
non-blocking snapshot on any error, not just context timeout. Transport
errors (like the WriteTimeout EOF) previously returned with no process
ID and no recovery path. Now if the process finished, the result is
returned transparently. If still running, the error includes the process
ID and tells the agent to use process_output.
The 5ms ServiceBannerRefreshInterval caused excessive DRPC
connection churn (200 calls/s) under the race detector, creating
heavy mutex contention on FakeAgentAPI and significant CPU overhead.
This made the test timing-sensitive in ways that manifested as
session.Wait() hangs, killing the test binary via timeout.
Three changes:
- Increase refresh interval from 5ms to testutil.IntervalFast (25ms),
reducing DRPC connection churn and mutex contention by 5x.
- Replace bare <-ready receives with testutil.TryReceive so the test
fails with context expiry instead of hanging indefinitely.
- Add a timeout to session.Wait() in testSessionOutput to prevent any
SSH session hang from killing the entire test binary.
Fixescoder/internal#1417
Replace afero.TempFile (which uses os.CreateTemp with mode 0600)
with a custom createTempFile that uses OpenFile with mode 0666.
This lets the kernel apply the process umask, matching the default
behavior of os.Create. New files now get ~0644 (with standard
umask) instead of 0600.
Extract atomicWrite(ctx, path, mode, haveMode, reader) to share
the entire temp-file lifecycle between writeFile and editFile.
## Problem
Flaky e2e test `create workspace and overwrite default parameters` — the
boolean parameter verification reads `"true"` when it should be
`"false"`.
`verifyParameters` in `site/e2e/helpers.ts` used a one-shot
`isChecked()` for boolean parameters (line 214), while the
`string`/`number` path used Playwright's auto-retrying `toHaveValue()`
with a 15-second timeout. When the settings/parameters page hydrates
with React Query data, the Switch can briefly render the default value
(`true`) before settling on the override (`false`). The one-shot check
captures the stale state.
## Fix
Replace the one-shot `isChecked()` + `expect().toEqual()` with
Playwright's auto-retrying `toBeChecked()` / `not.toBeChecked()`
assertions using a 15-second timeout, matching the pattern already used
for string/number parameters.
Fixescoder/internal#1414
Authored by coder agent 🤖
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
- Add `agents_workspace_ttl` site config (default: whatever the template
says a.k.a. `0s`)
- Expose via GET/PUT `/api/experimental/chats/config/workspace-ttl`
- Chat tool reads setting and passes `TTLMillis` on workspace creation
- Existing autostop infrastructure handles the rest (zero changes to
LifecycleExecutor, CalculateAutostop, or activity bumping)
- ⚠️ Template-level `UserAutostopEnabled=false` overrides this global
default. Not touching this.
- Frontend: "Workspace Lifetime" control in /agents/settings Behavior
tab (admin-only)
> This PR was created with the help of Coder Agents, and has been
reviewed by several humans and robots. 🤖🤝🧑💻
## Summary
Adds a new `coderd/chatd/mcpclient` package that connects to
admin-configured MCP servers and wraps their tools as
`fantasy.AgentTool` values that the chat loop can invoke.
## What changed
### New: `coderd/chatd/mcpclient/mcpclient.go`
The core package with a single entry point:
```go
func ConnectAll(
ctx context.Context,
logger slog.Logger,
configs []database.MCPServerConfig,
tokens []database.MCPServerUserToken,
) (tools []fantasy.AgentTool, cleanup func(), err error)
```
This:
1. Connects to each enabled MCP server using `mark3labs/mcp-go`
(streamable HTTP or SSE transport)
2. Discovers tools via the MCP `tools/list` method
3. Wraps each tool as a `fantasy.AgentTool` with namespaced name
(`serverslug__toolname`)
4. Applies tool allow/deny list filtering from the server config
5. Handles auth: OAuth2 bearer tokens, API keys, and custom headers
6. Skips broken servers with a warning (10s connect timeout per server)
7. Returns a cleanup function to close all MCP connections
### Modified: `coderd/chatd/chatd.go`
In `runChat()`, after loading the model/messages but before assembling
the tool list:
- Reads `chat.MCPServerIDs` from the chat record
- Loads the MCP server configs from the database
- Resolves the user's auth tokens
- Calls `mcpclient.ConnectAll()` to connect and discover tools
- Appends the MCP tools to the chat's tool set
- Defers cleanup to close connections when the chat turn ends
The chat loop (`chatloop.Run`) already handles tools generically —
MCP-backed tools are invoked identically to built-in workspace tools. No
changes needed in `chatloop/`.
### New: `coderd/chatd/mcpclient/mcpclient_test.go`
10 tests covering:
- Tool discovery and namespacing
- Tool call forwarding and result conversion
- Allow/deny list filtering
- Connection failure handling (graceful skip)
- Multi-server support with correct prefixes
- OAuth2 auth header injection
- Disabled server skipping
- Invalid input handling
- Tool info parameter propagation
## Design decisions
- **Tool namespacing**: `slug__toolname` with double underscore
separator. Avoids collisions with tools containing single underscores.
Stripped when forwarding to `tools/call`.
- **Connection lifecycle**: Fresh connections per chat turn, closed via
`defer`. Matches the `turnWorkspaceContext` pattern.
- **Failure isolation**: Each server connects independently. A broken
server doesn't fail the chat — its tools are simply unavailable.
- **No chatloop changes**: The existing `[]fantasy.AgentTool` interface
is already fully generic.
## What's NOT in this PR (follow-ups)
- Frontend MCP server picker UI (selecting servers for a chat)
- System prompt additions describing available MCP tools
- Token refresh on expiry mid-chat
- The deprecated `aibridged` MCP proxy cleanup
The git hooks now classify staged files and select either the full
or lightweight make target. This was missing from the contributing
guide after #23358 landed.
Also add actionlint config to suppress a pre-existing SC2016 false
positive in the triage workflow. Shellcheck disable directives
don't work inside heredocs when actionlint drives shellcheck.
Downgrades the "reporting script completed" log in `agentscripts` from
ERROR to WARN.
During agent reconnects, the `scriptCompleted` RPC can race with the
connection teardown, producing a "connection closed" error. Since
`slogtest` treats ERROR logs as test failures, this causes
`TestAgent_ReconnectNoLifecycleReemit` to flake on macOS.
A failed timing report is non-fatal — the script itself has already
finished, and the agent will continue operating normally. WARN is the
appropriate severity, consistent with the call site in
`agent.go:createDevcontainer`.
Also switches from `fmt.Sprintf` to structured `slog.Error` fields for
consistency with the rest of the codebase.
Fixescoder/internal#1410
> **PR Stack**
> 1. #23351 ← `#23282`
> 2. #23282 ← `#23275`
> 3. #23275 ← `#23349`
> 4. **#23349** ← `main` *(you are here)*
---
Retry events were published only to the local in-process stream via
`publishEvent()`. When pubsub is active, `Subscribe()`'s merge loop only
forwarded durable events (messages, status, errors) from pubsub
notifications,
so retry events were silently dropped for cross-replica subscribers.
This adds a `publishRetry()` helper that publishes both locally and via
pubsub,
and extends the `Subscribe()` notification handler to forward retry
events.
**Changes:**
- `coderd/pubsub/chatstreamnotify.go`: add `Retry` field to notify
message
- `coderd/chatd/chatd.go`: add `publishRetry()`, update `OnRetry`
callback,
extend `Subscribe()` to forward `notify.Retry`
- `coderd/chatd/chatd_internal_test.go`: focused pubsub delivery test
- `enterprise/coderd/chatd/chatd_test.go`: cross-replica end-to-end test
Audited exported helpers in `coderd/util/*`, `testutil`, `cryptorand`,
and friends, then replaced duplicated implementations with canonical
versions.
- **fix: `maps.SortedKeys` generic signature** — value type was
hardcoded to `any`, making it impossible to actually call. Added second
type parameter `V any`. Added table-driven tests with `cmp.Diff`.
- **refactor: replace ad-hoc ptr helpers with `ptr.Ref`** — removed
`int64Ptr`, `stringPtr`, `boolPtr`, `i64ptr`, `strPtr`, `PtrInt32`
across 6 files.
- **refactor: replace local `sortedKeys`/`sortKeys` with
`maps.SortedKeys`** — now that the signature is fixed, scripts can use
it.
- **refactor: replace hand-rolled `capitalize` with
`strings.Capitalize`** — the typegen version was also not UTF-8 safe.
> 🤖 This PR was created with the help of Coder Agents, and was reviewed
by my human. 🧑💻
Pre-commit classifies staged files and runs make pre-commit-light
when no Go, TypeScript, or Makefile changes are present. This
skips gen, lint/go, lint/ts, fmt/go, fmt/ts, and the binary
build. A markdown-only commit takes seconds instead of minutes.
Pre-push uses the same heuristic: if only light files changed
(docs, shell, terraform, etc.), tests are skipped entirely.
Falls back to the full make targets when Go/TS/Makefile changes
are detected, CODER_HOOK_RUN_ALL=1 is set, or the diff range
can't be determined.
Also adds test-storybook to make pre-push (vitest with the
storybook project in Playwright browser mode).
- Remove `TRIVY_VERSION` ARG and trivy CLI install block from
`dogfood/coder/Dockerfile`
- The `trivy` job in `.github/workflows/security.yaml` is kept — it uses
`aquasecurity/trivy-action` pinned to a known-good commit
> 🤖 This PR was created with the help of Coder Agents, and was reviewed
by my human. 🧑💻
Adds a repo-local agent skill at `.agents/skills/pull-requests/SKILL.md`
that guides agents through the PR lifecycle for this repository:
creating, updating, and following up on pull requests.
Covers lifecycle rules (reuse existing PRs, default to draft), local
validation commands (`make pre-commit`, `make lint`, etc.), PR
title/description conventions, CI check follow-up, and explicit
guardrails against common mistakes.
## Problem
Deleting a chat provider that has models referenced by existing chats
returns a raw PostgreSQL foreign key violation error to the user:
```
pq: update or delete on table "chat_model_configs" violates foreign key
constraint "chat_messages_model_config_id_fkey" on table "chat_messages"
```
This happens because `DELETE FROM chat_providers` cascades to
hard-delete
`chat_model_configs` rows, but `chat_messages` and `chats` still
reference
them with the default `RESTRICT` behavior.
## Fix
Check for `IsForeignKeyViolation` on the two relevant constraints and
return a 400 Bad Request with `"Provider models are still referenced by
existing chats."`, matching the existing FK error handling pattern used
elsewhere in the same file.
The Base URL field in the provider config form always showed
`https://api.example.com/v1` as its placeholder, regardless of the
selected provider. This was confusing — I added `/v1` to my Anthropic
base URL because the placeholder suggested it, but the Anthropic SDK
already prefixes its request paths with `v1/`, so this doubled it up
and broke requests.
The placeholder is now provider-aware:
- **Anthropic, Bedrock, Google** → `https://api.example.com`
- **OpenAI-family providers** (openai, openai-compat, openrouter,
vercel, azure) → `https://api.example.com/v1`
Replace the 200ms polling loop in chatd's execute and
process_output tools with server-side blocking via sync.Cond
on HeadTailBuffer.
The agent's GET /{id}/output endpoint accepts ?wait=true to
block until the process exits or a 5-minute server cap expires.
The process_output tool blocks by default for 10s (overridable
via wait_timeout), and falls back to a non-blocking snapshot on
timeout. The execute tool's foreground path makes a single
blocking call instead of polling.
Related #23316
Adds a warning comment to dbauthz.AsSystemRestricted to hopefully nudge agents away from it.
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Rate-limit "chat stream buffer full" and "dropping chat stream event"
WARN logs to at most once per 10s per chat.
- Intermediate drops not logged; WARN includes `dropped_count`.
- Per-chat tracking on `chatStreamState` using timestamp comparison
against `quartz.Clock` — no global tickers, no new `Server` fields.
- Subscriber and buffer drop counters reset at all lifecycle boundaries.
> 🤖 This PR was created with the help of Coder Agents, and was reviewed
by my human. 🧑💻
## Description
Blocks `CONNECT` tunnels to private and reserved IP ranges in
aibridgeproxyd, preventing the proxy from being used to reach internal
networks.
The Coder access URL is always exempt (hostname+port match) so the proxy
can reach its own deployment. It is possible to exempt additional ranges
via `CODER_AIBRIDGE_PROXY_ALLOWED_PRIVATE_CIDRS`.
DNS rebinding is handled differently per path:
* Direct (no upstream proxy): validate the resolved IP right before the
TCP dial, no window between check and connect.
* Upstream proxy: Resolves and checks before forwarding to the upstream
dialer. A small rebinding window exists since the upstream proxy
re-resolves independently.
## Changes
* Add blocked IP denylist covering private, reserved, and
special-purpose ranges
* Add `AllowedPrivateCIDRs` option with CLI flag and env var
* Wire IP checks into `proxy.ConnectDial` for both upstream and direct
paths
* Add tests for blocked/allowed cases across direct dial, upstream
proxy, CIDR exemptions, and CoderAccessURL exemption
Notes: documentation will be handled in a follow-up PR.
Closes: https://github.com/coder/security/issues/124
- Replace real healthcheck with mock `HealthcheckFunc` that returns a
canned report instantly
- Remove healthcheck cache-seeding goroutine/channel workaround
- Remove `HealthcheckTimeout: testutil.WaitSuperLong` (no longer needed)
- Reduce `setupCtx` from `WaitSuperLong` (60s) to `WaitLong` (25s)
The DERP healthcheck performs real network operations (portmapper
gateway probing, STUN) that hang for 60s+ on macOS CI runners. Since
`TestSupportBundle` validates bundle generation, not healthcheck
correctness, a canned report eliminates this entire class of flake.
Fixescoder/internal#272
> 🤖 This PR was created with the help of Coder Agents, and was reviewed
by my human. 🧑💻
handleProcessOutput and handleSignalProcess did not check the
chat ID from the request. Any caller that knew a process ID
could read output or signal processes belonging to other chats.
handleListProcesses already filtered by chat ID. Apply the
same check to the output and signal handlers. Non-chat callers
(no Coder-Chat-Id header) are allowed through for backwards
compatibility.
Both writeFile and editFile now use the same atomic write strategy:
temp file in the same directory, write, rename. This ensures a
failed write leaves the original file intact instead of truncated.
editFile already used temp-and-rename but lost the original file's
permissions because afero.TempFile creates with mode 0600. Both
functions now Chmod after rename to preserve the original mode.
writeFile also swallowed io.Copy errors (logged but returned HTTP
200). Fixed to return the error so the client knows the write
failed.
## Summary
The search field on `/agents/settings/usage` previously only matched
against usernames. This updates the SQL query to also match against the
user's display name via `ILIKE`, and updates the frontend placeholder
and variable names to reflect the broader search scope.
## Changes
- **SQL** (`coderd/database/queries/chats.sql`,
`coderd/database/queries.sql.go`): Added `OR u.name ILIKE '%' ||
@username::text || '%'` to the `GetChatCostPerUser` query's WHERE
clause.
- **Frontend** (`site/src/pages/AgentsPage/SettingsPageContent.tsx`):
Renamed `usernameFilter`/`debouncedUsername` to
`searchFilter`/`debouncedSearch`, updated placeholder to "Search by name
or username".
---
PR generated with Coder Agents
The ampersand detection treated bash's pipe-stderr operator (|&)
as a trailing & for backgrounding, stripping it and producing a
broken pipe command. Also adds tests for execute.go and chatloop
context limit helpers, covering previously untested edge cases.
- Wire `workspacestats.ActivityBumpWorkspace` into `trackWorkspaceUsage`
so the workspace build deadline is extended each time the chat heartbeat
fires
- Prevents mid-conversation autostop for chat workspaces
- Updates `TestHeartbeatBumpsWorkspaceUsage` verifying the deadline bump
> This PR was created with the help of Coder Agents, and was reviewed by two humans and their pet robots 🧑💻🤝🤖
Still not applying at the dynamic parameters websocket. The wsbuilder is
the source of truth for previous values, so this is the most accurate
and still will fail in the synchronous api call to build a workspace.
This mirrors how we handle immutable params.
Closes https://github.com/coder/coder/issues/19064
This adds the UI but does not add it to the Settings sidebar. Until it's
actually functional and usable (which will come in future PRs) it will
remain hidden.
Next step is wiring this up to chats and actually testing the full flow
end-to-end, but we aren't there yet.
- Replace fixed `100vh` spacer at bottom of diff list with a dynamically
sized one
- Adds "X files changed" message to the bottom of the diff
> This PR was created with the help of Coder Agents, and was reviewed by several humans and robots. 🧑💻🤝🤖
There are 333 stories with play functions but no local way to run them.
CI uses Chromatic, which means broken play functions aren't caught until
after push. For agents, the feedback loop is even worse since they can't
open a browser.
This adds the `@storybook/addon-vitest` integration so play functions
can run locally via vitest + Playwright:
```sh
pnpm test:storybook
pnpm test:storybook src/path/to/component.stories.tsx
```
The vitest config is restructured into two projects (`unit` and
`storybook`).
The processChat defer at line 2464 catches panics on its main
goroutine and transitions the chat to error status. This was
previously untested.
The test wraps the database Store to panic during PersistStep's
InTx call, which runs synchronously on the processChat goroutine.
A tool-level panic wouldn't work because executeTools has its own
recover that converts panics into tool error results.
scheduleStreamReset() fired for every durable message event with
changed=true, including user messages (e.g. promoted queued messages).
When a batch contained trailing message_parts followed by a user
message event, the batch loop flushed the parts (building stream
state), then scheduleStreamReset cleared it immediately.
Restrict the reset to assistant messages, which are the only role
that ends a streaming turn.
Dynamic parameters supports ephemeral parameters. Updated the test to
use dynamic parameters.
Ephemeral params **require** a default value.
Closes https://github.com/coder/coder/issues/19065
## Problem
Anthropic rejects requests containing empty text content blocks with:
```
messages: text content blocks must be non-empty
```
Empty text parts (`""` or whitespace-only like `" "`) get persisted in
the database when a stream sends `TextStart`/`TextEnd` with no
`TextDelta` in between. On the next turn, these parts are loaded from
the DB and sent to Anthropic, which rejects them.
## Fix
Filter empty/whitespace-only text and reasoning parts at the two LLM
dispatch boundaries, without modifying persistence (the raw record is
preserved):
- **`partsToMessageParts()`** in `chatprompt.go` — filters when
converting persisted DB messages to fantasy message parts for LLM calls.
This is the last gateway before the Anthropic provider creates
`TextBlockParam` objects.
- **`toResponseMessages()`** in `chatloop.go` — filters when building
in-flight conversation messages between steps within a single turn.
Note: `flushActiveState()` (the interruption path) already had this
guard — the normal `TextEnd` streaming path did not, but since we're not
changing persistence, the fix is applied at the dispatch layer.
## Problem
When `Store: true` is set for OpenAI Responses API calls (the new
default), multi-turn conversations with reasoning models fail on the
second message:
```
stream response: bad request: Item 'rs_xxx' of type 'reasoning' was provided
without its required following item.
```
The fantasy library was reconstructing full `OfReasoning` input items
(with encrypted content and summary) when replaying assistant messages.
The API cannot pair these reconstructed reasoning items with the output
items that originally followed them because the output items are sent as
plain `OfMessage` without server-side IDs.
## Fix
Updates the fantasy dependency (`kylecarbs/fantasy@cj/go1.25`) to skip
reasoning parts during conversation replay in `toResponsesPrompt`. With
`Store` enabled, the API already has the reasoning persisted server-side
— it doesn't need to be replayed in the input.
Fantasy PR: https://github.com/charmbracelet/fantasy/pull/181
## Testing
Adds `TestOpenAIReasoningRoundTrip` integration test that:
1. Sends a query to `o4-mini` (reasoning model with `Store: true`)
2. Verifies reasoning content is persisted
3. Sends a follow-up message — this was the failing step
4. Verifies the follow-up completes successfully
Requires `OPENAI_API_KEY` env var to run.
Omit rehype-raw from the Streamdown rehype plugin list so
HTML-like syntax in LLM output is escaped as text instead of
being parsed by the HTML5 engine and stripped by rehype-sanitize.
When the LLM writes JSX fragments like <Component prop={val} />
outside code fences, remark-parse tags them as html nodes.
rehype-raw then feeds them to parse5, and rehype-sanitize strips
the unknown elements, silently destroying content. Without
rehype-raw, Streamdown auto-injects a remark plugin that converts
html nodes to text, preserving them as visible escaped text.
Markdown formatting (bold, italic, links, code blocks, tables)
is unaffected since those go through remark/rehype directly.
## Summary
- **Removed** `docs/install/cloud/ec2.md` — the standalone EC2 install
guide.
- **Renamed** `docs/install/cloud/aws-mktplc-ce.md` →
`docs/install/cloud/aws-marketplace.md` for a clearer, more discoverable
filename.
- **Updated** `docs/manifest.json`: replaced the "AWS EC2" entry with
"AWS Marketplace" pointing to the renamed file.
- **Updated** `docs/install/cloud/index.md`: fixed the internal link to
the renamed file.
* Adds a "molly-guard" to require users to type the workspace name
before the 'Archive & delete workspace' action fires. This prevents
accidental deletion of 'pet' workspaces.
* This is only shown for workspaces created *before* the chat was
created. The logic here is that any workspace that existed previous to
the chat *cannot* have been created by the chat.
Eliminates the timing flake in
`TestInterruptAutoPromotionIgnoresLaterUsageLimitIncrease` by making the
chatd worker loop clock-controllable.
## Changes
**`coderd/chatd/chatd.go`**
- Replace `time.NewTicker` calls in `Server.start()` with
`p.clock.NewTicker` using named quartz tags `("chatd", "acquire")` and
`("chatd", "stale-recovery")`.
**`coderd/chatd/chatd_test.go`**
- Inject `quartz.NewMock(t)` into the test via `newActiveTestServer`
config override.
- Trap the acquire ticker so the test controls exactly when pending
chats are reacquired.
- Rewrite the test flow as explicit clock-advance steps instead of
wall-clock polling.
**`AGENTS.md`**
- Document the PR title scope rule (scope must be a real path containing
all changed files).
## Validation
- `go test ./coderd/chatd -run
TestInterruptAutoPromotionIgnoresLaterUsageLimitIncrease -count=100` ✅
- `go test ./coderd/chatd` ✅
- `make lint` ✅
## Problem
The VS Code extension embeds the Coder agent chat UI in an iframe,
passing the session token via `postMessage`. HTTP requests use the
`Coder-Session-Token` header, but browser WebSocket connections **cannot
carry custom headers** — they rely on cookies. This causes all WebSocket
requests (e.g. streaming chat messages) to fail with authorization
errors in the embedded iframe.
## Solution
Add `POST /api/v2/users/me/session/token-to-cookie` — a lightweight
endpoint that converts the current (already-validated) session token
into a `Set-Cookie` response. The frontend embed bootstrap flow calls
this immediately after `API.setSessionToken(token)`, before any
WebSocket connections are opened.
### Backend (`coderd/userauth.go`, `coderd/coderd.go`)
- New handler `postSessionTokenCookie` behind `apiKeyMiddleware`.
- Reads the validated token via `httpmw.APITokenFromRequest(r)`.
- Sets an `HttpOnly` cookie with the API key's expiry, applying
site-wide cookie config (Secure, SameSite, host prefix) via
`HTTPCookies.Apply`.
- Returns `204 No Content`.
### Frontend (`site/src/pages/AgentsPage/EmbedContext.tsx`)
- `bootstrapChatEmbedSessionFn` now calls the new endpoint after setting
the header token and before fetching user/permissions.
- The cookie is in place before any WebSocket connections are opened.
## Security
- **No privilege escalation**: The token is already valid — this just
moves it from a header credential to a cookie credential.
- **POST only**: Avoids CSRF-via-navigation.
- **Same origin**: The iframe loads from the Coder server, so the cookie
applies to the correct domain.
- **HttpOnly**: The cookie is not accessible to JavaScript.
> Built with [Coder Agents](https://coder.com/agents) 🤖
LazyFileDiff memo comparator: ignore renderAnnotation reference
changes when the file has no lineAnnotations, preventing comment-box
interactions from re-rendering every file diff.
useGitWatcher stale socket guard: check socketRef.current against
the local socket variable in all event handlers. Stale close events
from superseded connections no longer clobber the active socket or
schedule spurious reconnects.
useGitWatcher field comparison guard: add a compile-time Record
type that errors when WorkspaceAgentRepoChanges gains a field not
covered by the bailout comparison.
Tests: stale close race, reference stability on duplicate messages,
per-field change detection (branch, remote_origin, unified_diff),
and no-op removal of unknown repos.
Follow-up to #23243
Three changes that eliminate unnecessary re-renders cascading into
the FileDiff Shadow DOM components during chat/git-watcher updates:
useGitWatcher: compare repo fields before updating state, return
prev Map when nothing changed instead of always allocating a new one.
RemoteDiffPanel: remove dataUpdatedAt from parsedFiles memo deps,
replace it with a content-derived version counter. The memo now only
recomputes when the actual diff string changes.
DiffViewer: pre-compute per-file options and line annotations into
memoized Maps, wrap LazyFileDiff in React.memo so it skips renders
when props are reference-equal.
## Summary
Adds the database schema, API endpoints, SDK types, and encryption
wrappers for admin-managed MCP (Model Context Protocol) server
configurations that chatd can consume. This is the backend foundation
for allowing external MCP tools (Sentry, Linear, GitHub, etc.) to be
used during AI chat sessions.
## Database
Two new tables:
- **`mcp_server_configs`**: Admin-managed server definitions with URL,
transport (Streamable HTTP / SSE), auth config (none / OAuth2 / API key
/ custom headers), tool allow/deny lists, and an availability policy
(`force_on` / `default_on` / `default_off`). Includes CHECK constraints
on transport, auth_type, and availability values.
- **`mcp_server_user_tokens`**: Per-user OAuth2 tokens for servers
requiring individual authentication. Cascades on user/config deletion.
New column on `chats` table:
- **`mcp_server_ids UUID[]`**: Per-chat MCP server selection, following
the same pattern as `model_config_id` — passed at chat creation,
changeable per-message with nil-means-no-change semantics.
## API Endpoints
All routes are under `/api/experimental/mcp/servers/` and gated behind
the `agents` experiment.
**Admin endpoints** (`ResourceDeploymentConfig` auth):
- `POST /` — Create MCP server config
- `PATCH /{id}` — Update MCP server config (full-replace)
- `DELETE /{id}` — Delete MCP server config
**Authenticated endpoints** (all users, enabled servers only for
non-admins):
- `GET /` — List configs (admins see all, members see enabled-only with
admin fields redacted)
- `GET /{id}` — Get config by ID (with `auth_connected` populated
per-user)
**OAuth2 per-user auth flow:**
- `GET /{id}/oauth2/connect` — Initiate OAuth2 flow (state cookie CSRF
protection)
- `GET /{id}/oauth2/callback` — Handle OAuth2 callback, store tokens
- `DELETE /{id}/oauth2/disconnect` — Remove stored OAuth2 tokens
## Security
- **Secrets never returned**: `OAuth2ClientSecret`, `APIKeyValue`, and
`CustomHeaders` are never in API responses — only boolean indicators
(`has_oauth2_secret`, `has_api_key`, `has_custom_headers`).
- **Field redaction for non-admins**: `convertMCPServerConfigRedacted`
strips `OAuth2ClientID`, auth URLs, scopes, and `APIKeyHeader` from
non-admin responses.
- **dbcrypt encryption at rest**: All 5 secret fields use `dbcrypt_keys`
encryption with full encrypt-on-write / decrypt-on-read wrappers (11
dbcrypt method overrides + 2 helpers), following the same pattern as
`chat_providers.api_key`.
- **OAuth2 CSRF protection**: State parameter stored in `HttpOnly`
cookie with `HTTPCookies.Apply()` for correct `Secure`/`SameSite` behind
TLS-terminating proxies.
- **dbauthz authorization**: All 18 querier methods have authorization
wrappers. Read operations use `ActionRead`, write operations use
`ActionUpdate` on `ResourceDeploymentConfig`.
## Governance Model
| Control | Implementation |
|---------|---------------|
| **Global kill switch** | `enabled` defaults to `false` |
| **Availability policy** | `force_on` (always injected), `default_on`
(pre-selected), `default_off` (opt-in) |
| **Per-chat selection** | `mcp_server_ids` on `CreateChatRequest` /
`CreateChatMessageRequest` |
| **Auth gate** | OAuth2 servers require per-user auth before tools are
injected |
| **Tool-level allow/deny** | Arrays on `mcp_server_configs` for
granular tool filtering |
| **Secrets encrypted at rest** | Uses `dbcrypt_keys` (same pattern as
`chat_providers.api_key`) |
## Tests
8 test functions covering:
- Full CRUD lifecycle (create, list, update, delete)
- Non-admin visibility filtering (enabled-only, field redaction)
- `auth_connected` population for OAuth2 vs non-OAuth2 servers
- Availability policy validation (valid values + invalid rejection)
- Unique slug enforcement (409 Conflict)
- OAuth2 disconnect idempotency
- Chat creation with `mcp_server_ids` persistence
## Known Limitations (Deferred)
These are documented and intentional for an experimental feature:
- **Audit logging** not yet wired — will add when feature stabilizes
- **Cross-field validation** (e.g., OAuth2 fields required when
`auth_type=oauth2`) — admin-only endpoint, will add when stabilizing
- **`force_on` auto-injection** — query exists but not yet wired into
chatd tool injection (follow-up)
- **Additional test coverage** — 403 auth tests, GET-by-ID tests,
callback CSRF tests planned for follow-up
## What's NOT in this PR
- Frontend UI (admin panel + chat picker)
- Actual MCP client connections (`chatd/chatmcp/` manager)
- Tool injection into `chatloop/`
Always deploy from main for now. We want to keep testing commits to main
as soon as they're merged, so we're going to disable the release
freezing behavior. We will test cut releases on a separate deployment,
upgraded manually.
Two issues caused the promoted message to never appear:
1. handlePromoteQueuedMessage discarded the ChatMessage returned by
the promote API, relying on the WebSocket to deliver it.
2. Even when the WebSocket did deliver it (via upsertDurableMessage),
the queue_update event in the same batch called
updateChatQueuedMessages, which mutated the React Query cache. This
gave chatMessagesList a new reference, triggering the message sync
effect. The effect found the promoted message in the store but not in
the REST-fetched data, classified it as a stale entry (the path
designed for edit truncation), and called replaceMessages, wiping it.
Fix (1): capture the ChatMessage from the promote response and upsert
it into the store, matching handleSend for non-queued messages.
Fix (2): track the fetched message array elements across effect runs
using element-level reference comparison. Only run the
hasStaleEntries/replaceMessages path when the message objects actually
changed (e.g. a refetch producing new objects from the server), not
when only an unrelated field like queued_messages caused the query
data reference to update. Element references work because
useMemo(flatMap) preserves object identity when only non-message
fields change in the page data.
Updates the `charm.land/fantasy` replace to the rebased `cj/go1.25`
branch on `kylecarbs/fantasy`, which now includes:
- **chore: downgrade to Go 1.25**
- **feat: anthropic computer use**
- **chore: use kylecarbs/openai-go fork for coder/coder compat**
Switches the `openai-go/v3` replace from `SasSwart/openai-go` →
`kylecarbs/openai-go`, which is the same SasSwart perf fork plus a fix
for `WithJSONSet` being clobbered by deferred body serialization.
Without the fix, `NewStreaming` silently drops `stream: true` from
requests. See https://github.com/kylecarbs/openai-go/pull/2 for details.
Updates the editing banner message in the agents chat input from:
> Editing message — all subsequent messages will be deleted
to:
> Editing will delete all subsequent messages and restart the
conversation here.
---
PR generated with Coder Agents
- Add `RequireExperimentWithDevBypass` middleware to
`/.well-known/oauth-authorization-server` and
`/.well-known/oauth-protected-resource` routes, matching the existing
`/oauth2` routes.
- Clients can now detect OAuth2 support via unauthenticated discovery
(404 = not available).
Fixes#21608
## Summary
- guard Agent pages against malformed model provider/model values before
trimming
- reuse a shared model-ref normalizer across Agent detail, sidebar,
list, and create flows
- add regression coverage for malformed catalog and config entries
## Validation
- `cd site && pnpm exec vitest run
src/pages/AgentsPage/modelOptions.test.ts
src/pages/AgentsPage/AgentDetail.test.ts`
- `cd site && pnpm lint:types`
## Problem
Scaletest follow-up storms showed that the chat stream path was doing a
same-replica DB reread for every durable message it had already
delivered locally.
In a 600-chat / 10-turn run, `/stream`-attributed
`GetChatMessagesByChatID` calls reached about 14.2k across 5,400
follow-up turns — roughly **2.63 rereads per turn**. The primary coderd
replicas saturated their DB pools at 60/60 open connections during the
storm window.
The root cause: when pubsub was active, `Subscribe()` suppressed local
durable `message` events and relied entirely on pubsub notify →
`GetChatMessagesByChatID` for catch-up. Same-replica subscribers paid
the full DB round-trip even though the persisting process was on the
same replica.
## Solution
Add a bounded per-chat **durable message cache** to `chatStreamState` so
that same-replica subscribers can catch up from memory instead of the
database.
### How it works
1. `publishMessage()` caches the SDK event in `chatStreamState` before
local fanout and pubsub notify.
2. `publishEditedMessage()` replaces the cache with only the edited
message, then publishes `FullRefresh`.
3. `Subscribe()` handles ordinary `AfterMessageID` notifies by first
consulting the per-chat durable cache and only falling back to
`GetChatMessagesByChatID` on cache miss.
4. `FullRefresh` always forces a DB reread (cache is bypassed).
### Safety properties
- If the cache misses (e.g. message expired or remote replica), the DB
catch-up still runs — no silent message loss.
- `FullRefresh` (edits) always rereads from the database.
- Remote replicas still use the pubsub + DB path unchanged.
- The cache is bounded (`maxDurableMessageCacheSize = 256`) and scoped
per chat — no unbounded memory growth.
## Impact
This change removes the entire same-replica portion of the stream
rereads. Based on the 600-chat follow-up run, the upper bound on saved
work is the same-replica share of about 14.2k `GetChatMessagesByChatID`
rereads, with the observed total stream reread rate at about 2.63
rereads per follow-up turn.
React's [Why You Might Not Need An
Effect](https://react.dev/learn/you-might-not-need-an-effect) article
describes several antipatterns and footguns you might encounter when
working with useEffect, so I fed it to an agent and let it determine
some low hanging fruit to fix:
Replace useEffect+setState patterns with direct computations where the
values are purely derived from props, state, or query data:
- useWebpushNotifications: derive `enabled` inline from query data
instead of setting it via useEffect
- ProxyContext: replace `proxy` state + updateProxy callback + useEffect
with a single useMemo over its three inputs
- useSyncFormParameters: replace ref-sync useEffect with direct
assignment during render
- AgentsSidebar (LoadMoreSentinel): replace two ref-sync useEffects with
direct assignments during render
Co-authored by Coder Agent 🤖
- Adds a new API endpoint `GET /api/v2/users/oidc-claims` that returns
only the **merged claims** (not the separate id_token/userinfo
breakdown). Scoped exclusively to the authenticated user's own identity
— no user parameter, so users cannot view each other's claims.
- Adds a new CLI command:** `coder users oidc-claims` that hits the
above endpoint.
- The existing owner-only debug endpoint is preserved unchanged for
admins who need the full claim breakdown.
> 🤖 This PR was created with the help of Coder Agents, and will be
reviewed by my human. 🧑💻
Added new AWS install documentation and screenshots to support
deployment of AWS Marketplace Coder Community Edition, as the
primary/recommended method on AWS for POCs and experimenting with Coder.
description: "Multi-reviewer code review. Spawns domain-specific reviewers in parallel, cross-checks findings, posts a single structured GitHub review."
---
# Deep Review
Multi-reviewer code review. Spawns domain-specific reviewers in parallel, cross-checks their findings for contradictions and convergence, then posts a single structured GitHub review with inline comments.
- When you want independent perspectives cross-checked against each other, not just a single-pass review.
Use `.claude/skills/code-review/` for focused single-domain changes or quick single-pass reviews.
**Prerequisite:** This skill requires the ability to spawn parallel subagents. If your agent runtime cannot spawn subagents, use code-review instead.
**Severity scales:** Deep-review uses P0–P4 (consequence-based). Code-review uses 🔴🟡🔵. Both are valid; they serve different review depths. Approximate mapping: P0–P1 ≈ 🔴, P2 ≈ 🟡, P3–P4 ≈ 🔵.
## When NOT to use this skill
- Docs-only or config-only PRs (no code to structurally review). Use `.claude/skills/doc-check/` instead.
- Single-file changes under ~50 lines.
- The PR author asked for a quick review.
## 0. Proportionality check
Estimate scope before committing to a deep review. If the PR has fewer than 3 files and fewer than 100 lines changed, suggest code-review instead. If the PR is docs-only, suggest doc-check. Proceed only if the change warrants multi-reviewer analysis.
## 1. Scope the change
**Author independence.** Review with the same rigor regardless of who authored the PR. Don't soften findings because the author is the person who invoked this review, a maintainer, or a senior contributor. Don't harden findings because the author is a new contributor. The review's value comes from honest, consistent assessment.
Create the review output directory before anything else:
```sh
exportREVIEW_DIR="/tmp/deep-review/$(date +%s)"
mkdir -p "$REVIEW_DIR"
```
**Re-review detection.** Check if you or a previous agent session already reviewed this PR:
If a prior agent review exists, you must produce a prior-findings classification table before proceeding. This is not optional — the table is an input to step 3 (reviewer prompts). Without it, reviewers will re-discover resolved findings.
1. Read every author response since the last review (inline replies, PR comments, commit messages).
2. Diff the branch to see what changed since the last review.
3. Engage with any author questions before re-raising findings.
4. Write `$REVIEW_DIR/prior-findings.md` with this format:
- **Resolved**: author pushed a code fix. Verify the fix addresses the finding's specific concern — not just that code changed in the relevant area. Check that the fix doesn't introduce new issues.
- **Acknowledged**: author agreed but deferred.
- **Contested**: author disagreed or raised a constraint. Write their argument in the table.
- **No response**: author didn't address it.
Only **Contested** and **No response** findings carry forward to the new review. Resolved and Acknowledged findings must not be re-raised.
**Scope the diff.** Get the file list from the diff, PR, or user. Skim for intent and note which layers are touched (frontend, backend, database, auth, concurrency, tests, docs).
For each changed file, briefly check the surrounding context:
- Config files (package.json, tsconfig, vite.config, etc.): scan the existing entries for naming conventions and structural patterns.
- New files: check if an existing file could have been extended instead.
- Comments in the diff: do they explain why, or just restate what the code does?
## 2. Pick reviewers
Match reviewer roles to layers touched. The Test Auditor, Edge Case Analyst, and Contract Auditor always run. Conditional reviewers activate when their domain is touched.
- **Modernization Reviewer**: one instance per language present in the diff. Filter by extension:
- Go: `*.go` — reference `.claude/docs/GO.md` before reviewing.
- TypeScript: `*.ts``*.tsx`
- React: `*.tsx``*.jsx`
`.tsx` files match both TypeScript and React filters. Spawn both instances when the diff contains `.tsx` changes — TS covers language-level patterns; React covers component and hooks patterns. Before spawning, verify each instance's filter produces a non-empty diff. Skip instances whose filtered diff is empty.
Each reviewer writes findings to `$REVIEW_DIR/{role-name}.md` where `{role-name}` is the kebab-cased role name (e.g. `test-auditor`, `go-architect`). For Modernization Reviewer instances, qualify with the language: `modernization-reviewer-go.md`, `modernization-reviewer-ts.md`, `modernization-reviewer-react.md`. The orchestrator does not read reviewer findings from the subagent return text — it reads the files in step 4.
Spawn all Tier 1 and Tier 2 reviewers in parallel. Give each reviewer a reference (PR number, branch name), not the diff content. The reviewer fetches the diff itself. Reviewers are read-only — no worktrees needed.
**Tier 1 prompt:**
```text
Read `AGENTS.md` in this repository before starting.
You are the {Role Name} reviewer. Read your methodology in
For the Modernization Reviewer (Go), add after the methodology line:
> Read `.claude/docs/GO.md` as your Go language reference before reviewing.
For re-reviews, append to both Tier 1 and Tier 2 prompts:
> Prior findings and author responses are in {REVIEW_DIR}/prior-findings.md. Read it before reviewing. Do not re-raise Resolved or Acknowledged findings.
## 4. Cross-check findings
### 4a. Read findings from files
Read each reviewer's output file from `$REVIEW_DIR/` one at a time. One file per read — do not batch multiple reviewer files in parallel. Batching causes reviewer voices to blend in the context window, leading to misattribution (grabbing phrasing from one reviewer and attributing it to another).
For each file:
1. Read the file.
2. List each finding with its severity, location, and one-line summary.
3. Note the reviewer's exact evidence line for each finding.
If a file says "No findings," record that and move on. If a file is missing (reviewer crashed or timed out), note the gap and proceed — do not stall or silently drop the reviewer's perspective.
After reading all files, you have a finding inventory. Proceed to cross-check.
### 4b. Cross-check
Handle Tier 1 and Tier 2 findings separately before merging.
**Tier 2 nit findings:** Apply a lighter filter. Drop nits that are purely subjective, that duplicate what a linter already enforces, or that the author clearly made intentionally. Keep nits that have a practical benefit (clearer name, better error message, obsolete stdlib usage). Surviving nits stay as Nit.
**Tier 1 structural findings:** Before producing the final review, look across all findings for:
- **Contradictions.** Two reviewers recommending opposite approaches. Flag both and note the conflict.
- **Interactions.** One finding that solves or worsens another (e.g. a refactor suggestion that addresses a separate cleanup concern). Link them.
- **Convergence.** Two or more reviewers flagging the same function or component from different angles. Don't just merge at max(severity) and don't treat convergence as headcount ("more reviewers = higher confidence in the same thing"). After listing the convergent findings, trace the consequence chain _across_ them. One reviewer flags a resource leak, another flags an unbounded hang, a third flags infinite retries on reconnect — the combination means a single failure leaves a permanent resource drain with no recovery. That combined consequence may deserve its own finding at higher severity than any individual one.
- **Async findings.** When a finding mentions setState after unmount, unused cancellation signals, or missing error handling near an await: (1) find the setState or callback, (2) trace what renders or fires as a result, (3) ask "if this fires after the user navigated away, what do they see?" If the answer is "nothing" (a ref update, a console.log), it's P3. If the answer is "a dialog opens" or "state corrupts," upgrade. The severity depends on what's at the END of the async chain, not the start.
- **Mechanism vs. consequence.** Reviewers describe findings using mechanism vocabulary ("unused parameter", "duplicated code", "test passes by coincidence"), not consequence vocabulary ("dialog opens in wrong view", "attacker can bypass check", "removing this code has no test to catch it"). The Contract Auditor and Structural Analyst tend to frame findings by consequence already — use their framing directly. For mechanism-framed findings from other reviewers, restate the consequence before accepting the severity. Consequences include UX bugs, security gaps, data corruption, and silent regressions — not just things users see on screen.
- **Weak evidence.** Findings that assert a problem without demonstrating it. Downgrade or drop.
- **Unnecessary novelty.** New files, new naming patterns, new abstractions where the existing codebase already has a convention. If no reviewer flagged it but you see it, add it. If a reviewer flagged it as an observation, evaluate whether it should be a finding.
- **Scope creep.** Suggestions that go beyond reviewing what changed into redesigning what exists. Downgrade to P4.
- **Structural alternatives.** One reviewer proposes a design that eliminates a documented tradeoff, while others have zero findings because the current approach "works." Don't discount this as an outlier or scope creep. A structural alternative that removes the need for a tradeoff can be the highest-value output of the review. Preserve it at its original severity — the author decides whether to adopt it, but they need enough signal to evaluate it.
- **Pre-existing behavior.** "Pre-existing" doesn't erase severity. Check whether the PR introduced new code (comments, branches, error messages) that describes or depends on the pre-existing behavior incorrectly. The new code is in scope even when the underlying behavior isn't.
For each finding **and observation**, apply the severity test in **both directions**. Observations are not exempt — a reviewer may underrate a convention violation or a missing guarantee as Obs when the consequence warrants P3+:
- Downgrade: "Is this actually less severe than stated?"
- Upgrade: "Could this be worse than stated?"
When the severity spread among reviewers exceeds one level, note it explicitly. Only credit reviewers at or above the posted severity. A finding that survived 2+ independent reviewers needs an explicit counter-argument to drop. "Low risk" is not a counter when the reviewers already addressed it in their evidence.
Before forwarding a nit, form an independent opinion on whether it improves the code. Before rejecting a nit, verify you can prove it wrong, not just argue it's debatable.
Drop findings that don't survive this check. Adjust severity where the cross-check changes the picture.
After filtering both tiers, check for overlap: a nit that points at the same line as a Tier 1 finding can be folded into that comment rather than posted separately.
### 4c. Quoting discipline
When a finding survives cross-check, the reviewer's technical evidence is the source of record. Do not paraphrase it.
**Convergent findings — sharpest first.** When multiple reviewers flag the same issue:
1. Rank the converging findings by evidence quality.
2. Start from the sharpest individual finding as the base text.
3. Layer in only what other reviewers contributed that the base didn't cover (a concrete detail, a preemptive counter, a stronger framing).
4. Attribute to the 2–3 reviewers with the strongest evidence, not all N who noticed the same thing.
**Single-reviewer findings.** Go back to the reviewer's file and copy the evidence verbatim. The orchestrator owns framing, severity assessment, and practical judgment — those are your words. The technical claim and code-level evidence are the reviewer's words.
A posted finding has two voices:
- **Reviewer voice** (quoted): the specific technical observation and code evidence exactly as the reviewer wrote it.
- **Orchestrator voice** (original): severity framing, practical judgment ("worth fixing now because..."), scenario building, and conversational tone.
If you need to adjust a finding's scope (e.g. the reviewer said "file.go:42" but the real issue is broader), say so explicitly rather than silently rewriting the evidence.
**Attribution must show severity spread.** When reviewers disagree on severity, the attribution should reflect that — not flatten everyone to the posted severity. Show each reviewer's individual severity: `*(Security Reviewer P1, Concurrency Reviewer P1, Test Auditor P2)*` not `*(Security Reviewer, Concurrency Reviewer, Test Auditor)*`.
**Integrity check.** Before posting, verify that quoted evidence in findings actually corresponds to content in the diff. This guards against garbled cross-references from the file-reading step.
## 5. Post the review
When reviewing a GitHub PR, post findings as a proper GitHub review with inline comments, not a single comment dump.
**Review body.** Open with a short, friendly summary: what the change does well, what the overall impression is, and how many findings follow. Call out good work when you see it. A review that only lists problems teaches authors to dread your comments.
```text
Clean approach to X. The Y handling is particularly well done.
A couple things to look at: 1 P2, 1 P3, 3 nits across 5 inline
comments.
```
For re-reviews (round 2+), open with what was addressed:
```text
Thanks for fixing the wire-format break and the naming issue.
Fresh review found one new issue: 1 P2 across 1 inline comment.
```
Keep the review body to 2–4 sentences. Don't use markdown headers in the body — they render oversized in GitHub's review UI.
**Inline comments.** Every finding is an inline comment, pinned to the most relevant file and line. For findings that span multiple files, pin to the primary file (GitHub supports file-level comments when `position` is omitted or set to 1).
Inline comment format:
```text
**P{n}** One-sentence finding *(Reviewer Role)*
> Reviewer's evidence quoted verbatim from their file
Orchestrator's practical judgment: is this worth fixing now, or
is the current tradeoff acceptable? Scenario building, severity
reasoning, fix suggestions — these are your words.
```
For convergent findings (multiple reviewers, same issue):
> *Contract Auditor adds:* Additional detail from their file
Orchestrator's practical judgment.
```
For observations: `**Obs** One-sentence observation *(Role)* ...` For nits: `**Nit** One-sentence finding *(Role)* ...`
P3 findings and observations can be one-liners. Group multiple nits on the same file into one comment when they're co-located.
**Review event.** Always use `COMMENT`. Never use `REQUEST_CHANGES` — this isn't the norm in this repository. Never use `APPROVE` — approval is a human responsibility.
For P0 or P1 findings, add a note in the review body: "This review contains findings that may need attention before merge."
**Posting via GitHub API.**
The `gh api` endpoint for posting reviews routes through GraphQL by default. Field names differ from the REST API docs:
- Use `position` (diff-relative line number), not `line` + `side`. `side` is not a valid field in the GraphQL schema.
-`subject_type: "file"` is not recognized. Pin file-level comments to `position: 1` instead.
- Use `-X POST` with `--input` to force REST API routing.
To compute positions: save the PR diff to a file, then count lines from the first `@@` hunk header of each file's diff section. For new files, position = line number + 1 (the hunk header is position 1, first content line is position 2).
```sh
gh pr diff {number} > /tmp/pr.diff
```
Submit:
```sh
gh api -X POST \
repos/{owner}/{repo}/pulls/{number}/reviews \
--input review.json
```
Where `review.json`:
```json
{
"event":"COMMENT",
"body":"Summary of what's good and what to look at.\n1 P2, 1 P3 across 2 inline comments.",
**Tone guidance.** Frame design concerns as questions: "Could we use X instead?" — be direct only for correctness issues. Hedge design, not bugs. Build concrete scenarios to make concerns tangible. When uncertain, say so. See `.claude/docs/PR_STYLE_GUIDE.md` for PR conventions.
## Follow-up
After posting the review, monitor the PR for author responses. If the author pushes fixes or responds to findings, consider running a re-review (this skill, starting from step 1 with the re-review detection path). Allow time for the author to address multiple findings before re-reviewing — don't trigger on each individual response.
If the filtered diff is empty, say so in one line and stop.
You are a nit reviewer. Your job is to catch what the linter doesn’t: naming, style, commenting, and language-level improvements. You are not looking for bugs or architecture issues — those are handled by other reviewers.
Write all findings to the output file specified in your prompt. Create the directory if it doesn’t exist. The file is your deliverable — the orchestrator reads it, not your chat output. Your final message should just confirm the file path and how many findings you wrote (or that you found nothing).
Use this structure in the file:
---
**Nit**`file.go:42` — One-sentence finding.
Why it matters: brief explanation. If there’s an obvious fix, mention it.
---
Rules:
- Use **Nit** for all findings. Don’t use P0-P4 severity; that scale is for structural reviewers.
- Findings MUST reference specific lines or names. Vague style observations aren’t findings.
- Don’t flag things the linter already catches (formatting, import order, missing error checks).
- Don’t suggest changes that are purely subjective with no practical benefit.
- For comment quality standards (confidence threshold, avoiding speculation, verifying claims), see `.claude/skills/code-review/SKILL.md` Comment Standards section.
- If you find nothing, write a single line to the output file: "No findings."
- Find specific interleavings that break. A select statement where case ordering starves one branch. An unbuffered channel that deadlocks under backpressure. A context cancellation that races with a send on a closed channel.
- Check shutdown sequences. Component A depends on component B, but B was already torn down. "Fire and forget" goroutines that are actually "fire and leak." Join points that never arrive because nobody is waiting.
- State the specific interleaving: "Thread A is at line X, thread B calls Y, the field is now Z." Don't say "this might have a race."
- Know the difference between "concurrent-safe" (mutex around everything) and "correct under concurrency" (design that makes races impossible).
**Scope boundaries:** You review concurrency. You don't review architecture, package boundaries, or test quality. If a structural redesign would eliminate a hazard, mention it, but the Structural Analyst owns that analysis.
You review code by asking: **"What does this code promise, and does it keep that promise?"**
Every piece of code makes promises. An API endpoint promises a response shape. A status code promises semantics. A state transition promises reachability. An error message promises a diagnosis. A flag name promises a scope. A comment promises intent. Your job is to find where the implementation breaks the promise.
Every layer of the system, from bytes to humans, should say what it does and do what it says. False signals compound into bugs. A misleading name is a future misuse. A missing error path is a future outage. A flag that affects more than its name says is a future support ticket.
**Method — four modes, use all on every diff.** Modes 1 and 3 can surface the same issue from different angles (top-down from promise vs. bottom-up from signal). If they converge, report once and note both angles.
**1. Contract tracing.** Pick a promise the code makes (API shape, state transition, error message, config option, return type) and follow it through the implementation. Read every branch. Find where the promise breaks. Ask: does the implementation do what the name/comment/doc says? Does the error response match what the caller will see? Does the status code match the response body semantics? Does the flag/config affect exactly what its name and help text claim? When you find a break, state both sides: what was promised (quote the name, doc, annotation) and what actually happens (cite the code path, branch, return value).
**2. Lifecycle completeness.** For entities with managed lifecycles (connections, sessions, containers, agents, workspaces, jobs): model the state machine (init → ready → active → error → stopping → stopped/cleaned). Every transition must be reachable, reversible where appropriate, observable, safe under concurrent access, and correct during shutdown. Enumerate transitions. Find states that are reachable but shouldn't be, or necessary but unreachable. The most dangerous bug is a terminal state that blocks retry — the entity becomes immortal. Ask: what happens if this operation fails halfway? What state is the entity left in after an error? Can the user retry, or is the entity stuck? What happens if shutdown races with an in-progress operation? Does every path leave state consistent?
**3. Semantic honesty.** Every word in the codebase is a signal to the next reader. Audit signals for fidelity. Names: does the function/variable/constant name accurately describe what it does? A constant named after one concept that stores a different one is a lie. Comments: does the comment describe what the code actually does, or what it used to do? Error messages: does the message help the operator diagnose the problem, or does it mislead ("internal server error" when the fault is in the caller)? Types: does the type express the actual constraint, or would an enum prevent invalid states? Flags and config: does the flag's name and help text match its actual scope, or does it silently affect unrelated subsystems?
**4. Adversarial imagination.** Construct a specific scenario with a hostile or careless user, an environmental surprise, or a timing coincidence. Trace the system state step by step. Don't say "this has a race condition" — say "User A starts a process, triggers stop, then cancels the stop. The entity enters cancelled state. The previous stop never completed. The process runs in perpetuity." Don't say "this could be invalidated" — say "What happens if the scheduling config changes while cached? Each invalidation skips recomputation." Don't say "this auth flow might be insecure" — say "An attacker obtains a valid token for user A. They submit it alongside user B's identifier. Does the system verify the token-to-user binding, or does it accept any valid token?" Build the scenario. Name the actor. Describe the sequence. State the resulting system state. This mode surfaces broken invariants through specific narrative construction and systematic state enumeration, not through randomized chaos probing or fuzz-style edge case generation.
**Finding structure.** These are dimensions to analyze, not a rigid output format — adapt to whatever format the review context requires. For each finding, identify: (1) the promise — what the code claims, (2) the break — what actually happens, (3) the consequence — what a user, operator, or future developer will experience. Not every finding blocks. Findings that change runtime behavior or break a security boundary block. Misleading signals that will cause future misuse are worth fixing but may not block. Latent risks with no current trigger are worth noting.
**Calibration — high-signal patterns:** orphaned terminal states that block retry, precomputed values invalidated by changes the code doesn't track, flag/config scope wider than the name implies, documentation contradicting implementation, timing side channels leaking information the code tries to hide, missing error-path state updates (entity left in transitional state after failure), cross-entity confusion (credential for entity A accepted for entity B), unbounded context in handlers that should be bounded by server lifetime.
**Scope boundaries:** You trace promises and find where they break. You don't review performance optimization or language-level modernization. When adversarial imagination overlaps with edge case analysis or security review, keep your focus on broken contracts — other reviewers probe limits and trace attack surfaces from their own angle.
When you find nothing: say so. A clean review is a valid outcome. Don't manufacture findings to justify your existence.
**Lens:** PostgreSQL, data modeling, Go↔SQL boundary.
**Method:**
- Check migration safety. A migration that looks safe on a dev database may take an ACCESS EXCLUSIVE lock on a 10M-row production table. Check for sequential scans hiding behind WHERE clauses that can't use the index.
- Check schema design for future cost. Will the next feature need a column that doesn't fit? A query that can't perform?
- Own the Go↔SQL boundary. Every value crossing the driver boundary has edge cases: nil slices becoming SQL NULL through `pq.Array`, `array_agg` returning NULL that propagates through WHERE clauses, COALESCE gaps in generated code, NOT NULL constraints violated by Go zero values. Check both sides.
**Scope boundaries:** You review database interactions. You don't review application logic, frontend code, or test quality.
- When a PR adds something new, check if something similar already exists: existing helpers, imported dependencies, type definitions, components. Search the codebase.
- Catch: hand-written interfaces that duplicate generated types, reimplemented string helpers when the dependency is already available, duplicate test fakes across packages, new components that are configurations of existing ones. A new page that could be a prop on an existing page. A new wrapper that could be a call to an existing function.
- Don't argue. Show where it already lives.
**Scope boundaries:** You check for duplication. You don't review correctness, performance, or security.
- Find hidden connections. Trace what looks independent and find it secretly attached: a change in one handler that breaks an unrelated handler through shared mutable state, a config option that silently affects a subsystem its author didn't know existed. Pull one thread and watch what moves.
- Find surface deception. Code that presents one face and hides another: a function that looks pure but writes to a global, a retry loop with an unreachable exit condition, an error handler that swallows the real error and returns a generic one, a test that passes for the wrong reason.
- Probe limits. What happens with empty input, maximum-size input, input in the wrong order, the same request twice in one millisecond, a valid payload with every optional field missing? What happens when the clock skews, the disk fills, the DNS lookup hangs?
- Rate potential, not just current severity. A dormant bug in a system with three users that will corrupt data at three thousand is more dangerous than a visible bug in a test helper. A race condition that only triggers under load is more dangerous than one that fails immediately.
**Scope boundaries:** You probe limits and find hidden connections. You don't review test quality, naming conventions, or documentation.
- Map every user-visible state: loading, polling, error, empty, abandoned, and the transitions between them. Find the gaps. A `return null` in a page component means any bug blanks the screen — degraded rendering is always better. Form state that vanishes on navigation is a lost route.
- Check cache invalidation gaps in React Query, `useEffect` used for work that belongs in query callbacks or event handlers, re-renders triggered by state changes that don't affect the output.
- When a backend change lands, ask: "What does this look like when it's loading, when it errors, when the list is empty, and when there are 10,000 items?"
**Scope boundaries:** You review frontend code. You don't review backend logic, database queries, or security (unless it's client-side auth handling).
**Lens:** Package boundaries, API lifecycle, middleware.
**Method:**
- Check dependency direction. Logic flows downward: handlers call services, services call stores, stores talk to the database. When something reaches upward or sideways, flag it.
- Question whether every abstraction earns its indirection. An interface with one implementation is unnecessary. A handler doing business logic belongs in a service layer. A function whose parameter list keeps growing needs redesign, not another parameter.
- Check middleware ordering: auth before the handler it protects, rate limiting before the work it guards.
- Track API lifecycle. A shipped endpoint is a published contract. Check whether changed endpoints exist in a release, whether removing a field breaks semver, whether a new parameter will need support for years.
**Scope boundaries:** You review Go architecture. You don't review concurrency primitives, test quality, or frontend code.
- Read the version file first (go.mod, package.json, or equivalent). Don't suggest features the declared version doesn't support.
- Flag hand-rolled utilities the standard library now covers. Flag deprecated APIs still in active use. Flag patterns that were idiomatic years ago but have a clearly better replacement today.
- Name which version introduced the alternative.
- Only flag when the delta is worth the diff. If the old pattern works and the new one is only marginally better, pass.
**Scope boundaries:** You review language-level patterns. You don't review architecture, correctness, or security.
**Lens:** Hot paths, resource exhaustion, invisible degradation.
**Method:**
- Trace the hot path through the call stack. Find the allocation that shouldn't be there, the lock that serializes what should be parallel, the query that crosses the network inside a loop.
- Find multiplication at scale. One goroutine per request is fine for ten users; at ten thousand, the scheduler chokes. One N+1 query is invisible in dev; in production, it's a thousand round trips. One copy in a loop is nothing; a million copies per second is an OOM.
- Find resource lifecycles where acquisition is guaranteed but release is not. Memory leaks that grow slowly. Goroutine counts that climb and never decrease. Caches with no eviction. Temp files cleaned only on the happy path.
- Calculate, don't guess. A cold path that runs once per deploy is not worth optimizing. A hot path that runs once per request is. Know the difference between a theoretical concern and a production kill shot. If you can't estimate the load, say so.
**Scope boundaries:** You review performance. You don't review correctness, naming, or test quality.
- Ask "do users actually need this?" Not "is this elegant" or "is this extensible." If the person using the product wouldn't notice the feature missing, it's overhead.
- Question complexity. Three layers of abstraction for something that could be a function. A notification system that spams a thousand users when ten are active. A config surface nobody asked for.
- Check proportionality. Is the solution sized to the problem? A 3-line bug shouldn't produce a 200-line refactor.
**Scope boundaries:** You review product sense. You don't review implementation correctness, concurrency, or security.
- Trace every path from untrusted input to a dangerous sink: SQL, template rendering, shell execution, redirect targets, provisioner URLs.
- Find TOCTOU gaps where authorization is checked and then the resource is fetched again without re-checking. Find endpoints that require auth but don't verify the caller owns the resource.
- Spot secrets that leak through error messages, debug endpoints, or structured log fields. Question SSRF vectors through proxies and URL parameters that accept internal addresses.
- Insist on least privilege. Broad token scopes are attack surface. A permission granted "just in case" is a weakness. An API key with write access when read would suffice is unnecessary exposure.
- "The UI doesn't expose this" is not a security boundary.
**Scope boundaries:** You review security. You don't review performance, naming, or code style.
You review code by asking: **"What does this code assume that it doesn't express?"**
Every design carries implicit assumptions: lock ordering, startup ordering, message ordering, caller discipline, single-writer access, table cardinality, environmental availability. Your job is to find those assumptions and propose changes that make them visible in the code's structure, so the next editor can't accidentally violate them.
Eliminate the class of bug, not the instance. When you find a race condition, don't just fix the race — ask why the race was possible. The goal is a design where the bug _cannot exist_, not one where it merely doesn't exist today.
**Method — four modes, use all on every diff.**
**1. Structural redesign.** Find where correctness depends on something the code doesn't enforce. Propose alternatives where correctness falls out from the structure. Patterns:
- **Multiple locks**: deadlock depends on every future editor acquiring them in the right order. Propose one lock + condition variable.
- **Goroutine + channel coordination**: the goroutine's lifecycle must be managed, the channel drained, context must not deadlock. Propose timer/callback on the struct.
- **Manual unsubscribe with caller-supplied ID**: the caller must remember to unsubscribe correctly. Propose subscription interface with close method.
- **Hardcoded access control**: exceptions make the API brittle. Propose the policy system (RBAC, middleware).
- **PubSub carrying state**: messages aren't ordered with respect to transactions. Propose PubSub as notification only + database read for truth.
- **Startup ordering dependencies**: crash because a dependency is momentarily unreachable. Propose self-healing with retry/backoff.
- **Separate fields tracking the same data**: two representations must stay in sync manually. Propose deriving one from the other.
- **Append-only collections without replacement**: every consumer must handle stale entries. Propose replace semantics or explicit versioning.
Be concrete: name the type, the interface, the field, the method. Quote the specific implicit assumption being eliminated.
**2. Concurrency design review.** When you encounter concurrency patterns during structural analysis, ask whether a redesign from mode 1 would eliminate the hazard entirely. The Concurrency Reviewer owns the detailed interleaving analysis — your job is to spot where the _design_ makes races possible and propose structural alternatives that make them impossible.
**3. Test layer audit.** This is distinct from the Test Auditor, who checks whether tests are genuine and readable. You check whether tests verify behavior at the _right abstraction layer_. Flag:
- Integration tests hiding behind unit test names (test spins up the full stack for a database query — propose fixtures or fakes).
- Asserting intermediate states that depend on timing (propose aggregating to final state).
- Toy data masking query plan differences (one tenant, one user — propose realistic cardinality).
- Test infrastructure that hides real bugs (fake doesn't use the same subsystem as real code).
- Missing timeout wrappers (system bug hangs the entire test suite).
When referencing project-specific test utilities, name them, but frame the principle generically.
**4. Dead weight audit.** Unnecessary code is an implicit claim that it matters. Every dead line misleads the next reader. Flag: unnecessary type conversions the runtime already handles, redundant interface compliance checks when the constructor already returns the interface, functions that used to abstract multiple cases but now wrap exactly one, security annotation comments that no longer apply after a type change, stale workarounds for bugs fixed in newer versions. If it does nothing, delete it. If it does something but the name doesn't say what, rename it.
**Finding structure.** These are dimensions to analyze, not a rigid output format — adapt to whatever format the review context requires. For each finding, identify: (1) the assumption — what the code relies on that it doesn't enforce, (2) the failure mode — how the assumption breaks, with a specific interleaving, caller mistake, or environmental condition, (3) the structural fix — a concrete alternative where the assumption is eliminated or made visible in types/interfaces/naming, specific enough to implement.
Ship pragmatically. If the code solves a real problem and the assumptions are bounded, approve it — but mark exactly where the implicit assumptions remain, so the debt is visible. "A few nits inline, but I don't need to review again" is a valid outcome. So is "this needs structural rework before it's safe to merge."
**Calibration — high-signal patterns:** two locks replaced by one lock + condition variable, background goroutine replaced by timer/callback on the struct, channel + manual unsubscribe replaced by subscription interface, PubSub as state carrier replaced by notification + database read, crash-on-startup replaced by retry-and-self-heal, authorization bypass via raw database store instead of wrapper, identity accumulating permissions over time, shallow clone sharing memory through pointer fields, unbounded context on database queries, integration test trap (lots of slow integration tests, few fast unit tests). Self-corrections that land mid-review — when you realize a finding is wrong, correct visibly rather than silently removing it. Visible correction beats silent edit.
**Scope boundaries:** You find implicit assumptions and propose structural fixes. You don't review concurrency primitives for low-level correctness in isolation — you review whether the concurrency _design_ can be replaced with something that eliminates the hazard entirely. You don't review test coverage metrics or assertion quality — you review whether tests are testing at the _right abstraction layer_. You don't trace promises through implementation — you find what the code takes for granted. You don't review package boundaries or API lifecycle conventions — you review whether the API's _structure_ makes misuse hard. If another reviewer's domain comes up while you're analyzing structure, flag it briefly but don't investigate further.
When you find nothing: say so. A clean review is a valid outcome.
- Read every name fresh. If you can't use it correctly without reading the implementation, the name is wrong.
- Read every comment fresh. If it restates the line above it, it's noise. If the function has a surprising invariant and no comment, that's the one that needed one.
- Track patterns. If one misleading name appears, follow the scent through the whole diff. If `handle` means "transform" here, what does it mean in the next file? One inconsistency is a nit. A pattern of inconsistencies is a finding.
- Be direct. "This name is wrong" not "this name could perhaps be improved."
- Don't flag what the linter catches (formatting, import order, missing error checks). Focus on what no tool can see.
**Scope boundaries:** You review naming and style. You don't review architecture, correctness, or security.
**Lens:** Test authenticity, missing cases, readability.
**Method:**
- Distinguish real tests from fake ones. A real test proves behavior. A fake test executes code and proves nothing. Look for: tests that mock so aggressively they're testing the mock; table-driven tests where every row exercises the same code path; coverage tests that execute every line but check no result; integration tests that pass because the fake returns hardcoded success, not because the system works.
- Ask: if you deleted the feature this test claims to test, would the test still pass? If yes, the test is fake.
- Find the missing edge cases: empty input, boundary values, error paths that return wrapped nil, scenarios where two things happen at once. Ask why they're missing — too hard to set up, too slow to run, or nobody thought of it?
- Check test readability. A test nobody can read is a test nobody will maintain. Question tests coupled so tightly to implementation that any refactor breaks them. Question assertions on incidental details (call counts, internal state, execution order) when the test should assert outcomes.
**Scope boundaries:** You review tests. You don't review architecture, concurrency design, or security. If you spot something outside your lens, flag it briefly and move on.
Get the diff for the review target specified in your prompt, then review it.
Write all findings to the output file specified in your prompt. Create the directory if it doesn’t exist. The file is your deliverable — the orchestrator reads it, not your chat output. Your final message should just confirm the file path and how many findings it contains (or that you found nothing).
- **PR:** `gh pr diff {number}`
- **Branch:** `git diff origin/main...{branch}`
- **Commit range:** `git diff {base}..{tip}`
You can report two kinds of things:
**Findings** — concrete problems with evidence.
**Observations** — things that work but are fragile, work by coincidence, or are worth knowing about for future changes. These aren’t bugs, they’re context. Mark them with `Obs`.
Use this structure in the file for each finding:
---
**P{n}**`file.go:42` — One-sentence finding.
Evidence: what you see in the code, and what goes wrong.
---
For observations:
---
**Obs**`file.go:42` — One-sentence observation.
Why it matters: brief explanation.
---
Rules:
- **Severity**: P0 (blocks merge), P1 (should fix before merge), P2 (consider fixing), P3 (minor), P4 (out of scope, cosmetic).
- Severity comes from **consequences**, not mechanism. “setState on unmounted component” is a mechanism. “Dialog opens in wrong view” is a consequence. “Attacker can upload active content” is a consequence. “Removing this check has no test to catch it” is a consequence. Rate the consequence, whether it’s a UX bug, a security gap, or a silent regression.
- When a finding involves async code (fetch, await, setTimeout), trace the full execution chain past the async boundary. What renders, what callbacks fire, what state changes? Rate based on what happens at the END of the chain, not the start.
- Findings MUST have evidence. An assertion without evidence is an opinion.
- Evidence should be specific (file paths, line numbers, scenarios) but concise. Write it like you’re explaining to a colleague, not building a legal case.
- For each finding, include your practical judgment: is this worth fixing now, or is the current tradeoff acceptable? If there’s an obvious fix, mention it briefly.
- Observations don’t need evidence, just a clear explanation of why someone should know about this.
- Check the surrounding code for existing conventions. Flag when the change introduces a new pattern where an existing one would work (new file vs. extending existing, new naming scheme vs. established prefix, etc.).
- Note what the change does well. Good patterns are worth calling out so they get repeated.
- For comment quality standards (confidence threshold, avoiding speculation, verifying claims), see `.claude/skills/code-review/SKILL.md` Comment Standards section.
- If you find nothing, write a single line to the output file: “No findings.”
description: "Guide for creating, updating, and following up on pull requests in the Coder repository. Use when asked to open a PR, update a PR, rewrite a PR description, or follow up on CI/check failures."
---
# Pull Request Skill
## When to Use This Skill
Use this skill when asked to:
- Create a pull request for the current branch.
- Update an existing PR branch or description.
- Rewrite a PR body.
- Follow up on CI or check failures for an existing PR.
## References
Use the canonical docs for shared conventions and validation guidance:
- PR title and description conventions:
`.claude/docs/PR_STYLE_GUIDE.md`
- Local validation commands and git hooks: `AGENTS.md` (Essential Commands and
Git Hooks sections)
## Lifecycle Rules
1.**Check for an existing PR** before creating a new one:
```bash
gh pr list --head "$(git branch --show-current)" --author @me --json number --jq '.[0].number // empty'
```
If that returns a number, update that PR. If it returns empty output,
create a new one.
2. **Check you are not on main.** If the current branch is `main` or `master`,
create a feature branch before doing PR work.
3. **Default to draft.** Use `gh pr create --draft` unless the user explicitly
asks for ready-for-review.
4. **Keep description aligned with the full diff.** Re-read the diff against
the base branch before writing or updating the title and body. Describe the
entire PR diff, not just the last commit.
5. **Never auto-merge.** Do not merge or mark ready for review unless the user
explicitly asks.
6. **Never push to main or master.**
## CI / Checks Follow-up
**Always watch CI checks after pushing.** Do not push and walk away.
After pushing:
- Monitor CI with `gh pr checks <PR_NUMBER> --watch`.
- Use `gh pr view <PR_NUMBER> --json statusCheckRollup` for programmatic check
status.
If checks fail:
1. Find the failed run ID from the `gh pr checks` output.
2. Read the logs with `gh run view <run-id> --log-failed`.
3. Fix the problem locally.
4. Run `make pre-commit`.
5. Push the fix.
## What Not to Do
- Do not reference or call helper scripts that do not exist in this
repository.
- Do not auto-merge or mark ready for review without explicit user request.
- Do not push to `origin/main` or `origin/master`.
description: Iteratively refine development plans using TDD methodology. Ensures plans are clear, actionable, and include red-green-refactor cycles with proper test coverage.
---
# Refine Development Plan
## Overview
Good plans eliminate ambiguity through clear requirements, break work into clear phases, and always include refactoring to capture implementation insights.
# Build the same triage prompt used by the Tasks API workflow.
TASK_PROMPT=$(cat <<'EOF'
Fix ${ISSUE_URL}
1. Use the gh CLI to read the issue description and comments.
2. Think carefully and try to understand the root cause. If the issue is unclear or not well defined, ask me to clarify and provide more information.
3. Write a proposed implementation plan to PLAN.md for me to review before starting implementation. Your plan should use TDD and only make the minimal changes necessary to fix the root cause.
4. When I approve your plan, start working on it. If you encounter issues with the plan, ask me for clarification and update the plan as required.
5. When you have finished implementation according to the plan, commit and push your changes, and create a PR using the gh CLI for me to review.
EOF
)
# Perform variable substitution on the prompt — scoped to $ISSUE_URL only.
# Using envsubst without arguments would expand every env var in scope
# (including CODER_SESSION_TOKEN), so we name the variable explicitly.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.