coder-server

Author	SHA1	Message	Date
Zach	508114d484	feat: user secret database encryption (#24218 ) Add dbcrypt support for user secret values. When database encryption is enabled, secret values are transparently encrypted on write and decrypted on read through the existing dbcrypt store wrapper. - Wrap `CreateUserSecret`, `GetUserSecretByUserIDAndName`, `ListUserSecretsWithValues`, and `UpdateUserSecretByUserIDAndName` in enterprise/dbcrypt/dbcrypt.go. - Add rotate and decrypt support for user secrets in enterprise/dbcrypt/cliutil.go (`server dbcrypt rotate` and `server dbcrypt decrypt`). - Add internal tests covering encrypt-on-create, decrypt-on-read, re-encrypt-on-update, and plaintext passthrough when no cipher is configured.	2026-04-10 09:34:11 -06:00
J. Scott Miller	7bde763b66	feat: add workspace build transition to provisioner job list (#24131 ) Closes #16332 Previously `coder provisioner jobs list` showed no indication of what a workspace build job was doing (i.e., start, stop, or delete). This adds `workspace_build_transition` to the provisioner job metadata, exposed in both the REST API and CLI. Template and workspace name columns were also added, both available via `-c`. ``` $ coder provisioner jobs list -c id,type,status,"workspace build transition" ID TYPE STATUS WORKSPACE BUILD TRANSITION 95f35545-a59f-4900-813d-80b8c8fd7a33 template_version_import succeeded 0a903bbe-cef5-4e72-9e62-f7e7b4dfbb7a workspace_build succeeded start ```	2026-04-10 09:50:11 -05:00
Cian Johnston	7b0421d8c6	fix: revert auto-assign agents-access role enabled (#24170 ) This reverts commit `d4a9c63e91` (#23968). --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-04-08 20:56:17 +01:00
Yevhenii Shcherbina	7f496c2f18	feat: byok-observability for aibridge (#23808 ) ## Summary Adds `credential_kind` and `credential_hint` columns to `aibridge_interceptions` to record how each LLM request was authenticated and provide a masked credential identifier for audit purposes. This enables admins to distinguish between centralized API keys, personal API keys, and subscription-based credentials in the interceptions audit log. ## Changes - New migration adding `credential_kind`and `credential_hint` to `aibridge_interceptions` - Updated `InsertAIBridgeInterception` query and proto definition to carry the new fields - Wired proto fields through `translator.go` and `aibridgedserver.go` to the database Depends on https://github.com/coder/aibridge/pull/239	2026-04-08 13:24:28 -04:00
Jon Ayers	08bd9e672a	fix: resolve Test_batcherFlush/RetriesOnTransientFailure flake (#24112 ) fixes https://github.com/coder/internal/issues/1452	2026-04-07 13:46:26 -05:00
Kayla はな	c5f1a2fccf	feat: make service accounts a Premium feature (#24020 )	2026-04-07 12:25:32 -06:00
Kyle Carberry	f3f0a2c553	fix(enterprise/coderd/x/chatd): harden TestSubscribeRelayEstablishedMidStream against CI flakes (#24108 ) Fixes coder/internal#1455 Three changes to eliminate the timing-sensitive flake in `TestSubscribeRelayEstablishedMidStream`: 1. Reduce `PendingChatAcquireInterval` from `time.Hour` to `time.Second`. The primary trigger is still `signalWake()` from `SendMessage`, but a short fallback poll ensures the worker picks up the pending chat even under heavy CI goroutine scheduling contention. 2. Increase context timeout from `WaitLong` (25s) to `WaitSuperLong` (60s). The worker pipeline (model resolution, message loading, LLM call) involves multiple DB round-trips that can be slow when PostgreSQL is shared with many parallel test packages. 3. Add a status-polling loop while waiting for the streaming request. If the worker errors out during chat processing, the test now fails immediately with the error status and message instead of silently timing out. > Generated by Coder Agents	2026-04-07 13:41:33 -04:00
George K	86ca61d6ca	perf: cap count queries and emit native UUID comparisons for audit/connection logs (#23835 ) Audit and connection log pages were timing out due to expensive COUNT(*) queries over large tables. This commit adds opt-in count capping: requests can return a `count_cap` field signaling that the count was truncated at a threshold, avoiding full table scans that caused page timeouts. Text-cast UUID comparisons in regosql-generated authorization queries also contributed to the slowdown by preventing index usage for connection and audit log queries. These now emit native UUID operators. Frontend changes handle the capped state in usePaginatedQuery and PaginationWidget, optionally displaying a capped count in the pagination UI (e.g. "Showing 2,076 to 2,100 of 2,000+ logs") Related to: https://linear.app/codercom/issue/PLAT-31/connectionaudit-log-performance-issue	2026-04-07 07:24:53 -07:00
Kyle Carberry	e18094825a	fix: retain message_part buffer for cross-replica relay (#24031 )	2026-04-04 17:24:41 -04:00
Jon Ayers	a1d51f0dab	feat: batch connection logs to avoid DB lock contention (#23727 ) - Running 30k connections was generating a ton of lock contention in the DB	2026-04-03 15:47:26 -05:00
Jon Ayers	333503f74e	feat: improve coordinator peer mapping performance (#23696 ) - Skipping DB querying entirely for peers that aren't actually connected to our coordinator - Opportunistically batching the queries for peers	2026-04-03 14:22:58 -05:00
Paweł Banaszewski	8369fa88fd	feat: add columns for cached tokens from aibridge (#23832 ) Two new columns added to aibridge_token_usages: - cache_read_input_tokens (BIGINT, default 0) - cache_write_input_tokens (BIGINT, default 0) Migration backfills existing rows by extracting values from the metadata JSONB column (cache_read_input, input_cached, prompt_cached for reads (max value selected since only 1 should be set), cache_creation_input for writes). All references to data from metadata were updated to reference new columns. No other changes then changing where data is extracted from. Requires aibridge library version bump to include: https://github.com/coder/aibridge/pull/229 Fixes: https://github.com/coder/aibridge/issues/150	2026-04-03 16:27:31 +02:00
Michael Suchacz	7d0a0c6495	feat: provider key policies and user provider settings (#23751 )	2026-04-02 19:46:42 +02:00
Cian Johnston	d4a9c63e91	feat: auto-assign agents-access role to new users when experiment enabled (#23968 ) When the `agents` experiment is enabled, new users are automatically granted the `agents-access` role at creation time so they can use Coder Agents without manual admin intervention. - Auto-assigns in `CreateUser()` — covers admin API, OAuth, and OIDC creation paths - Skips auto-assign for OIDC users when enterprise site role sync is enabled (sync overwrites roles on every login; those admins should use `--oidc-user-role-default` instead) - CLI `create-admin-user` bypasses `CreateUser()` but creates `owner` users who already have all permissions > 🤖 Written by a Coder Agent. Will be reviewed by a human.	2026-04-02 14:46:47 +01:00
Susana Ferreira	fe13fd065c	chore: downgrade log level for unauthenticated HEAD requests (#23923 ) Some clients (e.g. Claude) send a HEAD request without credentials as a connectivity check before making actual API calls. This was logging at `Warn` level, creating noise. Downgrade to Info for unauthenticated HEAD requests and add the HTTP method to the logger for better observability. Related to internal slack thread: https://codercom.slack.com/archives/C0AEHQGLW22/p1775045200997309	2026-04-02 11:30:22 +01:00
Susana Ferreira	fb788530b3	feat: add provider_name column to aibridge interceptions (#23960 ) ## Description Adds `provider_name` to aibridge interceptions to store the provider instance name alongside the provider type. This allows distinguishing between multiple instances of the same provider type (e.g. `copilot` vs `copilot-business`). ## Changes * Add `provider_name` column to `aibridge_interceptions` table with backfill from `provider`. * Add `provider_name` field to the proto `RecordInterceptionRequest` message. * Add `ProviderName` to the `codersdk.AIBridgeInterception` API response. _Disclaimer: initially produced by Claude Opus 4.6, modified and reviewed by @ssncferreira ._	2026-04-02 10:58:13 +01:00
Ethan	7757cd8e08	refactor(coderd/x/chatd): insert chats directly as pending on creation (#23888 ) Previously, `CreateChat` inserted the `chats` row with the DB default status (`waiting`), then updated it to `pending` in the same transaction via `setChatPendingWithStore`. This wasted two extra queries per chat creation (`GetChatByID` + `UpdateChatStatus`) and rewrote the same row immediately after inserting it. Now `CreateChat` passes the status directly to `InsertChat`, so the row is written once in its final create-time state. The `setChatPendingWithStore` helper is removed entirely. `InsertChat` now requires an explicit `status` parameter at all callsites instead of relying on a DB column default. ## Motivation On an experimental branch we're trialing firing all chatd notifications from plpgsql triggers. The old two-step insert made that awkward: in an `AFTER INSERT` trigger, `NEW` only contained the insert-time row (`waiting`), not the final committed state (`pending`). To emit the correct event payload the trigger had to be deferred and re-read the row from `chats` at commit time. With this change, `NEW` already contains the correct row to publish — no deferred trigger, no extra `SELECT`, simpler and cheaper trigger logic. That said, this seems like a worthwhile change regardless of the trigger experiment: writing the final row state once removes unnecessary DB work on every chat creation and makes the create path easier to reason about.	2026-04-02 14:13:51 +11:00
Cian Johnston	d6df78c9b9	chore: remove racy ChatStatusPending assertions after CreateChat (#23882 ) Removes 6 fragile `require.Equal(t, codersdk.ChatStatusPending, chat.Status)` assertions from chat relay and creation tests. Root cause: In HA tests with two replicas sharing the same DB, the worker can acquire a just-created chat (flipping `pending → running` via `AcquireChats`) before the HTTP response reaches the test. All affected tests already synchronize via `require.Eventually` waiting for `running` status, making the initial assertion both redundant and racy. - Remove 5 assertions in `enterprise/coderd/exp_chats_test.go` (all `TestChatStreamRelay` subtests) - Remove 1 assertion in `coderd/exp_chats_test.go` (`TestPostChats`) - An existing comment in `TestPostChats/Success` already documents this exact race Fixes flake: https://github.com/coder/coder/actions/runs/23807597632/job/69385425724 > 🤖 Written by a Coder Agent. Will be reviewed by a human.	2026-04-01 10:00:50 +01:00
Yevhenii Shcherbina	84b94a8376	feat: add chatgpt support for aibridge proxy (#23826 ) Add ChatGPT support for AIBridgeProxy	2026-03-31 12:54:38 -04:00
Yevhenii Shcherbina	9440adf435	feat: add chatgpt support for aibridge (#23822 ) Registers a new aibridge provider for ChatGPT by reusing the existing OpenAI provider with a different `Name` and `BaseURL` (https://chatgpt.com/backend-api/codex). The ChatGPT backend API is OpenAI-compatible, so no new provider type is needed. ChatGPT authenticates exclusively via per-user OAuth JWTs (BYOK mode) — no centralized API key is configured. The OpenAI provider already handles this: when no key is set, it falls through to the bearer token from the request's Authorization header. Depends on #23811	2026-03-31 12:08:45 -04:00
Susana Ferreira	b0036af57b	feat: register multiple Copilot providers for business and enterprise upstreams (#23811 ) ## Description Adds support for multiple Copilot provider instances to route requests to different Copilot upstreams (individual, business, enterprise). Each instance has its own name and base URL, enabling per-upstream metrics, logs, circuit breakers, API dump, and routing. ## Changes * Add Copilot business and enterprise provider names and host constants * Register three Copilot provider instances in aibridged (default, business, enterprise) * Update `defaultAIBridgeProvider` in `aibridgeproxy` to route new Copilot hosts to their corresponding providers ## Related * Depends on: https://github.com/coder/aibridge/pull/240 * Closes: https://github.com/coder/aibridge/issues/152 Note: documentation changes will be added in a follow-up PR. _Disclaimer: initially produced by Claude Opus 4.6, heavily modified and reviewed by @ssncferreira ._	2026-03-31 16:00:37 +01:00
Danny Kopping	9fa103929a	perf: make `ListAIBridgeSessions` 10x faster (#23774 ) _Disclaimer: produced using Claude Opus 4.6, reviewed by me, and validated against Dogfood dataset._ The `ListAIBridgeSessions` query materialized and aggregated all matching interceptions before paginating, then ran expensive token/prompt lookups across the full dataset. For a page of 25 sessions against ~200k interceptions (our dogfood dataset), this meant: - Three CTEs scanning all rows (filtered_interceptions, session_tokens, session_root) - ARRAY_AGG(fi.id) collecting every interception ID per session - Lateral prompt lookup via ANY(array_of_all_ids) running for every session, not just the page - ~90MB of disk sorts and JIT compilation kicking in The improvement is to restructure to paginate first and enrich after: a single CTE groups interceptions into sessions with only cheap aggregates (MIN, MAX, COUNT), applies cursor pagination and LIMIT, then lateral joins fetch metadata, tokens, and prompts for just the ~25-row page. Measured against 220k interceptions / 160k sessions: \| Metric \| Before \| After \| \|--------------------\|--------\|-------\| \| Execution time \| 1800ms \| 185ms \| \| Shared buffer hits \| 737k \| 2.6k \| \| Disk sort spill \| 86MB \| 16MB \| \| Lateral loops \| 160k \| 25 \| https://grafana.dev.coder.com/goto/fbODPGtvR?orgId=1 the results are identical, just _much_ faster. --- Also includes some additional tests which I added prior to refactoring the query to ensure no regressions on edge-cases. --------- Signed-off-by: Danny Kopping <danny@coder.com>	2026-03-31 14:42:23 +02:00
Cian Johnston	3ce82bb885	feat: add chat-access site-wide role to gate chat creation (#23724 ) - Add `chat-access` built-in role granting chat CRUD at User scope - Exclude `ResourceChat` from member, org member, and org service account `allPermsExcept` calls - Allow system, owner, and user-admin to assign the new role - Migration auto-assigns role to users who have ever created a chat - Update RBAC test matrix: `memberMe` denied, `chatAccessUser` allowed Breaking change: Members without `chat-access` lose chat creation ability. Migration covers existing chat creators. Members who have never created a chat do not get this role automatically applied. > 🤖 This PR was created by a Coder Agent and reviewed by me.	2026-03-31 10:07:21 +01:00
Susana Ferreira	0fb3e5cba5	feat: extract, log, and strip aibridgeproxy request ID header in aibridged (#23731 ) ## Problem `aibridgeproxyd` sends `X-AI-Bridge-Request-Id` on every MITM request to `aibridged` for cross-service log correlation, but aibridged never reads it. The header is silently forwarded to upstream LLM providers. ## Changes * Renamed the header to `X-Coder-AI-Governance-Request-Id` to match the existing `X-Coder-AI-Governance-` convention. `aibridged` now extracts the header, logs it and strips it before forwarding upstream. * Added `TestServeHTTP_StripInternalHeaders` to verify no `X-Coder-*` headers leak to upstream	2026-03-30 15:21:30 +01:00
Jakub Domeracki	28484536b6	fix(enterprise/aibridgeproxyd): return 403 for blocked private IP CONNECT attempts (#23360 ) Previously, when a CONNECT tunnel was blocked because the destination resolved to a private/reserved IP range, the proxy returned 502 Bad Gateway — implying an upstream failure rather than a deliberate policy block. Introduce `blockedIPError` as a sentinel type returned by both `checkBlockedIP` and `checkBlockedIPAndDial`. `ConnectionErrHandler` now inspects the error with `errors.As` and returns 403 Forbidden for policy blocks, keeping 502 for genuine dial failures. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 12:25:33 +02:00
Ethan	13dfc9a9bb	test: harden chatd relay test setup (#23759 ) These chatd relay tests were seeding chats through `subscriber.CreateChat(...)`, which wakes the subscriber and can race local acquisition against the intended remote-worker setup. Seed waiting and remote-running chats directly in the database instead, and point the default OpenAI provider at a local safety-net server so accidental processing fails locally instead of reaching the live API. Closes https://github.com/coder/internal/issues/1430	2026-03-30 17:52:01 +11:00
Jake Howell	71a492a374	feat: implement `<ClientFilter />` to AI Bridge request logs (#22694 ) Closes #22136 This pull-request implements a `<ClientFilter />` to our `Request Logs` page for AI Bridge. This will allow the user to select a client which they wish to filter against. Technically the backend is able to actually filter against multiple clients at once however the frontend doesn't currently have a nice way of supporting this (future improvement). <img width="1447" height="831" alt="image" src="https://github.com/user-attachments/assets/0be234e2-25f2-4a89-b971-d74817395da1" /> --------- Co-authored-by: Jeremy Ruppel <jeremy.ruppel@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 17:18:28 -04:00
Jaayden Halko	86c3983fc0	feat: add AI Governance seat capacity banners (#23411 ) ## Summary Add site-wide banners for AI Governance seat usage thresholds: 1. 90% capacity warning (admin-only): When actual AI Governance seats are ≥90% and <100% of the license limit, admins see: > "You have used 90% of your AI governance add-on seats." 2. Over-limit banner (admin-only): When actual seats exceed the license limit, admins see a prominent warning: > "Your organization is using {actual} / {limit} AI Governance user seats ({X}% over the limit). Contact sales@coder.com" - Uses floor whole percentage (Go int division / `Math.floor`) - Includes a clickable `mailto:sales@coder.com` link	2026-03-27 05:51:51 +00:00
Danny Kopping	801e57d430	feat: session detail API (#23203 )	2026-03-26 18:09:53 +02:00
Ethan	4d74603045	fix(coderd/x/chatd): respect provider Retry-After headers in chat retry loop (#23351 ) > PR Stack > 1. #23351 ← `#23282` (you are here) > 2. #23282 ← `#23275` > 3. #23275 ← `#23349` > 4. #23349 ← `main` --- ## Summary `chatretry.Retry()` used pure exponential backoff (1 s, 2 s, 4 s, …) and never consulted provider `Retry-After` headers. Fantasy's `ProviderError` carries `ResponseHeaders` including `Retry-After`, but `chaterror.Classify()` only parsed error text and silently dropped the structured transport metadata. This makes `Retry-After` a first-class signal in the classification → retry pipeline. <img width="853" height="346" alt="image" src="https://github.com/user-attachments/assets/65f012b6-8173-43d2-957e-ab9faddea525" /> ## Changes ### `coderd/chatd/chaterror/classify.go` - Added `RetryAfter time.Duration` field to `ClassifiedError` — a normalized minimum retry delay derived from provider response metadata. - `Classify()` now calls `extractProviderErrorDetails()` before falling back to text heuristics. Structured `ProviderError.StatusCode` takes priority over regex extraction. - `normalizeClassification()` preserves and clamps `RetryAfter`. ### `coderd/chatd/chaterror/provider_error.go` (new) Provider-specific extraction, isolated from the text-based classification logic: - `extractProviderErrorDetails()` unwraps `fantasy.ProviderError` from the error chain via `errors.As`. - `retryAfterFromHeaders()` parses headers in priority order: 1. `retry-after-ms` (OpenAI-specific, millisecond precision) 2. `retry-after` (standard HTTP — integer seconds or HTTP-date) - Case-insensitive header key lookup. ### `coderd/chatd/chatretry/chatretry.go` - `effectiveDelay(attempt, classified)` computes `max(Delay(attempt), classified.RetryAfter)` — the provider hint acts as a floor without weakening the local exponential backoff. - `Retry()` now uses `effectiveDelay` and passes the effective delay to both `onRetry(...)` and the sleep timer, so downstream payloads, logs, and the frontend countdown stay aligned automatically. ### Tests - `classify_test.go`: Structured provider status + `Retry-After` extraction, `retry-after-ms` priority, HTTP-date parsing, invalid header fallback, `WithProvider` preservation. - `chatretry_test.go`: Retry-after-as-floor semantics — longer hint wins, shorter hint keeps base delay. ## Design notes - No SDK/API/frontend changes needed.* `codersdk.ChatStreamRetry` already carries `DelayMs` and `RetryingAt`, and the frontend already consumes them. The fix is purely in the server-side delay computation. - Existing retryability rules unchanged. This fixes when we sleep, not whether an error is retryable. - Provider hint is a floor: `max(baseDelay, RetryAfter)` ensures we never retry earlier than the provider asks, and never weaken our own backoff curve.	2026-03-27 01:20:46 +11:00
Cian Johnston	847a88c6ca	chore: clean up stale and dangerous //nolint comments (#23643 ) ## Changes - Commit 1: Remove 17 unnecessary `//nolint` directives: - `//nolint:varnamelen` — linter not active - `//nolint:unused` on exported `SlimUnsupported` - `//nolint:govet` in `coderd/httpmw/csrf` — no longer fires - `//nolint:revive` on functions refactored since the nolint was added - `//nolint:paralleltest` citing Go 1.22 loop variable capture (obsolete) - Bare `//nolint` narrowed to specific `//nolint:gocritic` with justification - Commit 2: Fix root causes behind 5 dangerous nolint suppressions: - Add `MinVersion: tls.VersionTLS12` to TLS client config (removes `gosec` G402) - Delete trivial unexported wrappers `apiKey()`/`normalizeProvider()` in chatprovider (removes `revive` confusing-naming) - Add doc comments to `StartWithAssert` and `Router` (removes `revive` exported) - Rename unused parameters to `_` in integration test helpers > 🤖 This PR was created using Coder Agents and reviewed by me.	2026-03-26 14:13:53 +00:00
Danny Kopping	8eade29e68	chore: update AI Bridge warning to require AI Governance Add-On (#23662 ) Disclaimer: implemented by a Coder Agent using Claude Opus 4.6, reviewed by me. Replace the transitional soft warning message: > AI Bridge is now Generally Available in v2.30. In a future Coder version, your deployment will require the AI Governance Add-On to continue using this feature. Please reach out to your account team or sales@coder.com to learn more. with the definitive requirement message: > The AI Governance Add-On is required to use AI Bridge. Please reach out to your account team or sales@coder.com to learn more. Updated in: - `enterprise/coderd/license/license.go` - `enterprise/coderd/license/license_test.go` (2 occurrences)	2026-03-26 11:10:53 +02:00
Yevhenii Shcherbina	a86b8ab6f8	feat: aibridge BYOK (#23013 ) ### Changes coder/coder: - `coderd/aibridge/aibridge.go` — Added `HeaderCoderBYOKToken` constant, `IsBYOK()` helper, and updated `ExtractAuthToken` to check the BYOK header first. - `enterprise/aibridged/http.go` — BYOK-aware header stripping: in BYOK mode only the BYOK header is stripped (user's LLM credentials preserved); in centralized mode all auth headers are stripped. <hr/> NOTE: `X-Coder-Token` was removed! As of now `ExtractAuthToken` retrieves token either from `X-Coder-AI-Governance-BYOK-Token` or from `Authorization`/`X-Api-Key`. --------- Co-authored-by: Susana Ferreira <susana@coder.com> Co-authored-by: Danny Kopping <danny@coder.com>	2026-03-25 14:17:56 -04:00
Jake Howell	0cea4de69e	fix: `AI governance` into `AI Governance` (#23553 )	2026-03-25 20:06:48 +11:00
Ethan	70f031d793	feat(coderd/chatd): structured chat error classification and retry hardening (#23275 ) > PR Stack > 1. #23351 ← `#23282` > 2. #23282 ← `#23275` > 3. #23275 ← `#23349` (you are here) > 4. #23349 ← `main` --- ## Summary Extracts a structured error classification subsystem for agent chat (`chatd`) so that retry and error payloads carry machine-readable metadata — error kind, provider name, HTTP status code, and retryability — instead of raw error strings. This is the backend half of the error-handling work. The frontend counterpart is in #23282. ## Changes ### New package: `coderd/chatd/chaterror/` Canonical error classification — extracts error kind, provider, status code, and user-facing message from raw provider errors. One source of truth that drives both retry policy and stream payloads. - `kind.go`: Error kind enum (`rate_limit`, `timeout`, `auth`, `config`, `overloaded`, `unknown`). - `signals.go`: Signal extraction — parses provider name, HTTP status code, and retryability from error strings and wrapped types. - `classify.go`: Classification logic — maps extracted signals to an error kind. - `message.go`: User-facing message templates keyed by kind + signals. - `payload.go`: Projectors that build `ChatStreamError` and `ChatStreamRetry` payloads from a classified error. ### Modified - `codersdk/chats.go`: Added `Kind`, `Provider`, `Retryable`, `StatusCode` fields to `ChatStreamError` and `ChatStreamRetry`. - `coderd/chatd/chatretry/`: Thinned to retry-policy only; classification logic moved to `chaterror`. - `coderd/chatd/chatloop/`: Added per-attempt first-chunk timeout (60 s) via `guardedStream` wrapper — produces retryable `startup_timeout` errors instead of hanging forever. - `coderd/chatd/chatd.go`: Publishes normalized retry/error payloads via `chaterror` projectors.	2026-03-25 13:47:54 +11:00
Mathias Fredriksson	38f723288f	fix: correct malformed struct tags in organizationroles and scim_test (#23497 ) Fix leading space in table tag and escaped-quote tag syntax. Extracted from #23201.	2026-03-25 13:11:08 +11:00
Asher	81188b9ac9	feat: add filtering by service account (#23468 ) You can now filter by/out service accounts using `service_account:true/false` or using the filter dropdown.	2026-03-24 10:13:25 -08:00
Danny Kopping	dba9f68b11	chore!: remove members' ability to read their own interceptions; rationalize RBAC requirements (#23320 ) _Disclaimer:_ _produced_ _by_ _Claude_ _Opus_ _4\.6,_ _reviewed_ _by_ _me._ This is a breaking change. Users who are not have `owner` or sitewide `auditor` roles will no longer be able to view interceptions. Regular users should not need to view this information; in fact, it could be used by a malicious insider to see what information we track and don't track to exfiltrate data or perform actions unobserved. --- Changed authorization for AI Bridge interception-related operations from system-level permissions to resource-specific permissions. The following functions now authorize against `rbac.ResourceAibridgeInterception` instead of `rbac.ResourceSystem`: - `ListAIBridgeTokenUsagesByInterceptionIDs` - `ListAIBridgeToolUsagesByInterceptionIDs` - `ListAIBridgeUserPromptsByInterceptionIDs` Updated RBAC roles to grant AI Bridge interception permissions: - User/Member roles: Can create and update AI Bridge interceptions but cannot read them back - Service accounts: Same create/update permissions without read access - Owners/Auditors: Retain full read access to all interceptions Removed system-level authorization bypass in `populatedAndConvertAIBridgeInterceptions` function, allowing proper resource-level authorization checks. Updated tests to reflect the new permission model where members cannot view AI Bridge interceptions, even their own, while owners and auditors maintain full visibility.	2026-03-24 12:03:20 +02:00
Danny Kopping	43a1af3cd6	feat: session list API (#23202 ) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. --> _Disclaimer:_ _initially_ _produced_ _by_ _Claude_ _Opus_ _4\.6,_ _heavily_ _modified_ _and_ _reviewed_ _by_ _me._ Closes https://github.com/coder/internal/issues/1360 Adds a new `/api/v2/aibridge/sessions` API which returns "sessions". Sessions, as defined in the [RFC](https://www.notion.so/coderhq/AI-Bridge-Sessions-Threads-2ccd579be59280f28021d3baf7472fbe?source=copy_link), are a set of interceptions logically grouped by a session key issued by the client. The API design for this endpoint was done in [this doc](https://github.com/coder/internal/issues/1360). If the client has not provided a session ID, we will revert to the thread root ID, and if that's not present we use the interception's own ID (i.e. a session of a single interception - which is effectively what we show currently in our `/api/v2/aibridge/interceptions` API). The SQL query looks gnarly but it's relatively simple, and seems to perform well (~200ms) even when I import dogfood's `aibridge_*` tables into my workspace. If we need to improve performance on this later we can investigate materialized views, perhaps, but for now I don't think it's warranted. --- _The PR looks large but it's got a lot of generated code; the actual changes aren't huge._	2026-03-24 08:58:47 +02:00
Cian Johnston	80a172f932	chore: move chatd and related packages to /x/ subpackage (#23445 ) - Moves `coderd/chatd/`, `coderd/gitsync/`, `enterprise/coderd/chatd/` under `x/` parent directories to signal instability - Adds `Experimental:` glue code comments in `coderd/coderd.go` > 🤖 This PR was created with the help of Coder Agents, and was reviewed by my human. 🧑‍💻	2026-03-23 17:34:43 +00:00
Cian Johnston	ef14654078	chore: move chat methods to ExperimentalClient (#23441 ) - Changes all 41 chat method receivers in `codersdk/chats.go` from `Client` to `ExperimentalClient` to ensure that callers are aware that these reference potentially unstable `/api/experimental` endpoints. > 🤖 This PR was created with the help of Coder Agents, and has been reviewed by my human. 🧑‍💻	2026-03-23 14:32:11 +00:00
Asher	24ab216dd1	feat: add new group members endpoint with filtering and pagination (#23067 ) Partially addresses #21813 (still need to make changes to the "add user" button to be complete) Since there are a lot of user tests already, I moved them into `coderdtest` to be shared.	2026-03-20 12:43:03 -08:00
Jaayden Halko	6f244cddde	feat: display the addon license UI (#22948 ) <img width="1052" height="234" alt="Screenshot 2026-03-18 at 21 58 57" src="https://github.com/user-attachments/assets/136ccb1f-e47a-44fd-804d-859301161435" /> --------- Co-authored-by: Steven Masley <stevenmasley@gmail.com>	2026-03-20 16:34:17 +00:00
Ethan	a1e912a763	fix(chatd): deliver retry control events via pubsub (#23349 ) > PR Stack > 1. #23351 ← `#23282` > 2. #23282 ← `#23275` > 3. #23275 ← `#23349` > 4. #23349 ← `main` (you are here) --- Retry events were published only to the local in-process stream via `publishEvent()`. When pubsub is active, `Subscribe()`'s merge loop only forwarded durable events (messages, status, errors) from pubsub notifications, so retry events were silently dropped for cross-replica subscribers. This adds a `publishRetry()` helper that publishes both locally and via pubsub, and extends the `Subscribe()` notification handler to forward retry events. Changes: - `coderd/pubsub/chatstreamnotify.go`: add `Retry` field to notify message - `coderd/chatd/chatd.go`: add `publishRetry()`, update `OnRetry` callback, extend `Subscribe()` to forward `notify.Retry` - `coderd/chatd/chatd_internal_test.go`: focused pubsub delivery test - `enterprise/coderd/chatd/chatd_test.go`: cross-replica end-to-end test	2026-03-20 15:19:41 +00:00
Cian Johnston	f1d333f0e6	refactor: deduplicate utility helpers across the codebase (#23338 ) Audited exported helpers in `coderd/util/`, `testutil`, `cryptorand`, and friends, then replaced duplicated implementations with canonical versions. - fix: `maps.SortedKeys` generic signature* — value type was hardcoded to `any`, making it impossible to actually call. Added second type parameter `V any`. Added table-driven tests with `cmp.Diff`. - refactor: replace ad-hoc ptr helpers with `ptr.Ref` — removed `int64Ptr`, `stringPtr`, `boolPtr`, `i64ptr`, `strPtr`, `PtrInt32` across 6 files. - refactor: replace local `sortedKeys`/`sortKeys` with `maps.SortedKeys` — now that the signature is fixed, scripts can use it. - refactor: replace hand-rolled `capitalize` with `strings.Capitalize` — the typegen version was also not UTF-8 safe. > 🤖 This PR was created with the help of Coder Agents, and was reviewed by my human. 🧑‍💻	2026-03-20 15:12:41 +00:00
Susana Ferreira	139594a4f4	feat: block CONNECT tunnels to private/reserved IP ranges (#23109 ) ## Description Blocks `CONNECT` tunnels to private and reserved IP ranges in aibridgeproxyd, preventing the proxy from being used to reach internal networks. The Coder access URL is always exempt (hostname+port match) so the proxy can reach its own deployment. It is possible to exempt additional ranges via `CODER_AIBRIDGE_PROXY_ALLOWED_PRIVATE_CIDRS`. DNS rebinding is handled differently per path: * Direct (no upstream proxy): validate the resolved IP right before the TCP dial, no window between check and connect. * Upstream proxy: Resolves and checks before forwarding to the upstream dialer. A small rebinding window exists since the upstream proxy re-resolves independently. ## Changes * Add blocked IP denylist covering private, reserved, and special-purpose ranges * Add `AllowedPrivateCIDRs` option with CLI flag and env var * Wire IP checks into `proxy.ConnectDial` for both upstream and direct paths * Add tests for blocked/allowed cases across direct dial, upstream proxy, CIDR exemptions, and CoderAccessURL exemption Notes: documentation will be handled in a follow-up PR. Closes: https://github.com/coder/security/issues/124	2026-03-20 09:49:26 +00:00
Kyle Carberry	d8ff67fb68	feat: add MCP server configuration backend for chats (#23227 ) ## Summary Adds the database schema, API endpoints, SDK types, and encryption wrappers for admin-managed MCP (Model Context Protocol) server configurations that chatd can consume. This is the backend foundation for allowing external MCP tools (Sentry, Linear, GitHub, etc.) to be used during AI chat sessions. ## Database Two new tables: - `mcp_server_configs`: Admin-managed server definitions with URL, transport (Streamable HTTP / SSE), auth config (none / OAuth2 / API key / custom headers), tool allow/deny lists, and an availability policy (`force_on` / `default_on` / `default_off`). Includes CHECK constraints on transport, auth_type, and availability values. - `mcp_server_user_tokens`: Per-user OAuth2 tokens for servers requiring individual authentication. Cascades on user/config deletion. New column on `chats` table: - `mcp_server_ids UUID[]`: Per-chat MCP server selection, following the same pattern as `model_config_id` — passed at chat creation, changeable per-message with nil-means-no-change semantics. ## API Endpoints All routes are under `/api/experimental/mcp/servers/` and gated behind the `agents` experiment. Admin endpoints (`ResourceDeploymentConfig` auth): - `POST /` — Create MCP server config - `PATCH /{id}` — Update MCP server config (full-replace) - `DELETE /{id}` — Delete MCP server config Authenticated endpoints (all users, enabled servers only for non-admins): - `GET /` — List configs (admins see all, members see enabled-only with admin fields redacted) - `GET /{id}` — Get config by ID (with `auth_connected` populated per-user) OAuth2 per-user auth flow: - `GET /{id}/oauth2/connect` — Initiate OAuth2 flow (state cookie CSRF protection) - `GET /{id}/oauth2/callback` — Handle OAuth2 callback, store tokens - `DELETE /{id}/oauth2/disconnect` — Remove stored OAuth2 tokens ## Security - Secrets never returned: `OAuth2ClientSecret`, `APIKeyValue`, and `CustomHeaders` are never in API responses — only boolean indicators (`has_oauth2_secret`, `has_api_key`, `has_custom_headers`). - Field redaction for non-admins: `convertMCPServerConfigRedacted` strips `OAuth2ClientID`, auth URLs, scopes, and `APIKeyHeader` from non-admin responses. - dbcrypt encryption at rest: All 5 secret fields use `dbcrypt_keys` encryption with full encrypt-on-write / decrypt-on-read wrappers (11 dbcrypt method overrides + 2 helpers), following the same pattern as `chat_providers.api_key`. - OAuth2 CSRF protection: State parameter stored in `HttpOnly` cookie with `HTTPCookies.Apply()` for correct `Secure`/`SameSite` behind TLS-terminating proxies. - dbauthz authorization: All 18 querier methods have authorization wrappers. Read operations use `ActionRead`, write operations use `ActionUpdate` on `ResourceDeploymentConfig`. ## Governance Model \| Control \| Implementation \| \|---------\|---------------\| \| Global kill switch \| `enabled` defaults to `false` \| \| Availability policy \| `force_on` (always injected), `default_on` (pre-selected), `default_off` (opt-in) \| \| Per-chat selection \| `mcp_server_ids` on `CreateChatRequest` / `CreateChatMessageRequest` \| \| Auth gate \| OAuth2 servers require per-user auth before tools are injected \| \| Tool-level allow/deny \| Arrays on `mcp_server_configs` for granular tool filtering \| \| Secrets encrypted at rest \| Uses `dbcrypt_keys` (same pattern as `chat_providers.api_key`) \| ## Tests 8 test functions covering: - Full CRUD lifecycle (create, list, update, delete) - Non-admin visibility filtering (enabled-only, field redaction) - `auth_connected` population for OAuth2 vs non-OAuth2 servers - Availability policy validation (valid values + invalid rejection) - Unique slug enforcement (409 Conflict) - OAuth2 disconnect idempotency - Chat creation with `mcp_server_ids` persistence ## Known Limitations (Deferred) These are documented and intentional for an experimental feature: - Audit logging not yet wired — will add when feature stabilizes - Cross-field validation (e.g., OAuth2 fields required when `auth_type=oauth2`) — admin-only endpoint, will add when stabilizing - `force_on` auto-injection — query exists but not yet wired into chatd tool injection (follow-up) - Additional test coverage — 403 auth tests, GET-by-ID tests, callback CSRF tests planned for follow-up ## What's NOT in this PR - Frontend UI (admin panel + chat picker) - Actual MCP client connections (`chatd/chatmcp/` manager) - Tool injection into `chatloop/`	2026-03-19 14:07:36 +00:00
Cian Johnston	65b7658568	chore: extract testutil.FakeSink for slog test assertions (#23208 ) Follow-up to [review comment on #23025](https://github.com/coder/coder/pull/23025#discussion_r2930309487) from @mafredri. Extracts the repeated `logSink` / `fakeSink` test pattern into a shared `testutil.FakeSink` and migrates all existing call sites. > 🤖 This PR was created with the help of Coder Agents, and will be reviewed by my human. 🧑‍💻 --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-03-18 17:02:38 +00:00
Steven Masley	84de391f26	chore: add tallyman events for ai seat tracking (#22689 ) AI seat tracking inserted as heartbeat into usage table.	2026-03-18 09:30:22 -05:00
George K	91ec0f1484	feat: add service_accounts workspace sharing mode (#23093 ) Introduce a three-way workspace sharing setting (none, everyone, service_accounts) replacing the boolean workspace_sharing_disabled. In service_accounts mode, only service account-owned workspaces can be shared while regular members' share permissions are removed. Adds a new organization-service-account system role with per-org permissions reconciled alongside the existing organization-member system role. Related to: https://linear.app/codercom/issue/PLAT-28/feat-service-accounts-sharing-mode-and-rbac-role --------- Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com> Co-authored-by: Kayla はな <mckayla@hey.com>	2026-03-17 12:16:43 -07:00

1 2 3 4 5 ...

1272 Commits