Compare commits

...

9 Commits

Author SHA1 Message Date
Mathias Fredriksson 34dc7fca99 fix(reaper): register signal handler before goroutine
Move signal.Notify call from inside the catchSignals goroutine
to ForkReap's synchronous path. This prevents a race where
SIGINT arrives before the goroutine has registered its handler,
causing the signal to never be forwarded to the child process
and TestReapInterrupt to hang indefinitely.
2026-03-10 05:14:42 +00:00
Mathias Fredriksson 17892264c6 fix(e2e): use retrying assertions for bool in verifyParameters
The verifyParameters helper used a one-shot isChecked() call to
verify boolean parameter state. Dynamic parameter hydration can
reset the checkbox after page load, causing a race where the
snapshot reads the stale default value instead of the expected
one. Replace with Playwright's auto-retrying toBeChecked() /
not.toBeChecked() assertions, matching the pattern already used
for string and number parameters.
2026-03-10 03:55:27 +00:00
Mathias Fredriksson c237c867b3 fix(e2e): reload login page on failed dynamic import
The login helper navigated to /login and immediately tried to
fill the Email field. During parallel e2e execution,
ERR_NETWORK_CHANGED can cause dynamic imports to fail, rendering
the error boundary instead of the login form. This caused
getByLabel('Email') to time out.

Apply the same reloadPageIfDynamicImportFailed pattern already
used in createWorkspace.
2026-03-10 02:33:19 +00:00
Mathias Fredriksson 0d23bc0d21 fix(autobuild): fix flaky TestExecutorAutostopAIAgentActivity
The second tick used a time offset from a 'now' variable captured
early in the test. The activity bump uses the database NOW() which
advances with wall clock time. Under slow CI, the elapsed time
between the two caused the tick to land before the bumped deadline,
so the executor skipped the stop transition.

Use time.Now() when computing the second tick so it is always
relative to the actual bump time.
2026-03-10 02:03:42 +00:00
Mathias Fredriksson d1665e49f8 fix(e2e): reload page on failed dynamic import
When parallel e2e tests trigger ERR_NETWORK_CHANGED, React
Router's dynamic import for the page chunk can fail, rendering
the error boundary instead of the expected page content. This
caused createWorkspace to time out waiting for the name input.

After navigating, race the expected content against the error
boundary. If the error boundary wins, reload the page to retry
the chunk load.
2026-03-10 00:03:33 +00:00
Mathias Fredriksson 97e12ca4c0 fix(Makefile): use bash subshell in lint/go recipe
The lint/go recipe used $(shell) inside a recipe to extract the
golangci-lint version. When MAKE_TIMED=1, $(shell) runs through
timed-shell.sh with $@ (the target name) as the first argument.
Since the target name doesn't start with '-', the timing code
path runs and its banner output contaminates the captured value,
causing intermittent 'lint/go: No such file or directory' errors.

Replace with bash command substitution ($$(...)), which is the
correct approach under .ONESHELL and avoids the SHELL/SHELLFLAGS
interaction entirely. Also replace deprecated egrep with grep -oE.
2026-03-10 00:03:29 +00:00
Mathias Fredriksson a1c75df06c fix(agent/reaper): prevent SIGINT from killing test process
The TestReapInterrupt test sends SIGINT to its own process. Under
the race detector, the catchSignals goroutine inside ForkReap may
not have registered signal.Notify yet, so Go's default SIGINT
handler can terminate the test binary.

Register signal.Notify for os.Interrupt in the test before sending
SIGINT. This disables the default termination behavior while still
allowing the catchSignals handler to receive the signal
independently.
2026-03-09 23:06:01 +00:00
Mathias Fredriksson 2d19519e6a fix(agent/reaper): prevent reaper from stealing child exit status
ForkReap starts a reaper goroutine that calls Wait4(-1, WNOHANG),
which reaps any child. This raced with ForkReap's own Wait4(pid)
for the direct child, causing ECHILD errors.

Use the reapLock that go-reap already supports: hold a read lock
during our Wait4(pid) so the reaper's write lock blocks until we
have collected the child's status.
2026-03-09 22:09:45 +00:00
Mathias Fredriksson 47d9931398 fix(e2e): handle bool parameter hydration race in fillParameters
The bool case in fillParameters blindly toggled the switch with a
single click. If dynamic parameters hydrated after the click and
reset the switch state, the final value was wrong, causing the
'overwrite default parameters' test to flake.

Mirror the retry pattern already used for string/number parameters:
read the current aria-checked state, click only if it differs from
the desired value, then verify and retry up to 3 times.
2026-03-09 22:09:45 +00:00
5 changed files with 115 additions and 22 deletions
+1 -1
View File
@@ -636,7 +636,7 @@ lint/ts: site/node_modules/.installed
lint/go:
./scripts/check_enterprise_imports.sh
./scripts/check_codersdk_imports.sh
linter_ver=$(shell egrep -o 'GOLANGCI_LINT_VERSION=\S+' dogfood/coder/Dockerfile | cut -d '=' -f 2)
linter_ver=$$(grep -oE 'GOLANGCI_LINT_VERSION=\S+' dogfood/coder/Dockerfile | cut -d '=' -f 2)
go run github.com/golangci/golangci-lint/cmd/golangci-lint@v$$linter_ver run
go tool github.com/coder/paralleltestctx/cmd/paralleltestctx -custom-funcs="testutil.Context" ./...
.PHONY: lint/go
+11
View File
@@ -141,6 +141,17 @@ func TestReapInterrupt(t *testing.T) {
}()
require.Equal(t, <-usrSig, syscall.SIGUSR1)
// Prevent SIGINT from terminating the test process. Under the
// race detector, the catchSignals goroutine in ForkReap may not
// have called signal.Notify yet, so the default Go handler
// could kill us. Registering our own Notify disables the
// default behavior. Both this channel and the one inside
// catchSignals receive independent copies of the signal.
intC := make(chan os.Signal, 1)
signal.Notify(intC, os.Interrupt)
defer signal.Stop(intC)
err := syscall.Kill(os.Getpid(), syscall.SIGINT)
require.NoError(t, err)
require.Equal(t, <-usrSig, syscall.SIGUSR2)
+21 -16
View File
@@ -6,6 +6,7 @@ import (
"context"
"os"
"os/signal"
"sync"
"syscall"
"github.com/hashicorp/go-reap"
@@ -19,20 +20,7 @@ func IsInitProcess() bool {
return os.Getpid() == 1
}
func catchSignals(logger slog.Logger, pid int, sigs []os.Signal) {
if len(sigs) == 0 {
return
}
sc := make(chan os.Signal, 1)
signal.Notify(sc, sigs...)
defer signal.Stop(sc)
logger.Info(context.Background(), "reaper catching signals",
slog.F("signals", sigs),
slog.F("child_pid", pid),
)
func catchSignals(logger slog.Logger, pid int, sc <-chan os.Signal) {
for {
s := <-sc
sig, ok := s.(syscall.Signal)
@@ -64,10 +52,17 @@ func ForkReap(opt ...Option) (int, error) {
o(opts)
}
go reap.ReapChildren(opts.PIDs, nil, opts.Done, nil)
// Use the reapLock to prevent the reaper's Wait4(-1) from
// stealing the direct child's exit status. The reaper takes
// a write lock; we hold a read lock during our own Wait4.
var reapLock sync.RWMutex
reapLock.RLock()
go reap.ReapChildren(opts.PIDs, nil, opts.Done, &reapLock)
pwd, err := os.Getwd()
if err != nil {
reapLock.RUnlock()
return 1, xerrors.Errorf("get wd: %w", err)
}
@@ -87,16 +82,26 @@ func ForkReap(opt ...Option) (int, error) {
//#nosec G204
pid, err := syscall.ForkExec(opts.ExecArgs[0], opts.ExecArgs, pattrs)
if err != nil {
reapLock.RUnlock()
return 1, xerrors.Errorf("fork exec: %w", err)
}
go catchSignals(opts.Logger, pid, opts.CatchSignals)
// Register the signal handler before spawning the goroutine
// so it is active by the time the child process starts. This
// avoids a race where a signal arrives before the goroutine
// has called signal.Notify.
if len(opts.CatchSignals) > 0 {
sc := make(chan os.Signal, 1)
signal.Notify(sc, opts.CatchSignals...)
go catchSignals(opts.Logger, pid, sc)
}
var wstatus syscall.WaitStatus
_, err = syscall.Wait4(pid, &wstatus, 0, nil)
for xerrors.Is(err, syscall.EINTR) {
_, err = syscall.Wait4(pid, &wstatus, 0, nil)
}
reapLock.RUnlock()
// Convert wait status to exit code using standard Unix conventions:
// - Normal exit: use the exit code
+4 -1
View File
@@ -599,8 +599,11 @@ func TestExecutorAutostopAIAgentActivity(t *testing.T) {
require.NoError(t, err)
// When: the autobuild executor ticks after the bumped deadline.
// Use time.Now() to account for elapsed time since the test's
// "now" variable, because the activity bump uses the database
// NOW() which advances with wall clock time.
go func() {
tickTime := now.Add(time.Hour).Add(time.Minute)
tickTime := time.Now().Add(time.Hour).Add(time.Minute)
coderdtest.UpdateProvisionerLastSeenAt(t, db, p.ID, tickTime)
tickCh <- tickTime
close(tickCh)
+78 -4
View File
@@ -74,7 +74,13 @@ export async function login(page: Page, options: LoginOptions = users.owner) {
// biome-ignore lint/suspicious/noExplicitAny: reset the current user
(ctx as any)[Symbol.for("currentUser")] = undefined;
await ctx.clearCookies();
await page.goto("/login");
await page.goto("/login", { waitUntil: "domcontentloaded" });
// Dynamic imports can fail with ERR_NETWORK_CHANGED during
// parallel test execution. Reload the page if the error
// boundary appears instead of the login form.
await reloadPageIfDynamicImportFailed(page, "form");
await page.getByLabel("Email").fill(options.email);
await page.getByLabel("Password").fill(options.password);
await page.getByRole("button", { name: "Sign In" }).click();
@@ -126,6 +132,11 @@ export const createWorkspace = async (
});
await expectUrl(page).toHavePathName(`/templates/${templatePath}/workspace`);
// Dynamic imports can fail with ERR_NETWORK_CHANGED during
// parallel test execution. Reload the page if the error
// boundary appears instead of the form.
await reloadPageIfDynamicImportFailed(page, "form");
const name = randomName();
await page.getByLabel("name").fill(name);
@@ -211,8 +222,18 @@ export const verifyParameters = async (
case "bool":
{
const parameterField = parameterLabel.locator("input");
const value = await parameterField.isChecked();
expect(value.toString()).toEqual(buildParameter.value);
// Dynamic parameters can hydrate after initial render
// and reset checkbox state. Use auto-retrying assertions
// instead of a one-shot isChecked() snapshot.
if (buildParameter.value === "true") {
await expect(parameterField).toBeChecked({
timeout: 15_000,
});
} else {
await expect(parameterField).not.toBeChecked({
timeout: 15_000,
});
}
}
break;
case "string":
@@ -815,6 +836,37 @@ export const randomName = (annotation?: string) => {
return annotation ? `${annotation}-${base}` : base;
};
/**
* Reload the page if a dynamic import failed (e.g. due to
* ERR_NETWORK_CHANGED). When React Router catches a failed chunk
* load it renders the GlobalErrorBoundary with the heading
* "Something went wrong". Detecting that and reloading is enough
* to recover.
*/
async function reloadPageIfDynamicImportFailed(
page: Page,
expectedSelector: string,
) {
const errorHeading = page.getByRole("heading", {
name: "Something went wrong",
});
// Race the expected content against the error boundary. Suppress
// the timeout rejection from whichever side loses.
const result = await Promise.race([
page
.waitForSelector(expectedSelector, { timeout: 10_000 })
.then(() => "ok" as const)
.catch(() => "timeout" as const),
errorHeading
.waitFor({ state: "visible", timeout: 10_000 })
.then(() => "error" as const)
.catch(() => "timeout" as const),
]);
if (result === "error") {
await page.reload({ waitUntil: "domcontentloaded" });
}
}
/**
* Awaiter is a helper that allows you to wait for a callback to be called. It
* is useful for waiting for events to occur.
@@ -1051,8 +1103,30 @@ const fillParameters = async (
switch (richParameter.type) {
case "bool":
{
const wantChecked = buildParameter.value === "true";
const parameterField = parameterLabel.locator("button");
await parameterField.click();
// Dynamic parameters can hydrate after initial render
// and reset the switch state. Check the current value
// and retry the click if needed.
for (let attempt = 0; attempt < 3; attempt++) {
const isChecked =
(await parameterField.getAttribute("aria-checked")) === "true";
if (isChecked !== wantChecked) {
await parameterField.click();
}
try {
await expect(parameterField).toHaveAttribute(
"aria-checked",
String(wantChecked),
{ timeout: 1000 },
);
break;
} catch (error) {
if (attempt === 2) {
throw error;
}
}
}
}
break;
case "string":