跳转至

launch

Bring a worker up — adapter launch, bounded prompt delivery, concurrency cap.

raw source

Launching a worker must be whole and bounded: the launch prompt arrives complete (never truncated by the transport), the spec-discipline contract reaches the agent through materialized auto-discovered files (so a dispatched agent loads its own CLAUDE.md + memory normally — the same launch a user takes, never a hidden or moved-away CLAUDE.md), and no launch ever crashes the box — past the concurrency cap a launch waits its turn instead of running. And launching has a single owner: the running backend process, never whichever shell happened to type spex new — because the launch env (and the cap) live in the backend, not in the caller.

expanded spec

newSession mints the governed SpexCode session <uuid>, adds the node/<slug> worktree (off the base branch), then writes the session's governed:true record session.json (+ best-effort the prompt artifact, and at launch the launch.sh script) into the GLOBAL per-session store (runtime) — NOT the worktree, which stays pristine — materializes the spec-discipline contract into the worktree's own CLAUDE.md/AGENTS.md (harness-delivery), and queues the worktree for launch on a private tmux -L socket (spex new "<prompt>" [--node X]). The selected harness-adapter owns the actual agent command. Claude launches with --session-id <uuid> — the SAME id the record is keyed by, the tmux window name, the rendezvous socket, and the commit attribution, so the conversation --resumes after death, the board maps it to its worktree, and a spec node links to it. Codex launches a visible TUI attached to the project's shared codex app-server --listen unix://<runtimeRoot>/codex-app-server.sock; its Codex thread id is captured later into harness_session_id because Codex does not let the launcher pin a new thread id. Workers run through the reclaude wrapper (SPEXCODE_CLAUDE_CMD), which runs claude as a child rather than exec'ing it, so the pane's foreground command is the wrapper/shell — not a liveness signal (state reads the socket instead). The spawned command alone carries CLAUDE_BG_BACKEND=daemon and a CLAUDE_BG_RENDEZVOUS_SOCK path derived from the session id as an env prefix (never global, never a plugin), so dispatch addresses only our sockets. Codex's app-server launch is project-idempotent: simultaneous spexcode serve processes in the same project share the runtime socket and take a per-project launch lock before starting the server, so they do not fan out one app-server per session or cross into another project's socket.

Materialized delivery, not injection: the spec-discipline contract is NOT pushed on the command line. Before the agent starts, the worktree is materialized (harness-delivery), rendering the surface: system bodies (name order — the core node + rules like voice-before-ask alongside it) into the <spexcode> managed block of the worktree's CLAUDE.md/AGENTS.md, plus the dispatch shims. The agent then launches plainly and auto-discovers them — the SAME path a user-self-launched agent takes — so editing any always-on contract is a spec edit, not a code change. There is no --append-system-prompt and no --settings. CLAUDE.md is no longer hidden (the old rename-to-CLAUDE.spexhidden.md isolation is gone): hiding it also suppressed the agent's own MEMORY load, so with the contract delivered by discovery the agent loads its CLAUDE.md + memory normally. Only the launch line itself (rendezvous env + harness command + the human prompt + spec pointer) is written to the launch script file in the global store, so a long prompt never hits the ~2KB tmux send-keys limit. Every path that file and its hooks reference resolves from the CLI package's own on-disk location, never a hardcoded <repoRoot>/spec-cli, so relocating it can't break launch.

The backend is the single launch owner. spex new / spex session new POST to the running backend, so the launch always runs where the launch env and cap live. The caller can be another agent whose env was stripped of SPEXCODE_CLAUDE_CMD, so an in-process launch there spawns bare-claude workers that 401 at boot. The CLI falls back to in-process only when no backend answers (warning that it then carries the caller's env, no cap).

The launch is project-bound; the route is not — so the launch guards its project. A launch builds the worktree under the backend's OWN mainRoot, but the route to a backend is a bare URL (SPEXCODE_API_URL, else the local-port default) carrying no project identity. So a stale inherited SPEXCODE_API_URL pointing at another repo's backend would silently land the session in the WRONG repo — the exact decoupling the remote-client read/control verbs exploit on purpose (point anywhere to monitor any machine) becomes a correctness hole the moment the verb mutates. The fix lives at the client launch seam: before POSTing, spex new compares the caller's cwd repo root to the backend's served root (GET /api/layout .main) and refuses, loud, on a provable same-host mismatch — cwd is in <A> but the backend serves <B>, with the repair (cd <A> && spex serve, or point the env at it). It fires only on a positive mismatch: no local repo, an unreachable backend, or a served root that isn't a resolvable local path (a genuinely remote backend) all fall through to allow, so legit cross-machine dispatch and the viewer-points-anywhere model stay intact. This is the same FAIL-LOUD-never-silent-fallback rule remote-client states, applied to launch: a mutating verb must never silently act on the wrong project.

Resilient bring-up. A launcher can lose a transient startup race — the reclaude daemon flapping on a version skew — and exit within seconds, before the rendezvous socket ever appears: a fast exit that is the race, not the work. The launch script therefore retries a bounded number of times on a fast exit and stops the instant the agent has run past the boot window, so a normal long-lived session is never relaunched on its eventual exit. The boot grace and the ready-wait both span this retry window, so liveness reads starting and the concurrency slot stays held until the worker is genuinely online — or truly dead. This only closes the startup race; it adds no fallback and never masks a dead agent.

Concurrency cap (bounded working set)

At most N agents run autonomously progressing at once — N configured per project in spexcode.json (sessions.maxActive, default 6), not hardcoded, read live so an edit applies on the next drain (the SPEXCODE_MAX_ACTIVE env is a fallback). A slot is compute pressure: a session holds one only while live AND working or parked (self-resuming). Everything waiting on the human frees its slotidle, asking, and the proposals (review/done/close-pending) — like offline/closed, since they burn no compute and must never block a launch. A launch beyond the cap lands as a durable queued worktree (fully prepared, claude not started, its prompt parked as the launch artifact in the global store). A drainer starts queued sessions oldest-first the instant a slot frees — on every slot-freeing server action and on a periodic tick (catching frees the server never sees: a hook subprocess, a crash). A restart re-drains survivors. reopen relaunches a dead session and waits for its rendezvous socket before returning, so a follow-on dispatch hits a live socket. closeSession is the only removal — human-only, deleting the worktree, sweeping the rendezvous socket (in the tmpdir), and removing the session's global record dir.