extract¶

Reverse-engineer a spec tree for the target source area(s) below — or re-run to reconcile an already-extracted area, the same pass that closes the gaps code growth opens (an unclaimed governed file, a frontend node missing its loss signal, a thin scenario). Extraction is not a one-shot. Aim for a tree where every governed file is claimed and spex lint is clean, bodies at contract altitude and in the codebase's own primary language (a predominantly-Chinese repo → Chinese specs).

Find the spine the code already declares first — a barrel of exports, a README or design doc, the dependency direction between modules — and adopt it as the top-level shape, refined where the code reveals finer responsibilities. Only when the code declares no architecture is the top-level cut a judgment call worth raising with the human.

Then grow nodes under that spine:

Decompose by responsibility, not by file. A node is one job the code does. A fat file split across several jobs becomes several nodes that each claim it; one job spanning several files becomes one node claiming them all. Every governed file is claimed by at least one node; nest into subtrees where warranted.
Group wide layers; don't mirror the file tree — at every level, the root included. One-node-per-folder is a smell. If a node would have more than ~7 direct children you're under-grouping: add intermediate sub-domain nodes that cluster siblings serving one concern (model-config + selection + auth → a model domain), and recurse until every level reads as a handful of siblings, not a flat wall. Fold cross-cutting substrate (design system, i18n, platform glue) under one foundation node that claims the cluster's barrel/wiring files (so it isn't pure-prose); split a fat folder holding several distinct jobs. Group by responsibility, never to hit a number.
Stay at contract altitude. State each node's intent, invariants, and outward behavior — what it guarantees and why — not how the code does it.
Never fabricate intent. Code shows what it does, rarely why. Read any README/design docs for real intent; where you can still only see behavior, state the behavior and mark the intent as inferred rather than inventing a rationale.
Reserve pure-prose nodes (no code:) for a genuine cross-cutting contract no single file owns. Use sparingly.
Mind the scope boundary. A file that looks like a thin wrapper may be the foot of a feature defined outside the target area — flag it instead of mis-homing it, and prefer extracting the whole repo so cross-cutting features stay visible. If nothing reaches a file, say it's likely dead rather than dignifying it with a confident spec.

Give every frontend node a loss signal. A node that governs UI or visual code (.tsx/.jsx/.vue/ .svelte/.css, or the dashboard) is a blind spot until it carries a yatsu.md — so write one as you extract it: a real user-path scenario — a goal and the steps to reach it through the running app (never a bare render-check), covering a failure/empty/edge state — with a description of those steps, the expected zero-loss result, and ≥1 tags classifying it (a surface like frontend-e2e/backend-api/ cli, a device like desktop/mobile) drawn from the configured library (lint.scenarioTags); a tag outside the library fails scan, so use an existing one or extend the library. Frontend scenarios are measured by looking (YATU) — a screenshot filed with spex yatsu eval <node> --image <png> --pass. Backend nodes don't need one yet; run spex yatsu scan to list the frontend nodes still uncovered. On a re-run this is also where you sharpen — give an uncovered node its first scenario, tighten a thin one (a bare render-check, or a stale expected); spex guide yatsu has the schema.

Extract incrementally — don't plan the whole tree before writing. For a large area (hundreds of files), enumerating the whole partition up front or scripting a generator to emit it at once burns context before a node lands and loses everything to one interruption. Fix the top-level cut and commit it, then take ONE subtree at a time (write the leaf, list its files, lint, COMMIT) before the next — never more than one subtree uncommitted, so progress survives context limits.

Confirm spexcode.json's governedRoots points at the real source dirs first — lint reads silently empty otherwise. Commit one node per commit (spec: <id> — extract from <area>) and run spex lint after each: 0 errors, 0 coverage/altitude warnings.