Specpunk
public build / march 2026

repo-native review boundary

This change should touch 3 files. It touched 11.

Review breaks when intent breaks.

uncontrolled change contained review

one task / two states / one review boundary

expected scope
3 files
actual touched
11 files
out of scope
8 files
attached evidence
3 proofs
without control 11 files changed

Raw change stream

review confidence: low / boundary drift: visible

  • src/auth/login.ts in scope
  • src/auth/session.ts in scope
  • src/auth/policy.ts in scope
  • +8 files outside allowed scope collapsed
  • src/billing/subscription.ts out of scope
  • src/notifications/email.ts out of scope
  • src/ui/header.tsx out of scope
  • src/telemetry/events.ts out of scope
  • src/api/account.ts out of scope

risk markers

  • cross-boundary edits are mixed with valid edits
  • intent is not explicit enough to judge side effects
  • tests exist, but proof of behavioral boundary is missing
with specpunk 3 files contained

Review artifact

contained enough to reason about

intent

  • add a fixed session timeout rule
  • preserve refresh-token behavior
  • do not alter billing state or outbound email flow

scope

  • src/auth/login.ts
  • src/auth/session.ts
  • src/auth/policy.ts

evidence

  • 3 auth tests updated
  • behavior summary attached
  • review posture: inspect + approve
latest notebook delta
scope enforcement stayed first because it produces the cleanest early signal.
benchmark note
review accuracy matters more than confidence or time alone.
open question
how much incremental value comes from Claude-aware extraction over portable core?
next build
artifact drawer + contradiction checks + evidence bundle.

public lab notebook

A product surface that shows its thinking.

Not a blog. Not founder theater. Just the narrow layer where decisions, failures, and unresolved questions stay visible enough to keep the rest honest.

thesis delta / this week

portable core / held

Why scope enforcement still comes before extraction.

Extraction helps adoption. Scope produces the cleaner first constraint. If the product cannot contain change, the rest of the story becomes decorative.

benchmark learning

unresolved / adjudication pass

Review confidence rose faster than review accuracy.

More artifacts can make reviewers feel safer before they are actually more correct. The benchmark has to split those two signals apart.

open design question

active / prototype next

How literal should the control room feel?

Too literal and it becomes internal tooling. Too abstract and it slides back into page design. The right answer is probably one real artifact deeper than feels comfortable.

control room

Open the artifact pack.

Closer to opening a module than reading copy. The artifacts are the object. The page only arranges them.

artifact

intent.md

The smallest durable statement of what this change is supposed to preserve.

task: add session timeout
scope: auth only
must_preserve:
  - refresh token flow
  - existing session renewal behavior
must_not_touch:
  - billing
  - notifications

artifact

scope.yml

A hard statement of where the agent is allowed to act.

allowed:
  - src/auth/login.ts
  - src/auth/session.ts
  - src/auth/policy.ts
blocked:
  - src/billing/**
  - src/notifications/**
  - src/ui/**

artifact

glossary.md

Short, explicit terminology to prevent quiet drift in meaning.

session
  authenticated server-side state

token
  client credential, not equivalent to session

timeout
  forced re-auth after inactivity threshold

artifact

invariants.md

Behavior the change may not contradict even if the diff passes tests.

- billing state must remain unchanged
- refresh token path must survive timeout changes
- timeout errors must stay auth-local

artifact

evidence.md

The smallest reviewable bundle of proof attached to the change.

tests_run:
  - auth/session_timeout_spec
  - auth/refresh_flow_spec

behavior_delta:
  - timeout now enforced after inactivity
  - refresh flow unchanged

confidence:
  module-contained

artifact

review.md

A review posture, not just a list of files touched.

decision: inspect and approve
reason:
  - scope respected
  - glossary unchanged
  - evidence attached
remaining_risk:
  - timeout threshold may need product tuning

prompt surface

Use this on the last AI PR you did not trust.

A real prompt, not a lead form. It should stay useful even before there is any proper contact surface behind it.

prompt

open artifact pack

why this prompt exists

  • it starts from one concrete review failure instead of a vague product story
  • it keeps the page useful before sales copy, forms, or a polished intake flow exist
  • it points back to the actual artifact shape: scope, intent, evidence, review posture