houserules/docs/adr/T4-scripted-modifiers-design.md

# T4 — Scripted Modifiers (Forward Design)

> **Status**: Forward design only. T4 is deferred. T3 ships data-only custom
> modifiers; this document records the design we'd reach for if and when T4
> becomes a priority.

T3 lets users compose custom modifiers from a fixed catalog of 15 primitives.
The natural next layer is letting users author primitives whose behaviour is
expressed in code (or a code-like DSL) rather than a shipped TypeScript
function. That's T4.

This document captures (a) why T4 is deferred, (b) the candidate sandbox
runtimes we evaluated, (c) the descriptor shape that preserves backwards
compatibility with T3, (d) the permission model we'd want, (e) the validation
strategy, (f) how T3 primitives migrate forward, and (g) open questions.

---

## Why T4 is deferred

The unifying concern is **security surface area**.

A T3 custom modifier can ONLY combine functions the engine already ships. A
malicious or buggy descriptor can no-op (unknown kinds skip silently) or
exhaust limits (caught by validator caps), but it can't mint new behaviour
the engine didn't already implement.

A T4 scripted modifier can express NEW behaviour written by an untrusted
author, run inside the same process the rest of the game uses. That demands
a sandbox: memory bounds, CPU bounds, deterministic execution (for
multiplayer replication), no host-API access (no `fetch`, no `Date.now()` in
the wrong place, no DOM), no infinite loops, no side channels into private
session state.

Building that sandbox correctly is a multi-week project and a permanent
maintenance burden. T3 delivers user-authored modifiers without it; T4
buys "Turing-complete user modifiers" at the cost of becoming a
sandbox-vendor team.

---

## Sandbox candidates evaluated

| Runtime | Verdict |
|---|---|
| **QuickJS (via wasm)** | Strong candidate. Mature ECMA-262 implementation, embeddable, used in production by Bun for `--smol` builds. Bun's `quickjs-emscripten` binding makes integration mechanical. CPU caps and memory caps are runtime-enforced. Determinism: would need to whitelist a subset of stdlib (no `Date.now`, no `Math.random` without a seeded shim) but well-trodden ground. **Likely T4 pick.** |
| **Duktape (via wasm)** | Smaller binary than QuickJS but lower spec compliance (ES5+). Less attractive for users writing modern JS. |
| **WebAssembly (Wasmer / Wasmtime in browser)** | Heaviest sandbox. Strongest isolation. Forces users to compile from a higher-level language (AssemblyScript, Rust). High authoring friction; better suited to a power-user tier than a casual editor. |
| **Custom mini-interpreter** | Same complexity as a vetted runtime, with less battle-testing. Rejected as not-invented-here. |
| **Web Workers** | Not a sandbox per se — same JS realm with structured cloning across the boundary. Doesn't bound CPU. Weaker isolation than QuickJS. |
| **vm2 / isolated-vm** | Server-only (Node), doesn't help our browser-runtime case. |

**Recommendation if/when T4 ships**: QuickJS via `quickjs-emscripten`, with a
whitelisted host-API surface and per-call CPU + memory caps.

---

## Descriptor shape extension

T3 reserved the discriminator field `type: "data"` on
`CustomModifierDescriptor` precisely so T4 could land alongside without
breaking changes:

```ts
type CustomModifierDescriptor =
  | DataModifierDescriptor      // T3 — what we ship today
  | ScriptedModifierDescriptor; // T4 — future

interface DataModifierDescriptor {
  readonly type: 'data';
  readonly id: CustomModifierId;
  readonly name: string;
  readonly description: string;
  readonly version: 1;
  readonly primitives: readonly EffectPrimitiveNode[];
  // ... other T3 fields
}

interface ScriptedModifierDescriptor {
  readonly type: 'scripted';
  readonly id: CustomModifierId;
  readonly name: string;
  readonly description: string;
  readonly version: 1;
  /** Source code in the chosen DSL (likely a JS subset). */
  readonly source: string;
  /** Compiled-and-validated bytecode hash; null = "needs compile". */
  readonly hash: string | null;
  /** Permissions the author requested; subject to user grant on import. */
  readonly permissions: readonly Permission[];
  // ... shared trunk: targetAttrs, uiForm, source, author, createdAt
}
```

The trunk fields stay identical so registry lookup, library persistence,
serialization, and the Modifier Profile picker UI all work uniformly across
both descriptor families. Only the apply-time dispatcher (and the editor's
right inspector panel) branches on `type`.

---

## Permission model sketch

Every scripted modifier declares the permissions it needs. The user grants
permissions explicitly on first use (mirroring browser permission prompts).

| Permission | Capabilities granted | Default |
|---|---|---|
| `read-self` | Read facts on the piece this descriptor is attached to | granted |
| `read-board` | Read facts on every other piece | prompt |
| `write-self` | Insert/retract facts on this piece | granted |
| `write-board` | Insert/retract facts on other pieces | prompt |
| `read-history` | Read the engine's move log | prompt |
| `emit-effect` | Push visual effects via `engine.emitEffect` | granted |
| `random` | Use the engine's seeded RNG | prompt |

Permissions are validated server-side on `custom-modifier.register` — a
descriptor with `write-board` arriving from an untrusted source is rejected
unless the room's host has explicitly opted into "allow scripted modifiers".

---

## Validation strategy

T3 validates structure (Zod) AND semantics (kind in registry, params satisfy
schemas). T4 needs a third layer: **static analysis of the script before
first execution**, to catch obvious abuse before we even spin up the sandbox.

Likely checks:
- **Source size cap** — reject scripts above N kB.
- **AST node count cap** — reject scripts above M AST nodes (parses big,
  evaluates fast).
- **Forbidden globals** — reject any reference to `eval`, `Function`,
  `WebAssembly`, `fetch`, `XMLHttpRequest`, `import`, `require`, `top`,
  `parent`, `globalThis`, etc. Whitelist instead of blacklist.
- **Loop bounds** — flag any loop without a statically-determinable bound;
  optionally inject a per-iteration CPU-budget check.
- **Recursion bounds** — flag any function calling itself without a
  statically-determinable termination.
- **Permission match** — every host-API call must be reachable only when the
  declared permission is granted.

The static analyser produces a verdict (`{ ok: true, hash } | { ok: false,
errors: [] }`) and is run on Save in the editor and again on
`custom-modifier.register` server-side.

The runtime sandbox enforces what static analysis can't:
- **CPU budget per apply call** — preempt and abort after N ms.
- **Memory budget** — hard cap on heap size.
- **Determinism** — seeded RNG, frozen `Date.now`, no setTimeout / setInterval
  reaching outside the call.

---

## How T3 primitives migrate forward

Every T3 primitive is structurally a function `(ctx, params) => void`. A T4
scripted modifier can wrap an equivalent function written in the script
runtime, so a power user can:

1. Start with a T3 descriptor (data composition).
2. "Eject" to T4 — the editor generates the equivalent script wrapping each
   primitive's behaviour, switches the descriptor's `type` to `"scripted"`,
   and gives the user a starting point for further customisation.
3. Continue in T4, tweaking the generated source.

The reverse is NOT generally possible (a hand-written T4 script can express
behaviours that no combination of T3 primitives matches), but the eject path
gives users a smooth on-ramp.

---

## Open questions

These are the design decisions we haven't made yet. They block T4 kickoff:

1. **DSL surface** — full ECMAScript subset or a smaller language? A
   restricted DSL is easier to validate but harder for users to learn.
2. **Multiplayer determinism** — every client must execute scripted modifiers
   identically. A fundamental choice between (a) "server is authoritative,
   clients receive deltas" (works today for built-ins) and (b) "clients run
   the script themselves, must converge bit-for-bit" (requires deterministic
   sandbox + identical inputs).
3. **Editor experience** — full text editor (Monaco / CodeMirror)? Visual
   block editor (Scratch / Blockly style)? Both?
4. **Sharing trust model** — a profile that references a scripted modifier
   travels with the descriptor's source. Recipients see the source plus the
   declared permissions before importing. Do we also show a static-analysis
   summary ("This script reads the board, writes 1 attribute, declares no
   network access")?
5. **Rate limiting** — how many scripted modifiers can register per room per
   minute? Per session?
6. **Script versioning** — when an author updates their scripted modifier,
   does the new version replace the old in every saved profile that
   referenced it (auto-update) or only on explicit user action (manual
   update)?
7. **Failure mode** — when a scripted modifier throws or times out at apply
   time, do we abort the move (strict) or skip the modifier and continue
   (lenient)? T3's data primitives are infallible; T4's scripts aren't.

Each question is a small ADR worth of debate. We don't need to answer them
to ship T3, but we should answer 1-3 before any T4 implementation work
begins.