houserules/docs/adr/T4-scripted-modifiers-design.md
Joey Yakimowich-Payne 52752ecf33
docs(adr): T4 scripted modifiers forward-design
T3 Wave 5 (T32). New forward-design document at
docs/adr/T4-scripted-modifiers-design.md capturing where T4 would land
if/when it becomes a priority:

- Why T4 is deferred (security surface area)
- Sandbox candidates evaluated (QuickJS recommended; Duktape, vanilla
  WASM, custom interpreter, Web Workers, vm2 / isolated-vm rejected
  with rationale)
- Descriptor shape extension preserving T3 backwards compatibility
  via the type: 'data' | 'scripted' discriminator already reserved
  on CustomModifierDescriptor
- Permission model sketch (read/write self/board, history, effects,
  random — granted/prompt defaults per permission)
- Validation strategy (static analysis + runtime sandbox enforcement,
  whitelist over blacklist for forbidden globals, source/AST size
  caps, loop-bound checks)
- T3 → T4 ejection path (a T3 descriptor can generate equivalent
  scripted source as a starting point)
- 7 open questions blocking T4 kickoff (DSL surface, multiplayer
  determinism, editor experience, sharing trust, rate limiting,
  versioning, failure mode)
2026-04-19 21:09:52 -06:00

9.1 KiB

T4 — Scripted Modifiers (Forward Design)

Status: Forward design only. T4 is deferred. T3 ships data-only custom modifiers; this document records the design we'd reach for if and when T4 becomes a priority.

T3 lets users compose custom modifiers from a fixed catalog of 15 primitives. The natural next layer is letting users author primitives whose behaviour is expressed in code (or a code-like DSL) rather than a shipped TypeScript function. That's T4.

This document captures (a) why T4 is deferred, (b) the candidate sandbox runtimes we evaluated, (c) the descriptor shape that preserves backwards compatibility with T3, (d) the permission model we'd want, (e) the validation strategy, (f) how T3 primitives migrate forward, and (g) open questions.


Why T4 is deferred

The unifying concern is security surface area.

A T3 custom modifier can ONLY combine functions the engine already ships. A malicious or buggy descriptor can no-op (unknown kinds skip silently) or exhaust limits (caught by validator caps), but it can't mint new behaviour the engine didn't already implement.

A T4 scripted modifier can express NEW behaviour written by an untrusted author, run inside the same process the rest of the game uses. That demands a sandbox: memory bounds, CPU bounds, deterministic execution (for multiplayer replication), no host-API access (no fetch, no Date.now() in the wrong place, no DOM), no infinite loops, no side channels into private session state.

Building that sandbox correctly is a multi-week project and a permanent maintenance burden. T3 delivers user-authored modifiers without it; T4 buys "Turing-complete user modifiers" at the cost of becoming a sandbox-vendor team.


Sandbox candidates evaluated

Runtime Verdict
QuickJS (via wasm) Strong candidate. Mature ECMA-262 implementation, embeddable, used in production by Bun for --smol builds. Bun's quickjs-emscripten binding makes integration mechanical. CPU caps and memory caps are runtime-enforced. Determinism: would need to whitelist a subset of stdlib (no Date.now, no Math.random without a seeded shim) but well-trodden ground. Likely T4 pick.
Duktape (via wasm) Smaller binary than QuickJS but lower spec compliance (ES5+). Less attractive for users writing modern JS.
WebAssembly (Wasmer / Wasmtime in browser) Heaviest sandbox. Strongest isolation. Forces users to compile from a higher-level language (AssemblyScript, Rust). High authoring friction; better suited to a power-user tier than a casual editor.
Custom mini-interpreter Same complexity as a vetted runtime, with less battle-testing. Rejected as not-invented-here.
Web Workers Not a sandbox per se — same JS realm with structured cloning across the boundary. Doesn't bound CPU. Weaker isolation than QuickJS.
vm2 / isolated-vm Server-only (Node), doesn't help our browser-runtime case.

Recommendation if/when T4 ships: QuickJS via quickjs-emscripten, with a whitelisted host-API surface and per-call CPU + memory caps.


Descriptor shape extension

T3 reserved the discriminator field type: "data" on CustomModifierDescriptor precisely so T4 could land alongside without breaking changes:

type CustomModifierDescriptor =
  | DataModifierDescriptor      // T3 — what we ship today
  | ScriptedModifierDescriptor; // T4 — future

interface DataModifierDescriptor {
  readonly type: 'data';
  readonly id: CustomModifierId;
  readonly name: string;
  readonly description: string;
  readonly version: 1;
  readonly primitives: readonly EffectPrimitiveNode[];
  // ... other T3 fields
}

interface ScriptedModifierDescriptor {
  readonly type: 'scripted';
  readonly id: CustomModifierId;
  readonly name: string;
  readonly description: string;
  readonly version: 1;
  /** Source code in the chosen DSL (likely a JS subset). */
  readonly source: string;
  /** Compiled-and-validated bytecode hash; null = "needs compile". */
  readonly hash: string | null;
  /** Permissions the author requested; subject to user grant on import. */
  readonly permissions: readonly Permission[];
  // ... shared trunk: targetAttrs, uiForm, source, author, createdAt
}

The trunk fields stay identical so registry lookup, library persistence, serialization, and the Modifier Profile picker UI all work uniformly across both descriptor families. Only the apply-time dispatcher (and the editor's right inspector panel) branches on type.


Permission model sketch

Every scripted modifier declares the permissions it needs. The user grants permissions explicitly on first use (mirroring browser permission prompts).

Permission Capabilities granted Default
read-self Read facts on the piece this descriptor is attached to granted
read-board Read facts on every other piece prompt
write-self Insert/retract facts on this piece granted
write-board Insert/retract facts on other pieces prompt
read-history Read the engine's move log prompt
emit-effect Push visual effects via engine.emitEffect granted
random Use the engine's seeded RNG prompt

Permissions are validated server-side on custom-modifier.register — a descriptor with write-board arriving from an untrusted source is rejected unless the room's host has explicitly opted into "allow scripted modifiers".


Validation strategy

T3 validates structure (Zod) AND semantics (kind in registry, params satisfy schemas). T4 needs a third layer: static analysis of the script before first execution, to catch obvious abuse before we even spin up the sandbox.

Likely checks:

  • Source size cap — reject scripts above N kB.
  • AST node count cap — reject scripts above M AST nodes (parses big, evaluates fast).
  • Forbidden globals — reject any reference to eval, Function, WebAssembly, fetch, XMLHttpRequest, import, require, top, parent, globalThis, etc. Whitelist instead of blacklist.
  • Loop bounds — flag any loop without a statically-determinable bound; optionally inject a per-iteration CPU-budget check.
  • Recursion bounds — flag any function calling itself without a statically-determinable termination.
  • Permission match — every host-API call must be reachable only when the declared permission is granted.

The static analyser produces a verdict ({ ok: true, hash } | { ok: false, errors: [] }) and is run on Save in the editor and again on custom-modifier.register server-side.

The runtime sandbox enforces what static analysis can't:

  • CPU budget per apply call — preempt and abort after N ms.
  • Memory budget — hard cap on heap size.
  • Determinism — seeded RNG, frozen Date.now, no setTimeout / setInterval reaching outside the call.

How T3 primitives migrate forward

Every T3 primitive is structurally a function (ctx, params) => void. A T4 scripted modifier can wrap an equivalent function written in the script runtime, so a power user can:

  1. Start with a T3 descriptor (data composition).
  2. "Eject" to T4 — the editor generates the equivalent script wrapping each primitive's behaviour, switches the descriptor's type to "scripted", and gives the user a starting point for further customisation.
  3. Continue in T4, tweaking the generated source.

The reverse is NOT generally possible (a hand-written T4 script can express behaviours that no combination of T3 primitives matches), but the eject path gives users a smooth on-ramp.


Open questions

These are the design decisions we haven't made yet. They block T4 kickoff:

  1. DSL surface — full ECMAScript subset or a smaller language? A restricted DSL is easier to validate but harder for users to learn.
  2. Multiplayer determinism — every client must execute scripted modifiers identically. A fundamental choice between (a) "server is authoritative, clients receive deltas" (works today for built-ins) and (b) "clients run the script themselves, must converge bit-for-bit" (requires deterministic sandbox + identical inputs).
  3. Editor experience — full text editor (Monaco / CodeMirror)? Visual block editor (Scratch / Blockly style)? Both?
  4. Sharing trust model — a profile that references a scripted modifier travels with the descriptor's source. Recipients see the source plus the declared permissions before importing. Do we also show a static-analysis summary ("This script reads the board, writes 1 attribute, declares no network access")?
  5. Rate limiting — how many scripted modifiers can register per room per minute? Per session?
  6. Script versioning — when an author updates their scripted modifier, does the new version replace the old in every saved profile that referenced it (auto-update) or only on explicit user action (manual update)?
  7. Failure mode — when a scripted modifier throws or times out at apply time, do we abort the move (strict) or skip the modifier and continue (lenient)? T3's data primitives are infallible; T4's scripts aren't.

Each question is a small ADR worth of debate. We don't need to answer them to ship T3, but we should answer 1-3 before any T4 implementation work begins.