T3 Wave 5 (T32). New forward-design document at docs/adr/T4-scripted-modifiers-design.md capturing where T4 would land if/when it becomes a priority: - Why T4 is deferred (security surface area) - Sandbox candidates evaluated (QuickJS recommended; Duktape, vanilla WASM, custom interpreter, Web Workers, vm2 / isolated-vm rejected with rationale) - Descriptor shape extension preserving T3 backwards compatibility via the type: 'data' | 'scripted' discriminator already reserved on CustomModifierDescriptor - Permission model sketch (read/write self/board, history, effects, random — granted/prompt defaults per permission) - Validation strategy (static analysis + runtime sandbox enforcement, whitelist over blacklist for forbidden globals, source/AST size caps, loop-bound checks) - T3 → T4 ejection path (a T3 descriptor can generate equivalent scripted source as a starting point) - 7 open questions blocking T4 kickoff (DSL surface, multiplayer determinism, editor experience, sharing trust, rate limiting, versioning, failure mode)
203 lines
9.1 KiB
Markdown
203 lines
9.1 KiB
Markdown
# T4 — Scripted Modifiers (Forward Design)
|
|
|
|
> **Status**: Forward design only. T4 is deferred. T3 ships data-only custom
|
|
> modifiers; this document records the design we'd reach for if and when T4
|
|
> becomes a priority.
|
|
|
|
T3 lets users compose custom modifiers from a fixed catalog of 15 primitives.
|
|
The natural next layer is letting users author primitives whose behaviour is
|
|
expressed in code (or a code-like DSL) rather than a shipped TypeScript
|
|
function. That's T4.
|
|
|
|
This document captures (a) why T4 is deferred, (b) the candidate sandbox
|
|
runtimes we evaluated, (c) the descriptor shape that preserves backwards
|
|
compatibility with T3, (d) the permission model we'd want, (e) the validation
|
|
strategy, (f) how T3 primitives migrate forward, and (g) open questions.
|
|
|
|
---
|
|
|
|
## Why T4 is deferred
|
|
|
|
The unifying concern is **security surface area**.
|
|
|
|
A T3 custom modifier can ONLY combine functions the engine already ships. A
|
|
malicious or buggy descriptor can no-op (unknown kinds skip silently) or
|
|
exhaust limits (caught by validator caps), but it can't mint new behaviour
|
|
the engine didn't already implement.
|
|
|
|
A T4 scripted modifier can express NEW behaviour written by an untrusted
|
|
author, run inside the same process the rest of the game uses. That demands
|
|
a sandbox: memory bounds, CPU bounds, deterministic execution (for
|
|
multiplayer replication), no host-API access (no `fetch`, no `Date.now()` in
|
|
the wrong place, no DOM), no infinite loops, no side channels into private
|
|
session state.
|
|
|
|
Building that sandbox correctly is a multi-week project and a permanent
|
|
maintenance burden. T3 delivers user-authored modifiers without it; T4
|
|
buys "Turing-complete user modifiers" at the cost of becoming a
|
|
sandbox-vendor team.
|
|
|
|
---
|
|
|
|
## Sandbox candidates evaluated
|
|
|
|
| Runtime | Verdict |
|
|
|---|---|
|
|
| **QuickJS (via wasm)** | Strong candidate. Mature ECMA-262 implementation, embeddable, used in production by Bun for `--smol` builds. Bun's `quickjs-emscripten` binding makes integration mechanical. CPU caps and memory caps are runtime-enforced. Determinism: would need to whitelist a subset of stdlib (no `Date.now`, no `Math.random` without a seeded shim) but well-trodden ground. **Likely T4 pick.** |
|
|
| **Duktape (via wasm)** | Smaller binary than QuickJS but lower spec compliance (ES5+). Less attractive for users writing modern JS. |
|
|
| **WebAssembly (Wasmer / Wasmtime in browser)** | Heaviest sandbox. Strongest isolation. Forces users to compile from a higher-level language (AssemblyScript, Rust). High authoring friction; better suited to a power-user tier than a casual editor. |
|
|
| **Custom mini-interpreter** | Same complexity as a vetted runtime, with less battle-testing. Rejected as not-invented-here. |
|
|
| **Web Workers** | Not a sandbox per se — same JS realm with structured cloning across the boundary. Doesn't bound CPU. Weaker isolation than QuickJS. |
|
|
| **vm2 / isolated-vm** | Server-only (Node), doesn't help our browser-runtime case. |
|
|
|
|
**Recommendation if/when T4 ships**: QuickJS via `quickjs-emscripten`, with a
|
|
whitelisted host-API surface and per-call CPU + memory caps.
|
|
|
|
---
|
|
|
|
## Descriptor shape extension
|
|
|
|
T3 reserved the discriminator field `type: "data"` on
|
|
`CustomModifierDescriptor` precisely so T4 could land alongside without
|
|
breaking changes:
|
|
|
|
```ts
|
|
type CustomModifierDescriptor =
|
|
| DataModifierDescriptor // T3 — what we ship today
|
|
| ScriptedModifierDescriptor; // T4 — future
|
|
|
|
interface DataModifierDescriptor {
|
|
readonly type: 'data';
|
|
readonly id: CustomModifierId;
|
|
readonly name: string;
|
|
readonly description: string;
|
|
readonly version: 1;
|
|
readonly primitives: readonly EffectPrimitiveNode[];
|
|
// ... other T3 fields
|
|
}
|
|
|
|
interface ScriptedModifierDescriptor {
|
|
readonly type: 'scripted';
|
|
readonly id: CustomModifierId;
|
|
readonly name: string;
|
|
readonly description: string;
|
|
readonly version: 1;
|
|
/** Source code in the chosen DSL (likely a JS subset). */
|
|
readonly source: string;
|
|
/** Compiled-and-validated bytecode hash; null = "needs compile". */
|
|
readonly hash: string | null;
|
|
/** Permissions the author requested; subject to user grant on import. */
|
|
readonly permissions: readonly Permission[];
|
|
// ... shared trunk: targetAttrs, uiForm, source, author, createdAt
|
|
}
|
|
```
|
|
|
|
The trunk fields stay identical so registry lookup, library persistence,
|
|
serialization, and the Modifier Profile picker UI all work uniformly across
|
|
both descriptor families. Only the apply-time dispatcher (and the editor's
|
|
right inspector panel) branches on `type`.
|
|
|
|
---
|
|
|
|
## Permission model sketch
|
|
|
|
Every scripted modifier declares the permissions it needs. The user grants
|
|
permissions explicitly on first use (mirroring browser permission prompts).
|
|
|
|
| Permission | Capabilities granted | Default |
|
|
|---|---|---|
|
|
| `read-self` | Read facts on the piece this descriptor is attached to | granted |
|
|
| `read-board` | Read facts on every other piece | prompt |
|
|
| `write-self` | Insert/retract facts on this piece | granted |
|
|
| `write-board` | Insert/retract facts on other pieces | prompt |
|
|
| `read-history` | Read the engine's move log | prompt |
|
|
| `emit-effect` | Push visual effects via `engine.emitEffect` | granted |
|
|
| `random` | Use the engine's seeded RNG | prompt |
|
|
|
|
Permissions are validated server-side on `custom-modifier.register` — a
|
|
descriptor with `write-board` arriving from an untrusted source is rejected
|
|
unless the room's host has explicitly opted into "allow scripted modifiers".
|
|
|
|
---
|
|
|
|
## Validation strategy
|
|
|
|
T3 validates structure (Zod) AND semantics (kind in registry, params satisfy
|
|
schemas). T4 needs a third layer: **static analysis of the script before
|
|
first execution**, to catch obvious abuse before we even spin up the sandbox.
|
|
|
|
Likely checks:
|
|
- **Source size cap** — reject scripts above N kB.
|
|
- **AST node count cap** — reject scripts above M AST nodes (parses big,
|
|
evaluates fast).
|
|
- **Forbidden globals** — reject any reference to `eval`, `Function`,
|
|
`WebAssembly`, `fetch`, `XMLHttpRequest`, `import`, `require`, `top`,
|
|
`parent`, `globalThis`, etc. Whitelist instead of blacklist.
|
|
- **Loop bounds** — flag any loop without a statically-determinable bound;
|
|
optionally inject a per-iteration CPU-budget check.
|
|
- **Recursion bounds** — flag any function calling itself without a
|
|
statically-determinable termination.
|
|
- **Permission match** — every host-API call must be reachable only when the
|
|
declared permission is granted.
|
|
|
|
The static analyser produces a verdict (`{ ok: true, hash } | { ok: false,
|
|
errors: [] }`) and is run on Save in the editor and again on
|
|
`custom-modifier.register` server-side.
|
|
|
|
The runtime sandbox enforces what static analysis can't:
|
|
- **CPU budget per apply call** — preempt and abort after N ms.
|
|
- **Memory budget** — hard cap on heap size.
|
|
- **Determinism** — seeded RNG, frozen `Date.now`, no setTimeout / setInterval
|
|
reaching outside the call.
|
|
|
|
---
|
|
|
|
## How T3 primitives migrate forward
|
|
|
|
Every T3 primitive is structurally a function `(ctx, params) => void`. A T4
|
|
scripted modifier can wrap an equivalent function written in the script
|
|
runtime, so a power user can:
|
|
|
|
1. Start with a T3 descriptor (data composition).
|
|
2. "Eject" to T4 — the editor generates the equivalent script wrapping each
|
|
primitive's behaviour, switches the descriptor's `type` to `"scripted"`,
|
|
and gives the user a starting point for further customisation.
|
|
3. Continue in T4, tweaking the generated source.
|
|
|
|
The reverse is NOT generally possible (a hand-written T4 script can express
|
|
behaviours that no combination of T3 primitives matches), but the eject path
|
|
gives users a smooth on-ramp.
|
|
|
|
---
|
|
|
|
## Open questions
|
|
|
|
These are the design decisions we haven't made yet. They block T4 kickoff:
|
|
|
|
1. **DSL surface** — full ECMAScript subset or a smaller language? A
|
|
restricted DSL is easier to validate but harder for users to learn.
|
|
2. **Multiplayer determinism** — every client must execute scripted modifiers
|
|
identically. A fundamental choice between (a) "server is authoritative,
|
|
clients receive deltas" (works today for built-ins) and (b) "clients run
|
|
the script themselves, must converge bit-for-bit" (requires deterministic
|
|
sandbox + identical inputs).
|
|
3. **Editor experience** — full text editor (Monaco / CodeMirror)? Visual
|
|
block editor (Scratch / Blockly style)? Both?
|
|
4. **Sharing trust model** — a profile that references a scripted modifier
|
|
travels with the descriptor's source. Recipients see the source plus the
|
|
declared permissions before importing. Do we also show a static-analysis
|
|
summary ("This script reads the board, writes 1 attribute, declares no
|
|
network access")?
|
|
5. **Rate limiting** — how many scripted modifiers can register per room per
|
|
minute? Per session?
|
|
6. **Script versioning** — when an author updates their scripted modifier,
|
|
does the new version replace the old in every saved profile that
|
|
referenced it (auto-update) or only on explicit user action (manual
|
|
update)?
|
|
7. **Failure mode** — when a scripted modifier throws or times out at apply
|
|
time, do we abort the move (strict) or skip the modifier and continue
|
|
(lenient)? T3's data primitives are infallible; T4's scripts aren't.
|
|
|
|
Each question is a small ADR worth of debate. We don't need to answer them
|
|
to ship T3, but we should answer 1-3 before any T4 implementation work
|
|
begins.
|