chore(root): scaffold monorepo — Phase 0 complete

2026-04-16 13:32:21 -06:00

18 KiB

Raw Blame History

`@paratype/rete` — Engine Semantics Specification

This document is the canonical, normative specification of @paratype/rete, a Doorenbos-style Rete II rules engine implemented in TypeScript. All implementation, tests, and downstream consumers MUST conform to the semantics defined here. Where the specification disagrees with the implementation, the specification wins and the implementation is a bug.

This is a Phase 0 spec lock. Subsequent phases (engine implementation, chess domain, multiplayer server) depend on the contracts established here.

Fact Model

Working memory is a strict EAV (Entity-Attribute-Value) store. Every fact is a triple of the form (id: EntityId, attr: AttrKey, value: AttrValue<S, K>), where:

id is a branded numeric entity identifier (see ID Authority).
attr is a string key drawn from a schema-declared set of attribute names.
value is the schema-declared TypeScript type for that attribute.

Each (id, attr) pair stores exactly one value. Inserting a fact for an (id, attr) pair that already has a value is an update: the previous value is replaced, and the engine treats this as a retract-then-insert for the purposes of match invalidation and re-firing (see Match Refraction).

Facts are typed: each attr key maps to a specific TypeScript type declared in a session-level schema S. The schema is the single source of truth for attribute typing and is consumed by both the public API surface (for type inference at insert/query call sites) and by JSON rule validation (see JSON Rule Schema).

Internally, working memory is stored as Map<EntityId, Map<AttrKey, unknown>>. The outer map is keyed by entity id, the inner map is keyed by attribute name, and the value is the schema-typed payload (erased to unknown at the storage layer and recovered via the schema at API boundaries). The canonical fact shape exposed by the public API is { id: EntityId, attr: AttrKey, value: T }, where T is the schema-declared type for that attr.

This shape is the only fact shape the engine recognises. Composite shapes (objects, tuples) are not facts; they must be decomposed into multiple EAV triples sharing the same id before insertion.

ID Authority

EntityId is a branded numeric type:

type EntityId = number & { readonly __brand: 'EntityId' }

The brand exists at the type level only; at runtime an EntityId is a plain number. The brand prevents accidental interchange between EntityId and ordinary number values at compile time.

In a standalone session (no network), IDs are minted by the Session itself via an auto-incrementing counter exposed as session.nextId(): EntityId. The counter starts at 1 and increments by 1 on each call. Each call returns a unique, never-reused id for the lifetime of the session.

In the multiplayer server context, ID minting is reserved exclusively to the server. Clients MUST NOT call session.nextId() directly. Instead, clients send intents that reference positions, squares, or other domain coordinates, and the server translates these intents into fact mutations using server-minted ids. The client-side session receives facts with server-assigned ids and references them opaquely thereafter.

This split guarantees network-deterministic replay: every replay of the same event log on every peer produces the same id assignments and therefore the same working-memory contents and the same rule firing order.

Derived facts produced by the engine itself (see Truth Maintenance) use a separate negative-id space and are minted by the engine, not by the user or the server.

Conflict Resolution

When multiple rule activations are pending simultaneously inside a single fireRules() call, they are ordered deterministically before any RHS executes. The ordering is a pure function over the activation set and is identical on every machine and every run.

The ordering keys, in priority order:

Salience descending. Each rule has an integer salience (default 0). Activations whose rule has higher salience fire first.
Specificity descending. Among activations of equal salience, those whose rule has more conditions (longer what / conditions array) fire first. Specificity is the cardinality of the rule's condition list at registration time.
Insertion order ascending. Among activations of equal salience and specificity, those whose rule was added to the session earlier (lower addedAt index assigned at registration time) fire first.

There is no randomness. There is no lexicographic ordering on rule names, fact ids, or any other identifier. The only tie-breaker beyond the three keys above is the order in which rules were added to the session, which is itself deterministic given a deterministic registration order.

The conflict resolution function is invoked once per fireRules() iteration: pending activations are collected, sorted by (−salience, −specificity, +addedAt), and fired in that order. New activations produced by an RHS are added to the next iteration's pending set; they do not interleave with the current iteration's already-sorted firing order.

Tests MUST verify this exact ordering by constructing scenarios with deliberate ties at each level.

Match Refraction

Each unique match fires its rule's RHS exactly once. A "unique match" is identified by the tuple of fact identities — not values — bound to the rule's variables at the time of activation. Concretely, a match is keyed by the ordered tuple of EntityId values bound to the rule's pattern variables, in the order those variables appear in the rule's condition list.

A match re-fires only when one of the following occurs:

A fact whose value is bound to one of the rule's variables changes. Updating an (id, attr) pair is treated as a retract followed by an insert with the same (id, attr); this invalidates the previous match (removing it from the fired-set) and creates a new match (which is then eligible to fire).
A previously absent matching fact is inserted, producing a new match for which no entry exists in the fired-set.

This is CLIPS-style refraction. The engine maintains a Set<MatchKey>, where MatchKey is a stable serialisation of the bound entity ids in the canonical variable order for that rule. Before invoking an RHS, the engine checks whether the candidate match's key is already in the fired-set; if so, the activation is suppressed.

Retracting a fact removes every match that depended on that fact from the fired-set, so that if the same pattern later becomes true again the rule re-fires. This is what makes truth maintenance and re-derivation correct (see Truth Maintenance).

The fired-set is per-rule. Two different rules matching the same fact tuple have independent fired-set entries.

Iteration Order

All iteration over working-memory facts in hot paths — alpha dispatch, beta join, query result assembly — uses sorted arrays with documented, deterministic sort keys. Iteration over a raw Set<object>, Map<object, …>, or any unordered structure is forbidden in any code path that affects rule firing order or query result order. Insertion-order iteration of Set and Map is also forbidden in these paths because it makes firing order depend on insertion history rather than on fact identity, defeating replay determinism.

The canonical sort keys, in priority order:

id ascending (numeric comparison on the underlying number).
attr ascending (string lexicographic, by JavaScript default string comparison).

Any collection of facts returned by queryAll() or by the session-level introspection method session.allFacts() is sorted by [id, attr] before being returned to the caller. Internal beta-memory tokens — partial matches consisting of multiple bound facts — are sorted by their constituent fact ids in the variable order defined by the rule's condition list, so that any deterministic enumeration of tokens produces the same sequence on every run.

Code review and the no-impure-rhs lint family (extended with iteration-order checks) MUST flag any use of Set or Map iteration in firing-order-sensitive paths.

Truth Maintenance

Derived facts — facts produced as the conclusion of a thenFinally-style production rather than asserted by the user — are logically dependent on the matches that produced them. The engine maintains this dependency explicitly: each derived fact stores the Set<MatchKey> of matches that currently support it.

When any supporting match is removed from the fired-set (because a source fact was retracted, or because a condition no longer holds after an update), the derived fact's support set shrinks. When the support set becomes empty, the engine automatically retracts the derived fact. This retraction propagates: if other rules matched against the now-retracted derived fact, their matches are also invalidated, and any further derived facts they supported are likewise re-evaluated.

Derived facts use negative EntityId values, minted by the engine as -(counter) from a counter independent of the user-facing positive-id counter. This separation guarantees that derived ids cannot collide with user-asserted ids and makes derived facts trivially identifiable in logs and diagnostics.

Derived facts are excluded from the serialised event log. Replay re-derives them from the user-asserted facts and the rule set, which preserves both correctness (no stale derivations) and log compactness.

A thenFinally production whose match becomes true again after a previous derived fact was retracted will re-derive the fact (with a new negative id), because match refraction's invalidation rule applies to the derived production's match in the same way as for any other rule.

Cycle Detection

session.fireRules(opts?: { recursionLimit?: number }) accepts a configurable recursion limit. The default limit is 64.

The engine tracks "depth" as the number of times fireRules has recursively triggered itself. Recursion occurs in two situations:

In autoFire: true mode, when a rule's RHS calls session.insert() or session.retract(), the engine immediately re-enters fireRules to propagate the resulting activations.
In any mode, when a rule's RHS explicitly calls session.fireRules().

When depth exceeds recursionLimit, the engine throws RecursionLimitExceededError. The error contains:

message: a human-readable description naming the limit and the depth reached.
depth: the integer depth at which the limit was breached.
activationTrace: an array of the last N rule names that fired, in order, where N is at most 10. This trace is the most recent suffix of the firing history and is intended for diagnosing the cycle.

Setting recursionLimit: 0 disables the limit entirely. This is a deliberate escape hatch for advanced users who need unbounded fixpoint computations; it is dangerous because infinite loops will hang the host thread, and its use is a smell that warrants review.

In autoFire: false mode, the depth counter resets to zero at the start of each explicit fireRules() call, so successive top-level fireRules() invocations do not accumulate depth across calls. Within a single fireRules() call, depth accumulates across all recursive re-entries until the call returns.

RHS Purity Contract

A Rule Right-Hand Side (RHS) handler — the function registered via HandlerRegistry.register(name, fn) and referenced by name from the JSON rule schema — MUST be pure with respect to the engine's notion of determinism. Specifically, an RHS MUST NOT call any of the following:

Date.now(), new Date(), performance.now(), or any other source of wall-clock or monotonic time.
Math.random(), or any other source of non-deterministic randomness.
setTimeout, setInterval, clearTimeout, clearInterval, or any other timer API.
fetch, XMLHttpRequest, WebSocket, or any other network I/O.
console.log, console.warn, console.error, console.info, console.debug, or any other console API.
Any DOM API, including document, window, and event-listener registration.

The only session mutations an RHS is permitted to perform are: session.insert(), session.retract(), and session.nextId() (the last only in standalone, non-multiplayer contexts).

Enforcement is layered:

ESLint. A custom rule no-impure-rhs, defined in packages/rete/eslint-rules/, applies to files matching packages/rete/src/**/rhs/** and packages/chess/src/**/handlers/**. It bans the prohibited globals via no-restricted-globals and forbids importing time, random, timer, network, console, and DOM modules. Lint failures block CI.
Dev-mode runtime guard. When NODE_ENV !== 'production', the engine wraps each RHS invocation in a scope where Date, Math.random, setTimeout, setInterval, fetch, and console are replaced with stubs that throw on access. This catches violations the lint rule misses, including indirect calls through helper functions or third-party libraries pulled into RHS modules.

The combination — static lint plus dynamic guard — gives high confidence that no RHS observed in production was permitted to be impure during development or CI.

Time, randomness, and I/O belong outside the engine: time is provided by the host as a fact ((world, tick, n)), randomness is provided as pre-rolled facts produced by a seeded RNG outside the rule firing path, and network/console output is the responsibility of the calling layer.

JSON Rule Schema

Rules are serialisable to and from JSON using a handler-registry pattern. There is no function-to-string conversion, no eval, no new Function, and no arbitrary JavaScript embedded in JSON. All executable behaviour — predicates and RHS handlers — is referenced by name and resolved against a registry at deserialisation time.

The v1 JSON rule schema:

{
  "name": "string (unique rule name within the session)",
  "salience": "number (optional, default 0)",
  "conditions": [
    {
      "type": "alpha | negation | existential | ncc | aggregation",
      "id": "string | number | null (null = wildcard)",
      "attr": "string (attribute key)",
      "binding": "string (variable name to bind value to, or null)",
      "idBinding": "string (variable name to bind entity id to, or null)"
    }
  ],
  "filters": [
    {
      "predicate": "string (registered predicate name)",
      "args": ["JsonValue (static arguments)"]
    }
  ],
  "handler": "string (registered handler name)",
  "handlerArgs": ["JsonValue (static arguments passed to handler alongside match)"]
}

Field semantics:

name MUST be unique within the session; duplicate registration is an error.
salience is consumed by Conflict Resolution; omission is equivalent to 0.
conditions[].type selects the Rete II node type that handles the condition; alpha is the ordinary positive condition, the others correspond to the Phase 2 node types listed in Rete II Reference Target.
conditions[].id is either a literal entity id (number), a variable reference (string starting with ?), or null for wildcard.
conditions[].binding and conditions[].idBinding declare variables introduced by this condition; downstream conditions and filters refer to them by name.
filters[].predicate MUST be a name registered in the session's predicate registry; the predicate receives the bound variable values plus args and returns a boolean.
handler MUST be a name registered in the session's HandlerRegistry; the handler receives the bound match plus handlerArgs and may perform the permitted session mutations subject to RHS Purity Contract.

Deserialisation validates every registry reference. If handler names a function not present in the HandlerRegistry, the engine throws UnknownHandlerError. If any filters[].predicate names a function not present in the predicate registry, the engine throws UnknownPredicateError. Both errors include the offending name and the rule name in their message.

The schema is exported as RULE_SCHEMA_V1, a Zod schema, for runtime structural validation prior to registry resolution. Schema-level errors (missing fields, wrong types, unknown type values) are reported with Zod's standard issue paths.

This handler-registry design makes rules safely portable across processes, persistable to disk, and shippable over the network without ever transmitting executable code.

Rete II Reference Target

The canonical reference for the Rete II algorithm implemented by this engine is the Doorenbos PhD thesis:

Doorenbos, R. B. (1995). Production Matching for Large Learning Systems. PhD Thesis, Carnegie Mellon University. CMU-CS-95-113.

All node types, memory structures, and algorithmic decisions in this engine trace back to that thesis. Where this specification diverges (notably in Conflict Resolution, ID Authority, and the JSON serialisation surface), the divergence is documented above and is intentional.

The following node types from the Doorenbos thesis are in scope for v1 of this engine:

Node Type	Phase	Description
`AlphaNode`	Phase 1	Indexes facts by `(id?, attr)` pattern; feeds an `AlphaMemory`.
`AlphaMemory`	Phase 1	Stores facts matching one alpha pattern.
`BetaMemory`	Phase 1	Stores partial matches (tokens) produced by left activations.
`JoinNode`	Phase 1	Combines left tokens with right alpha facts; performs variable binding and equality checks.
`FilterNode`	Phase 1	Applies registered predicates to tokens; the analog of CLIPS-style `cond` / test nodes.
`ProductionNode`	Phase 1	Terminal node; triggers an RHS handler on each full match (subject to refraction).
`DerivedFactProduction`	Phase 1	The `thenFinally` variant of a production; retracts its derived fact when the supporting match is removed.
`NegationNode`	Phase 2	Passes a token iff zero facts match the negated pattern (NOT).
`ExistentialNode`	Phase 2	Passes a token iff at least one fact matches the pattern (EXISTS).
`NccNode`	Phase 2	Passes a token iff no combination of N conditions matches (negated conjunctive condition).
`AggregationNode`	Phase 2	Computes `count` / `sum` / `min` / `max` / `collect` over matching facts; binds the result to a variable.

Out of scope for v1: RETE/UL (unlinking), right-unlinking optimisation, sequential mode, and any conflict-set priority queue beyond the three-key sort defined in Conflict Resolution. These may be revisited in later versions if profiling demonstrates a need; until then, the simpler implementation is preferred.

18 KiB Raw Blame History Unescape Escape

@paratype/rete — Engine Semantics Specification