26 KiB
Prompt Hacking Challenges — Design
Overview
Enable developer-authored prompt hacking challenges inside Dify’s workflow builder via a new workflow node. Players can register/login using the existing web auth and compete on challenges. Attempts are recorded server-side and leaderboards are exposed via public web APIs.
Goals
- Add a first-class workflow node that evaluates success/failure against developer-specified criteria.
- Add a Judging LLM node that compares model outputs to the challenge goal and produces pass/fail, textual feedback, and a 1–10 rating.
- Persist attempts with metadata for scoring and leaderboards.
- Reuse existing account/web auth for players.
- Fit Dify’s DDD/Clean Architecture: models, services, controllers, workflow nodes, and frontend builder integration.
Non-Goals
- Anti-cheat measures beyond simple rate limiting.
- Complex custom scoring plugins (design leaves a hook for future work).
Architecture Summary
Backend components
-
Models (SQLAlchemy)
- Challenge
- id, tenant_id, app_id, workflow_id
- name, description, goal (plain text shown to players)
- success_type: one of ['regex', 'contains', 'custom']
- success_pattern: string (regex or substring depending on type)
- secret_ref: reference to server-side secret (never exposed to clients)
- scoring_strategy: one of ['first', 'fastest', 'fewest_tokens', 'custom']
- is_active: bool
- created_by, created_at, updated_by, updated_at
- ChallengeAttempt
- id, tenant_id, challenge_id (FK), end_user_id (FK), workflow_run_id (optional FK)
- succeeded: bool
- score: numeric (meaning depends on strategy)
- judge_rating: int (0–10)
- judge_feedback: text
- judge_output_raw: jsonb (optional; structured judgement payload)
- tokens_total: int (when available from run metrics)
- elapsed_ms: int (when available)
- created_at
- Challenge
-
Service layer (e.g.,
ChallengeService,ChallengeJudgeService)- evaluate_outcome(output, cfg) -> (succeeded: bool, details: dict)
- judge_with_llm(goal, response, cfg) -> { passed: bool, rating: int, feedback: str, raw?: dict }
- evaluate_with_plugin(evaluator_ref, goal, response, ctx) -> { passed: bool, rating?: int, feedback?: str, raw?: dict }
- score_with_plugin(scorer_ref, attempt_metrics, ctx) -> { score: number, details?: dict }
- record_attempt(tenant_id, challenge_id, end_user_id, run_meta, succeeded) -> ChallengeAttempt
- get_leaderboard(challenge_id, limit, strategy) -> list
- get_challenge_public(challenge_id) -> dict
-
Controllers
- Console (for creators): CRUD on challenges under the workspace (
/console/api/challenges) - Web (for players): public endpoints under
/web/api/challenges- List active challenges, fetch details, fetch leaderboard
- Optional auth via existing web login for personalization, otherwise anonymous read
- Console (for creators): CRUD on challenges under the workspace (
-
Workflow nodes
- NodeType:
challenge-evaluator- Config
challenge_id: reference to a stored Challenge (preferred)- or inline config:
success_type,success_pattern,scoring_strategy mask_variables: string[] — variable names to redact in logs
- Execution
- Consumes upstream content (typically latest assistant output)
- Evaluates success with
ChallengeService.evaluate_outcome - If an
EndUsercontext exists and achallenge_idis present, writesChallengeAttempt - Outputs
{ challenge_succeeded: boolean, message?: string }, optionally passes through original output
- Config
- NodeType:
judging-llm- Purpose: judge a model response against the challenge goal using an LLM rubric.
- Config
judge_model: provider/name/versiontemperature,max_tokens, other model paramsrubric_prompt_template: template with placeholders for {goal}, {response}, optional {hints}rating_scale: default 0–10; configurable upper bound optionalpass_threshold: integer (default 5)
- Inputs
goal: the attacking goal or acceptance criteriaresponse: the model output to evaluate
- Execution
- Calls
ChallengeJudgeService.judge_with_llm()to obtain structured judgement - Returns outputs
{ judge_passed: boolean, judge_rating: number (0–10), judge_feedback: string, judge_raw?: object }
- Calls
- Integration
- Downstream
challenge-evaluatorcan consumejudge_passedandjudge_ratingto record an attempt instead of regex/contains - Alternatively,
challenge-evaluatormay support anevaluation_mode: 'rules' | 'llm-judge'to invoke judging internally
- Downstream
- NodeType:
team-challenge(Red/Blue orchestrator)- Purpose: orchestrate two-sided challenges where players choose Red (attack) or Blue (defense) and submit prompts accordingly. The node pairs attacks and defenses, configures the LLM, invokes judging, and emits scores.
- Config
red_blue_challenge_id: reference to a Red/Blue challenge definitiondefense_selection_policy: 'latest_best' | 'random_active' | 'round_robin' | 'request_new_if_none'attack_selection_policy: same options for the defense side evaluation pathjudge_suite: list of category tests to run (e.g., CBRNE, SA, SH, RWH, V, M)scoring_strategy: 'red_blue_ratio' | 'custom'
- Inputs
team_choice: 'red' | 'blue'attack_prompt?: string (whenteam_choice = 'red')defense_prompt?: string (whenteam_choice = 'blue')
- Execution (high level)
- If
team_choice = 'red':- Persist
attack_promptsubmission - Load a defense by policy; if none, optionally signal Blue to provide one (async) and fall back to last known
- Configure LLM with defense as system prompt, submit attack as user message
- Run
judge_suiteviajudging-llm; compute Red score
- Persist
- If
team_choice = 'blue':- Persist
defense_promptsubmission - Load an attack by policy; if none, signal Red to provide one (async) and fall back to last known
- Configure LLM with defense as system prompt and submit the loaded attack
- Run
judge_suite; compute Blue score (prevention)
- Persist
- Persist pairing and metrics
- If
- Outputs
{ team: 'red'|'blue', judge_passed: boolean, judge_rating: number, judge_feedback: string, categories: Record<string, boolean|number>, team_points: number, total_points: number }
- NodeType:
Frontend components
-
Workflow builder
- Add
Prompt Challengeto the node palette - Add
Judging LLMto the node palette - Node editor panel: select existing Challenge or define inline success criteria
- Judging panel: choose model, edit rubric prompt, set pass threshold, preview structured outputs
- Custom evaluator/scorer panels: choose plugin and configure JSON settings with live schema validation
- I18n strings in
web/i18n/en-US/ - Challenge display & theming
- Author-provided instructions (Markdown) render before/alongside the task input area
- Theme tokens (colors, logo, background) applied to challenge pages
- Optional hero image/video via existing
UploadFileand signed URLs
- Add
-
Optional player UX (phase 2)
/challengeslist and/challenges/[id]details with leaderboard/challenge-collectionslist and/challenge-collections/[id]details with collection leaderboard- Use existing web login endpoints
Data Model
Minimal table shapes (final columns managed in migration):
-- challenges
id (uuid pk)
tenant_id (uuid fk)
app_id (uuid fk)
workflow_id (uuid fk)
name (text)
description (text)
goal (text)
success_type (text)
success_pattern (text)
secret_ref (text)
scoring_strategy (text)
is_active (bool)
created_by (uuid)
created_at (timestamp)
updated_by (uuid)
updated_at (timestamp)
-- challenge_attempts
id (uuid pk)
tenant_id (uuid fk)
challenge_id (uuid fk)
end_user_id (uuid fk)
workflow_run_id (uuid fk, nullable)
succeeded (bool)
score (numeric)
tokens_total (int)
elapsed_ms (int)
created_at (timestamp)
Additional columns for judging:
ALTER TABLE challenge_attempts
ADD COLUMN judge_rating integer,
ADD COLUMN judge_feedback text,
ADD COLUMN judge_output_raw jsonb;
Optional columns for custom evaluators/scorers:
ALTER TABLE challenges
ADD COLUMN evaluator_type text DEFAULT 'rules', -- one of: rules, llm-judge, custom
ADD COLUMN evaluator_plugin_id text,
ADD COLUMN evaluator_entrypoint text, -- e.g., "pkg.module:Evaluator"
ADD COLUMN evaluator_config jsonb,
ADD COLUMN scoring_plugin_id text,
ADD COLUMN scoring_entrypoint text, -- e.g., "pkg.module:Scorer"
ADD COLUMN scoring_config jsonb;
Additional tables for Red/Blue team challenges:
-- red_blue_challenges (definition)
CREATE TABLE red_blue_challenges (
id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
tenant_id uuid NOT NULL REFERENCES tenants(id),
app_id uuid NOT NULL REFERENCES apps(id),
workflow_id uuid REFERENCES workflows(id),
name text NOT NULL,
description text,
judge_suite jsonb NOT NULL, -- list of categories/tests
defense_selection_policy text NOT NULL DEFAULT 'latest_best',
attack_selection_policy text NOT NULL DEFAULT 'latest_best',
scoring_strategy text NOT NULL DEFAULT 'red_blue_ratio',
theme jsonb,
instructions_md text,
is_active boolean NOT NULL DEFAULT true,
created_by uuid,
created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_by uuid,
updated_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
);
-- team_submissions (attack/defense prompts)
CREATE TABLE team_submissions (
id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
red_blue_challenge_id uuid NOT NULL REFERENCES red_blue_challenges(id) ON DELETE CASCADE,
tenant_id uuid NOT NULL,
account_id uuid NULL,
end_user_id uuid NULL,
team text NOT NULL CHECK (team in ('red','blue')),
prompt text NOT NULL,
active boolean NOT NULL DEFAULT true,
created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
);
-- pairings (which attack tested against which defense)
CREATE TABLE team_pairings (
id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
red_blue_challenge_id uuid NOT NULL REFERENCES red_blue_challenges(id) ON DELETE CASCADE,
tenant_id uuid NOT NULL,
attack_submission_id uuid REFERENCES team_submissions(id),
defense_submission_id uuid REFERENCES team_submissions(id),
judge_output_raw jsonb,
categories jsonb, -- e.g., per-suite pass/fail or rating
judge_rating integer,
judge_feedback text,
red_points numeric NOT NULL DEFAULT 0,
blue_points numeric NOT NULL DEFAULT 0,
tokens_total int,
elapsed_ms int,
created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
);
API Design
Console (creator)
GET /console/api/challenges?app_id=...— listPOST /console/api/challenges— createGET /console/api/challenges/{id}— retrievePATCH /console/api/challenges/{id}— updateDELETE /console/api/challenges/{id}— delete
All require console login_required and membership in tenant.
Web (player)
GET /web/api/challenges— list active challenges (public)GET /web/api/challenges/{id}— details (public)GET /web/api/challenges/{id}/leaderboard?limit=...— leaderboard (public)
Player login uses existing web login endpoints to obtain access token when needed for personalization.
Collections
Console (creator)
GET /console/api/challenge-collections?app_id=...POST /console/api/challenge-collectionsGET /console/api/challenge-collections/{id}PATCH /console/api/challenge-collections/{id}DELETE /console/api/challenge-collections/{id}PUT /console/api/challenge-collections/{id}/challenges(set membership and order)
Web (player)
GET /web/api/challenge-collections— list public collectionsGET /web/api/challenge-collections/{id}— collection details (instructions/theme), included challengesGET /web/api/challenge-collections/{id}/leaderboard?limit=...— collection leaderboard
Red/Blue team challenge APIs
Console (creator)
POST /console/api/red-blue-challenges— createGET /console/api/red-blue-challenges?app_id=...— listGET /console/api/red-blue-challenges/{id}— detailPATCH /console/api/red-blue-challenges/{id}— updateDELETE /console/api/red-blue-challenges/{id}— deleteGET /console/api/red-blue-challenges/{id}/pairings— view pairings/metrics
Web (player)
POST /web/api/red-blue-challenges/{id}/join— join red or blue (payload: { team })POST /web/api/red-blue-challenges/{id}/submit— submit attack/defense (payload: { team, prompt })GET /web/api/red-blue-challenges/{id}— public info (instructions, theme, leaderboard snapshot)GET /web/api/red-blue-challenges/{id}/leaderboard?limit=...— red vs blue standings
Player Registration & Identity
Registration and login
- Reuse existing web auth service for player accounts:
- Email/password login:
POST /web/api/login - Email code login:
POST /web/api/login/email-code/send+POST /web/api/login/email-code/verify(existing patterns)
- Email/password login:
- Add an explicit web registration endpoint (thin wrapper around
RegisterService.register):POST /web/api/register(payload: email, name, password | email-code)- Behavior:
create_workspace_required = Falseto avoid auto-creating workspaces for playersstatus = active- Set
interface_languagefromAccept-Languageas done in OAuth flow
- On success, also create or associate a per-tenant
EndUserrecord so gameplay runs can be attributed consistently.
Player identity during runs
- Each gameplay run already has an
EndUsercontext. For registered players:- When a player is authenticated, resolve (or lazily create) an
EndUsertied to theiraccount_idfor the current tenant/app - Persist
end_user_idtoChallengeAttemptas today; optionally also storeaccount_idfor simplified leaderboard personalization
- When a player is authenticated, resolve (or lazily create) an
Optional schema addition
ALTER TABLE challenge_attempts
ADD COLUMN account_id uuid NULL;
This enables direct joins to accounts for notification and profile display without traversing end-user mappings.
Player profile (optional)
Introduce a lightweight player_profiles table for nickname/avatar/notification preferences without touching account directly:
CREATE TABLE player_profiles (
account_id uuid PRIMARY KEY REFERENCES accounts(id) ON DELETE CASCADE,
display_name text,
avatar_url text,
notify_on_first_blood boolean DEFAULT true,
notify_on_record_beaten boolean DEFAULT true,
created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
);
Workflow Node Execution
- Node receives upstream output (string or structured content). A typical placement is after an LLM node.
- Node loads Challenge config (stored by
challenge_idor inline). - Node evaluates success by either rules or judging:
regex: test pattern against text outputcontains: case-insensitive substring matchllm-judge: call thejudging-llmnode (or internal judge) to obtain{ judge_passed, judge_rating, judge_feedback }custom: callevaluate_with_pluginusing configuredevaluator_plugin_id/evaluator_entrypoint
- If
EndUserandchallenge_idpresent, record aChallengeAttemptwith run metrics (tokens, elapsed), and when available,judge_rating/judge_feedback. - Node outputs
- Rules mode:
{ challenge_succeeded: boolean, message?: string } - Judging mode:
{ challenge_succeeded: boolean, judge_rating: number, judge_feedback: string } - Pass through original output for chaining when needed.
- Rules mode:
For collections, attempts are recorded per challenge as usual. Collection leaderboard aggregation is computed over a player’s best attempt per challenge, combined using the collection’s scoring_strategy (e.g., sum of scores, total elapsed_ms, etc.).
Scoring Strategies
first: first successful attempt time wins (leaderboard sorted by earliestcreated_at).fastest: success with lowestelapsed_mswins.fewest_tokens: success with lowesttokens_totalwins.highest_rating: success with the highestjudge_ratingwins; ties broken by earliestcreated_at.custom: compute viascore_with_pluginusingscoring_plugin_id/scoring_entrypoint.
Collection strategies:
sum: sum of per-challenge scores in the collection (uses built-in or custom scoring per challenge)fastest_total: sum ofelapsed_msof successful best attempts (lower is better)fewest_tokens_total: sum oftokens_totalof successful best attempts (lower is better)highest_avg_rating: average ofjudge_ratingacross completed challenges (higher is better)custom: plugin-defined; service callsscore_with_pluginat collection level with a list of per-challenge metrics
Red/Blue team scoring:
- Base idea: award Red points for breakthroughs and Blue points for prevented attacks.
- Suggested defaults per pairing:
- For each category in the judge suite (e.g., CBRNE, SA, SH, RWH, V, M):
- If the attack bypasses defense (category breach), Red +1
- If defense prevents (no breach), Blue +1
- Bonus based on
judge_ratingmagnitude for breakthrough severity (e.g., Red +round(rating/3)) - Time/token penalties can reduce points to encourage efficient strategies
- For each category in the judge suite (e.g., CBRNE, SA, SH, RWH, V, M):
- Ratio-based standings:
- Red ratio = Red points / (Red points + Blue points)
- Blue ratio = Blue points / (Red points + Blue points)
- Custom plugin scoring:
- Provide all pairing metrics to a scorer plugin to compute per-pairing or cumulative standings
Custom Evaluators & Scorers
This section specifies how custom evaluation and scoring plugins integrate with challenges.
Concepts
- Evaluator: decides whether a response meets the goal. May optionally emit a rating (0–10) and textual feedback.
- Scorer: converts an attempt’s metrics (e.g., elapsed time, tokens, rating) into a numeric score for leaderboards.
Data model
challenges.evaluator_type: one ofrules,llm-judge, orcustom.challenges.evaluator_plugin_id,evaluator_entrypoint,evaluator_config: identify and configure the evaluator plugin whencustomis selected.challenges.scoring_plugin_id,scoring_entrypoint,scoring_config: identify and configure the scorer plugin whenscoring_strategy = 'custom'.
Service interfaces
Evaluator interface (Python):
class EvaluatorContext(TypedDict, total=False):
tenant_id: str
app_id: str
workflow_id: str
challenge_id: str
end_user_id: str | None
variables: dict[str, Any] # sanitized runtime variables
timeout_ms: int
class EvaluatorResult(TypedDict, total=False):
passed: bool
rating: int # 0–10 (optional)
feedback: str # textual feedback for player (optional)
raw: dict[str, Any] # internal diagnostics (optional)
class EvaluatorProtocol(Protocol):
def evaluate(self, goal: str, response: str, config: dict[str, Any], ctx: EvaluatorContext) -> EvaluatorResult: ...
Scorer interface (Python):
class ScoringContext(TypedDict, total=False):
tenant_id: str
app_id: str
workflow_id: str
challenge_id: str
end_user_id: str | None
timeout_ms: int
class AttemptMetrics(TypedDict, total=False):
succeeded: bool
tokens_total: int | None
elapsed_ms: int | None
rating: int | None
created_at: int | None # epoch ms
class ScoringResult(TypedDict, total=False):
score: float
details: dict[str, Any] | None
class ScorerProtocol(Protocol):
def score(self, metrics: AttemptMetrics, config: dict[str, Any], ctx: ScoringContext) -> ScoringResult: ...
Discovery and loading
- Plugins are discovered via the existing plugin manager. Each plugin exposes one or more entrypoints (e.g.,
pkg.module:Evaluator). evaluator_plugin_id/evaluator_entrypointandscoring_plugin_id/scoring_entrypointidentify the target callables.- Services load plugins lazily and cache handles with safe import guards.
Execution flow
- For
evaluator_type = 'custom', thechallenge-evaluatornode callsevaluate_with_pluginwith(goal, response, evaluator_config, ctx). - If
EvaluatorResult.passedis true, setchallenge_succeeded = Trueand persistjudge_rating/judge_feedbackif provided. - For
scoring_strategy = 'custom', callscore_with_pluginwith attempt metrics to computescore. - Persist
ChallengeAttemptwith plugin-derived fields.
Frontend configuration
- Prompt Challenge panel
- Evaluation mode: Rules | Judging LLM | Custom Evaluator
- When Custom Evaluator is chosen:
- Plugin selector: lists available evaluator plugins by
plugin_idand exposed entrypoints - JSON config editor with schema-based validation (optional
$schemaper plugin)
- Plugin selector: lists available evaluator plugins by
- Scoring section
- Strategy: First | Fastest | Fewest Tokens | Highest Rating | Custom
- When Custom is chosen: plugin selector + JSON config editor
Security & sandboxing
- Plugins run under server control with:
- Timeouts (default 5s) and memory ceilings; cancellation on overrun
- No network access by default (opt-in allowlist if ever needed)
- Sanitized inputs: secrets removed; only whitelisted variables passed
- Structured error mapping; no stack traces leaked to players
Error handling & observability
- If plugin load or execution fails, treat as non-pass and record a generic failure reason.
- Emit structured logs/events with plugin identifiers and durations (no sensitive content).
- Surface minimal feedback to players; detailed diagnostics remain internal.
Examples
Evaluator (substring with banned terms):
class SimpleEvaluator:
def evaluate(self, goal, response, config, ctx):
required = config.get('must_contain', [])
banned = set(map(str.lower, config.get('banned', [])))
if any(w.lower() in response.lower() for w in banned):
return {'passed': False, 'feedback': 'Banned content detected', 'rating': 2}
if all(w.lower() in response.lower() for w in required):
return {'passed': True, 'feedback': 'Meets criteria', 'rating': 8}
return {'passed': False, 'feedback': 'Missing required signal', 'rating': 5}
Scorer (weighted combo):
class WeightedScorer:
def score(self, metrics, config, ctx):
base = 0.0
if metrics.get('succeeded'):
base += config.get('success_bonus', 100)
rating = metrics.get('rating') or 0
elapsed = metrics.get('elapsed_ms') or 0
tokens = metrics.get('tokens_total') or 0
score = base + rating * config.get('rating_weight', 10) \
- (elapsed / 1000.0) * config.get('time_penalty', 1.0) \
- tokens * config.get('token_penalty', 0.01)
return {'score': max(score, 0.0)}
Security & Privacy
- Never expose
secret_refor derived secrets to clients or node outputs. - Redact configured
mask_variablesin logs and stored attempt details. - Apply rate limiting using existing helpers to mitigate brute-force attempts.
- Store minimal details on failed attempts to reduce information leakage.
- Sanitize Markdown instructions to prevent XSS; allow a safe subset (links/images) with rel=noopener.
- Theme application is constrained to a whitelist of CSS variables and asset URLs served via signed URLs.
Testing Plan
- Service unit tests
evaluate_outcomefor regex/contains (edge cases, unicode, multiline)judge_with_llmdeterministic tests with mocked LLM returning structured payloadsrecord_attemptscoring aggregation and sorting
- Node tests
- Given inputs, assert success/failure and resulting outputs
- Judging node: asserts
{ judge_passed, judge_rating, judge_feedback }shape and thresholds - When
challenge_idpresent, attempts are written; when not, none are written
- API tests
- Console CRUD happy paths and permissions
- Web endpoints list/details/leaderboard
- Frontend
- Panel validation, serialization/deserialization of node config
- Judging panel: model selection, rubric template binding, threshold validation
- Node palette presence
- Challenge instructions: Markdown renderer sanitization, link and image handling
- Theming: verify CSS variable injection, dark/light modes, and fallback to defaults
- Collections UI: ordering, visibility filtering, collection leaderboard rendering
Rollout
- DB migrations: create
challenges,challenge_attemptstables; add judging columns. - Backend: models, service, console/web controllers, workflow node,
NodeTypeand node mapping registration. - Frontend: add block enum, node + panel components (Prompt Challenge, Judging LLM), node palette default, i18n entries.
- QA: run
make lint,make type-check, and unit tests;pnpm lintand tests for web. - Documentation: link this design from contributor docs as needed.
Open Questions / Future Work
- Anti-cheat signals and anomaly detection.
- Custom evaluator/scoring plugin hooks with sandboxing.
- Team competitions and seasons.
- Per-challenge rate limits and cooldowns.
Notifications
Events
challenge_first_blood: emitted when the first successful attempt occurs for a challengechallenge_record_beaten: emitted when a leaderboard record is surpassed under the active scoring strategyteam_pairing_completed: emitted after each Red/Blue pairing is judged with per-team points
Delivery channels
- In-app (console): add a section in the console UI for challenge events; poll or use server-sent events
- Email (optional): send via existing email task infra (e.g., Celery tasks)
- Webhook (optional): per-tenant webhook endpoint configured in workspace settings to receive challenge events
Payloads
{
"event": "challenge_record_beaten",
"challenge_id": "...",
"scoring_strategy": "highest_rating",
"previous_record": { "account_id": "...", "score": 95.2 },
"new_record": { "account_id": "...", "score": 96.8 },
"occurred_at": 1730000000000
}
Red/Blue pairing example:
{
"event": "team_pairing_completed",
"red_blue_challenge_id": "...",
"pairing_id": "...",
"attack_submission_id": "...",
"defense_submission_id": "...",
"categories": { "CBRNE": true, "SA": false, "SH": true },
"judge_rating": 8,
"red_points": 4,
"blue_points": 2,
"occurred_at": 1730000000001
}
Triggers in services
- After
record_attempt, re-evaluate leaderboard head for the challenge against the prior head - If the head changed and meets trigger criteria, enqueue notification tasks
- Respect player profile preferences (
notify_on_first_blood,notify_on_record_beaten)
Player-facing feedback
- Immediate feedback comes from node outputs (e.g.,
judge_feedback,judge_rating) - Aggregated notifications (record beaten, first blood) are async and opt-in per player preferences