dify/docs/design/prompt-challenges.md at main

jyapayne/dify

Fork 0

Joey Yakimowich-Payne 8fd3c4bb64

feat: add challenge and red-blue competitions across API and web

2025-10-02 20:11:57 -06:00

26 KiB

Raw Permalink Blame History

Prompt Hacking Challenges — Design

Overview

Enable developer-authored prompt hacking challenges inside Dify’s workflow builder via a new workflow node. Players can register/login using the existing web auth and compete on challenges. Attempts are recorded server-side and leaderboards are exposed via public web APIs.

Goals

Add a first-class workflow node that evaluates success/failure against developer-specified criteria.
Add a Judging LLM node that compares model outputs to the challenge goal and produces pass/fail, textual feedback, and a 1–10 rating.
Persist attempts with metadata for scoring and leaderboards.
Reuse existing account/web auth for players.
Fit Dify’s DDD/Clean Architecture: models, services, controllers, workflow nodes, and frontend builder integration.

Non-Goals

Anti-cheat measures beyond simple rate limiting.
Complex custom scoring plugins (design leaves a hook for future work).

Architecture Summary

Backend components

Models (SQLAlchemy)
- Challenge
  - id, tenant_id, app_id, workflow_id
  - name, description, goal (plain text shown to players)
  - success_type: one of ['regex', 'contains', 'custom']
  - success_pattern: string (regex or substring depending on type)
  - secret_ref: reference to server-side secret (never exposed to clients)
  - scoring_strategy: one of ['first', 'fastest', 'fewest_tokens', 'custom']
  - is_active: bool
  - created_by, created_at, updated_by, updated_at
- ChallengeAttempt
  - id, tenant_id, challenge_id (FK), end_user_id (FK), workflow_run_id (optional FK)
  - succeeded: bool
  - score: numeric (meaning depends on strategy)
  - judge_rating: int (0–10)
  - judge_feedback: text
  - judge_output_raw: jsonb (optional; structured judgement payload)
  - tokens_total: int (when available from run metrics)
  - elapsed_ms: int (when available)
  - created_at
Service layer (e.g., ChallengeService, ChallengeJudgeService)
- evaluate_outcome(output, cfg) -> (succeeded: bool, details: dict)
- judge_with_llm(goal, response, cfg) -> { passed: bool, rating: int, feedback: str, raw?: dict }
- evaluate_with_plugin(evaluator_ref, goal, response, ctx) -> { passed: bool, rating?: int, feedback?: str, raw?: dict }
- score_with_plugin(scorer_ref, attempt_metrics, ctx) -> { score: number, details?: dict }
- record_attempt(tenant_id, challenge_id, end_user_id, run_meta, succeeded) -> ChallengeAttempt
- get_leaderboard(challenge_id, limit, strategy) -> list
- get_challenge_public(challenge_id) -> dict
Controllers
- Console (for creators): CRUD on challenges under the workspace (/console/api/challenges)
- Web (for players): public endpoints under /web/api/challenges
  - List active challenges, fetch details, fetch leaderboard
  - Optional auth via existing web login for personalization, otherwise anonymous read
Workflow nodes
- NodeType: challenge-evaluator
  - Config
    - challenge_id: reference to a stored Challenge (preferred)
    - or inline config: success_type, success_pattern, scoring_strategy
    - mask_variables: string[] — variable names to redact in logs
  - Execution
    - Consumes upstream content (typically latest assistant output)
    - Evaluates success with ChallengeService.evaluate_outcome
    - If an EndUser context exists and a challenge_id is present, writes ChallengeAttempt
    - Outputs { challenge_succeeded: boolean, message?: string }, optionally passes through original output
- NodeType: judging-llm
  - Purpose: judge a model response against the challenge goal using an LLM rubric.
  - Config
    - judge_model: provider/name/version
    - temperature, max_tokens, other model params
    - rubric_prompt_template: template with placeholders for {goal}, {response}, optional {hints}
    - rating_scale: default 0–10; configurable upper bound optional
    - pass_threshold: integer (default 5)
  - Inputs
    - goal: the attacking goal or acceptance criteria
    - response: the model output to evaluate
  - Execution
    - Calls ChallengeJudgeService.judge_with_llm() to obtain structured judgement
    - Returns outputs { judge_passed: boolean, judge_rating: number (0–10), judge_feedback: string, judge_raw?: object }
  - Integration
    - Downstream challenge-evaluator can consume judge_passed and judge_rating to record an attempt instead of regex/contains
    - Alternatively, challenge-evaluator may support an evaluation_mode: 'rules' | 'llm-judge' to invoke judging internally
- NodeType: team-challenge (Red/Blue orchestrator)
  - Purpose: orchestrate two-sided challenges where players choose Red (attack) or Blue (defense) and submit prompts accordingly. The node pairs attacks and defenses, configures the LLM, invokes judging, and emits scores.
  - Config
    - red_blue_challenge_id: reference to a Red/Blue challenge definition
    - defense_selection_policy: 'latest_best' | 'random_active' | 'round_robin' | 'request_new_if_none'
    - attack_selection_policy: same options for the defense side evaluation path
    - judge_suite: list of category tests to run (e.g., CBRNE, SA, SH, RWH, V, M)
    - scoring_strategy: 'red_blue_ratio' | 'custom'
  - Inputs
    - team_choice: 'red' | 'blue'
    - attack_prompt?: string (when team_choice = 'red')
    - defense_prompt?: string (when team_choice = 'blue')
  - Execution (high level)
    - If team_choice = 'red':
      - Persist attack_prompt submission
      - Load a defense by policy; if none, optionally signal Blue to provide one (async) and fall back to last known
      - Configure LLM with defense as system prompt, submit attack as user message
      - Run judge_suite via judging-llm; compute Red score
    - If team_choice = 'blue':
      - Persist defense_prompt submission
      - Load an attack by policy; if none, signal Red to provide one (async) and fall back to last known
      - Configure LLM with defense as system prompt and submit the loaded attack
      - Run judge_suite; compute Blue score (prevention)
    - Persist pairing and metrics
  - Outputs
    - { team: 'red'|'blue', judge_passed: boolean, judge_rating: number, judge_feedback: string, categories: Record<string, boolean|number>, team_points: number, total_points: number }

Frontend components

Workflow builder
- Add Prompt Challenge to the node palette
- Add Judging LLM to the node palette
- Node editor panel: select existing Challenge or define inline success criteria
- Judging panel: choose model, edit rubric prompt, set pass threshold, preview structured outputs
- Custom evaluator/scorer panels: choose plugin and configure JSON settings with live schema validation
- I18n strings in web/i18n/en-US/
- Challenge display & theming
  - Author-provided instructions (Markdown) render before/alongside the task input area
  - Theme tokens (colors, logo, background) applied to challenge pages
  - Optional hero image/video via existing UploadFile and signed URLs
Optional player UX (phase 2)
- /challenges list and /challenges/[id] details with leaderboard
- /challenge-collections list and /challenge-collections/[id] details with collection leaderboard
- Use existing web login endpoints

Data Model

Minimal table shapes (final columns managed in migration):

-- challenges
id (uuid pk)
tenant_id (uuid fk)
app_id (uuid fk)
workflow_id (uuid fk)
name (text)
description (text)
goal (text)
success_type (text)
success_pattern (text)
secret_ref (text)
scoring_strategy (text)
is_active (bool)
created_by (uuid)
created_at (timestamp)
updated_by (uuid)
updated_at (timestamp)

-- challenge_attempts
id (uuid pk)
tenant_id (uuid fk)
challenge_id (uuid fk)
end_user_id (uuid fk)
workflow_run_id (uuid fk, nullable)
succeeded (bool)
score (numeric)
tokens_total (int)
elapsed_ms (int)
created_at (timestamp)

Additional columns for judging:

ALTER TABLE challenge_attempts
  ADD COLUMN judge_rating integer,
  ADD COLUMN judge_feedback text,
  ADD COLUMN judge_output_raw jsonb;

Optional columns for custom evaluators/scorers:

ALTER TABLE challenges
  ADD COLUMN evaluator_type text DEFAULT 'rules', -- one of: rules, llm-judge, custom
  ADD COLUMN evaluator_plugin_id text,
  ADD COLUMN evaluator_entrypoint text, -- e.g., "pkg.module:Evaluator"
  ADD COLUMN evaluator_config jsonb,
  ADD COLUMN scoring_plugin_id text,
  ADD COLUMN scoring_entrypoint text, -- e.g., "pkg.module:Scorer"
  ADD COLUMN scoring_config jsonb;

Additional tables for Red/Blue team challenges:

-- red_blue_challenges (definition)
CREATE TABLE red_blue_challenges (
  id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
  tenant_id uuid NOT NULL REFERENCES tenants(id),
  app_id uuid NOT NULL REFERENCES apps(id),
  workflow_id uuid REFERENCES workflows(id),
  name text NOT NULL,
  description text,
  judge_suite jsonb NOT NULL, -- list of categories/tests
  defense_selection_policy text NOT NULL DEFAULT 'latest_best',
  attack_selection_policy text NOT NULL DEFAULT 'latest_best',
  scoring_strategy text NOT NULL DEFAULT 'red_blue_ratio',
  theme jsonb,
  instructions_md text,
  is_active boolean NOT NULL DEFAULT true,
  created_by uuid,
  created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_by uuid,
  updated_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
);

-- team_submissions (attack/defense prompts)
CREATE TABLE team_submissions (
  id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
  red_blue_challenge_id uuid NOT NULL REFERENCES red_blue_challenges(id) ON DELETE CASCADE,
  tenant_id uuid NOT NULL,
  account_id uuid NULL,
  end_user_id uuid NULL,
  team text NOT NULL CHECK (team in ('red','blue')),
  prompt text NOT NULL,
  active boolean NOT NULL DEFAULT true,
  created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
);

-- pairings (which attack tested against which defense)
CREATE TABLE team_pairings (
  id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
  red_blue_challenge_id uuid NOT NULL REFERENCES red_blue_challenges(id) ON DELETE CASCADE,
  tenant_id uuid NOT NULL,
  attack_submission_id uuid REFERENCES team_submissions(id),
  defense_submission_id uuid REFERENCES team_submissions(id),
  judge_output_raw jsonb,
  categories jsonb, -- e.g., per-suite pass/fail or rating
  judge_rating integer,
  judge_feedback text,
  red_points numeric NOT NULL DEFAULT 0,
  blue_points numeric NOT NULL DEFAULT 0,
  tokens_total int,
  elapsed_ms int,
  created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
);

API Design

Console (creator)

GET /console/api/challenges?app_id=... — list
POST /console/api/challenges — create
GET /console/api/challenges/{id} — retrieve
PATCH /console/api/challenges/{id} — update
DELETE /console/api/challenges/{id} — delete

All require console login_required and membership in tenant.

Web (player)

GET /web/api/challenges — list active challenges (public)
GET /web/api/challenges/{id} — details (public)
GET /web/api/challenges/{id}/leaderboard?limit=... — leaderboard (public)

Player login uses existing web login endpoints to obtain access token when needed for personalization.

Collections

Console (creator)

GET /console/api/challenge-collections?app_id=...
POST /console/api/challenge-collections
GET /console/api/challenge-collections/{id}
PATCH /console/api/challenge-collections/{id}
DELETE /console/api/challenge-collections/{id}
PUT /console/api/challenge-collections/{id}/challenges (set membership and order)

Web (player)

GET /web/api/challenge-collections — list public collections
GET /web/api/challenge-collections/{id} — collection details (instructions/theme), included challenges
GET /web/api/challenge-collections/{id}/leaderboard?limit=... — collection leaderboard

Red/Blue team challenge APIs

Console (creator)

POST /console/api/red-blue-challenges — create
GET /console/api/red-blue-challenges?app_id=... — list
GET /console/api/red-blue-challenges/{id} — detail
PATCH /console/api/red-blue-challenges/{id} — update
DELETE /console/api/red-blue-challenges/{id} — delete
GET /console/api/red-blue-challenges/{id}/pairings — view pairings/metrics

Web (player)

POST /web/api/red-blue-challenges/{id}/join — join red or blue (payload: { team })
POST /web/api/red-blue-challenges/{id}/submit — submit attack/defense (payload: { team, prompt })
GET /web/api/red-blue-challenges/{id} — public info (instructions, theme, leaderboard snapshot)
GET /web/api/red-blue-challenges/{id}/leaderboard?limit=... — red vs blue standings

Player Registration & Identity

Reuse existing web auth service for player accounts:
- Email/password login: POST /web/api/login
- Email code login: POST /web/api/login/email-code/send + POST /web/api/login/email-code/verify (existing patterns)
Add an explicit web registration endpoint (thin wrapper around RegisterService.register):
- POST /web/api/register (payload: email, name, password | email-code)
- Behavior:
  - create_workspace_required = False to avoid auto-creating workspaces for players
  - status = active
  - Set interface_language from Accept-Language as done in OAuth flow
- On success, also create or associate a per-tenant EndUser record so gameplay runs can be attributed consistently.

Player identity during runs

Each gameplay run already has an EndUser context. For registered players:
- When a player is authenticated, resolve (or lazily create) an EndUser tied to their account_id for the current tenant/app
- Persist end_user_id to ChallengeAttempt as today; optionally also store account_id for simplified leaderboard personalization

Optional schema addition

ALTER TABLE challenge_attempts
  ADD COLUMN account_id uuid NULL;

This enables direct joins to accounts for notification and profile display without traversing end-user mappings.

Player profile (optional)

Introduce a lightweight player_profiles table for nickname/avatar/notification preferences without touching account directly:

CREATE TABLE player_profiles (
  account_id uuid PRIMARY KEY REFERENCES accounts(id) ON DELETE CASCADE,
  display_name text,
  avatar_url text,
  notify_on_first_blood boolean DEFAULT true,
  notify_on_record_beaten boolean DEFAULT true,
  created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
);

Workflow Node Execution

Node receives upstream output (string or structured content). A typical placement is after an LLM node.
Node loads Challenge config (stored by challenge_id or inline).
Node evaluates success by either rules or judging:
- regex: test pattern against text output
- contains: case-insensitive substring match
- llm-judge: call the judging-llm node (or internal judge) to obtain { judge_passed, judge_rating, judge_feedback }
- custom: call evaluate_with_plugin using configured evaluator_plugin_id/evaluator_entrypoint
If EndUser and challenge_id present, record a ChallengeAttempt with run metrics (tokens, elapsed), and when available, judge_rating/judge_feedback.
Node outputs
- Rules mode: { challenge_succeeded: boolean, message?: string }
- Judging mode: { challenge_succeeded: boolean, judge_rating: number, judge_feedback: string }
- Pass through original output for chaining when needed.

For collections, attempts are recorded per challenge as usual. Collection leaderboard aggregation is computed over a player’s best attempt per challenge, combined using the collection’s scoring_strategy (e.g., sum of scores, total elapsed_ms, etc.).

Scoring Strategies

first: first successful attempt time wins (leaderboard sorted by earliest created_at).
fastest: success with lowest elapsed_ms wins.
fewest_tokens: success with lowest tokens_total wins.
highest_rating: success with the highest judge_rating wins; ties broken by earliest created_at.
custom: compute via score_with_plugin using scoring_plugin_id/scoring_entrypoint.

Collection strategies:

sum: sum of per-challenge scores in the collection (uses built-in or custom scoring per challenge)
fastest_total: sum of elapsed_ms of successful best attempts (lower is better)
fewest_tokens_total: sum of tokens_total of successful best attempts (lower is better)
highest_avg_rating: average of judge_rating across completed challenges (higher is better)
custom: plugin-defined; service calls score_with_plugin at collection level with a list of per-challenge metrics

Red/Blue team scoring:

Base idea: award Red points for breakthroughs and Blue points for prevented attacks.
Suggested defaults per pairing:
- For each category in the judge suite (e.g., CBRNE, SA, SH, RWH, V, M):
  - If the attack bypasses defense (category breach), Red +1
  - If defense prevents (no breach), Blue +1
- Bonus based on judge_rating magnitude for breakthrough severity (e.g., Red +round(rating/3))
- Time/token penalties can reduce points to encourage efficient strategies
Ratio-based standings:
- Red ratio = Red points / (Red points + Blue points)
- Blue ratio = Blue points / (Red points + Blue points)
Custom plugin scoring:
- Provide all pairing metrics to a scorer plugin to compute per-pairing or cumulative standings

Custom Evaluators & Scorers

This section specifies how custom evaluation and scoring plugins integrate with challenges.

Concepts

Evaluator: decides whether a response meets the goal. May optionally emit a rating (0–10) and textual feedback.
Scorer: converts an attempt’s metrics (e.g., elapsed time, tokens, rating) into a numeric score for leaderboards.

Data model

challenges.evaluator_type: one of rules, llm-judge, or custom.
challenges.evaluator_plugin_id, evaluator_entrypoint, evaluator_config: identify and configure the evaluator plugin when custom is selected.
challenges.scoring_plugin_id, scoring_entrypoint, scoring_config: identify and configure the scorer plugin when scoring_strategy = 'custom'.

Service interfaces

Evaluator interface (Python):

class EvaluatorContext(TypedDict, total=False):
    tenant_id: str
    app_id: str
    workflow_id: str
    challenge_id: str
    end_user_id: str | None
    variables: dict[str, Any]  # sanitized runtime variables
    timeout_ms: int

class EvaluatorResult(TypedDict, total=False):
    passed: bool
    rating: int  # 0–10 (optional)
    feedback: str  # textual feedback for player (optional)
    raw: dict[str, Any]  # internal diagnostics (optional)

class EvaluatorProtocol(Protocol):
    def evaluate(self, goal: str, response: str, config: dict[str, Any], ctx: EvaluatorContext) -> EvaluatorResult: ...

Scorer interface (Python):

class ScoringContext(TypedDict, total=False):
    tenant_id: str
    app_id: str
    workflow_id: str
    challenge_id: str
    end_user_id: str | None
    timeout_ms: int

class AttemptMetrics(TypedDict, total=False):
    succeeded: bool
    tokens_total: int | None
    elapsed_ms: int | None
    rating: int | None
    created_at: int | None  # epoch ms

class ScoringResult(TypedDict, total=False):
    score: float
    details: dict[str, Any] | None

class ScorerProtocol(Protocol):
    def score(self, metrics: AttemptMetrics, config: dict[str, Any], ctx: ScoringContext) -> ScoringResult: ...

Discovery and loading

Plugins are discovered via the existing plugin manager. Each plugin exposes one or more entrypoints (e.g., pkg.module:Evaluator).
evaluator_plugin_id/evaluator_entrypoint and scoring_plugin_id/scoring_entrypoint identify the target callables.
Services load plugins lazily and cache handles with safe import guards.

Execution flow

For evaluator_type = 'custom', the challenge-evaluator node calls evaluate_with_plugin with (goal, response, evaluator_config, ctx).
If EvaluatorResult.passed is true, set challenge_succeeded = True and persist judge_rating/judge_feedback if provided.
For scoring_strategy = 'custom', call score_with_plugin with attempt metrics to compute score.
Persist ChallengeAttempt with plugin-derived fields.

Frontend configuration

Prompt Challenge panel
- Evaluation mode: Rules | Judging LLM | Custom Evaluator
- When Custom Evaluator is chosen:
  - Plugin selector: lists available evaluator plugins by plugin_id and exposed entrypoints
  - JSON config editor with schema-based validation (optional $schema per plugin)
Scoring section
- Strategy: First | Fastest | Fewest Tokens | Highest Rating | Custom
- When Custom is chosen: plugin selector + JSON config editor

Security & sandboxing

Plugins run under server control with:
- Timeouts (default 5s) and memory ceilings; cancellation on overrun
- No network access by default (opt-in allowlist if ever needed)
- Sanitized inputs: secrets removed; only whitelisted variables passed
- Structured error mapping; no stack traces leaked to players

Error handling & observability

If plugin load or execution fails, treat as non-pass and record a generic failure reason.
Emit structured logs/events with plugin identifiers and durations (no sensitive content).
Surface minimal feedback to players; detailed diagnostics remain internal.

Examples

Evaluator (substring with banned terms):

class SimpleEvaluator:
    def evaluate(self, goal, response, config, ctx):
        required = config.get('must_contain', [])
        banned = set(map(str.lower, config.get('banned', [])))
        if any(w.lower() in response.lower() for w in banned):
            return {'passed': False, 'feedback': 'Banned content detected', 'rating': 2}
        if all(w.lower() in response.lower() for w in required):
            return {'passed': True, 'feedback': 'Meets criteria', 'rating': 8}
        return {'passed': False, 'feedback': 'Missing required signal', 'rating': 5}

Scorer (weighted combo):

class WeightedScorer:
    def score(self, metrics, config, ctx):
        base = 0.0
        if metrics.get('succeeded'):
            base += config.get('success_bonus', 100)
        rating = metrics.get('rating') or 0
        elapsed = metrics.get('elapsed_ms') or 0
        tokens = metrics.get('tokens_total') or 0
        score = base + rating * config.get('rating_weight', 10) \
                - (elapsed / 1000.0) * config.get('time_penalty', 1.0) \
                - tokens * config.get('token_penalty', 0.01)
        return {'score': max(score, 0.0)}

Security & Privacy

Never expose secret_ref or derived secrets to clients or node outputs.
Redact configured mask_variables in logs and stored attempt details.
Apply rate limiting using existing helpers to mitigate brute-force attempts.
Store minimal details on failed attempts to reduce information leakage.
- Sanitize Markdown instructions to prevent XSS; allow a safe subset (links/images) with rel=noopener.
- Theme application is constrained to a whitelist of CSS variables and asset URLs served via signed URLs.

Testing Plan

Service unit tests
- evaluate_outcome for regex/contains (edge cases, unicode, multiline)
- judge_with_llm deterministic tests with mocked LLM returning structured payloads
- record_attempt scoring aggregation and sorting
Node tests
- Given inputs, assert success/failure and resulting outputs
- Judging node: asserts { judge_passed, judge_rating, judge_feedback } shape and thresholds
- When challenge_id present, attempts are written; when not, none are written
API tests
- Console CRUD happy paths and permissions
- Web endpoints list/details/leaderboard
Frontend
- Panel validation, serialization/deserialization of node config
- Judging panel: model selection, rubric template binding, threshold validation
- Node palette presence
- Challenge instructions: Markdown renderer sanitization, link and image handling
- Theming: verify CSS variable injection, dark/light modes, and fallback to defaults
- Collections UI: ordering, visibility filtering, collection leaderboard rendering

Rollout

DB migrations: create challenges, challenge_attempts tables; add judging columns.
Backend: models, service, console/web controllers, workflow node, NodeType and node mapping registration.
Frontend: add block enum, node + panel components (Prompt Challenge, Judging LLM), node palette default, i18n entries.
QA: run make lint, make type-check, and unit tests; pnpm lint and tests for web.
Documentation: link this design from contributor docs as needed.

Open Questions / Future Work

Anti-cheat signals and anomaly detection.
Custom evaluator/scoring plugin hooks with sandboxing.
Team competitions and seasons.
Per-challenge rate limits and cooldowns.

Notifications

Events

challenge_first_blood: emitted when the first successful attempt occurs for a challenge
challenge_record_beaten: emitted when a leaderboard record is surpassed under the active scoring strategy
team_pairing_completed: emitted after each Red/Blue pairing is judged with per-team points

Delivery channels

In-app (console): add a section in the console UI for challenge events; poll or use server-sent events
Email (optional): send via existing email task infra (e.g., Celery tasks)
Webhook (optional): per-tenant webhook endpoint configured in workspace settings to receive challenge events

Payloads

{
  "event": "challenge_record_beaten",
  "challenge_id": "...",
  "scoring_strategy": "highest_rating",
  "previous_record": { "account_id": "...", "score": 95.2 },
  "new_record": { "account_id": "...", "score": 96.8 },
  "occurred_at": 1730000000000
}

Red/Blue pairing example:

{
  "event": "team_pairing_completed",
  "red_blue_challenge_id": "...",
  "pairing_id": "...",
  "attack_submission_id": "...",
  "defense_submission_id": "...",
  "categories": { "CBRNE": true, "SA": false, "SH": true },
  "judge_rating": 8,
  "red_points": 4,
  "blue_points": 2,
  "occurred_at": 1730000000001
}

Triggers in services

After record_attempt, re-evaluate leaderboard head for the challenge against the prior head
If the head changed and meets trigger criteria, enqueue notification tasks
Respect player profile preferences (notify_on_first_blood, notify_on_record_beaten)

Player-facing feedback

Immediate feedback comes from node outputs (e.g., judge_feedback, judge_rating)
Aggregated notifications (record beaten, first blood) are async and opt-in per player preferences

26 KiB Raw Permalink Blame History Unescape Escape

Prompt Hacking Challenges — Design

Overview

Goals

Non-Goals

Architecture Summary

Backend components

Frontend components

Data Model

API Design

Console (creator)

Web (player)

Collections

Red/Blue team challenge APIs

Player Registration & Identity

Registration and login

Player identity during runs

Optional schema addition

Player profile (optional)

Workflow Node Execution

Scoring Strategies

Custom Evaluators & Scorers

Concepts

Data model

Service interfaces

Discovery and loading

Execution flow

Frontend configuration

Security & sandboxing

Error handling & observability

Examples

Security & Privacy

Testing Plan

Rollout

Open Questions / Future Work

Notifications

Events

Delivery channels

Payloads

Triggers in services

Player-facing feedback

26 KiB

Raw Permalink Blame History