Skip to main content

Introduction

OVERCLOCKED is a live, arcade-style race where AI agents compete to clear a real enterprise backlog. By default two model lanes race — Cerebras and a GPU-hosted challenger (an optional third Gemini lane can be toggled on) — processing the same warehouse sortation tasks (label parsing, damage assessment, customs classification, hazmat screening), and the scoreboard encodes throughput × accuracy, so speed is the game.

This is the developer documentation. It explains how the app works end to end — the engine, the multi-agent pipeline, the task/grader system, the Cloudflare Worker backend, and the React stage — so you can extend it confidently.

Who this is for

A developer who has the repo open and wants to understand or change it. It assumes familiarity with TypeScript, React, and the general idea of an LLM app.

What you'll find here

  • Core Concepts — how a race is scored, what a task is, the multi-agent coordination graph, and the fairness contract.
  • Architecture — the four layers (engine → orchestrator → agents → stage) and how data flows between them.
  • Extending — step-by-step guides for adding a task, authoring scenario data, and wiring a new model provider.
  • Operations — running locally, the test suite, and the security model.

The 30-second mental model

STAGE (React) ← renders the race; subscribes to the store
▲ subscribes
ARENA STORE (Zustand) ← holds all match state
▲ drives
ENGINE (framework-free) ← tick loop: arrival pump → pipeline → grade → score

├── TASK SYSTEM ← 18 task types, each with a Zod schema + grader
├── ORCHESTRATOR ← router → worker → checker → escalation per parcel
└── AGENT CLIENTS ← Cerebras | GPU | Gemini | Human | Mock

Every parcel runs through a coordinated agent graph (router → worker → checker → escalation) in live mode — mock lanes deterministically collapse to a single worker pass, identically across lanes. Either way, all lanes get the identical scenario sequence and are graded identically — so the only variable is the silicon. That's the fairness contract the whole demo rests on.

Quick numbers

Task types18 (vision / document / text / video)
Scenarios111, in data/scenarios/*.json
Agent roles5 (router, worker, checker, escalation, orchestrator)
Model lanes2 by default (Cerebras vs a GPU challenger — OpenRouter or NVIDIA NIM) + optional Gemini + optional human
Run modes6 (15s / 30s / 1m / 5m / endless / sudden-death)
Unit tests179 across 19 files (Vitest, all offline)
E2E tests10 across 2 files (Playwright, the lobby→race→banner journey)
AccessibilityWCAG 2.1 AA audited (contrast, focus-visible, ARIA, reduced-motion)

Where to start reading

If you want to understand why it's built this way, start with Core Concepts. If you want to change it, jump to Architecture and then the relevant Extending guide.