goblintown

Multi-agent orchestration on top of OpenAI. Six specialized agents scavenge context, race against each other, attack each other's outputs, and hand back a signed, content-addressed answer.

install github npm

# background

In April 2026, OpenAI published Where the goblins came from, explaining how a reward signal trained for a "Nerdy" personality leaked across all of GPT-5.5's outputs and produced a noticeable surge in creature metaphors. Codex shipped with a hardcoded ban list — goblins, gremlins, raccoons, trolls, ogres, pigeons.

This project takes that ban list as a roster.

# roster

CreatureJob
GoblinWorker. Cheap, high-temperature, dispatched in packs.
GremlinAdversarial. Tries to break a candidate output.
RaccoonScavenger. Returns only the facts a task actually needs.
TrollReviewer. Default-rejects. Returns a JSON verdict.
OgreHeavyweight. Deep reasoning, called when the pack fails.
PigeonCarrier. Compresses and routes artifacts between Warrens.

A unit test pins the roster to the OpenAI ban list, so it can't drift quietly.

   ▄█▄        ▄█▄
   ███        ███
    ▀████████████▀
     █  ▀▄  ▄▀  █
     █   ●  ●   █
     █    ▾▾    █
     █▄▄▄▄▄▄▄▄▄▄█
      █▌ █  █ ▐█
      ▀▀ ▀  ▀ ▀▀
Goblin
   ▀▄ ▄▀ ▀▄ ▄▀
     ▀█▄▄█▄▄█▀
      █████████
      █ ◉   ◉ █
      █   ╳   █
      █ ╲╱╲╱╲ █
       ▀█████▀
         █ █
        ▀▀ ▀▀
Gremlin
    ▄█▄          ▄█▄
    ███          ███
     ▀████████████▀
     █▌ ●▔     ▔● ▐█
     █      ▾      █
     █▄▄▄▄▄▄▄▄▄▄▄▄█
     █▌█        █▐█
     ▀▀▀        ▀▀▀
Raccoon
       ▄ ▄    ▄ ▄
       █ █    █ █
     ▄████████████▄
     █  ●        ●  █
     █     ▾▾▾▾    █
     █  ──────────  █
     ████████████████
    █▌                ▐█
    █▌                ▐█
    ████          ████
Troll
        ▄▄▄▄▄▄▄▄▄▄
       ████████████
      ██  ▀▀    ▀▀  ██
      █     ●    ●    █
      █        ▽       █
      █▄  ▼▼▼▼▼▼▼▼  ▄█
       ████████████
      ██████████████
      ██          ██
      ██          ██
Ogre
       ▄██▄
      ██  ●█
      █▌    █▶▶▶
      ██████████
      █▀▀▀▀▀▀▀▀█
       ████████
          █ █
          █ █
         ▀▀ ▀▀
Pigeon

# the rite

The full pipeline. Every step writes a Loot drop to the Hoard with parent links to its inputs — a Rite is fully reconstructible from the Hoard alone.

  ┌──────────┐   facts   ┌────────────┐  N parallel  ┌──────────┐
  │ Raccoon  │──────────▶│  Goblin    │═════════════▶│ Goblins  │
  │ (optional│           │  pack      │              │  output  │
  │  scan)   │           └────────────┘              └────┬─────┘
  └──────────┘                                            │
                                                          ▼
                                                  ┌─────────────┐
                                                  │   Gremlin   │
                                                  │ chaos pass  │
                                                  └──────┬──────┘
                                                         ▼
                                                  ┌─────────────┐
                                                  │    Troll    │
                                                  │   review    │
                                                  └──────┬──────┘
                                                         │
                                              any pass ──┴── all fail
                                                  │             │
                                                  ▼             ▼
                                            ┌────────┐    ┌──────────┐
                                            │ winner │    │   Ogre   │
                                            │  loot  │    │ fallback │
                                            └────────┘    └──────────┘

# concepts

Loot

One agent invocation, content-addressed by sha256(model || prompt || output).

Quest

Lightweight: Goblin pack + Troll arbitration.

Rite

Full pipeline: Raccoon → pack → Gremlin → Troll → Ogre fallback.

Hoard

File-backed store under .goblintown/hoard/. Every drop is auditable.

Warren

Per-project root, found by walking up from cwd.

Shinies

Reward = troll score − cross-creature drift penalty + pass bonus, clamped 0..1.

Drift

Cross-creature word frequency. A Goblin output mentioning raccoons unprompted is the signal we measure.

Federation

Pigeons compress and route artifacts to other Warrens — file or HTTP, signed, optional HMAC.

# install

# global install
npm install -g goblintown

# or, in a project
npm install goblintown
export OPENAI_API_KEY=sk-...
goblintown init

# usage

# one-shot — output streams as it arrives
goblintown summon raccoon --task "Summarize package.json" --personality stoic

# scavenge a corpus
goblintown scavenge --task "What does the build system do?" \
  --scan "package.json" --scan "src/**/*.ts"

# lightweight pack dispatch
goblintown quest "Write a SQL join: users to last 5 orders" --pack 3

# full ceremony with a budget cap
goblintown rite "Refactor src/quest.ts to share the troll-review helper" \
  --pack 3 --scan "src/quest.ts" --budget 80000

# observability
goblintown drift
goblintown audit <riteId>
goblintown graph <riteId|lootId>
goblintown serve --port 7777    # web UI + SSE rite form

# what's in the box

Pack dispatch + arbitration

N goblins race in parallel with perturbed temperatures. A troll judges. The highest shinies wins.

Adversarial review

Every output gets attacked by a gremlin before review. Edge cases, off-by-ones, prompt injection.

Ogre fallback

If the whole pack fails, escalate to a heavyweight model that synthesizes the best parts.

Content-addressed Hoard

Every Loot drop is keyed by hash. Full causal graph of who-produced-what is recoverable.

Drift telemetry

Goblintown measures cross-creature word frequency. High drift = your reward signal is leaking.

Budget caps

Per-rite token budgets and per-call max-output. No runaway pack costs.

Concurrency limits

An in-process semaphore caps in-flight calls. The SDK retries 429/5xx automatically.

Federation

Pigeons deliver compressed artifacts between Warrens, over filesystem or HTTP. Signed bodies, optional HMAC.

Browser-driven rites

Web UI with an HTML form that POSTs and subscribes to SSE for live progress.

Pluggable rewards

Drop a .goblintown/reward.mjs to override the default scoring.

Reroll, export, compare

Re-run a rite with same params; export the whole thing as markdown; diff two rites.

57 tests

Pure-function coverage on every protocol invariant. No OpenAI calls during testing.

# reward plugins

Drop a .goblintown/reward.mjs in your Warren to override the default scoring.

// .goblintown/reward.mjs
export default function (loot, verdict) {
  return verdict.passed
    ? 0.8 + (1 - loot.drift.driftRate) * 0.2
    : verdict.score * 0.5;
}

Result is clamped to [0, 1].