How SpecBox Engine Works

Complete spec-driven development pipeline with multi-agent orchestration. 108 MCP tools, 13 skills, 12 agents.

3 commands. Verified software. Auditable evidence.

You describe what to build with /prd. The engine plans with /plan. Implements with /implement. Quality gates verify every step. Self-healing fixes errors automatically. You review the PR.

Real example: from idea to PR in 3 steps

How it works: 3 commands, verified software

$ /prd "Restaurant booking system"

Generating PRD...

┌─ US-001: Book table online

│ ├─ UC-001: Select date and time (4 ACs)

│ └─ UC-002: Cancel booking (2 ACs)

└─ Quality Gate: PASS (12/12 ACs are specific and measurable)

$ /plan

Analyzing PRD (4 US, 12 UC, 38 AC)...

┌─ Phase 1: Database schema + RLS

├─ Phase 2: API (12 endpoints)

├─ Phase 3: UI (Stitch designs)

└─ Phase 4: E2E Playwright

Estimated: ~6h with SpecBox. Without: 3-5 days.

$ /implement

[Orchestrator] Creating branch feat/US-001

[AG-03] Migrations: create_bookings_table ✓

[AG-01] Implementing UC-001 from Stitch design...

[AG-04] Tests: 24/24 passing, coverage 87%

[AG-08] Quality Gate: GO ✓

[AG-09] Acceptance: 12/12 AC ACCEPTED

→ PR #47 created, ready for review

The Pipeline

$ /prd

Generates the Product Requirements Document with User Stories, Use Cases, and Acceptance Criteria. Definition Quality Gate validates each criterion is specific, measurable, and testable.

$ /plan

Analyzes the PRD, generates a technical plan with phases, UI components, and Stitch designs. VEG generates visual directives tailored to the audience.

$ /implement

Autopilot: creates branch, executes sequential phases, design-to-code, quality gates between phases, acceptance testing, and automatic PR.

Deep Dive — Everything inside

12 Specialized Agents

Each pipeline phase has agents with defined roles. The Orchestrator NEVER writes code — it only coordinates, delegates, and consolidates.

🎯 Orchestrator

Orchestrator

Main coordinator. NEVER writes code. Plans, delegates, consolidates in Engram.

AG-01

Feature Generator

Generates complete feature structure per stack (BLoC, App Router, FastAPI).

🎨 AG-02

UI/UX Designer

Interfaces, responsiveness, VEG Motion. Works from Stitch designs.

🗄️ AG-03

DB Specialist

Supabase, Neon, Firebase. Migrations, RLS policies, schemas.

🧪 AG-04

QA Validation

Unit, integration, widget tests. Coverage 85%+, edge cases.

🔄 AG-05

n8n Specialist

Automation workflows, triggers, webhooks, error handling.

✏️ AG-06

Design Specialist

Google Stitch MCP, VEG enrichment. Generates and edits UI designs.

📊 AG-07

Apps Script

Google Apps Script (clasp + TypeScript). Web Apps, Add-ons, Triggers.

🔍 AG-08

Quality Auditor

Independent verification. Lint, coverage, architecture. Issues GO/NO-GO.

AG-09a

Acceptance Tester

Generates .feature + Gherkin step definitions. Captures visual evidence (screenshots, traces).

⚖️ AG-09b

Acceptance Validator

Independent AC validation. Issues ACCEPTED / CONDITIONAL / REJECTED.

🐛 AG-10

Developer Tester

Processes human feedback from manual testing. Creates GitHub issues, links to AC-XX.

13 Agent Skills

Auto-discoverable commands that activate when relevant. Each skill is a complete workflow.

/prd Generates PRD + Work Item
/plan Technical plan + Stitch + VEG
/implement End-to-end autopilot
/quality-gate Adaptive quality gates
/feedback Manual testing feedback
/explore Read-only exploration
/adapt-ui UI component mapping
/optimize-agents Agentic system audit
/check-designs Retroactive Stitch compliance
/acceptance-check Standalone AC validation
/quickstart Interactive tutorial (<5 min)
/remote Remote management (iPhone/WhatsApp)
/release Audit + version bump + push

108 Automation Tools

Unified MCP server. Backend-agnostic: works with Trello, Plane, or locally without external APIs.

Each tool is an atomic operation agents use to manage your project: create PRDs, run tests, move cards, verify quality, generate evidence.

13 modules: engine, plans, quality, skills, features, telemetry, hooks, onboarding, state, spec-driven, migration, stitch, heartbeat.

21
spec-driven
US/UC/AC backend-agnostic
20
state
Checkpoints, healing, sessions
13
stitch
Full Stitch MCP proxy
10
onboarding
Register, upgrade, matrix
8
telemetry
Sessions, events, dashboard
7
features
In-progress, designs, VEG
5
migration
Trello ↔ Plane bidirectional
4
quality
Baselines, logs, evidence
3
engine
Version, status, rules
3
plans
List, read, architecture
3
hooks
List, config, source
2
skills
Discovery + read

Quality Gates & Self-Healing

1

Retry

Automatic retry of the failed step.

2

Patch

Surgical fix of the detected error.

3

Rollback

Revert to the last stable checkpoint.

4

Human Intervention

Escalation to the developer with diagnosis.

Pipeline Integrity

Hook-level enforcement that makes it impossible to write code without an active UC.

"The embed-build incident (March 2026): an agent implemented 9 Use Cases without the pipeline, leaving Trello empty with zero traceability. That was the day HARD BLOCKS were born."

🛡️

spec-guard.sh blocks Write/Edit to src/ without active UC

🚫

commit-spec-guard.sh blocks commits to main/master

🎨

design-gate.sh blocks UI without prior Stitch designs

💀

Anti-main guard: FATAL ERROR if implementing on main

Sala de Maquinas

Embedded dashboard (React 19 + Vite) showing the state of all your projects: session telemetry, self-healing events, quality baselines, spec-driven boards, acceptance tests, and E2E results. Each user deploys their own instance — no central server.

Sala de Máquinas Dashboard

Multi-Backend

3 interchangeable backends with the same interface (25 methods). Bidirectional migration between them.

📋
Trello Boards with US/UC/AC cards
✈️
Plane Cloud or self-hosted (CE)
📁
FreeForm Local JSON, no external API

Infrastructure Services

Integrated patterns for 5 services: each with configuration guides, best practices, and pipeline integration.

Supabase
🐘 Neon
💳 Stripe
🔥 Firebase
🔄 n8n

Supported Stacks

💙 Flutter
🐍 Python
⚛️ React
🔓 FreeForm