How SpecBox Engine Works

Complete spec-driven development pipeline with multi-agent orchestration. 108 MCP tools, 13 skills, 12 agents.

3 commands. Verified software. Auditable evidence.

You describe what to build with /prd. The engine plans with /plan. Implements with /implement. Quality gates verify every step. Self-healing fixes errors automatically. You review the PR.

Real example: from idea to PR in 3 steps

How it works: 3 commands, verified software

$ /prd "Restaurant booking system"

Generating PRD...

┌─ US-001: Book table online

│ ├─ UC-001: Select date and time (4 ACs)

│ └─ UC-002: Cancel booking (2 ACs)

└─ Quality Gate: PASS (12/12 ACs are specific and measurable)

$ /plan

Analyzing PRD (4 US, 12 UC, 38 AC)...

┌─ Phase 1: Database schema + RLS

├─ Phase 2: API (12 endpoints)

├─ Phase 3: UI (Stitch designs)

└─ Phase 4: E2E Playwright

Estimated: ~6h with SpecBox. Without: 3-5 days.

$ /implement

[Orchestrator] Creating branch feat/US-001

[AG-03] Migrations: create_bookings_table ✓

[AG-01] Implementing UC-001 from Stitch design...

[AG-04] Tests: 24/24 passing, coverage 87%

[AG-08] Quality Gate: GO ✓

[AG-09] Acceptance: 12/12 AC ACCEPTED

→ PR #47 created, ready for review

The Pipeline

$ /prd

Generates the Product Requirements Document with User Stories, Use Cases, and Acceptance Criteria. Definition Quality Gate validates each criterion is specific, measurable, and testable.

$ /plan

Analyzes the PRD, generates a technical plan with phases, UI components, and Stitch designs. VEG generates visual directives tailored to the audience.

$ /implement

Autopilot: creates branch, executes sequential phases, design-to-code, quality gates between phases, acceptance testing, and automatic PR.

Deep Dive — Everything inside

12 Specialized Agents

Each pipeline phase has agents with defined roles. The Orchestrator NEVER writes code — it only coordinates, delegates, and consolidates.

🎯 Orchestrator

Orchestrator

Main coordinator. NEVER writes code. Plans, delegates, consolidates in Engram.

⚡ AG-01

Feature Generator

Generates complete feature structure per stack (BLoC, App Router, FastAPI).

🎨 AG-02

UI/UX Designer

Interfaces, responsiveness, VEG Motion. Works from Stitch designs.

🗄️ AG-03

DB Specialist

Supabase, Neon, Firebase. Migrations, RLS policies, schemas.

🧪 AG-04

QA Validation

Unit, integration, widget tests. Coverage 85%+, edge cases.

🔄 AG-05

n8n Specialist

Automation workflows, triggers, webhooks, error handling.

✏️ AG-06

Design Specialist

Google Stitch MCP, VEG enrichment. Generates and edits UI designs.

📊 AG-07

Apps Script

Google Apps Script (clasp + TypeScript). Web Apps, Add-ons, Triggers.

🔍 AG-08

Quality Auditor

Independent verification. Lint, coverage, architecture. Issues GO/NO-GO.

✅ AG-09a

Acceptance Tester

Generates .feature + Gherkin step definitions. Captures visual evidence (screenshots, traces).

⚖️ AG-09b

Acceptance Validator

Independent AC validation. Issues ACCEPTED / CONDITIONAL / REJECTED.

🐛 AG-10

Developer Tester

Processes human feedback from manual testing. Creates GitHub issues, links to AC-XX.

13 Agent Skills

Auto-discoverable commands that activate when relevant. Each skill is a complete workflow.

/prd Generates PRD + Work Item

/plan Technical plan + Stitch + VEG

/implement End-to-end autopilot

/quality-gate Adaptive quality gates

/feedback Manual testing feedback

/explore Read-only exploration

/adapt-ui UI component mapping

/optimize-agents Agentic system audit

/check-designs Retroactive Stitch compliance

/acceptance-check Standalone AC validation

/quickstart Interactive tutorial (<5 min)

/remote Remote management (iPhone/WhatsApp)

/release Audit + version bump + push

108 Automation Tools

Unified MCP server. Backend-agnostic: works with Trello, Plane, or locally without external APIs.

Each tool is an atomic operation agents use to manage your project: create PRDs, run tests, move cards, verify quality, generate evidence.

13 modules: engine, plans, quality, skills, features, telemetry, hooks, onboarding, state, spec-driven, migration, stitch, heartbeat.

spec-driven

US/UC/AC backend-agnostic

state

Checkpoints, healing, sessions

stitch

Full Stitch MCP proxy

onboarding

telemetry

Sessions, events, dashboard

features

In-progress, designs, VEG

migration

Trello ↔ Plane bidirectional

quality

Baselines, logs, evidence

engine

Version, status, rules

plans

List, read, architecture

hooks

List, config, source

skills

Discovery + read

Quality Gates & Self-Healing

Retry

Automatic retry of the failed step.

Patch

Surgical fix of the detected error.

Rollback

Revert to the last stable checkpoint.

Human Intervention

Escalation to the developer with diagnosis.

Pipeline Integrity

Hook-level enforcement that makes it impossible to write code without an active UC.

"The embed-build incident (March 2026): an agent implemented 9 Use Cases without the pipeline, leaving Trello empty with zero traceability. That was the day HARD BLOCKS were born."

🛡️

spec-guard.sh blocks Write/Edit to src/ without active UC

🚫

commit-spec-guard.sh blocks commits to main/master

🎨

design-gate.sh blocks UI without prior Stitch designs

💀

Anti-main guard: FATAL ERROR if implementing on main

Sala de Maquinas

Embedded dashboard (React 19 + Vite) showing the state of all your projects: session telemetry, self-healing events, quality baselines, spec-driven boards, acceptance tests, and E2E results. Each user deploys their own instance — no central server.

Sala de Máquinas Dashboard

Multi-Backend

3 interchangeable backends with the same interface (25 methods). Bidirectional migration between them.

📋

Trello Boards with US/UC/AC cards

✈️

Plane Cloud or self-hosted (CE)

📁

FreeForm Local JSON, no external API

Infrastructure Services

Integrated patterns for 5 services: each with configuration guides, best practices, and pipeline integration.

⚡ Supabase

🐘 Neon

💳 Stripe

🔥 Firebase

🔄 n8n

Supported Stacks

💙 Flutter

🐍 Python

⚛️ React

🔓 FreeForm