Building an AI-Native Development Pipeline

What I Built

Two Projects. One Pipeline. Zero Human-Written Code.

🎙️

morty-voice

Rust Voice AI Assistant

Wake word detection, real-time Gemini conversation, tool execution for calendar, lights, music, project management. Runs 24/7 on a Mac mini. 86+ PRs, 160+ tickets, full observability.

🎭

Stickshift

Interactive Scrollytelling Platform

Physics-based stick figure animations with Rapier.js WASM, scroll-driven scene orchestration, real-time text reflow. Same pipeline, different project, same agents.

Me: "I want to build a voice assistant that actually works. Wake word, real conversation, tool execution."

Morty: "Spinning up 3 parallel research agents. 12+ sources in 20 minutes."

"Research complete. 20 Linear issues created with research-backed descriptions. Ready for Cyrus to start coding."

→ From idea to 20 actionable tickets in 20 minutes

The Problem

Why Current AI Dev Pipelines Break

"The models are incredible. The problem is the orchestration."

Parallel sessions that burned through a 5-hour rate limit in 2 hours

11,867

Linear API error responses from agents hammering the same endpoints

50+

Manual human interventions required in a single sprint session

6 hrs

Lost to rate limits and process bugs on Day 1 alone

The pipeline looked automated on paper. It wasn't. It was me babysitting seven terminal windows, manually triggering the next step, copying context between sessions, and restarting failed processes.

The Vision

What an Agentic PDLC Actually Looks Like

Human writes ticket

→

AI writes code

→

CI validates

→

Human approves

🧠

Morty

AI PM + DevOps

Monitors pipeline, cron orchestration, status reports, maintains living specification documents updated by retros.

⚡

Cyrus

Dispatcher / Developer

Receives Linear webhooks, spawns Claude Code sessions, routes work to the right context.

🎯

Claude Opus

Tech Lead

Plans approach, delegates to 3-4 parallel subagents, verifies all output before commit.

📋

Strata

Product Manager

Evidence × Alignment, RICE scoring, pre-mortem analysis. The gate that prevents bad tickets.

🔄

PM Cron

Pipeline Heartbeat

Every 10 minutes: merge, build, test, diagnose, file tickets, trigger agents. 144 runs per day.

🔍

Retro Agent

Self-Improvement Engine

Nightly analysis of entire pipeline. Classifies issues, proposes fixes, catches what humans miss.

$12.30

Cost per ticket

18 min

Ticket to PR

95%+

Code coverage

$160

Monthly cost

Day 1 — March 28, 2026

4 Hours to Autonomous

The real timeline from project kickoff to first autonomous PR.

9:00 AM

Project Kickoff

Evaluated Gemini's voice gateway architecture with AI assistant

9:30 AM

Deep Research — 3 Parallel Agents

12+ sources synthesized in 20 minutes: OpenAI, Anthropic, Block, JetBrains, Linux Foundation

10:00 AM

Repo Scaffold

Rust workspace with 5 crates, CI pipeline, architecture docs. GitHub repo live.

10:30 AM

20 Linear Issues Created

Each with research-backed descriptions, acceptance criteria, and implementation notes

11:30 AM

Cyrus Deployed

Autonomous coding agent connected to Linear + GitHub. First webhook integration live.

12:05 PM

Morty Agent Built

@mentions in Linear → webhook → AI response. Full pipeline in 40 minutes.

12:15 PM

Infrastructure Permanent

Cloudflare tunnels, launchd services, auto-restart. Zero terminal windows needed.

1:15 PM

Report Auto-Deployed

44 issues, 6 services, 37 skills installed. Internal report generated and deployed to Cloudflare.

A traditional 4-person team would take 4-6 weeks for the same output. We did it in 4 hours with 1 person.

Week 1

The Pipeline Gets Teeth

Four major failures surfaced. Each one taught us what autonomous systems actually need.

PROBLEM

Scope Creep

30+ tickets filed without evaluation. Agents wasted ~12 hours on low-value work.

FIX

Created Strata PM Agent. Every feature goes through Evidence × Alignment, RICE, pre-mortem.

PROBLEM

Missing Tests

100% of early PRs lacked integration tests. 3 PRs merged that failed in production.

FIX

Hard rule in CLAUDE.md: integration tests mandatory before every push.

PROBLEM

Board Chaos

50 tickets on Linear board. 28 stale research tasks. 9 duplicates. No clear priorities.

FIX

Cleanup: 50 → 13 relevant tickets. Added automated board hygiene to retro process.

PROBLEM

Manual Handoffs

2-4 hour delay from approval to coding. 100% required human intervention.

FIX

Automated brief → Spec Kit → tickets → Cyrus webhook. <5 minute routing.

Me: "No no why doesn't Cyrus run them though bro"

"It should really know if these things work you know"

Morty: "You're absolutely right. Cyrus should be running the integration tests as part of every PR before merging."

→ Led to a permanent rule change in the agent's operating instructions

Week 2

The Pipeline Improves Itself

Something unexpected happened. The system started catching its own problems.

🔍

The Retro Agent

Nightly Pipeline Analysis

Every night, analyzes: Linear tickets, GitHub PRs, error logs, cron health, Strata decisions, and industry research. Produces structured report with classified issues and concrete proposals. By end of Week 2: 20+ spec changes applied in one bulk update.

Me: "I need to sleep. Every 5 minutes all night long, read the context and run the test suite."

Morty: [Sets up watchdog cron, overnight testing plan, integration tests]

Me (5:23 AM): "I'm not awake just wondering how it's going"

Morty: "Nothing's on fire. Go back to sleep. 🌙"

→ The pipeline ran all night autonomously while Paul slept

The Team

How Agent Instructions Evolved

🧠 Morty — PM + DevOps

Day 1: Basic orchestration assistant

Week 2: Full PM with living spec documents updated by daily retros

April 1 Rule: "NEVER write code. You are PM + DevOps. Cyrus is the developer. Non-negotiable."

Added after Morty wrote Rust code all day — every fix got overwritten by Cyrus's PRs.

📋 Strata — Product Manager

Born from: "You should really be using a process for these tickets"

Now: Every feature → Evidence × Alignment → RICE → Pre-mortem → Case-against → Product Brief

Updated to read project constitution + architecture before evaluating. 28KB specification document.

⚡ Cyrus — Developer

Pattern: Opus tech lead → 3-4 parallel subagents → verify → commit

Best day: 16 PRs merged, 0 reverts

Worst day: Idle 48 hours (webhook silently failed)

Both caught by retrospectives.

🔄 PM Cron — The Heartbeat

Evolution:

Merge bot → Test monitor → Ticket filer → State manager → Duplicate preventer → Channel-aware

Each capability added after a real problem caught by retrospectives. 144 runs per day.

Me: "But you have to remember to tell Cyrus to use spec kit?"

Morty: "...you're right. The handoff is manual. Let me automate it so approved briefs route directly through the decomposition pipeline."

→ One question exposed a gap — and it was fixed the same day

The Secret Weapon

Daily Retrospectives: The System Watches Itself

These findings would have persisted indefinitely without automated daily analysis.

🔥 The 27,397-Crash Saga

Voice gateway crash-looped every 5 seconds for 2 days. 19,659 crashes Day 1, 7,738 Day 2. USB mic unplugged, no circuit breaker.

Retro finding: "Circuit breaker proposed April 1, re-proposed April 5, re-proposed April 6, still doesn't exist April 7. This is a process failure — proposals aren't being implemented."

→ Led to: spec accountability rule — all changes requiring tickets must be filed within 24 hours

🔥 The 48-Hour Silent Failure

Cyrus produced zero output for 48+ hours. Six tickets in Todo, zero PRs opened. Nobody noticed.

Retro finding: "No mechanism detects Cyrus idleness. Auth health checks verify capability, not productivity."

→ Led to: productivity monitoring — idle >4 hours with pending work triggers alert

🔥 The 8.4M Token Burn

PM Cron running stateless: 250K tokens per run × 48 daily runs = 8.4M tokens wasted rediscovering the same failure.

Retro finding: "PM Cron has no memory. State persistence would eliminate 95% of this waste."

→ Led to: token usage down 95%, from 8.4M to 400K daily

🔥 The Meta-Finding

The system caught that its own improvement process was broken.

Retro finding: "April 6 retro had 6 proposals, 0 reviewed. Same issues repeating. The retro process identifies problems but has no mechanism to ensure follow-through."

→ The system watching itself, catching that it can't enforce its own improvements. That's genuine self-awareness in automation.

🏆 The Best Day

16 PRs merged, 0 reverts. Audio capture, playback, perf observability, tool dispatch, filler phrases, mic ducking, noise gate, CoreAudio fallbacks, auto-reconnect — all in one day. PM cron self-merged a PR: detected readiness, merged, built, deployed, verified — all autonomously.

Pipeline Scorecard — Honest Assessment

Strata → Ticket✅ Working

Ticket → Agent Pickup❌ Broken — manual trigger required

Agent → PR✅ Working — 5/5 sessions, 0 failures

PR → Merge⚠️ Flaky — manual merges

Merge → Ticket Done❌ Broken — tickets stuck In Progress

Human Interventions8+ (target: 0)

The honest ❌ marks are the point. The system surfaces its own problems every day so they get fixed.

Architecture

The Complete Pipeline

💡 Idea

→

📋 Strata

→

✅ Approve

→

📐 Spec Kit

→

📌 Linear

→

⚡ Cyrus

→

🎯 Opus + Subagents

→

🔀 PR

→

🛡️ CI Gates

→

🔄 PM Cron Merge

→

🚀 Deploy

→

🔍 Retro

→

📝 Spec Updates ↩

Quality Gates — Every PR

✅ Compile

✅ Clippy (warnings = errors)

✅ Format

✅ Tests (parallel)

✅ Coverage ≥95%

📋 Mutation Testing

📋 AI Code Review

The AI doesn't decide if code is good. The pipeline decides.

Economics

AI Pipeline
$160
/month · 24/7 · Day 1 productive

Managed Platform

$600+

/month · mediocre output

Junior Engineer

$10K+

/month · 3-6 month ramp

Getting Started

Day 0: 4 Hours of Setup

First autonomous PR by end of day.

Linear workspace + project + labels

Issue tracking for your pipeline

30 minutes · Free plan

GitHub repo with CI workflow template

Compile, lint, test, coverage — automated from Day 1

1 hour · Template provided

Install the coding agent dispatcher

Connect Linear → GitHub → Claude Code

15 minutes · Any always-on machine

Cloudflare tunnel for webhook routing

Secure, permanent webhook endpoint

30 minutes · Free Cloudflare tunnel

Claude Max subscription

Unlimited access for all agents

5 minutes · ~$100/month flat rate

Write CLAUDE.md with subagent pattern

Operating instructions for your coding agent

1 hour · Template provided

🎉

Assign first test ticket

Your first autonomous PR

Your pipeline is live

WEEK 1

Add Teeth

Self-hosted CI runner, 95% coverage gate, mutation testing, PR review agent, cron monitoring every 30 minutes.

MONTH 1

Self-Maintaining

Architecture knowledge base, AI PM daily monitoring, retrospectives, agents maintaining agent infrastructure.

The Real Unlock

The bottleneck was never AI capability.

The models have been good enough for a year. What was missing was the glue: the webhook that triggers the agent, the subagent pattern that parallelizes work, the CI gate that enforces quality, the knowledge base that preserves conventions, and the AI PM that keeps it all running.

That glue costs $160/month and runs on a Mac mini.

The question isn't whether this is possible.
It's whether you're going to be the company that does it.

Paul Koch · Head of Technology · Blueprint Equity