Paul Koch ยท Blueprint Equity ยท April 2026

Building an AI-Native Development Pipeline

How I set up a state-of-the-art Agentic PDLC in two weeks โ€” with an AI assistant doing the research, evaluation, and implementation alongside me.

0
PRs Merged
0
Tickets Managed
$160
Monthly Cost
18m
Ticket โ†’ PR
Scroll to explore

Two Projects. One Pipeline. Zero Human-Written Code.

๐ŸŽ™๏ธ

morty-voice

Rust Voice AI Assistant

Wake word detection, real-time Gemini conversation, tool execution for calendar, lights, music, project management. Runs 24/7 on a Mac mini. 86+ PRs, 160+ tickets, full observability.

๐ŸŽญ

Stickshift

Interactive Scrollytelling Platform

Physics-based stick figure animations with Rapier.js WASM, scroll-driven scene orchestration, real-time text reflow. Same pipeline, different project, same agents.

Me: "I want to build a voice assistant that actually works. Wake word, real conversation, tool execution."
Morty: "Spinning up 3 parallel research agents. 12+ sources in 20 minutes."
"Research complete. 20 Linear issues created with research-backed descriptions. Ready for Cyrus to start coding."
โ†’ From idea to 20 actionable tickets in 20 minutes

Why Current AI Dev Pipelines Break

"The models are incredible. The problem is the orchestration."

0
Parallel sessions that burned through a 5-hour rate limit in 2 hours
11,867
Linear API error responses from agents hammering the same endpoints
50+
Manual human interventions required in a single sprint session
6 hrs
Lost to rate limits and process bugs on Day 1 alone
The pipeline looked automated on paper. It wasn't. It was me babysitting seven terminal windows, manually triggering the next step, copying context between sessions, and restarting failed processes.

What an Agentic PDLC Actually Looks Like

Human writes ticket
โ†’
AI writes code
โ†’
CI validates
โ†’
Human approves
๐Ÿง 

Morty

AI PM + DevOps

Monitors pipeline, cron orchestration, status reports, maintains living specification documents updated by retros.

โšก

Cyrus

Dispatcher / Developer

Receives Linear webhooks, spawns Claude Code sessions, routes work to the right context.

๐ŸŽฏ

Claude Opus

Tech Lead

Plans approach, delegates to 3-4 parallel subagents, verifies all output before commit.

๐Ÿ“‹

Strata

Product Manager

Evidence ร— Alignment, RICE scoring, pre-mortem analysis. The gate that prevents bad tickets.

๐Ÿ”„

PM Cron

Pipeline Heartbeat

Every 10 minutes: merge, build, test, diagnose, file tickets, trigger agents. 144 runs per day.

๐Ÿ”

Retro Agent

Self-Improvement Engine

Nightly analysis of entire pipeline. Classifies issues, proposes fixes, catches what humans miss.

$12.30
Cost per ticket
18 min
Ticket to PR
95%+
Code coverage
$160
Monthly cost

4 Hours to Autonomous

The real timeline from project kickoff to first autonomous PR.

9:00 AM
Project Kickoff
Evaluated Gemini's voice gateway architecture with AI assistant
9:30 AM
Deep Research โ€” 3 Parallel Agents
12+ sources synthesized in 20 minutes: OpenAI, Anthropic, Block, JetBrains, Linux Foundation
10:00 AM
Repo Scaffold
Rust workspace with 5 crates, CI pipeline, architecture docs. GitHub repo live.
10:30 AM
20 Linear Issues Created
Each with research-backed descriptions, acceptance criteria, and implementation notes
11:30 AM
Cyrus Deployed
Autonomous coding agent connected to Linear + GitHub. First webhook integration live.
12:05 PM
Morty Agent Built
@mentions in Linear โ†’ webhook โ†’ AI response. Full pipeline in 40 minutes.
12:15 PM
Infrastructure Permanent
Cloudflare tunnels, launchd services, auto-restart. Zero terminal windows needed.
1:15 PM
Report Auto-Deployed
44 issues, 6 services, 37 skills installed. Internal report generated and deployed to Cloudflare.
A traditional 4-person team would take 4-6 weeks for the same output. We did it in 4 hours with 1 person.

The Pipeline Gets Teeth

Four major failures surfaced. Each one taught us what autonomous systems actually need.

PROBLEM

Scope Creep

30+ tickets filed without evaluation. Agents wasted ~12 hours on low-value work.

FIX

Created Strata PM Agent. Every feature goes through Evidence ร— Alignment, RICE, pre-mortem.

PROBLEM

Missing Tests

100% of early PRs lacked integration tests. 3 PRs merged that failed in production.

FIX

Hard rule in CLAUDE.md: integration tests mandatory before every push.

PROBLEM

Board Chaos

50 tickets on Linear board. 28 stale research tasks. 9 duplicates. No clear priorities.

FIX

Cleanup: 50 โ†’ 13 relevant tickets. Added automated board hygiene to retro process.

PROBLEM

Manual Handoffs

2-4 hour delay from approval to coding. 100% required human intervention.

FIX

Automated brief โ†’ Spec Kit โ†’ tickets โ†’ Cyrus webhook. <5 minute routing.

Me: "No no why doesn't Cyrus run them though bro"
"It should really know if these things work you know"
Morty: "You're absolutely right. Cyrus should be running the integration tests as part of every PR before merging."
โ†’ Led to a permanent rule change in the agent's operating instructions

The Pipeline Improves Itself

Something unexpected happened. The system started catching its own problems.

๐Ÿ”

The Retro Agent

Nightly Pipeline Analysis

Every night, analyzes: Linear tickets, GitHub PRs, error logs, cron health, Strata decisions, and industry research. Produces structured report with classified issues and concrete proposals. By end of Week 2: 20+ spec changes applied in one bulk update.

Me: "I need to sleep. Every 5 minutes all night long, read the context and run the test suite."
Morty: [Sets up watchdog cron, overnight testing plan, integration tests]
Me (5:23 AM): "I'm not awake just wondering how it's going"
Morty: "Nothing's on fire. Go back to sleep. ๐ŸŒ™"
โ†’ The pipeline ran all night autonomously while Paul slept

How Agent Instructions Evolved

๐Ÿง  Morty โ€” PM + DevOps

Day 1: Basic orchestration assistant

Week 2: Full PM with living spec documents updated by daily retros

April 1 Rule: "NEVER write code. You are PM + DevOps. Cyrus is the developer. Non-negotiable."

Added after Morty wrote Rust code all day โ€” every fix got overwritten by Cyrus's PRs.

๐Ÿ“‹ Strata โ€” Product Manager

Born from: "You should really be using a process for these tickets"

Now: Every feature โ†’ Evidence ร— Alignment โ†’ RICE โ†’ Pre-mortem โ†’ Case-against โ†’ Product Brief

Updated to read project constitution + architecture before evaluating. 28KB specification document.

โšก Cyrus โ€” Developer

Pattern: Opus tech lead โ†’ 3-4 parallel subagents โ†’ verify โ†’ commit

Best day: 16 PRs merged, 0 reverts

Worst day: Idle 48 hours (webhook silently failed)

Both caught by retrospectives.

๐Ÿ”„ PM Cron โ€” The Heartbeat

Evolution:

Merge bot โ†’ Test monitor โ†’ Ticket filer โ†’ State manager โ†’ Duplicate preventer โ†’ Channel-aware

Each capability added after a real problem caught by retrospectives. 144 runs per day.

Me: "But you have to remember to tell Cyrus to use spec kit?"
Morty: "...you're right. The handoff is manual. Let me automate it so approved briefs route directly through the decomposition pipeline."
โ†’ One question exposed a gap โ€” and it was fixed the same day

Daily Retrospectives: The System Watches Itself

These findings would have persisted indefinitely without automated daily analysis.

๐Ÿ”ฅ The 27,397-Crash Saga

Voice gateway crash-looped every 5 seconds for 2 days. 19,659 crashes Day 1, 7,738 Day 2. USB mic unplugged, no circuit breaker.
Retro finding: "Circuit breaker proposed April 1, re-proposed April 5, re-proposed April 6, still doesn't exist April 7. This is a process failure โ€” proposals aren't being implemented."
โ†’ Led to: spec accountability rule โ€” all changes requiring tickets must be filed within 24 hours

๐Ÿ”ฅ The 48-Hour Silent Failure

Cyrus produced zero output for 48+ hours. Six tickets in Todo, zero PRs opened. Nobody noticed.
Retro finding: "No mechanism detects Cyrus idleness. Auth health checks verify capability, not productivity."
โ†’ Led to: productivity monitoring โ€” idle >4 hours with pending work triggers alert

๐Ÿ”ฅ The 8.4M Token Burn

PM Cron running stateless: 250K tokens per run ร— 48 daily runs = 8.4M tokens wasted rediscovering the same failure.
Retro finding: "PM Cron has no memory. State persistence would eliminate 95% of this waste."
โ†’ Led to: token usage down 95%, from 8.4M to 400K daily

๐Ÿ”ฅ The Meta-Finding

The system caught that its own improvement process was broken.
Retro finding: "April 6 retro had 6 proposals, 0 reviewed. Same issues repeating. The retro process identifies problems but has no mechanism to ensure follow-through."
โ†’ The system watching itself, catching that it can't enforce its own improvements. That's genuine self-awareness in automation.

๐Ÿ† The Best Day

16 PRs merged, 0 reverts. Audio capture, playback, perf observability, tool dispatch, filler phrases, mic ducking, noise gate, CoreAudio fallbacks, auto-reconnect โ€” all in one day. PM cron self-merged a PR: detected readiness, merged, built, deployed, verified โ€” all autonomously.

Pipeline Scorecard โ€” Honest Assessment

Strata โ†’ Ticketโœ… Working
Ticket โ†’ Agent PickupโŒ Broken โ€” manual trigger required
Agent โ†’ PRโœ… Working โ€” 5/5 sessions, 0 failures
PR โ†’ Mergeโš ๏ธ Flaky โ€” manual merges
Merge โ†’ Ticket DoneโŒ Broken โ€” tickets stuck In Progress
Human Interventions8+ (target: 0)

The honest โŒ marks are the point. The system surfaces its own problems every day so they get fixed.

The Complete Pipeline

๐Ÿ’ก Idea
โ†’
๐Ÿ“‹ Strata
โ†’
โœ… Approve
โ†’
๐Ÿ“ Spec Kit
โ†’
๐Ÿ“Œ Linear
โ†’
โšก Cyrus
โ†’
๐ŸŽฏ Opus + Subagents
โ†’
๐Ÿ”€ PR
โ†’
๐Ÿ›ก๏ธ CI Gates
โ†’
๐Ÿ”„ PM Cron Merge
โ†’
๐Ÿš€ Deploy
โ†’
๐Ÿ” Retro
โ†’
๐Ÿ“ Spec Updates โ†ฉ

Quality Gates โ€” Every PR

โœ… Compile
โœ… Clippy (warnings = errors)
โœ… Format
โœ… Tests (parallel)
โœ… Coverage โ‰ฅ95%
๐Ÿ“‹ Mutation Testing
๐Ÿ“‹ AI Code Review

The AI doesn't decide if code is good. The pipeline decides.

Economics

AI Pipeline
$160
/month ยท 24/7 ยท Day 1 productive
Managed Platform
$600+
/month ยท mediocre output
Junior Engineer
$10K+
/month ยท 3-6 month ramp

Day 0: 4 Hours of Setup

First autonomous PR by end of day.

1

Linear workspace + project + labels

Issue tracking for your pipeline

30 minutes ยท Free plan
2

GitHub repo with CI workflow template

Compile, lint, test, coverage โ€” automated from Day 1

1 hour ยท Template provided
3

Install the coding agent dispatcher

Connect Linear โ†’ GitHub โ†’ Claude Code

15 minutes ยท Any always-on machine
4

Cloudflare tunnel for webhook routing

Secure, permanent webhook endpoint

30 minutes ยท Free Cloudflare tunnel
5

Claude Max subscription

Unlimited access for all agents

5 minutes ยท ~$100/month flat rate
6

Write CLAUDE.md with subagent pattern

Operating instructions for your coding agent

1 hour ยท Template provided
๐ŸŽ‰

Assign first test ticket

Your first autonomous PR

Your pipeline is live
WEEK 1

Add Teeth

Self-hosted CI runner, 95% coverage gate, mutation testing, PR review agent, cron monitoring every 30 minutes.

MONTH 1

Self-Maintaining

Architecture knowledge base, AI PM daily monitoring, retrospectives, agents maintaining agent infrastructure.

The bottleneck was never AI capability.

The models have been good enough for a year. What was missing was the glue: the webhook that triggers the agent, the subagent pattern that parallelizes work, the CI gate that enforces quality, the knowledge base that preserves conventions, and the AI PM that keeps it all running.
That glue costs $160/month and runs on a Mac mini.
The question isn't whether this is possible.
It's whether you're going to be the company that does it.

Paul Koch ยท Head of Technology ยท Blueprint Equity