The Plan: AI Implementation Architect Portfolio


In early 2026, Nate B. Jones published an analysis of hundreds of AI job postings and identified something specific: the AI job market had split. Generalist “AI users” were competing for commodity roles. A much smaller group — people who could design, build, operate, and validate agentic systems — were getting $280K–$400K+ offers with 142-day average time-to-fill because there simply weren’t enough of them.

He distilled it to 7 skills that employers were explicitly hiring for. My collaborator Dan Lee and I mapped those skills against what I’d already built and what was missing, and turned the gap analysis into a build plan.

The gap analysis

SkillStatus going in
Specification PrecisionStrong — already natural
Evaluation & Quality JudgmentGap — biggest priority
Multi-Agent Task DecompositionTheory only — not built
Failure Pattern RecognitionPartial — not explicit
Trust & Security DesignStrong — agent-lab guardrails
Context ArchitectureStrong — OpenBrain is this
Cost & Token EconomicsFoundation exists, incomplete

The four projects

Project 1 — Eval Harness for OpenBrain (addresses: Evaluation & Quality Judgment)

A systematic eval framework measuring whether OpenBrain’s AI responses are actually correct — not just fluent. Two independent judges (Claude Sonnet + GPT-4o), hallucination detection, longitudinal tracking. Nate’s framing: “Resisting the temptation to read fluency as correctness.” Baseline: 0.950 average fidelity across 25 test cases, 98/100 query pass rate.

Project 2 — Multi-Agent Infrastructure Automation (addresses: Multi-Agent Task Decomposition)

A real working 3-agent system: Planner → Terraform sub-agent → Validator sub-agent. Explicit Pydantic handoff contracts, deterministic state machine, LLM only on failure. Deployed against live Proxmox infrastructure. ✅ Complete — exceeded scope.

Project 3 — Failure Pattern Detection Dashboard (addresses: Failure Pattern Recognition)

Explicit instrumentation for the six AI failure modes Nate identified: context degradation, specification drift, sycophantic confirmation, tool selection errors, cascading failure, and silent failure. Prometheus counters + Grafana dashboard + failure injection test suite. Not started.

Project 4 — Token Economics Calculator (addresses: Cost & Token Economics)

A Python CLI that projects AI task costs across models before committing to a run, calibrated against actual agent-lab telemetry. Not started.

Where things stand (2026-04-01)

Week 2 (multi-agent) is complete and beyond scope. The eval harness for OpenBrain is ~90% done. The agent eval harness built on 2026-04-01 is a direct descendant of both — a fixed scenario suite for model comparison that’s now part of the standard toolkit for any future model upgrade.

Projects 3 and 4 are next. The Prometheus infrastructure from projects 1 and 2 means Project 3 is mostly instrumentation and visualization work on top of an existing foundation.

The portfolio framing when complete: “I built a production RAG system, instrumented it with systematic evals, extended the infrastructure agent to multi-agent orchestration, explicitly detect and classify failure modes in real time, and can model the economics of any of it before spending a token.”

That’s not a candidate who uses AI. That’s a candidate who builds AI systems — and can prove they work.