The 11-Factor Framework
A comprehensive guide to building AI systems that survive contact with production.
Build
Design & Development
Model Serving Layer
Abstract model providers behind a unified serving layer for portability, failover, and cost optimization.
Your AI system shouldn't be married to a single model provider. A proper serving layer abstracts away the underlying LLM, giving you the freedom to swap providers, implement failover strategies, and optimize costs without touching application code. Think of it as the load balancer for intelligence. Today it's GPT-4, tomorrow it's Claude, next month it's an open-source model that outperforms both — your application shouldn't care.
Context Management
Design and manage the full context pipeline — system prompts, RAG, grounding, and context window strategy.
Context is everything. It's not just prompt engineering — it's the entire pipeline of information that shapes how your AI system behaves. System instructions, retrieved documents, conversation history, grounding data, tool results. Managing context means deciding what goes in, what stays out, how to prioritise when windows fill up, and how to version and test context configurations independently of code. The best AI systems aren't the ones with the best models — they're the ones with the best context.
Memory Management
Give AI systems structured short-term and long-term memory for context-aware interactions.
AI systems without memory are goldfish. Effective memory management means implementing both working memory (conversation context, session state) and long-term memory (user preferences, learned patterns, historical decisions). The key is knowing what to remember, what to forget, and how to retrieve efficiently. In multi-agent systems, memory becomes even more critical — agents need shared memory for coordination and private memory for specialisation.
Integrations (MCP)
Standardise how AI systems connect to external tools, APIs, and data sources.
AI systems that can't interact with the real world are just expensive chatbots. Integrations are how your system reads databases, calls APIs, triggers workflows, and takes action. The Model Context Protocol (MCP) is emerging as the standard — think USB-C for AI — but the factor itself is protocol-agnostic. What matters is having a consistent, secure, discoverable way for AI systems to connect to external capabilities. Today it's MCP, tomorrow it might be something else. Your integration layer should survive that transition.
Run
Production Operations
Orchestration
Coordinate multiple AI components, agents, and workflows for complex task execution.
Real production AI systems aren't single agents — they're orchestrated ensembles. Orchestration is how you coordinate multiple models, agents, and workflows to accomplish complex tasks. It's deciding when to use a workflow (structured, predictable) versus an agent (autonomous, flexible). It's managing task delegation, parallel execution, error recovery, and result aggregation. The difference between a demo and a production system is almost always orchestration — the boring plumbing that makes everything work together reliably.
Human in the Loop
Design clear escalation paths and approval workflows for high-stakes decisions.
Not every decision should be autonomous. Production AI systems need well-designed intervention points where humans can review, approve, or override actions. The art is in knowing which decisions need oversight and making the handoff seamless. Too many checkpoints and you've built an expensive approval system. Too few and you're one hallucination away from a headline. The best systems make human oversight feel natural, not like a speed bump.
Rate Limits & Latency
Handle provider rate limits gracefully and optimise for acceptable response times.
Real users don't wait 30 seconds for a response. Managing rate limits means implementing queuing, backoff strategies, and provider failover. Latency optimisation involves streaming responses, caching where appropriate, choosing the right model size for each task, and knowing when "good enough fast" beats "perfect slow". In multi-agent systems, latency compounds — one slow agent blocks the entire chain.
Cost Control
Monitor, budget, and optimise LLM spend with token tracking and model routing.
LLM costs can spiral fast. Production cost control means tracking spend per user, per task, per agent. It means routing simple tasks to cheaper models and reserving expensive frontier models for complex reasoning. It means implementing token budgets, caching repeated queries, and having kill switches when budgets are exceeded. The companies that win with AI won't be the ones that spend the most — they'll be the ones that spend the smartest.
Govern
Risk & Quality
Evaluation & Observability
Continuously measure, trace, and monitor AI system performance and behaviour.
You can't improve what you can't measure, and you can't debug what you can't see. This factor merges two deeply connected disciplines: evaluation (measuring output quality, reasoning accuracy, task completion, regression detection) and observability (tracing every decision, logging tool calls, monitoring latency and errors). Build eval suites that run in CI and catch quality drops before users do. Build dashboards that tell you the health of your entire AI system at a glance. When your agent hallucinates at 3am, you need to know why — and you need to know within minutes, not days.
Safety & Guardrails
Implement input/output filtering, content policies, and behavioural boundaries.
Production AI systems need guardrails like production cars need seatbelts. This means input validation to prevent prompt injection, output filtering to catch harmful or off-brand content, behavioural boundaries to keep systems on-task, and circuit breakers when things go sideways. The Chevy Tahoe for $1 incident, the Replit agent deleting production databases — these aren't edge cases, they're what happens without guardrails. Safety isn't a feature you add later. It's a design principle from day one.
Reproducibility & Audit
Ensure AI behaviour can be replayed, audited, and explained for compliance.
In regulated industries, "the AI decided" isn't good enough. Reproducibility means capturing enough state to replay any interaction — the inputs, the context, the model version, the outputs, the decisions. Audit trails provide the paper trail that regulators, legal teams, and stakeholders need. Together, they make your AI system trustworthy and defensible. This isn't just about compliance — it's about building systems you can learn from. Every failure becomes a training example. Every success becomes a repeatable pattern.