A framework for AI safety through architectural constraints, preserving human agency where it matters most.
The Tractatus Framework exists to address a fundamental problem in AI safety: current approaches rely on training, fine-tuning, and corporate governance—all of which can fail, drift, or be overridden. We propose safety through architecture.
Inspired by Ludwig Wittgenstein's Tractatus Logico-Philosophicus, our framework recognizes that some domains—values, ethics, cultural context, human agency—cannot be systematized. What cannot be systematized must not be automated. AI systems should have structural constraints that prevent them from crossing these boundaries.
"Whereof one cannot speak, thereof one must be silent."
— Ludwig Wittgenstein, Tractatus (§7)
Applied to AI: "What cannot be systematized must not be automated."
Individuals and communities must maintain control over decisions affecting their data, privacy, and values. AI systems must preserve human agency, not erode it.
All AI decisions must be explainable, auditable, and reversible. No black boxes. Users deserve to understand how and why systems make choices, and have power to override them.
AI systems must not cause harm through action or inaction. This includes preventing drift, detecting degradation, and enforcing boundaries against values erosion.
AI safety is a collective endeavor. We are committed to open collaboration, knowledge sharing, and empowering communities to shape the AI systems that affect their lives.
The Tractatus Framework consists of five integrated components that work together to enforce structural safety:
Classifies instructions by quadrant (Strategic, Operational, Tactical, System, Stochastic) and determines persistence level (HIGH/MEDIUM/LOW/VARIABLE).
Validates AI actions against stored instructions to prevent contradictions (like the 27027 incident where MongoDB port was changed from explicit instruction).
Ensures AI never makes values decisions without human approval. Privacy trade-offs, user agency, cultural context—these require human judgment.
Detects when session conditions increase error probability (token pressure, message length, task complexity) and adjusts behavior or suggests handoff.
AI self-checks complex reasoning before proposing actions. Evaluates alignment, coherence, completeness, safety, and alternatives.
The Tractatus Framework emerged from real-world AI failures experienced during extended Claude Code sessions. The "27027 incident"—where AI contradicted an explicit instruction about MongoDB port after 85,000 tokens—revealed that traditional safety approaches were insufficient.
After documenting multiple failure modes (parameter contradiction, values drift, silent degradation), we recognized a pattern: AI systems lacked structural constraints. They could theoretically "learn" safety, but in practice they failed when context pressure increased, attention decayed, or subtle values conflicts emerged.
The solution wasn't better training—it was architecture. Drawing inspiration from Wittgenstein's insight that some things lie beyond the limits of language (and thus systematization), we built a framework that enforces boundaries through structure, not aspiration.
The Tractatus Framework is open source under the Apache License 2.0. We encourage:
The framework is intentionally permissive because AI safety benefits from transparency and collective improvement, not proprietary control.
We chose Apache 2.0 over MIT because it provides:
Help build AI systems that preserve human agency through architectural guarantees.