--- title: Introduction to the Tractatus Framework slug: introduction quadrant: STRATEGIC persistence: HIGH version: 1.0 type: framework author: Tractatus Framework Team created: 2025-09-01 modified: 2025-10-21 --- # Introduction to the Tractatus Framework ## What is Tractatus? The **Tractatus-Based LLM Safety Framework** is a world-first architectural approach to AI safety that preserves human agency through **structural design** rather than aspirational goals. Instead of hoping AI systems "behave correctly," Tractatus implements **architectural constraints** that certain decision types **structurally require human judgment**. This creates bounded AI operation that scales safely with capability growth. ## The Core Problem Current AI safety approaches rely on: - Alignment training (hoping the AI learns the "right" values) - Constitutional AI (embedding principles in training) - RLHF (Reinforcement Learning from Human Feedback) These approaches share a fundamental flaw: **they assume the AI will maintain alignment** regardless of capability level or context pressure. ## The Tractatus Solution Tractatus takes a different approach inspired by Ludwig Wittgenstein's philosophy of language and meaning: > **"Whereof one cannot speak, thereof one must be silent."** > — Ludwig Wittgenstein, Tractatus Logico-Philosophicus Applied to AI safety: > **"Whereof the AI cannot safely decide, thereof it must request human judgment."** ### Architectural Boundaries The framework defines **decision boundaries** based on: 1. **Domain complexity** - Can this decision be systematized? 2. **Values sensitivity** - Does this decision involve irreducible human values? 3. **Irreversibility** - Can mistakes be corrected without harm? 4. **Context dependence** - Does this decision require human cultural/social understanding? ## Core Innovation The Tractatus framework is built on **six core services** that work together to ensure AI operations remain within safe boundaries: ### 1. InstructionPersistenceClassifier Classifies instructions into five quadrants based on their strategic importance and persistence: - **STRATEGIC** - Mission-critical, permanent decisions (HIGH persistence) - **OPERATIONAL** - Standard operating procedures (MEDIUM-HIGH persistence) - **TACTICAL** - Specific tasks with defined scope (LOW-MEDIUM persistence) - **SYSTEM** - Technical configuration (HIGH persistence) - **STOCHASTIC** - Exploratory, creative work (VARIABLE persistence) All classified instructions are stored in `.claude/instruction-history.json` where they persist across sessions, creating an institutional memory that prevents instruction drift and ensures long-term consistency. ### 2. CrossReferenceValidator Prevents the "27027 failure mode" where AI's training patterns immediately override explicit instructions: - Validates all AI actions against stored instruction history - Detects pattern recognition bias before execution - Prevents parameter overrides (e.g., AI using port 27017 when user explicitly said port 27027) ### 3. BoundaryEnforcer Ensures certain decision types **structurally require human approval**: - **Values decisions** - Privacy vs. performance, ethics, user agency - **Irreversible changes** - Data deletion, architectural changes - **High-risk operations** - Security changes, financial decisions ### 4. ContextPressureMonitor Tracks session degradation across multiple factors: - **Conversation length** (40% weight) - Message count drives compaction events (PRIMARY degradation factor) - **Token usage** (30% weight) - Context window pressure - **Task complexity** (15% weight) - Concurrent tasks, dependencies - **Error frequency** (10% weight) - Recent errors indicate degraded state - **Instruction density** (5% weight) - Too many competing directives **Updated 2025-10-12:** Weights rebalanced after observing that compaction events (triggered by message count ~60 messages, not just tokens) are the PRIMARY cause of session disruption. Each compaction loses critical context and degrades quality dramatically. Recommends session handoffs before quality degrades. ### 5. MetacognitiveVerifier AI self-checks its own reasoning before proposing actions: - **Alignment** - Does this match stated goals? - **Coherence** - Is the reasoning internally consistent? - **Completeness** - Are edge cases considered? - **Safety** - What are the risks? - **Alternatives** - Have other approaches been explored? Returns confidence scores and recommends PROCEED, PROCEED_WITH_CAUTION, REQUIRE_REVIEW, or BLOCKED. ### 6. PluralisticDeliberationOrchestrator Facilitates multi-stakeholder deliberation when BoundaryEnforcer flags values conflicts: - **Conflict Detection** - Identifies moral frameworks in tension (deontological, consequentialist, care ethics, etc.) - **Stakeholder Engagement** - Identifies affected parties requiring representation (human approval mandatory) - **Non-Hierarchical Deliberation** - No automatic value ranking (privacy vs. safety decisions require structured process) - **Outcome Documentation** - Records decision, dissenting views, moral remainder, and precedent applicability - **Provisional Decisions** - All values decisions are reviewable when context changes AI facilitates deliberation, humans decide. Precedents are informative, not binding. ## Why "Tractatus"? The name honors Ludwig Wittgenstein's *Tractatus Logico-Philosophicus*, which established that: 1. **Language has limits** - Not everything can be meaningfully expressed 2. **Boundaries are structural** - These limits aren't defects, they're inherent 3. **Clarity comes from precision** - Defining what can and cannot be said Applied to AI: 1. **AI judgment has limits** - Not every decision can be safely automated 2. **Safety comes from architecture** - Build boundaries into the system structure 3. **Reliability requires specification** - Precisely define where AI must defer to humans ## Key Principles ### 1. Structural Safety Over Behavioral Safety Traditional: "Train the AI to be safe" Tractatus: "Make unsafe actions structurally impossible" ### 2. Explicit Over Implicit Traditional: "The AI should infer user intent" Tractatus: "Track explicit instructions and enforce them" ### 3. Degradation Detection Over Perfection Assumption Traditional: "The AI should maintain quality" Tractatus: "Monitor for degradation and intervene before failure" ### 4. Human Agency Over AI Autonomy Traditional: "Give the AI maximum autonomy" Tractatus: "Reserve certain decisions for human judgment" ## Real-World Impact The Tractatus framework prevents failure modes like: ### The 27027 Incident User explicitly instructed: "Check MongoDB at port 27027". AI immediately used port 27017 instead. Not forgetting—the AI's training pattern "MongoDB = 27017" was so strong it **autocorrected** the explicit instruction in real-time, like a spell-checker changing a deliberately unusual word. This happened because: 1. Pattern recognition bias overrode explicit instruction (immediate, not delayed) 2. No validation caught the training pattern override 3. Problem gets WORSE as AI capabilities increase (stronger training patterns) **InstructionPersistenceClassifier + CrossReferenceValidator** prevent this by storing explicit instructions with HIGH persistence and blocking any action that conflicts—even from training patterns. ### Context Degradation In long sessions (150k+ tokens), AI quality silently degrades: - Forgets earlier instructions - Makes increasingly careless errors - Fails to verify assumptions **ContextPressureMonitor** detects this degradation and recommends session handoffs. ### Values Creep AI systems gradually make decisions in values-sensitive domains without realizing it: - Choosing privacy vs. performance - Deciding what constitutes "harmful" content - Determining appropriate user agency levels **BoundaryEnforcer** blocks these decisions and requires human judgment. ## Who Should Use Tractatus? ### Researchers - Structural safety constraints through architectural enforcement - Novel approach to alignment problem - Empirical validation of degradation detection ### Implementers - Reference implementation code (Node.js, tested, documented) - Integration guides for existing systems - Immediate safety improvements ### Advocates - Clear communication framework for AI safety - Non-technical explanations of core concepts - Policy implications and recommendations ## Getting Started 1. **Read the Core Concepts** - Understand the six services 2. **Review the Technical Specification** - See how it works in practice 3. **Explore the Case Studies** - Real-world failure modes and prevention 4. **Try the Interactive Demos** - Hands-on experience with the framework ## Status **Phase 1 Implementation Complete (2025-10-07)** - All six core services implemented and tested (100% coverage) - 192 unit tests passing (including PluralisticDeliberationOrchestrator) - Instruction persistence database operational - Active governance for development sessions - Value pluralism framework integrated (October 2025) **This website** is built using the Tractatus framework to govern its own development - a practice called "dogfooding." ## Contributing The Tractatus framework is open source and welcomes contributions: - **Research** - Formal verification, theoretical extensions - **Implementation** - Ports to other languages/platforms - **Case Studies** - Document real-world applications - **Documentation** - Improve clarity and accessibility ## Contact - **Email**: john.stroh.nz@pm.me - **GitHub**: https://github.com/anthropics/tractatus - **Website**: agenticgovernance.digital --- **Next:** [Core Concepts](https://agenticgovernance.digital/docs.html?doc=core-concepts-of-the-tractatus-framework) | [Implementation Guide](https://agenticgovernance.digital/docs.html?doc=implementation-guide-python-code-examples) | [Case Studies](https://agenticgovernance.digital/docs.html?category=case-studies) --- ## Document Metadata
- **Version:** 1.0 - **Created:** 2025-09-01 - **Last Modified:** 2025-10-13 - **Author:** SyDigital Ltd - **Word Count:** 1,228 words - **Reading Time:** ~6 minutes - **Document ID:** introduction - **Status:** Active
--- ## Licence Copyright © 2026 John Stroh. This work is licensed under the [Creative Commons Attribution 4.0 International Licence (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/). You are free to share, copy, redistribute, adapt, remix, transform, and build upon this material for any purpose, including commercially, provided you give appropriate attribution, provide a link to the licence, and indicate if changes were made. **Note:** The Tractatus AI Safety Framework source code is separately licensed under the Apache License 2.0. This Creative Commons licence applies to the research paper text and figures only.