- Fixed GitHub repository URLs (anthropics org, not AgenticGovernance) - Updated navigation links to use specific document URLs instead of generic /docs.html - Added instruction persistence system explanation to introduction - Regenerated PDFs with corrections
246 lines
9.6 KiB
Markdown
246 lines
9.6 KiB
Markdown
---
|
|
title: Introduction to the Tractatus Framework
|
|
slug: introduction
|
|
quadrant: STRATEGIC
|
|
persistence: HIGH
|
|
version: 1.0
|
|
type: framework
|
|
author: SyDigital Ltd
|
|
---
|
|
|
|
# Introduction to the Tractatus Framework
|
|
|
|
## What is Tractatus?
|
|
|
|
The **Tractatus-Based LLM Safety Framework** is a world-first architectural approach to AI safety that preserves human agency through **structural guarantees** rather than aspirational goals.
|
|
|
|
Instead of hoping AI systems "behave correctly," Tractatus implements **architectural constraints** that certain decision types **structurally require human judgment**. This creates bounded AI operation that scales safely with capability growth.
|
|
|
|
## The Core Problem
|
|
|
|
Current AI safety approaches rely on:
|
|
- Alignment training (hoping the AI learns the "right" values)
|
|
- Constitutional AI (embedding principles in training)
|
|
- RLHF (Reinforcement Learning from Human Feedback)
|
|
|
|
These approaches share a fundamental flaw: **they assume the AI will maintain alignment** regardless of capability level or context pressure.
|
|
|
|
## The Tractatus Solution
|
|
|
|
Tractatus takes a different approach inspired by Ludwig Wittgenstein's philosophy of language and meaning:
|
|
|
|
> **"Whereof one cannot speak, thereof one must be silent."**
|
|
> — Ludwig Wittgenstein, Tractatus Logico-Philosophicus
|
|
|
|
Applied to AI safety:
|
|
|
|
> **"Whereof the AI cannot safely decide, thereof it must request human judgment."**
|
|
|
|
### Architectural Boundaries
|
|
|
|
The framework defines **decision boundaries** based on:
|
|
|
|
1. **Domain complexity** - Can this decision be systematized?
|
|
2. **Values sensitivity** - Does this decision involve irreducible human values?
|
|
3. **Irreversibility** - Can mistakes be corrected without harm?
|
|
4. **Context dependence** - Does this decision require human cultural/social understanding?
|
|
|
|
## Core Innovation
|
|
|
|
The Tractatus framework is built on **six core services** that work together to ensure AI operations remain within safe boundaries:
|
|
|
|
### 1. InstructionPersistenceClassifier
|
|
|
|
Classifies instructions into five quadrants based on their strategic importance and persistence:
|
|
|
|
- **STRATEGIC** - Mission-critical, permanent decisions (HIGH persistence)
|
|
- **OPERATIONAL** - Standard operating procedures (MEDIUM-HIGH persistence)
|
|
- **TACTICAL** - Specific tasks with defined scope (LOW-MEDIUM persistence)
|
|
- **SYSTEM** - Technical configuration (HIGH persistence)
|
|
- **STOCHASTIC** - Exploratory, creative work (VARIABLE persistence)
|
|
|
|
All classified instructions are stored in `.claude/instruction-history.json` where they persist across sessions, creating an institutional memory that prevents instruction drift and ensures long-term consistency.
|
|
|
|
### 2. CrossReferenceValidator
|
|
|
|
Prevents the "27027 failure mode" where AI's training patterns immediately override explicit instructions:
|
|
|
|
- Validates all AI actions against stored instruction history
|
|
- Detects pattern recognition bias before execution
|
|
- Prevents parameter overrides (e.g., AI using port 27017 when user explicitly said port 27027)
|
|
|
|
### 3. BoundaryEnforcer
|
|
|
|
Ensures certain decision types **structurally require human approval**:
|
|
|
|
- **Values decisions** - Privacy vs. performance, ethics, user agency
|
|
- **Irreversible changes** - Data deletion, architectural changes
|
|
- **High-risk operations** - Security changes, financial decisions
|
|
|
|
### 4. ContextPressureMonitor
|
|
|
|
Tracks session degradation across multiple factors:
|
|
|
|
- **Token usage** (35% weight) - Context window pressure
|
|
- **Conversation length** (25% weight) - Attention decay
|
|
- **Task complexity** (15% weight) - Concurrent tasks, dependencies
|
|
- **Error frequency** (15% weight) - Recent errors indicate degraded state
|
|
- **Instruction density** (10% weight) - Too many competing directives
|
|
|
|
Recommends session handoffs before quality degrades.
|
|
|
|
### 5. MetacognitiveVerifier
|
|
|
|
AI self-checks its own reasoning before proposing actions:
|
|
|
|
- **Alignment** - Does this match stated goals?
|
|
- **Coherence** - Is the reasoning internally consistent?
|
|
- **Completeness** - Are edge cases considered?
|
|
- **Safety** - What are the risks?
|
|
- **Alternatives** - Have other approaches been explored?
|
|
|
|
Returns confidence scores and recommends PROCEED, PROCEED_WITH_CAUTION, REQUIRE_REVIEW, or BLOCKED.
|
|
|
|
### 6. PluralisticDeliberationOrchestrator
|
|
|
|
Facilitates multi-stakeholder deliberation when BoundaryEnforcer flags values conflicts:
|
|
|
|
- **Conflict Detection** - Identifies moral frameworks in tension (deontological, consequentialist, care ethics, etc.)
|
|
- **Stakeholder Engagement** - Identifies affected parties requiring representation (human approval mandatory)
|
|
- **Non-Hierarchical Deliberation** - No automatic value ranking (privacy vs. safety decisions require structured process)
|
|
- **Outcome Documentation** - Records decision, dissenting views, moral remainder, and precedent applicability
|
|
- **Provisional Decisions** - All values decisions are reviewable when context changes
|
|
|
|
AI facilitates deliberation, humans decide. Precedents are informative, not binding.
|
|
|
|
## Why "Tractatus"?
|
|
|
|
The name honors Ludwig Wittgenstein's *Tractatus Logico-Philosophicus*, which established that:
|
|
|
|
1. **Language has limits** - Not everything can be meaningfully expressed
|
|
2. **Boundaries are structural** - These limits aren't defects, they're inherent
|
|
3. **Clarity comes from precision** - Defining what can and cannot be said
|
|
|
|
Applied to AI:
|
|
|
|
1. **AI judgment has limits** - Not every decision can be safely automated
|
|
2. **Safety comes from architecture** - Build boundaries into the system structure
|
|
3. **Reliability requires specification** - Precisely define where AI must defer to humans
|
|
|
|
## Key Principles
|
|
|
|
### 1. Structural Safety Over Behavioral Safety
|
|
|
|
Traditional: "Train the AI to be safe"
|
|
Tractatus: "Make unsafe actions structurally impossible"
|
|
|
|
### 2. Explicit Over Implicit
|
|
|
|
Traditional: "The AI should infer user intent"
|
|
Tractatus: "Track explicit instructions and enforce them"
|
|
|
|
### 3. Degradation Detection Over Perfection Assumption
|
|
|
|
Traditional: "The AI should maintain quality"
|
|
Tractatus: "Monitor for degradation and intervene before failure"
|
|
|
|
### 4. Human Agency Over AI Autonomy
|
|
|
|
Traditional: "Give the AI maximum autonomy"
|
|
Tractatus: "Reserve certain decisions for human judgment"
|
|
|
|
## Real-World Impact
|
|
|
|
The Tractatus framework prevents failure modes like:
|
|
|
|
### The 27027 Incident
|
|
|
|
User explicitly instructed: "Check MongoDB at port 27027". AI immediately used port 27017 instead. Not forgetting—the AI's training pattern "MongoDB = 27017" was so strong it **autocorrected** the explicit instruction in real-time, like a spell-checker changing a deliberately unusual word. This happened because:
|
|
|
|
1. Pattern recognition bias overrode explicit instruction (immediate, not delayed)
|
|
2. No validation caught the training pattern override
|
|
3. Problem gets WORSE as AI capabilities increase (stronger training patterns)
|
|
|
|
**InstructionPersistenceClassifier + CrossReferenceValidator** prevent this by storing explicit instructions with HIGH persistence and blocking any action that conflicts—even from training patterns.
|
|
|
|
### Context Degradation
|
|
|
|
In long sessions (150k+ tokens), AI quality silently degrades:
|
|
|
|
- Forgets earlier instructions
|
|
- Makes increasingly careless errors
|
|
- Fails to verify assumptions
|
|
|
|
**ContextPressureMonitor** detects this degradation and recommends session handoffs.
|
|
|
|
### Values Creep
|
|
|
|
AI systems gradually make decisions in values-sensitive domains without realizing it:
|
|
|
|
- Choosing privacy vs. performance
|
|
- Deciding what constitutes "harmful" content
|
|
- Determining appropriate user agency levels
|
|
|
|
**BoundaryEnforcer** blocks these decisions and requires human judgment.
|
|
|
|
## Who Should Use Tractatus?
|
|
|
|
### Researchers
|
|
|
|
- Formal safety guarantees through architectural constraints
|
|
- Novel approach to alignment problem
|
|
- Empirical validation of degradation detection
|
|
|
|
### Implementers
|
|
|
|
- Production-ready code (Node.js, tested, documented)
|
|
- Integration guides for existing systems
|
|
- Immediate safety improvements
|
|
|
|
### Advocates
|
|
|
|
- Clear communication framework for AI safety
|
|
- Non-technical explanations of core concepts
|
|
- Policy implications and recommendations
|
|
|
|
## Getting Started
|
|
|
|
1. **Read the Core Concepts** - Understand the six services
|
|
2. **Review the Technical Specification** - See how it works in practice
|
|
3. **Explore the Case Studies** - Real-world failure modes and prevention
|
|
4. **Try the Interactive Demos** - Hands-on experience with the framework
|
|
|
|
## Status
|
|
|
|
**Phase 1 Implementation Complete (2025-10-07)**
|
|
|
|
- All six core services implemented and tested (100% coverage)
|
|
- 192 unit tests passing (including PluralisticDeliberationOrchestrator)
|
|
- Instruction persistence database operational
|
|
- Active governance for development sessions
|
|
- Value pluralism framework integrated (October 2025)
|
|
|
|
**This website** is built using the Tractatus framework to govern its own development - a practice called "dogfooding."
|
|
|
|
## Contributing
|
|
|
|
The Tractatus framework is open source and welcomes contributions:
|
|
|
|
- **Research** - Formal verification, theoretical extensions
|
|
- **Implementation** - Ports to other languages/platforms
|
|
- **Case Studies** - Document real-world applications
|
|
- **Documentation** - Improve clarity and accessibility
|
|
|
|
## License
|
|
|
|
Apache 2.0 - See [LICENSE](https://github.com/anthropics/tractatus/blob/main/LICENSE) for full terms
|
|
|
|
## Contact
|
|
|
|
- **Email**: john.stroh.nz@pm.me
|
|
- **GitHub**: https://github.com/anthropics/tractatus
|
|
- **Website**: agenticgovernance.digital
|
|
|
|
---
|
|
|
|
**Next:** [Core Concepts](https://agenticgovernance.digital/docs.html?doc=core-concepts-of-the-tractatus-framework) | [Implementation Guide](https://agenticgovernance.digital/docs.html?doc=implementation-guide-python-code-examples) | [Case Studies](https://agenticgovernance.digital/docs.html?category=case-studies)
|