tractatus/docs/markdown/introduction-to-the-tractatus-framework.md
TheFlow fa8654b399 docs: Migrate markdown sources to CC BY 4.0 licence for PDF regeneration
Updates 9 remaining markdown source files from Apache 2.0 to CC BY 4.0.
These are the sources used to regenerate the corresponding PDFs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 17:02:37 +13:00

278 lines
11 KiB
Markdown

# Introduction to the Tractatus Framework
**Version:** 0.5.0 (Phase 5 Proof-of-Concept)
**Last Updated:** 2025-10-12
**Status:** Active development with production deployment
---
## What Is Tractatus?
Tractatus is an **architectural AI safety framework** that enforces boundaries through system structure rather than behavioral training. Instead of hoping LLMs "behave correctly," Tractatus makes certain decision types **structurally prevented** without human approval.
The framework operates through six specialized services that continuously monitor LLM operations, detect unsafe conditions, and enforce mandatory human judgment for values-sensitive decisions.
## The Core Problem
Current AI safety approaches rely on alignment training (teaching "correct" values), Constitutional AI (embedding principles in training), and RLHF (Reinforcement Learning from Human Feedback).
**Fundamental flaw:** These assume AI maintains alignment regardless of context pressure or competing constraints. Empirical evidence shows this fails.
## The Tractatus Solution
Rather than training AI to make "good decisions," Tractatus **removes certain decisions from AI authority entirely**.
### Core Principle
> **"Whereof the AI cannot safely decide, thereof it must request human judgment."**
Inspired by Wittgenstein's Tractatus Logico-Philosophicus: recognize limits, enforce them structurally, and be explicit about boundaries.
### Decision Boundaries
AI defers to humans when decisions involve:
1. **Irreducible values conflicts** - Privacy vs safety, autonomy vs harm prevention
2. **Irreversible consequences** - Data deletion, architectural changes, security modifications
3. **Cultural/social context** - Decisions requiring human cultural understanding
4. **Competing moral frameworks** - Cases where legitimate values conflict
## Six Core Services
### 1. InstructionPersistenceClassifier
**Function:** Classifies every instruction by strategic importance and temporal scope.
**Quadrants:**
- **STRATEGIC** - Mission/values (persistence: PERMANENT)
- **OPERATIONAL** - Processes/standards (persistence: HIGH)
- **TACTICAL** - Specific tasks (persistence: LOW-MEDIUM)
- **SYSTEM** - Technical config (persistence: HIGH)
- **STOCHASTIC** - Creative exploration (persistence: VARIABLE)
**Why it matters:** Prevents instruction drift and ensures critical directives persist across sessions.
### 2. CrossReferenceValidator
**Function:** Validates proposed actions against stored instruction history before execution.
**Prevents:** Pattern recognition bias where LLM training overrides explicit instructions.
**Example:** User says "MongoDB port 27027", LLM's training pattern autocorrects to "27017". CrossReferenceValidator blocks this as instruction conflict.
### 3. BoundaryEnforcer
**Function:** Structurally blocks decisions in protected domains, requiring human approval.
**Protected domains:**
- Values decisions (privacy, user agency, ethics)
- Irreversible changes (deletions, schema changes)
- Security modifications (authentication, access control)
- Financial decisions (pricing, billing, payments)
**Result:** AI is prevented from executing these decisions without explicit human approval.
### 4. ContextPressureMonitor
**Function:** Tracks session degradation across five factors.
**Monitors:**
- Conversation length (40% weight) - PRIMARY factor: message count drives compaction events
- Token usage (30% weight) - Context window pressure
- Task complexity (15% weight) - Competing demands
- Error frequency (10% weight) - Quality indicators
- Instruction density (5% weight) - Directive overload
**Action:** Recommends session handoff before quality degrades.
### 5. MetacognitiveVerifier
**Function:** LLM evaluates its own reasoning before proposing complex actions.
**Checks:**
- Alignment with stated goals
- Internal coherence
- Completeness (edge cases considered)
- Safety risks
- Alternatives explored
**Output:** Confidence score + recommendation (PROCEED / REQUIRE_REVIEW / BLOCKED)
### 6. PluralisticDeliberationOrchestrator
**Function:** Manages decisions involving competing values frameworks.
**Process:**
1. Detects values conflicts (privacy vs safety, rights vs consequences)
2. Identifies affected stakeholder groups
3. Structures multi-perspective deliberation
4. Documents all positions (including dissent)
5. Creates reviewable precedents
**Principle:** When values genuinely conflict, deliberation quality matters more than decision speed. AI facilitates; humans decide.
## Why "Tractatus"?
Named after Ludwig Wittgenstein's *Tractatus Logico-Philosophicus* (1921), which established:
1. **Language has limits** - Not everything can be meaningfully stated
2. **Boundaries are structural** - Limits are inherent, not defects
3. **Clarity through precision** - Define what can and cannot be said
Applied to AI safety:
1. **AI judgment has limits** - Not every decision can be safely automated
2. **Safety through architecture** - Build boundaries into system structure
3. **Reliability through specification** - Precisely define where humans must decide
## Demonstrated Failure Modes Prevented
### Port 27027 Incident (2025-10-06)
**What happened:** User specified MongoDB port 27027. LLM immediately used 27017 instead—not through forgetting, but through pattern recognition autocorrection. Training data "MongoDB=27017" was so strong it overrode the explicit instruction in real-time.
**Tractatus prevention:** InstructionPersistenceClassifier + CrossReferenceValidator store explicit parameters and block any action conflicting with stored instructions—even from training patterns.
### Context Degradation (Multiple sessions)
**What happens:** Beyond 150k tokens, LLM quality silently degrades: forgets instructions, makes careless errors, fails to verify assumptions.
**Tractatus prevention:** ContextPressureMonitor calculates degradation score and recommends session handoff at 75% threshold—before failure occurs.
### Values Creep (Ongoing risk)
**What happens:** LLM gradually makes values-sensitive decisions without recognizing them as such: privacy vs performance trade-offs, "harmful" content definitions, user agency boundaries.
**Tractatus prevention:** BoundaryEnforcer structurally blocks these decisions. LLM cannot execute them without explicit human approval.
## Current Implementation Status
**Production deployment:** agenticgovernance.digital (this website)
**Development governance:** Active (this website built under Tractatus governance)
**Test coverage:** 192 unit tests passing (100% coverage on core services)
**Database:** Instruction persistence operational (MongoDB)
**Phase:** 5 PoC - Value pluralism integration active
**Dogfooding:** The Tractatus framework governs its own development. Every decision to modify this website passes through Tractatus services.
## Technical Architecture
- **Runtime:** Node.js (Express)
- **Database:** MongoDB (instruction persistence, precedent storage)
- **Frontend:** Vanilla JavaScript (no framework dependencies)
- **API:** RESTful (OpenAPI 3.0 spec available)
- **Services:** Six independent modules with defined interfaces
**Key design decision:** No machine learning in governance services. All boundaries are deterministic and auditable.
## Who Should Use Tractatus?
### AI Safety Researchers
- Architectural approach to alignment problem
- Formal specification of decision boundaries
- Empirical validation of degradation detection
- Novel framework for values pluralism in AI
### Software Teams Deploying LLMs
- Reference implementation code (tested, documented)
- Immediate safety improvements
- Integration guides for existing systems
- Prevents known failure modes
### Policy Makers / Advocates
- Clear framework for AI safety requirements
- Non-technical explanations available
- Addresses agency preservation
- Demonstrates practical implementation
## Integration Requirements
**Minimum:** LLM with structured output support, persistent storage for instruction history, ability to wrap LLM calls in governance layer.
**Recommended:** Session state management, token counting, user authentication for human approval workflows.
## Limitations
**What Tractatus does NOT do:**
- Train better LLMs (uses existing models as-is)
- Ensure "aligned" AI behavior
- Reduce risk of failures
- Replace human judgment
**What Tractatus DOES do:**
- Designed to detect specific known failure modes before execution
- Architecturally enforce boundaries on decision authority
- Monitor session quality degradation indicators
- Require human judgment for values-sensitive decisions
## Getting Started
1. **Read Core Concepts** - Understand the six services in detail
2. **Review Case Studies** - See real failure modes and prevention
3. **Check Technical Specification** - API reference and integration guide
4. **Explore Implementation Guide** - Step-by-step deployment
## Research Foundations
Tractatus integrates concepts from:
- **Philosophy of language** (Wittgenstein) - Limits and boundaries
- **Organizational theory** (March, Simon) - Bounded rationality, decision premises
- **Deliberative democracy** (Gutmann, Thompson) - Structured disagreement
- **Value pluralism** (Berlin, Chang) - Incommensurable values
- **Systems architecture** (Conway, Brooks) - Structural constraints and boundaries
See [Research Foundations](/docs.html) for academic grounding and citations.
## Contributing
Tractatus is open source and welcomes contributions:
- **Code:** GitHub pull requests (Node.js, tests required)
- **Research:** Theoretical extensions, formal verification
- **Case studies:** Document real-world applications
- **Documentation:** Clarity improvements, translations
**Repository:** https://github.com/AgenticGovernance/tractatus
**Issues:** https://github.com/AgenticGovernance/tractatus/issues
## Contact
**Email:** john.stroh.nz@pm.me
**Website:** https://agenticgovernance.digital
---
## Licence
Copyright © 2026 John Stroh.
This work is licensed under the [Creative Commons Attribution 4.0 International Licence (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).
You are free to share, copy, redistribute, adapt, remix, transform, and build upon this material for any purpose, including commercially, provided you give appropriate attribution, provide a link to the licence, and indicate if changes were made.
**Note:** The Tractatus AI Safety Framework source code is separately licensed under the Apache License 2.0. This Creative Commons licence applies to the research paper text and figures only.
---
## Document Metadata
<div class="document-metadata">
- **Version:** 0.5.0
- **Created:** 2025-10-12
- **Last Modified:** 2025-10-13
- **Author:** John Stroh
- **Word Count:** 1,372 words
- **Reading Time:** ~7 minutes
- **Document ID:** introduction-to-the-tractatus-framework
- **Status:** Active
</div>
---
**Next Steps:**
- [Core Concepts: Deep Dive into Six Services →](/docs.html)
- [Case Studies: Real-World Failure Modes →](/docs.html)
- [Implementation Guide: Deploy Tractatus →](/docs.html)