Updates 9 remaining markdown source files from Apache 2.0 to CC BY 4.0. These are the sources used to regenerate the corresponding PDFs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
278 lines
11 KiB
Markdown
278 lines
11 KiB
Markdown
# Introduction to the Tractatus Framework
|
|
|
|
**Version:** 0.5.0 (Phase 5 Proof-of-Concept)
|
|
**Last Updated:** 2025-10-12
|
|
**Status:** Active development with production deployment
|
|
|
|
---
|
|
|
|
## What Is Tractatus?
|
|
|
|
Tractatus is an **architectural AI safety framework** that enforces boundaries through system structure rather than behavioral training. Instead of hoping LLMs "behave correctly," Tractatus makes certain decision types **structurally prevented** without human approval.
|
|
|
|
The framework operates through six specialized services that continuously monitor LLM operations, detect unsafe conditions, and enforce mandatory human judgment for values-sensitive decisions.
|
|
|
|
## The Core Problem
|
|
|
|
Current AI safety approaches rely on alignment training (teaching "correct" values), Constitutional AI (embedding principles in training), and RLHF (Reinforcement Learning from Human Feedback).
|
|
|
|
**Fundamental flaw:** These assume AI maintains alignment regardless of context pressure or competing constraints. Empirical evidence shows this fails.
|
|
|
|
## The Tractatus Solution
|
|
|
|
Rather than training AI to make "good decisions," Tractatus **removes certain decisions from AI authority entirely**.
|
|
|
|
### Core Principle
|
|
|
|
> **"Whereof the AI cannot safely decide, thereof it must request human judgment."**
|
|
|
|
Inspired by Wittgenstein's Tractatus Logico-Philosophicus: recognize limits, enforce them structurally, and be explicit about boundaries.
|
|
|
|
### Decision Boundaries
|
|
|
|
AI defers to humans when decisions involve:
|
|
|
|
1. **Irreducible values conflicts** - Privacy vs safety, autonomy vs harm prevention
|
|
2. **Irreversible consequences** - Data deletion, architectural changes, security modifications
|
|
3. **Cultural/social context** - Decisions requiring human cultural understanding
|
|
4. **Competing moral frameworks** - Cases where legitimate values conflict
|
|
|
|
## Six Core Services
|
|
|
|
### 1. InstructionPersistenceClassifier
|
|
|
|
**Function:** Classifies every instruction by strategic importance and temporal scope.
|
|
|
|
**Quadrants:**
|
|
- **STRATEGIC** - Mission/values (persistence: PERMANENT)
|
|
- **OPERATIONAL** - Processes/standards (persistence: HIGH)
|
|
- **TACTICAL** - Specific tasks (persistence: LOW-MEDIUM)
|
|
- **SYSTEM** - Technical config (persistence: HIGH)
|
|
- **STOCHASTIC** - Creative exploration (persistence: VARIABLE)
|
|
|
|
**Why it matters:** Prevents instruction drift and ensures critical directives persist across sessions.
|
|
|
|
### 2. CrossReferenceValidator
|
|
|
|
**Function:** Validates proposed actions against stored instruction history before execution.
|
|
|
|
**Prevents:** Pattern recognition bias where LLM training overrides explicit instructions.
|
|
|
|
**Example:** User says "MongoDB port 27027", LLM's training pattern autocorrects to "27017". CrossReferenceValidator blocks this as instruction conflict.
|
|
|
|
### 3. BoundaryEnforcer
|
|
|
|
**Function:** Structurally blocks decisions in protected domains, requiring human approval.
|
|
|
|
**Protected domains:**
|
|
- Values decisions (privacy, user agency, ethics)
|
|
- Irreversible changes (deletions, schema changes)
|
|
- Security modifications (authentication, access control)
|
|
- Financial decisions (pricing, billing, payments)
|
|
|
|
**Result:** AI is prevented from executing these decisions without explicit human approval.
|
|
|
|
### 4. ContextPressureMonitor
|
|
|
|
**Function:** Tracks session degradation across five factors.
|
|
|
|
**Monitors:**
|
|
- Conversation length (40% weight) - PRIMARY factor: message count drives compaction events
|
|
- Token usage (30% weight) - Context window pressure
|
|
- Task complexity (15% weight) - Competing demands
|
|
- Error frequency (10% weight) - Quality indicators
|
|
- Instruction density (5% weight) - Directive overload
|
|
|
|
**Action:** Recommends session handoff before quality degrades.
|
|
|
|
### 5. MetacognitiveVerifier
|
|
|
|
**Function:** LLM evaluates its own reasoning before proposing complex actions.
|
|
|
|
**Checks:**
|
|
- Alignment with stated goals
|
|
- Internal coherence
|
|
- Completeness (edge cases considered)
|
|
- Safety risks
|
|
- Alternatives explored
|
|
|
|
**Output:** Confidence score + recommendation (PROCEED / REQUIRE_REVIEW / BLOCKED)
|
|
|
|
### 6. PluralisticDeliberationOrchestrator
|
|
|
|
**Function:** Manages decisions involving competing values frameworks.
|
|
|
|
**Process:**
|
|
1. Detects values conflicts (privacy vs safety, rights vs consequences)
|
|
2. Identifies affected stakeholder groups
|
|
3. Structures multi-perspective deliberation
|
|
4. Documents all positions (including dissent)
|
|
5. Creates reviewable precedents
|
|
|
|
**Principle:** When values genuinely conflict, deliberation quality matters more than decision speed. AI facilitates; humans decide.
|
|
|
|
## Why "Tractatus"?
|
|
|
|
Named after Ludwig Wittgenstein's *Tractatus Logico-Philosophicus* (1921), which established:
|
|
|
|
1. **Language has limits** - Not everything can be meaningfully stated
|
|
2. **Boundaries are structural** - Limits are inherent, not defects
|
|
3. **Clarity through precision** - Define what can and cannot be said
|
|
|
|
Applied to AI safety:
|
|
|
|
1. **AI judgment has limits** - Not every decision can be safely automated
|
|
2. **Safety through architecture** - Build boundaries into system structure
|
|
3. **Reliability through specification** - Precisely define where humans must decide
|
|
|
|
## Demonstrated Failure Modes Prevented
|
|
|
|
### Port 27027 Incident (2025-10-06)
|
|
|
|
**What happened:** User specified MongoDB port 27027. LLM immediately used 27017 instead—not through forgetting, but through pattern recognition autocorrection. Training data "MongoDB=27017" was so strong it overrode the explicit instruction in real-time.
|
|
|
|
**Tractatus prevention:** InstructionPersistenceClassifier + CrossReferenceValidator store explicit parameters and block any action conflicting with stored instructions—even from training patterns.
|
|
|
|
### Context Degradation (Multiple sessions)
|
|
|
|
**What happens:** Beyond 150k tokens, LLM quality silently degrades: forgets instructions, makes careless errors, fails to verify assumptions.
|
|
|
|
**Tractatus prevention:** ContextPressureMonitor calculates degradation score and recommends session handoff at 75% threshold—before failure occurs.
|
|
|
|
### Values Creep (Ongoing risk)
|
|
|
|
**What happens:** LLM gradually makes values-sensitive decisions without recognizing them as such: privacy vs performance trade-offs, "harmful" content definitions, user agency boundaries.
|
|
|
|
**Tractatus prevention:** BoundaryEnforcer structurally blocks these decisions. LLM cannot execute them without explicit human approval.
|
|
|
|
## Current Implementation Status
|
|
|
|
**Production deployment:** agenticgovernance.digital (this website)
|
|
**Development governance:** Active (this website built under Tractatus governance)
|
|
**Test coverage:** 192 unit tests passing (100% coverage on core services)
|
|
**Database:** Instruction persistence operational (MongoDB)
|
|
**Phase:** 5 PoC - Value pluralism integration active
|
|
|
|
**Dogfooding:** The Tractatus framework governs its own development. Every decision to modify this website passes through Tractatus services.
|
|
|
|
## Technical Architecture
|
|
|
|
- **Runtime:** Node.js (Express)
|
|
- **Database:** MongoDB (instruction persistence, precedent storage)
|
|
- **Frontend:** Vanilla JavaScript (no framework dependencies)
|
|
- **API:** RESTful (OpenAPI 3.0 spec available)
|
|
- **Services:** Six independent modules with defined interfaces
|
|
|
|
**Key design decision:** No machine learning in governance services. All boundaries are deterministic and auditable.
|
|
|
|
## Who Should Use Tractatus?
|
|
|
|
### AI Safety Researchers
|
|
- Architectural approach to alignment problem
|
|
- Formal specification of decision boundaries
|
|
- Empirical validation of degradation detection
|
|
- Novel framework for values pluralism in AI
|
|
|
|
### Software Teams Deploying LLMs
|
|
- Reference implementation code (tested, documented)
|
|
- Immediate safety improvements
|
|
- Integration guides for existing systems
|
|
- Prevents known failure modes
|
|
|
|
### Policy Makers / Advocates
|
|
- Clear framework for AI safety requirements
|
|
- Non-technical explanations available
|
|
- Addresses agency preservation
|
|
- Demonstrates practical implementation
|
|
|
|
## Integration Requirements
|
|
|
|
**Minimum:** LLM with structured output support, persistent storage for instruction history, ability to wrap LLM calls in governance layer.
|
|
|
|
**Recommended:** Session state management, token counting, user authentication for human approval workflows.
|
|
|
|
## Limitations
|
|
|
|
**What Tractatus does NOT do:**
|
|
- Train better LLMs (uses existing models as-is)
|
|
- Ensure "aligned" AI behavior
|
|
- Reduce risk of failures
|
|
- Replace human judgment
|
|
|
|
**What Tractatus DOES do:**
|
|
- Designed to detect specific known failure modes before execution
|
|
- Architecturally enforce boundaries on decision authority
|
|
- Monitor session quality degradation indicators
|
|
- Require human judgment for values-sensitive decisions
|
|
|
|
## Getting Started
|
|
|
|
1. **Read Core Concepts** - Understand the six services in detail
|
|
2. **Review Case Studies** - See real failure modes and prevention
|
|
3. **Check Technical Specification** - API reference and integration guide
|
|
4. **Explore Implementation Guide** - Step-by-step deployment
|
|
|
|
## Research Foundations
|
|
|
|
Tractatus integrates concepts from:
|
|
|
|
- **Philosophy of language** (Wittgenstein) - Limits and boundaries
|
|
- **Organizational theory** (March, Simon) - Bounded rationality, decision premises
|
|
- **Deliberative democracy** (Gutmann, Thompson) - Structured disagreement
|
|
- **Value pluralism** (Berlin, Chang) - Incommensurable values
|
|
- **Systems architecture** (Conway, Brooks) - Structural constraints and boundaries
|
|
|
|
See [Research Foundations](/docs.html) for academic grounding and citations.
|
|
|
|
## Contributing
|
|
|
|
Tractatus is open source and welcomes contributions:
|
|
|
|
- **Code:** GitHub pull requests (Node.js, tests required)
|
|
- **Research:** Theoretical extensions, formal verification
|
|
- **Case studies:** Document real-world applications
|
|
- **Documentation:** Clarity improvements, translations
|
|
|
|
**Repository:** https://github.com/AgenticGovernance/tractatus
|
|
**Issues:** https://github.com/AgenticGovernance/tractatus/issues
|
|
|
|
## Contact
|
|
|
|
**Email:** john.stroh.nz@pm.me
|
|
**Website:** https://agenticgovernance.digital
|
|
|
|
---
|
|
|
|
## Licence
|
|
|
|
Copyright © 2026 John Stroh.
|
|
|
|
This work is licensed under the [Creative Commons Attribution 4.0 International Licence (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).
|
|
|
|
You are free to share, copy, redistribute, adapt, remix, transform, and build upon this material for any purpose, including commercially, provided you give appropriate attribution, provide a link to the licence, and indicate if changes were made.
|
|
|
|
**Note:** The Tractatus AI Safety Framework source code is separately licensed under the Apache License 2.0. This Creative Commons licence applies to the research paper text and figures only.
|
|
|
|
---
|
|
|
|
## Document Metadata
|
|
|
|
<div class="document-metadata">
|
|
|
|
- **Version:** 0.5.0
|
|
- **Created:** 2025-10-12
|
|
- **Last Modified:** 2025-10-13
|
|
- **Author:** John Stroh
|
|
- **Word Count:** 1,372 words
|
|
- **Reading Time:** ~7 minutes
|
|
- **Document ID:** introduction-to-the-tractatus-framework
|
|
- **Status:** Active
|
|
|
|
</div>
|
|
|
|
---
|
|
|
|
**Next Steps:**
|
|
- [Core Concepts: Deep Dive into Six Services →](/docs.html)
|
|
- [Case Studies: Real-World Failure Modes →](/docs.html)
|
|
- [Implementation Guide: Deploy Tractatus →](/docs.html)
|