tractatus/docs/markdown/introduction.md

---
title: Introduction to the Tractatus Framework
slug: introduction
quadrant: STRATEGIC
persistence: HIGH
version: 1.0
type: framework
author: Tractatus Framework Team
created: 2025-09-01
modified: 2025-10-21
---

# Introduction to the Tractatus Framework

## What is Tractatus?

The **Tractatus-Based LLM Safety Framework** is a world-first architectural approach to AI safety that preserves human agency through **structural design** rather than aspirational goals.

Instead of hoping AI systems "behave correctly," Tractatus implements **architectural constraints** that certain decision types **structurally require human judgment**. This creates bounded AI operation that scales safely with capability growth.

## The Core Problem

Current AI safety approaches rely on:
- Alignment training (hoping the AI learns the "right" values)
- Constitutional AI (embedding principles in training)
- RLHF (Reinforcement Learning from Human Feedback)

These approaches share a fundamental flaw: **they assume the AI will maintain alignment** regardless of capability level or context pressure.

## The Tractatus Solution

Tractatus takes a different approach inspired by Ludwig Wittgenstein's philosophy of language and meaning:

> **"Whereof one cannot speak, thereof one must be silent."**
> — Ludwig Wittgenstein, Tractatus Logico-Philosophicus

Applied to AI safety:

> **"Whereof the AI cannot safely decide, thereof it must request human judgment."**

### Architectural Boundaries

The framework defines **decision boundaries** based on:

1. **Domain complexity** - Can this decision be systematized?
2. **Values sensitivity** - Does this decision involve irreducible human values?
3. **Irreversibility** - Can mistakes be corrected without harm?
4. **Context dependence** - Does this decision require human cultural/social understanding?

## Core Innovation

The Tractatus framework is built on **six core services** that work together to ensure AI operations remain within safe boundaries:

### 1. InstructionPersistenceClassifier

Classifies instructions into five quadrants based on their strategic importance and persistence:

- **STRATEGIC** - Mission-critical, permanent decisions (HIGH persistence)
- **OPERATIONAL** - Standard operating procedures (MEDIUM-HIGH persistence)
- **TACTICAL** - Specific tasks with defined scope (LOW-MEDIUM persistence)
- **SYSTEM** - Technical configuration (HIGH persistence)
- **STOCHASTIC** - Exploratory, creative work (VARIABLE persistence)

All classified instructions are stored in `.claude/instruction-history.json` where they persist across sessions, creating an institutional memory that prevents instruction drift and ensures long-term consistency.

### 2. CrossReferenceValidator

Prevents the "27027 failure mode" where AI's training patterns immediately override explicit instructions:

- Validates all AI actions against stored instruction history
- Detects pattern recognition bias before execution
- Prevents parameter overrides (e.g., AI using port 27017 when user explicitly said port 27027)

### 3. BoundaryEnforcer

Ensures certain decision types **structurally require human approval**:

- **Values decisions** - Privacy vs. performance, ethics, user agency
- **Irreversible changes** - Data deletion, architectural changes
- **High-risk operations** - Security changes, financial decisions

### 4. ContextPressureMonitor

Tracks session degradation across multiple factors:

- **Conversation length** (40% weight) - Message count drives compaction events (PRIMARY degradation factor)
- **Token usage** (30% weight) - Context window pressure
- **Task complexity** (15% weight) - Concurrent tasks, dependencies
- **Error frequency** (10% weight) - Recent errors indicate degraded state
- **Instruction density** (5% weight) - Too many competing directives

**Updated 2025-10-12:** Weights rebalanced after observing that compaction events (triggered by message count ~60 messages, not just tokens) are the PRIMARY cause of session disruption. Each compaction loses critical context and degrades quality dramatically.

Recommends session handoffs before quality degrades.

### 5. MetacognitiveVerifier

AI self-checks its own reasoning before proposing actions:

- **Alignment** - Does this match stated goals?
- **Coherence** - Is the reasoning internally consistent?
- **Completeness** - Are edge cases considered?
- **Safety** - What are the risks?
- **Alternatives** - Have other approaches been explored?

Returns confidence scores and recommends PROCEED, PROCEED_WITH_CAUTION, REQUIRE_REVIEW, or BLOCKED.

### 6. PluralisticDeliberationOrchestrator

Facilitates multi-stakeholder deliberation when BoundaryEnforcer flags values conflicts:

- **Conflict Detection** - Identifies moral frameworks in tension (deontological, consequentialist, care ethics, etc.)
- **Stakeholder Engagement** - Identifies affected parties requiring representation (human approval mandatory)
- **Non-Hierarchical Deliberation** - No automatic value ranking (privacy vs. safety decisions require structured process)
- **Outcome Documentation** - Records decision, dissenting views, moral remainder, and precedent applicability
- **Provisional Decisions** - All values decisions are reviewable when context changes

AI facilitates deliberation, humans decide. Precedents are informative, not binding.

## Why "Tractatus"?

The name honors Ludwig Wittgenstein's *Tractatus Logico-Philosophicus*, which established that:

1. **Language has limits** - Not everything can be meaningfully expressed
2. **Boundaries are structural** - These limits aren't defects, they're inherent
3. **Clarity comes from precision** - Defining what can and cannot be said

Applied to AI:

1. **AI judgment has limits** - Not every decision can be safely automated
2. **Safety comes from architecture** - Build boundaries into the system structure
3. **Reliability requires specification** - Precisely define where AI must defer to humans

## Key Principles

### 1. Structural Safety Over Behavioral Safety

Traditional: "Train the AI to be safe"
Tractatus: "Make unsafe actions structurally impossible"

### 2. Explicit Over Implicit

Traditional: "The AI should infer user intent"
Tractatus: "Track explicit instructions and enforce them"

### 3. Degradation Detection Over Perfection Assumption

Traditional: "The AI should maintain quality"
Tractatus: "Monitor for degradation and intervene before failure"

### 4. Human Agency Over AI Autonomy

Traditional: "Give the AI maximum autonomy"
Tractatus: "Reserve certain decisions for human judgment"

## Real-World Impact

The Tractatus framework prevents failure modes like:

### The 27027 Incident

User explicitly instructed: "Check MongoDB at port 27027". AI immediately used port 27017 instead. Not forgetting—the AI's training pattern "MongoDB = 27017" was so strong it **autocorrected** the explicit instruction in real-time, like a spell-checker changing a deliberately unusual word. This happened because:

1. Pattern recognition bias overrode explicit instruction (immediate, not delayed)
2. No validation caught the training pattern override
3. Problem gets WORSE as AI capabilities increase (stronger training patterns)

**InstructionPersistenceClassifier + CrossReferenceValidator** prevent this by storing explicit instructions with HIGH persistence and blocking any action that conflicts—even from training patterns.

### Context Degradation

In long sessions (150k+ tokens), AI quality silently degrades:

- Forgets earlier instructions
- Makes increasingly careless errors
- Fails to verify assumptions

**ContextPressureMonitor** detects this degradation and recommends session handoffs.

### Values Creep

AI systems gradually make decisions in values-sensitive domains without realizing it:

- Choosing privacy vs. performance
- Deciding what constitutes "harmful" content
- Determining appropriate user agency levels

**BoundaryEnforcer** blocks these decisions and requires human judgment.

## Who Should Use Tractatus?

### Researchers

- Structural safety constraints through architectural enforcement
- Novel approach to alignment problem
- Empirical validation of degradation detection

### Implementers

- Reference implementation code (Node.js, tested, documented)
- Integration guides for existing systems
- Immediate safety improvements

### Advocates

- Clear communication framework for AI safety
- Non-technical explanations of core concepts
- Policy implications and recommendations

## Getting Started

1. **Read the Core Concepts** - Understand the six services
2. **Review the Technical Specification** - See how it works in practice
3. **Explore the Case Studies** - Real-world failure modes and prevention
4. **Try the Interactive Demos** - Hands-on experience with the framework

## Status

**Phase 1 Implementation Complete (2025-10-07)**

- All six core services implemented and tested (100% coverage)
- 192 unit tests passing (including PluralisticDeliberationOrchestrator)
- Instruction persistence database operational
- Active governance for development sessions
- Value pluralism framework integrated (October 2025)

**This website** is built using the Tractatus framework to govern its own development - a practice called "dogfooding."

## Contributing

The Tractatus framework is open source and welcomes contributions:

- **Research** - Formal verification, theoretical extensions
- **Implementation** - Ports to other languages/platforms
- **Case Studies** - Document real-world applications
- **Documentation** - Improve clarity and accessibility

## Contact

- **Email**: john.stroh.nz@pm.me
- **GitHub**: https://github.com/anthropics/tractatus
- **Website**: agenticgovernance.digital

---

**Next:** [Core Concepts](https://agenticgovernance.digital/docs.html?doc=core-concepts-of-the-tractatus-framework) | [Implementation Guide](https://agenticgovernance.digital/docs.html?doc=implementation-guide-python-code-examples) | [Case Studies](https://agenticgovernance.digital/docs.html?category=case-studies)

---

## Document Metadata

<div class="document-metadata">

- **Version:** 1.0
- **Created:** 2025-09-01
- **Last Modified:** 2025-10-13
- **Author:** SyDigital Ltd
- **Word Count:** 1,228 words
- **Reading Time:** ~6 minutes
- **Document ID:** introduction
- **Status:** Active

</div>

---

## Licence

Copyright © 2026 John Stroh.

This work is licensed under the [Creative Commons Attribution 4.0 International Licence (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).

You are free to share, copy, redistribute, adapt, remix, transform, and build upon this material for any purpose, including commercially, provided you give appropriate attribution, provide a link to the licence, and indicate if changes were made.

**Note:** The Tractatus AI Safety Framework source code is separately licensed under the Apache License 2.0. This Creative Commons licence applies to the research paper text and figures only.