TheFlow fa8654b399 docs: Migrate markdown sources to CC BY 4.0 licence for PDF regeneration

Updates 9 remaining markdown source files from Apache 2.0 to CC BY 4.0.
These are the sources used to regenerate the corresponding PDFs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-22 17:02:37 +13:00

11 KiB

Raw Blame History

title	slug	quadrant	persistence	version	type	author	created	modified
Introduction to the Tractatus Framework	introduction	STRATEGIC	HIGH	1.0	framework	Tractatus Framework Team	2025-09-01	2025-10-21

Introduction to the Tractatus Framework

What is Tractatus?

The Tractatus-Based LLM Safety Framework is a world-first architectural approach to AI safety that preserves human agency through structural design rather than aspirational goals.

Instead of hoping AI systems "behave correctly," Tractatus implements architectural constraints that certain decision types structurally require human judgment. This creates bounded AI operation that scales safely with capability growth.

The Core Problem

Current AI safety approaches rely on:

Alignment training (hoping the AI learns the "right" values)
Constitutional AI (embedding principles in training)
RLHF (Reinforcement Learning from Human Feedback)

These approaches share a fundamental flaw: they assume the AI will maintain alignment regardless of capability level or context pressure.

The Tractatus Solution

Tractatus takes a different approach inspired by Ludwig Wittgenstein's philosophy of language and meaning:

"Whereof one cannot speak, thereof one must be silent." — Ludwig Wittgenstein, Tractatus Logico-Philosophicus

Applied to AI safety:

"Whereof the AI cannot safely decide, thereof it must request human judgment."

Architectural Boundaries

The framework defines decision boundaries based on:

Domain complexity - Can this decision be systematized?
Values sensitivity - Does this decision involve irreducible human values?
Irreversibility - Can mistakes be corrected without harm?
Context dependence - Does this decision require human cultural/social understanding?

Core Innovation

The Tractatus framework is built on six core services that work together to ensure AI operations remain within safe boundaries:

1. InstructionPersistenceClassifier

Classifies instructions into five quadrants based on their strategic importance and persistence:

STRATEGIC - Mission-critical, permanent decisions (HIGH persistence)
OPERATIONAL - Standard operating procedures (MEDIUM-HIGH persistence)
TACTICAL - Specific tasks with defined scope (LOW-MEDIUM persistence)
SYSTEM - Technical configuration (HIGH persistence)
STOCHASTIC - Exploratory, creative work (VARIABLE persistence)

All classified instructions are stored in .claude/instruction-history.json where they persist across sessions, creating an institutional memory that prevents instruction drift and ensures long-term consistency.

2. CrossReferenceValidator

Prevents the "27027 failure mode" where AI's training patterns immediately override explicit instructions:

Validates all AI actions against stored instruction history
Detects pattern recognition bias before execution
Prevents parameter overrides (e.g., AI using port 27017 when user explicitly said port 27027)

3. BoundaryEnforcer

Ensures certain decision types structurally require human approval:

Values decisions - Privacy vs. performance, ethics, user agency
Irreversible changes - Data deletion, architectural changes
High-risk operations - Security changes, financial decisions

4. ContextPressureMonitor

Tracks session degradation across multiple factors:

Conversation length (40% weight) - Message count drives compaction events (PRIMARY degradation factor)
Token usage (30% weight) - Context window pressure
Task complexity (15% weight) - Concurrent tasks, dependencies
Error frequency (10% weight) - Recent errors indicate degraded state
Instruction density (5% weight) - Too many competing directives

Updated 2025-10-12: Weights rebalanced after observing that compaction events (triggered by message count ~60 messages, not just tokens) are the PRIMARY cause of session disruption. Each compaction loses critical context and degrades quality dramatically.

Recommends session handoffs before quality degrades.

5. MetacognitiveVerifier

AI self-checks its own reasoning before proposing actions:

Alignment - Does this match stated goals?
Coherence - Is the reasoning internally consistent?
Completeness - Are edge cases considered?
Safety - What are the risks?
Alternatives - Have other approaches been explored?

Returns confidence scores and recommends PROCEED, PROCEED_WITH_CAUTION, REQUIRE_REVIEW, or BLOCKED.

6. PluralisticDeliberationOrchestrator

Facilitates multi-stakeholder deliberation when BoundaryEnforcer flags values conflicts:

Conflict Detection - Identifies moral frameworks in tension (deontological, consequentialist, care ethics, etc.)
Stakeholder Engagement - Identifies affected parties requiring representation (human approval mandatory)
Non-Hierarchical Deliberation - No automatic value ranking (privacy vs. safety decisions require structured process)
Outcome Documentation - Records decision, dissenting views, moral remainder, and precedent applicability
Provisional Decisions - All values decisions are reviewable when context changes

AI facilitates deliberation, humans decide. Precedents are informative, not binding.

Why "Tractatus"?

The name honors Ludwig Wittgenstein's Tractatus Logico-Philosophicus, which established that:

Language has limits - Not everything can be meaningfully expressed
Boundaries are structural - These limits aren't defects, they're inherent
Clarity comes from precision - Defining what can and cannot be said

Applied to AI:

AI judgment has limits - Not every decision can be safely automated
Safety comes from architecture - Build boundaries into the system structure
Reliability requires specification - Precisely define where AI must defer to humans

Key Principles

1. Structural Safety Over Behavioral Safety

Traditional: "Train the AI to be safe" Tractatus: "Make unsafe actions structurally impossible"

2. Explicit Over Implicit

Traditional: "The AI should infer user intent" Tractatus: "Track explicit instructions and enforce them"

3. Degradation Detection Over Perfection Assumption

Traditional: "The AI should maintain quality" Tractatus: "Monitor for degradation and intervene before failure"

4. Human Agency Over AI Autonomy

Traditional: "Give the AI maximum autonomy" Tractatus: "Reserve certain decisions for human judgment"

Real-World Impact

The Tractatus framework prevents failure modes like:

The 27027 Incident

User explicitly instructed: "Check MongoDB at port 27027". AI immediately used port 27017 instead. Not forgetting—the AI's training pattern "MongoDB = 27017" was so strong it autocorrected the explicit instruction in real-time, like a spell-checker changing a deliberately unusual word. This happened because:

Pattern recognition bias overrode explicit instruction (immediate, not delayed)
No validation caught the training pattern override
Problem gets WORSE as AI capabilities increase (stronger training patterns)

InstructionPersistenceClassifier + CrossReferenceValidator prevent this by storing explicit instructions with HIGH persistence and blocking any action that conflicts—even from training patterns.

Context Degradation

In long sessions (150k+ tokens), AI quality silently degrades:

Forgets earlier instructions
Makes increasingly careless errors
Fails to verify assumptions

ContextPressureMonitor detects this degradation and recommends session handoffs.

Values Creep

AI systems gradually make decisions in values-sensitive domains without realizing it:

Choosing privacy vs. performance
Deciding what constitutes "harmful" content
Determining appropriate user agency levels

BoundaryEnforcer blocks these decisions and requires human judgment.

Who Should Use Tractatus?

Researchers

Structural safety constraints through architectural enforcement
Novel approach to alignment problem
Empirical validation of degradation detection

Implementers

Reference implementation code (Node.js, tested, documented)
Integration guides for existing systems
Immediate safety improvements

Advocates

Clear communication framework for AI safety
Non-technical explanations of core concepts
Policy implications and recommendations

Getting Started

Read the Core Concepts - Understand the six services
Review the Technical Specification - See how it works in practice
Explore the Case Studies - Real-world failure modes and prevention
Try the Interactive Demos - Hands-on experience with the framework

Status

Phase 1 Implementation Complete (2025-10-07)

All six core services implemented and tested (100% coverage)
192 unit tests passing (including PluralisticDeliberationOrchestrator)
Instruction persistence database operational
Active governance for development sessions
Value pluralism framework integrated (October 2025)

This website is built using the Tractatus framework to govern its own development - a practice called "dogfooding."

Contributing

The Tractatus framework is open source and welcomes contributions:

Research - Formal verification, theoretical extensions
Implementation - Ports to other languages/platforms
Case Studies - Document real-world applications
Documentation - Improve clarity and accessibility

Contact

Email: john.stroh.nz@pm.me
GitHub: https://github.com/anthropics/tractatus
Website: agenticgovernance.digital

Next: Core Concepts | Implementation Guide | Case Studies

Document Metadata

Version: 1.0
Created: 2025-09-01
Last Modified: 2025-10-13
Author: SyDigital Ltd
Word Count: 1,228 words
Reading Time: ~6 minutes
Document ID: introduction
Status: Active

Licence

This work is licensed under the Creative Commons Attribution 4.0 International Licence (CC BY 4.0).

You are free to share, copy, redistribute, adapt, remix, transform, and build upon this material for any purpose, including commercially, provided you give appropriate attribution, provide a link to the licence, and indicate if changes were made.

Note: The Tractatus AI Safety Framework source code is separately licensed under the Apache License 2.0. This Creative Commons licence applies to the research paper text and figures only.

11 KiB Raw Blame History