TheFlow 7cd10978f6 docs: regenerate PDFs and update documentation metadata

- Regenerated all PDF downloads with updated timestamps
- Updated markdown metadata across documentation
- Fixed ContextPressureMonitor test for conversation length tracking
- Documentation consistency improvements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-10-14 10:53:48 +13:00

11 KiB

Raw Blame History

Introduction to the Tractatus Framework

Version: 0.5.0 (Phase 5 Proof-of-Concept) Last Updated: 2025-10-12 Status: Active development with production deployment

What Is Tractatus?

Tractatus is an architectural AI safety framework that enforces boundaries through system structure rather than behavioral training. Instead of hoping LLMs "behave correctly," Tractatus makes certain decision types structurally prevented without human approval.

The framework operates through six specialized services that continuously monitor LLM operations, detect unsafe conditions, and enforce mandatory human judgment for values-sensitive decisions.

The Core Problem

Current AI safety approaches rely on alignment training (teaching "correct" values), Constitutional AI (embedding principles in training), and RLHF (Reinforcement Learning from Human Feedback).

Fundamental flaw: These assume AI maintains alignment regardless of context pressure or competing constraints. Empirical evidence shows this fails.

The Tractatus Solution

Rather than training AI to make "good decisions," Tractatus removes certain decisions from AI authority entirely.

Core Principle

"Whereof the AI cannot safely decide, thereof it must request human judgment."

Inspired by Wittgenstein's Tractatus Logico-Philosophicus: recognize limits, enforce them structurally, and be explicit about boundaries.

Decision Boundaries

AI defers to humans when decisions involve:

Irreducible values conflicts - Privacy vs safety, autonomy vs harm prevention
Irreversible consequences - Data deletion, architectural changes, security modifications
Cultural/social context - Decisions requiring human cultural understanding
Competing moral frameworks - Cases where legitimate values conflict

Six Core Services

1. InstructionPersistenceClassifier

Function: Classifies every instruction by strategic importance and temporal scope.

Quadrants:

STRATEGIC - Mission/values (persistence: PERMANENT)
OPERATIONAL - Processes/standards (persistence: HIGH)
TACTICAL - Specific tasks (persistence: LOW-MEDIUM)
SYSTEM - Technical config (persistence: HIGH)
STOCHASTIC - Creative exploration (persistence: VARIABLE)

Why it matters: Prevents instruction drift and ensures critical directives persist across sessions.

2. CrossReferenceValidator

Function: Validates proposed actions against stored instruction history before execution.

Prevents: Pattern recognition bias where LLM training overrides explicit instructions.

Example: User says "MongoDB port 27027", LLM's training pattern autocorrects to "27017". CrossReferenceValidator blocks this as instruction conflict.

3. BoundaryEnforcer

Function: Structurally blocks decisions in protected domains, requiring human approval.

Protected domains:

Values decisions (privacy, user agency, ethics)
Irreversible changes (deletions, schema changes)
Security modifications (authentication, access control)
Financial decisions (pricing, billing, payments)

Result: AI is prevented from executing these decisions without explicit human approval.

4. ContextPressureMonitor

Function: Tracks session degradation across five factors.

Monitors:

Conversation length (40% weight) - PRIMARY factor: message count drives compaction events
Token usage (30% weight) - Context window pressure
Task complexity (15% weight) - Competing demands
Error frequency (10% weight) - Quality indicators
Instruction density (5% weight) - Directive overload

Action: Recommends session handoff before quality degrades.

5. MetacognitiveVerifier

Function: LLM evaluates its own reasoning before proposing complex actions.

Checks:

Alignment with stated goals
Internal coherence
Completeness (edge cases considered)
Safety risks
Alternatives explored

Output: Confidence score + recommendation (PROCEED / REQUIRE_REVIEW / BLOCKED)

6. PluralisticDeliberationOrchestrator

Function: Manages decisions involving competing values frameworks.

Process:

Detects values conflicts (privacy vs safety, rights vs consequences)
Identifies affected stakeholder groups
Structures multi-perspective deliberation
Documents all positions (including dissent)
Creates reviewable precedents

Principle: When values genuinely conflict, deliberation quality matters more than decision speed. AI facilitates; humans decide.

Why "Tractatus"?

Named after Ludwig Wittgenstein's Tractatus Logico-Philosophicus (1921), which established:

Language has limits - Not everything can be meaningfully stated
Boundaries are structural - Limits are inherent, not defects
Clarity through precision - Define what can and cannot be said

Applied to AI safety:

AI judgment has limits - Not every decision can be safely automated
Safety through architecture - Build boundaries into system structure
Reliability through specification - Precisely define where humans must decide

Demonstrated Failure Modes Prevented

Port 27027 Incident (2025-10-06)

What happened: User specified MongoDB port 27027. LLM immediately used 27017 instead—not through forgetting, but through pattern recognition autocorrection. Training data "MongoDB=27017" was so strong it overrode the explicit instruction in real-time.

Tractatus prevention: InstructionPersistenceClassifier + CrossReferenceValidator store explicit parameters and block any action conflicting with stored instructions—even from training patterns.

Context Degradation (Multiple sessions)

What happens: Beyond 150k tokens, LLM quality silently degrades: forgets instructions, makes careless errors, fails to verify assumptions.

Tractatus prevention: ContextPressureMonitor calculates degradation score and recommends session handoff at 75% threshold—before failure occurs.

Values Creep (Ongoing risk)

What happens: LLM gradually makes values-sensitive decisions without recognizing them as such: privacy vs performance trade-offs, "harmful" content definitions, user agency boundaries.

Tractatus prevention: BoundaryEnforcer structurally blocks these decisions. LLM cannot execute them without explicit human approval.

Current Implementation Status

Production deployment: agenticgovernance.digital (this website) Development governance: Active (this website built under Tractatus governance) Test coverage: 192 unit tests passing (100% coverage on core services) Database: Instruction persistence operational (MongoDB) Phase: 5 PoC - Value pluralism integration active

Dogfooding: The Tractatus framework governs its own development. Every decision to modify this website passes through Tractatus services.

Technical Architecture

Runtime: Node.js (Express)
Database: MongoDB (instruction persistence, precedent storage)
Frontend: Vanilla JavaScript (no framework dependencies)
API: RESTful (OpenAPI 3.0 spec available)
Services: Six independent modules with defined interfaces

Key design decision: No machine learning in governance services. All boundaries are deterministic and auditable.

Who Should Use Tractatus?

AI Safety Researchers

Architectural approach to alignment problem
Formal specification of decision boundaries
Empirical validation of degradation detection
Novel framework for values pluralism in AI

Software Teams Deploying LLMs

Production-ready code (tested, documented)
Immediate safety improvements
Integration guides for existing systems
Prevents known failure modes

Policy Makers / Advocates

Clear framework for AI safety requirements
Non-technical explanations available
Addresses agency preservation
Demonstrates practical implementation

Integration Requirements

Minimum: LLM with structured output support, persistent storage for instruction history, ability to wrap LLM calls in governance layer.

Recommended: Session state management, token counting, user authentication for human approval workflows.

Limitations

What Tractatus does NOT do:

Train better LLMs (uses existing models as-is)
Guarantee "aligned" AI behavior
Eliminate all possible failures
Replace human judgment

What Tractatus DOES do:

Designed to detect specific known failure modes before execution
Architecturally enforce boundaries on decision authority
Monitor session quality degradation indicators
Require human judgment for values-sensitive decisions

Getting Started

Read Core Concepts - Understand the six services in detail
Review Case Studies - See real failure modes and prevention
Check Technical Specification - API reference and integration guide
Explore Implementation Guide - Step-by-step deployment

Research Foundations

Tractatus integrates concepts from:

Philosophy of language (Wittgenstein) - Limits and boundaries
Organizational theory (March, Simon) - Bounded rationality, decision premises
Deliberative democracy (Gutmann, Thompson) - Structured disagreement
Value pluralism (Berlin, Chang) - Incommensurable values
Systems architecture (Conway, Brooks) - Structural constraints and boundaries

See Research Foundations for academic grounding and citations.

Contributing

Tractatus is open source and welcomes contributions:

Code: GitHub pull requests (Node.js, tests required)
Research: Theoretical extensions, formal verification
Case studies: Document real-world applications
Documentation: Clarity improvements, translations

Repository: https://github.com/AgenticGovernance/tractatus Issues: https://github.com/AgenticGovernance/tractatus/issues

Contact

Email: john.stroh.nz@pm.me Website: https://agenticgovernance.digital

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at:

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Summary:

✅ Commercial use allowed
✅ Modification allowed
✅ Distribution allowed
✅ Patent grant included
✅ Private use allowed
⚠️ Must include license and copyright notice
⚠️ Must state significant changes
❌ No trademark rights granted
❌ No liability or warranty

Document Metadata

Version: 0.5.0
Created: 2025-10-12
Last Modified: 2025-10-13
Author: John Stroh
Word Count: 1,372 words
Reading Time: ~7 minutes
Document ID: introduction-to-the-tractatus-framework
Status: Active

Next Steps:

11 KiB Raw Blame History