tractatus/docs/markdown/introduction-to-the-tractatus-framework.md
TheFlow 7cd10978f6 docs: regenerate PDFs and update documentation metadata
- Regenerated all PDF downloads with updated timestamps
- Updated markdown metadata across documentation
- Fixed ContextPressureMonitor test for conversation length tracking
- Documentation consistency improvements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-14 10:53:48 +13:00

11 KiB

Introduction to the Tractatus Framework

Version: 0.5.0 (Phase 5 Proof-of-Concept) Last Updated: 2025-10-12 Status: Active development with production deployment


What Is Tractatus?

Tractatus is an architectural AI safety framework that enforces boundaries through system structure rather than behavioral training. Instead of hoping LLMs "behave correctly," Tractatus makes certain decision types structurally prevented without human approval.

The framework operates through six specialized services that continuously monitor LLM operations, detect unsafe conditions, and enforce mandatory human judgment for values-sensitive decisions.

The Core Problem

Current AI safety approaches rely on alignment training (teaching "correct" values), Constitutional AI (embedding principles in training), and RLHF (Reinforcement Learning from Human Feedback).

Fundamental flaw: These assume AI maintains alignment regardless of context pressure or competing constraints. Empirical evidence shows this fails.

The Tractatus Solution

Rather than training AI to make "good decisions," Tractatus removes certain decisions from AI authority entirely.

Core Principle

"Whereof the AI cannot safely decide, thereof it must request human judgment."

Inspired by Wittgenstein's Tractatus Logico-Philosophicus: recognize limits, enforce them structurally, and be explicit about boundaries.

Decision Boundaries

AI defers to humans when decisions involve:

  1. Irreducible values conflicts - Privacy vs safety, autonomy vs harm prevention
  2. Irreversible consequences - Data deletion, architectural changes, security modifications
  3. Cultural/social context - Decisions requiring human cultural understanding
  4. Competing moral frameworks - Cases where legitimate values conflict

Six Core Services

1. InstructionPersistenceClassifier

Function: Classifies every instruction by strategic importance and temporal scope.

Quadrants:

  • STRATEGIC - Mission/values (persistence: PERMANENT)
  • OPERATIONAL - Processes/standards (persistence: HIGH)
  • TACTICAL - Specific tasks (persistence: LOW-MEDIUM)
  • SYSTEM - Technical config (persistence: HIGH)
  • STOCHASTIC - Creative exploration (persistence: VARIABLE)

Why it matters: Prevents instruction drift and ensures critical directives persist across sessions.

2. CrossReferenceValidator

Function: Validates proposed actions against stored instruction history before execution.

Prevents: Pattern recognition bias where LLM training overrides explicit instructions.

Example: User says "MongoDB port 27027", LLM's training pattern autocorrects to "27017". CrossReferenceValidator blocks this as instruction conflict.

3. BoundaryEnforcer

Function: Structurally blocks decisions in protected domains, requiring human approval.

Protected domains:

  • Values decisions (privacy, user agency, ethics)
  • Irreversible changes (deletions, schema changes)
  • Security modifications (authentication, access control)
  • Financial decisions (pricing, billing, payments)

Result: AI is prevented from executing these decisions without explicit human approval.

4. ContextPressureMonitor

Function: Tracks session degradation across five factors.

Monitors:

  • Conversation length (40% weight) - PRIMARY factor: message count drives compaction events
  • Token usage (30% weight) - Context window pressure
  • Task complexity (15% weight) - Competing demands
  • Error frequency (10% weight) - Quality indicators
  • Instruction density (5% weight) - Directive overload

Action: Recommends session handoff before quality degrades.

5. MetacognitiveVerifier

Function: LLM evaluates its own reasoning before proposing complex actions.

Checks:

  • Alignment with stated goals
  • Internal coherence
  • Completeness (edge cases considered)
  • Safety risks
  • Alternatives explored

Output: Confidence score + recommendation (PROCEED / REQUIRE_REVIEW / BLOCKED)

6. PluralisticDeliberationOrchestrator

Function: Manages decisions involving competing values frameworks.

Process:

  1. Detects values conflicts (privacy vs safety, rights vs consequences)
  2. Identifies affected stakeholder groups
  3. Structures multi-perspective deliberation
  4. Documents all positions (including dissent)
  5. Creates reviewable precedents

Principle: When values genuinely conflict, deliberation quality matters more than decision speed. AI facilitates; humans decide.

Why "Tractatus"?

Named after Ludwig Wittgenstein's Tractatus Logico-Philosophicus (1921), which established:

  1. Language has limits - Not everything can be meaningfully stated
  2. Boundaries are structural - Limits are inherent, not defects
  3. Clarity through precision - Define what can and cannot be said

Applied to AI safety:

  1. AI judgment has limits - Not every decision can be safely automated
  2. Safety through architecture - Build boundaries into system structure
  3. Reliability through specification - Precisely define where humans must decide

Demonstrated Failure Modes Prevented

Port 27027 Incident (2025-10-06)

What happened: User specified MongoDB port 27027. LLM immediately used 27017 instead—not through forgetting, but through pattern recognition autocorrection. Training data "MongoDB=27017" was so strong it overrode the explicit instruction in real-time.

Tractatus prevention: InstructionPersistenceClassifier + CrossReferenceValidator store explicit parameters and block any action conflicting with stored instructions—even from training patterns.

Context Degradation (Multiple sessions)

What happens: Beyond 150k tokens, LLM quality silently degrades: forgets instructions, makes careless errors, fails to verify assumptions.

Tractatus prevention: ContextPressureMonitor calculates degradation score and recommends session handoff at 75% threshold—before failure occurs.

Values Creep (Ongoing risk)

What happens: LLM gradually makes values-sensitive decisions without recognizing them as such: privacy vs performance trade-offs, "harmful" content definitions, user agency boundaries.

Tractatus prevention: BoundaryEnforcer structurally blocks these decisions. LLM cannot execute them without explicit human approval.

Current Implementation Status

Production deployment: agenticgovernance.digital (this website) Development governance: Active (this website built under Tractatus governance) Test coverage: 192 unit tests passing (100% coverage on core services) Database: Instruction persistence operational (MongoDB) Phase: 5 PoC - Value pluralism integration active

Dogfooding: The Tractatus framework governs its own development. Every decision to modify this website passes through Tractatus services.

Technical Architecture

  • Runtime: Node.js (Express)
  • Database: MongoDB (instruction persistence, precedent storage)
  • Frontend: Vanilla JavaScript (no framework dependencies)
  • API: RESTful (OpenAPI 3.0 spec available)
  • Services: Six independent modules with defined interfaces

Key design decision: No machine learning in governance services. All boundaries are deterministic and auditable.

Who Should Use Tractatus?

AI Safety Researchers

  • Architectural approach to alignment problem
  • Formal specification of decision boundaries
  • Empirical validation of degradation detection
  • Novel framework for values pluralism in AI

Software Teams Deploying LLMs

  • Production-ready code (tested, documented)
  • Immediate safety improvements
  • Integration guides for existing systems
  • Prevents known failure modes

Policy Makers / Advocates

  • Clear framework for AI safety requirements
  • Non-technical explanations available
  • Addresses agency preservation
  • Demonstrates practical implementation

Integration Requirements

Minimum: LLM with structured output support, persistent storage for instruction history, ability to wrap LLM calls in governance layer.

Recommended: Session state management, token counting, user authentication for human approval workflows.

Limitations

What Tractatus does NOT do:

  • Train better LLMs (uses existing models as-is)
  • Guarantee "aligned" AI behavior
  • Eliminate all possible failures
  • Replace human judgment

What Tractatus DOES do:

  • Designed to detect specific known failure modes before execution
  • Architecturally enforce boundaries on decision authority
  • Monitor session quality degradation indicators
  • Require human judgment for values-sensitive decisions

Getting Started

  1. Read Core Concepts - Understand the six services in detail
  2. Review Case Studies - See real failure modes and prevention
  3. Check Technical Specification - API reference and integration guide
  4. Explore Implementation Guide - Step-by-step deployment

Research Foundations

Tractatus integrates concepts from:

  • Philosophy of language (Wittgenstein) - Limits and boundaries
  • Organizational theory (March, Simon) - Bounded rationality, decision premises
  • Deliberative democracy (Gutmann, Thompson) - Structured disagreement
  • Value pluralism (Berlin, Chang) - Incommensurable values
  • Systems architecture (Conway, Brooks) - Structural constraints and boundaries

See Research Foundations for academic grounding and citations.

Contributing

Tractatus is open source and welcomes contributions:

  • Code: GitHub pull requests (Node.js, tests required)
  • Research: Theoretical extensions, formal verification
  • Case studies: Document real-world applications
  • Documentation: Clarity improvements, translations

Repository: https://github.com/AgenticGovernance/tractatus Issues: https://github.com/AgenticGovernance/tractatus/issues

Contact

Email: john.stroh.nz@pm.me Website: https://agenticgovernance.digital


License

Copyright 2025 John Stroh

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at:

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Summary:

  • Commercial use allowed
  • Modification allowed
  • Distribution allowed
  • Patent grant included
  • Private use allowed
  • ⚠️ Must include license and copyright notice
  • ⚠️ Must state significant changes
  • No trademark rights granted
  • No liability or warranty

Document Metadata

  • Version: 0.5.0
  • Created: 2025-10-12
  • Last Modified: 2025-10-13
  • Author: John Stroh
  • Word Count: 1,372 words
  • Reading Time: ~7 minutes
  • Document ID: introduction-to-the-tractatus-framework
  • Status: Active

Next Steps: