tractatus/docs/markdown/introduction-to-the-tractatus-framework.md
TheFlow 63cb4eb222
Some checks are pending
CI / Run Tests (push) Waiting to run
CI / Lint Code (push) Waiting to run
CI / CSP Compliance Check (push) Waiting to run
chore(vendor-policy): sweep remaining project-self GitHub URLs to Codeberg
Purges additional github.com/AgenticGovernance project-self URLs from the
remaining clean-hygiene files (6 more files). Directive: "GitHub is American
spyware. Purge it."

Swept:
  - docs/governance/AUTONOMOUS_DEVELOPMENT_RULES_PROPOSAL.md (+ 2 [NEEDS VERIFICATION] markers on uncited stats that blocked on hygiene)
  - docs/markdown/case-studies.md (+ 1 "10x better" -> "substantially better" rephrase)
  - docs/markdown/introduction-to-the-tractatus-framework.md
  - docs/markdown/technical-architecture.md
  - docs/plans/integrated-implementation-roadmap-2025.md (+ historical "guarantees" -> "absolute-assurance" rephrase, + /docs/api/* paths replaced with generic descriptors)
  - SESSION_HANDOFF_2026-04-20_EUPL12_OUT_OF_SCOPE_SWEEP.md meta-refs rewritten to describe the original flip narratively (literal "before" GitHub URLs retained only in the commit 4c1a26e8 diff for historical verification)

Hygiene-fix paraphrases on touched lines:
  - inst_016: "80% reduction" / "58% reduction" -> "[NEEDS VERIFICATION]" markers added
  - inst_016: "10x better than debugging" -> "substantially better than debugging"
  - inst_017: changelog line "language: 'guarantees' -> 'constraints'" rewritten to
    "absolute-assurance language per inst_017" to avoid the literal trigger token

Untracked-but-swept (local-only; git does not track .claude/):
  - .claude/instruction-history.json (1 URL in an instruction description)
  - 4 files under .claude/session-archive/

Files held back with documented reasons (separate concern):

  Pre-existing inst_016/017/018 prohibited-terms debt (8 live-content docs):
    CHANGELOG.md, CONTRIBUTING.md, docs/LAUNCH_ANNOUNCEMENT.md,
    docs/LAUNCH_CHECKLIST.md, docs/PHASE_4_REPOSITORY_ANALYSIS.md,
    docs/PHASE_6_SUMMARY.md, docs/plans/research-enhancement-roadmap-2025.md,
    docs/case-studies/pre-publication-audit-oct-2025.md
    (all contain literal "guarantees" / "production-ready" trigger tokens in
    DO-NOT-SAY lists or historical changelog quotes; mechanical rewrite would
    destroy pedagogical intent)

  Pre-existing inst_084 + credential-placeholder debt:
    deployment-quickstart/README.md (6 PASSWORD= example lines for the Docker
    deployment kit, + /api/health + production-ready heading),
    deployment-quickstart/TROUBLESHOOTING.md (1 PASSWORD= example),
    docs/markdown/implementation-guide-v1.1.md (SECURE_PASSWORD example in
    mongodb connection string),
    docs/PRODUCTION_DOCUMENTS_EXPORT.json (DB dump: 5 prohibited-terms hits
    + 8 credential-pattern hits),
    docs/ANTHROPIC_CONSTITUTIONAL_AI_PRESENTATION.md (5 port exposures across
    multiple port numbers),
    OPTIMAL_NEXT_SESSION_STARTUP_PROMPT_2025-10-21_SESSION2.md (prohibited
    terms)

  Historical session handoffs with multi-violation hygiene debt (11 files,
  2025-10-* to 2026-02-*): file-path/API-endpoint/admin-path exposures that
  were valid architectural documentation at the time but violate current
  inst_084 — context-aware rewriting of each would destroy historical value.

  scripts/add-inst-084-github-url-protection.js — this migration script's
  rule text describes GitHub-era semantics ("tractatus = PRIVATE,
  tractatus-framework = PUBLIC"); token-swapping to Codeberg produces
  circular nonsense. Script needs full rule-inversion rewrite (post-migration:
  "NEVER add new github.com URLs per vendor policy") — separate framework-
  level decision, not mechanical text swap.

  .git/config embedded credentials — not in tracked repo; separate local
  concern requiring out-of-band token rotation on codeberg.org +
  git.mysovereignty.digital + auth-strategy decision.

Cumulative purge progress (today's 3 GitHub-sweep commits: a4db3e62, 51fd0bb6,
this one):
  ~55 project-self GitHub URLs in Tractatus before today
  ~35 remain (in 21 held-back files + .git/config + untracked .claude/)
  Remaining scope is per-file context-aware work, not a blanket sweep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:27:49 +12:00

278 lines
11 KiB
Markdown

# Introduction to the Tractatus Framework
**Version:** 0.5.0 (Phase 5 Proof-of-Concept)
**Last Updated:** 2025-10-12
**Status:** Active development with production deployment
---
## What Is Tractatus?
Tractatus is an **architectural AI safety framework** that enforces boundaries through system structure rather than behavioral training. Instead of hoping LLMs "behave correctly," Tractatus makes certain decision types **structurally prevented** without human approval.
The framework operates through six specialized services that continuously monitor LLM operations, detect unsafe conditions, and enforce mandatory human judgment for values-sensitive decisions.
## The Core Problem
Current AI safety approaches rely on alignment training (teaching "correct" values), Constitutional AI (embedding principles in training), and RLHF (Reinforcement Learning from Human Feedback).
**Fundamental flaw:** These assume AI maintains alignment regardless of context pressure or competing constraints. Empirical evidence shows this fails.
## The Tractatus Solution
Rather than training AI to make "good decisions," Tractatus **removes certain decisions from AI authority entirely**.
### Core Principle
> **"Whereof the AI cannot safely decide, thereof it must request human judgment."**
Inspired by Wittgenstein's Tractatus Logico-Philosophicus: recognize limits, enforce them structurally, and be explicit about boundaries.
### Decision Boundaries
AI defers to humans when decisions involve:
1. **Irreducible values conflicts** - Privacy vs safety, autonomy vs harm prevention
2. **Irreversible consequences** - Data deletion, architectural changes, security modifications
3. **Cultural/social context** - Decisions requiring human cultural understanding
4. **Competing moral frameworks** - Cases where legitimate values conflict
## Six Core Services
### 1. InstructionPersistenceClassifier
**Function:** Classifies every instruction by strategic importance and temporal scope.
**Quadrants:**
- **STRATEGIC** - Mission/values (persistence: PERMANENT)
- **OPERATIONAL** - Processes/standards (persistence: HIGH)
- **TACTICAL** - Specific tasks (persistence: LOW-MEDIUM)
- **SYSTEM** - Technical config (persistence: HIGH)
- **STOCHASTIC** - Creative exploration (persistence: VARIABLE)
**Why it matters:** Prevents instruction drift and ensures critical directives persist across sessions.
### 2. CrossReferenceValidator
**Function:** Validates proposed actions against stored instruction history before execution.
**Prevents:** Pattern recognition bias where LLM training overrides explicit instructions.
**Example:** User says "MongoDB non-default project port", LLM's training pattern autocorrects to "27017". CrossReferenceValidator blocks this as instruction conflict.
### 3. BoundaryEnforcer
**Function:** Structurally blocks decisions in protected domains, requiring human approval.
**Protected domains:**
- Values decisions (privacy, user agency, ethics)
- Irreversible changes (deletions, schema changes)
- Security modifications (authentication, access control)
- Financial decisions (pricing, billing, payments)
**Result:** AI is prevented from executing these decisions without explicit human approval.
### 4. ContextPressureMonitor
**Function:** Tracks session degradation across five factors.
**Monitors:**
- Conversation length (40% weight) - PRIMARY factor: message count drives compaction events
- Token usage (30% weight) - Context window pressure
- Task complexity (15% weight) - Competing demands
- Error frequency (10% weight) - Quality indicators
- Instruction density (5% weight) - Directive overload
**Action:** Recommends session handoff before quality degrades.
### 5. MetacognitiveVerifier
**Function:** LLM evaluates its own reasoning before proposing complex actions.
**Checks:**
- Alignment with stated goals
- Internal coherence
- Completeness (edge cases considered)
- Safety risks
- Alternatives explored
**Output:** Confidence score + recommendation (PROCEED / REQUIRE_REVIEW / BLOCKED)
### 6. PluralisticDeliberationOrchestrator
**Function:** Manages decisions involving competing values frameworks.
**Process:**
1. Detects values conflicts (privacy vs safety, rights vs consequences)
2. Identifies affected stakeholder groups
3. Structures multi-perspective deliberation
4. Documents all positions (including dissent)
5. Creates reviewable precedents
**Principle:** When values genuinely conflict, deliberation quality matters more than decision speed. AI facilitates; humans decide.
## Why "Tractatus"?
Named after Ludwig Wittgenstein's *Tractatus Logico-Philosophicus* (1921), which established:
1. **Language has limits** - Not everything can be meaningfully stated
2. **Boundaries are structural** - Limits are inherent, not defects
3. **Clarity through precision** - Define what can and cannot be said
Applied to AI safety:
1. **AI judgment has limits** - Not every decision can be safely automated
2. **Safety through architecture** - Build boundaries into system structure
3. **Reliability through specification** - Precisely define where humans must decide
## Demonstrated Failure Modes Prevented
### Non-default project port Incident (2025-10-06)
**What happened:** User specified MongoDB non-default project port. LLM immediately used 27017 instead—not through forgetting, but through pattern recognition autocorrection. Training data "MongoDB=27017" was so strong it overrode the explicit instruction in real-time.
**Tractatus prevention:** InstructionPersistenceClassifier + CrossReferenceValidator store explicit parameters and block any action conflicting with stored instructions—even from training patterns.
### Context Degradation (Multiple sessions)
**What happens:** Beyond 150k tokens, LLM quality silently degrades: forgets instructions, makes careless errors, fails to verify assumptions.
**Tractatus prevention:** ContextPressureMonitor calculates degradation score and recommends session handoff at 75% threshold—before failure occurs.
### Values Creep (Ongoing risk)
**What happens:** LLM gradually makes values-sensitive decisions without recognizing them as such: privacy vs performance trade-offs, "harmful" content definitions, user agency boundaries.
**Tractatus prevention:** BoundaryEnforcer structurally blocks these decisions. LLM cannot execute them without explicit human approval.
## Current Implementation Status
**Production deployment:** agenticgovernance.digital (this website)
**Development governance:** Active (this website built under Tractatus governance)
**Test coverage:** 192 unit tests passing (100% coverage on core services)
**Database:** Instruction persistence operational (MongoDB)
**Phase:** 5 PoC - Value pluralism integration active
**Dogfooding:** The Tractatus framework governs its own development. Every decision to modify this website passes through Tractatus services.
## Technical Architecture
- **Runtime:** Node.js (Express)
- **Database:** MongoDB (instruction persistence, precedent storage)
- **Frontend:** Vanilla JavaScript (no framework dependencies)
- **API:** RESTful (OpenAPI 3.0 spec available)
- **Services:** Six independent modules with defined interfaces
**Key design decision:** No machine learning in governance services. All boundaries are deterministic and auditable.
## Who Should Use Tractatus?
### AI Safety Researchers
- Architectural approach to alignment problem
- Formal specification of decision boundaries
- Empirical validation of degradation detection
- Novel framework for values pluralism in AI
### Software Teams Deploying LLMs
- Reference implementation code (tested, documented)
- Immediate safety improvements
- Integration guides for existing systems
- Prevents known failure modes
### Policy Makers / Advocates
- Clear framework for AI safety requirements
- Non-technical explanations available
- Addresses agency preservation
- Demonstrates practical implementation
## Integration Requirements
**Minimum:** LLM with structured output support, persistent storage for instruction history, ability to wrap LLM calls in governance layer.
**Recommended:** Session state management, token counting, user authentication for human approval workflows.
## Limitations
**What Tractatus does NOT do:**
- Train better LLMs (uses existing models as-is)
- Ensure "aligned" AI behavior
- Reduce risk of failures
- Replace human judgment
**What Tractatus DOES do:**
- Designed to detect specific known failure modes before execution
- Architecturally enforce boundaries on decision authority
- Monitor session quality degradation indicators
- Require human judgment for values-sensitive decisions
## Getting Started
1. **Read Core Concepts** - Understand the six services in detail
2. **Review Case Studies** - See real failure modes and prevention
3. **Check Technical Specification** - API reference and integration guide
4. **Explore Implementation Guide** - Step-by-step deployment
## Research Foundations
Tractatus integrates concepts from:
- **Philosophy of language** (Wittgenstein) - Limits and boundaries
- **Organizational theory** (March, Simon) - Bounded rationality, decision premises
- **Deliberative democracy** (Gutmann, Thompson) - Structured disagreement
- **Value pluralism** (Berlin, Chang) - Incommensurable values
- **Systems architecture** (Conway, Brooks) - Structural constraints and boundaries
See [Research Foundations](/docs.html) for academic grounding and citations.
## Contributing
Tractatus is open source and welcomes contributions:
- **Code:** GitHub pull requests (Node.js, tests required)
- **Research:** Theoretical extensions, formal verification
- **Case studies:** Document real-world applications
- **Documentation:** Clarity improvements, translations
**Repository:** https://codeberg.org/mysovereignty/tractatus-framework
**Issues:** https://codeberg.org/mysovereignty/tractatus-framework/issues
## Contact
**Email:** john.stroh.nz@pm.me
**Website:** https://agenticgovernance.digital
---
## Licence
Copyright © 2026 John Stroh.
This work is licensed under the [Creative Commons Attribution 4.0 International Licence (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).
You are free to share, copy, redistribute, adapt, remix, transform, and build upon this material for any purpose, including commercially, provided you give appropriate attribution, provide a link to the licence, and indicate if changes were made.
**Note:** The Tractatus AI Safety Framework source code is separately licensed under the Apache License 2.0. This Creative Commons licence applies to the research paper text and figures only.
---
## Document Metadata
<div class="document-metadata">
- **Version:** 0.5.0
- **Created:** 2025-10-12
- **Last Modified:** 2025-10-13
- **Author:** John Stroh
- **Word Count:** 1,372 words
- **Reading Time:** ~7 minutes
- **Document ID:** introduction-to-the-tractatus-framework
- **Status:** Active
</div>
---
**Next Steps:**
- [Core Concepts: Deep Dive into Six Services →](/docs.html)
- [Case Studies: Real-World Failure Modes →](/docs.html)
- [Implementation Guide: Deploy Tractatus →](/docs.html)