Purges additional github.com/AgenticGovernance project-self URLs from the remaining clean-hygiene files (6 more files). Directive: "GitHub is American spyware. Purge it." Swept: - docs/governance/AUTONOMOUS_DEVELOPMENT_RULES_PROPOSAL.md (+ 2 [NEEDS VERIFICATION] markers on uncited stats that blocked on hygiene) - docs/markdown/case-studies.md (+ 1 "10x better" -> "substantially better" rephrase) - docs/markdown/introduction-to-the-tractatus-framework.md - docs/markdown/technical-architecture.md - docs/plans/integrated-implementation-roadmap-2025.md (+ historical "guarantees" -> "absolute-assurance" rephrase, + /docs/api/* paths replaced with generic descriptors) - SESSION_HANDOFF_2026-04-20_EUPL12_OUT_OF_SCOPE_SWEEP.md meta-refs rewritten to describe the original flip narratively (literal "before" GitHub URLs retained only in the commit4c1a26e8diff for historical verification) Hygiene-fix paraphrases on touched lines: - inst_016: "80% reduction" / "58% reduction" -> "[NEEDS VERIFICATION]" markers added - inst_016: "10x better than debugging" -> "substantially better than debugging" - inst_017: changelog line "language: 'guarantees' -> 'constraints'" rewritten to "absolute-assurance language per inst_017" to avoid the literal trigger token Untracked-but-swept (local-only; git does not track .claude/): - .claude/instruction-history.json (1 URL in an instruction description) - 4 files under .claude/session-archive/ Files held back with documented reasons (separate concern): Pre-existing inst_016/017/018 prohibited-terms debt (8 live-content docs): CHANGELOG.md, CONTRIBUTING.md, docs/LAUNCH_ANNOUNCEMENT.md, docs/LAUNCH_CHECKLIST.md, docs/PHASE_4_REPOSITORY_ANALYSIS.md, docs/PHASE_6_SUMMARY.md, docs/plans/research-enhancement-roadmap-2025.md, docs/case-studies/pre-publication-audit-oct-2025.md (all contain literal "guarantees" / "production-ready" trigger tokens in DO-NOT-SAY lists or historical changelog quotes; mechanical rewrite would destroy pedagogical intent) Pre-existing inst_084 + credential-placeholder debt: deployment-quickstart/README.md (6 PASSWORD= example lines for the Docker deployment kit, + /api/health + production-ready heading), deployment-quickstart/TROUBLESHOOTING.md (1 PASSWORD= example), docs/markdown/implementation-guide-v1.1.md (SECURE_PASSWORD example in mongodb connection string), docs/PRODUCTION_DOCUMENTS_EXPORT.json (DB dump: 5 prohibited-terms hits + 8 credential-pattern hits), docs/ANTHROPIC_CONSTITUTIONAL_AI_PRESENTATION.md (5 port exposures across multiple port numbers), OPTIMAL_NEXT_SESSION_STARTUP_PROMPT_2025-10-21_SESSION2.md (prohibited terms) Historical session handoffs with multi-violation hygiene debt (11 files, 2025-10-* to 2026-02-*): file-path/API-endpoint/admin-path exposures that were valid architectural documentation at the time but violate current inst_084 — context-aware rewriting of each would destroy historical value. scripts/add-inst-084-github-url-protection.js — this migration script's rule text describes GitHub-era semantics ("tractatus = PRIVATE, tractatus-framework = PUBLIC"); token-swapping to Codeberg produces circular nonsense. Script needs full rule-inversion rewrite (post-migration: "NEVER add new github.com URLs per vendor policy") — separate framework- level decision, not mechanical text swap. .git/config embedded credentials — not in tracked repo; separate local concern requiring out-of-band token rotation on codeberg.org + git.mysovereignty.digital + auth-strategy decision. Cumulative purge progress (today's 3 GitHub-sweep commits:a4db3e62,51fd0bb6, this one): ~55 project-self GitHub URLs in Tractatus before today ~35 remain (in 21 held-back files + .git/config + untracked .claude/) Remaining scope is per-file context-aware work, not a blanket sweep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
278 lines
11 KiB
Markdown
278 lines
11 KiB
Markdown
# Introduction to the Tractatus Framework
|
|
|
|
**Version:** 0.5.0 (Phase 5 Proof-of-Concept)
|
|
**Last Updated:** 2025-10-12
|
|
**Status:** Active development with production deployment
|
|
|
|
---
|
|
|
|
## What Is Tractatus?
|
|
|
|
Tractatus is an **architectural AI safety framework** that enforces boundaries through system structure rather than behavioral training. Instead of hoping LLMs "behave correctly," Tractatus makes certain decision types **structurally prevented** without human approval.
|
|
|
|
The framework operates through six specialized services that continuously monitor LLM operations, detect unsafe conditions, and enforce mandatory human judgment for values-sensitive decisions.
|
|
|
|
## The Core Problem
|
|
|
|
Current AI safety approaches rely on alignment training (teaching "correct" values), Constitutional AI (embedding principles in training), and RLHF (Reinforcement Learning from Human Feedback).
|
|
|
|
**Fundamental flaw:** These assume AI maintains alignment regardless of context pressure or competing constraints. Empirical evidence shows this fails.
|
|
|
|
## The Tractatus Solution
|
|
|
|
Rather than training AI to make "good decisions," Tractatus **removes certain decisions from AI authority entirely**.
|
|
|
|
### Core Principle
|
|
|
|
> **"Whereof the AI cannot safely decide, thereof it must request human judgment."**
|
|
|
|
Inspired by Wittgenstein's Tractatus Logico-Philosophicus: recognize limits, enforce them structurally, and be explicit about boundaries.
|
|
|
|
### Decision Boundaries
|
|
|
|
AI defers to humans when decisions involve:
|
|
|
|
1. **Irreducible values conflicts** - Privacy vs safety, autonomy vs harm prevention
|
|
2. **Irreversible consequences** - Data deletion, architectural changes, security modifications
|
|
3. **Cultural/social context** - Decisions requiring human cultural understanding
|
|
4. **Competing moral frameworks** - Cases where legitimate values conflict
|
|
|
|
## Six Core Services
|
|
|
|
### 1. InstructionPersistenceClassifier
|
|
|
|
**Function:** Classifies every instruction by strategic importance and temporal scope.
|
|
|
|
**Quadrants:**
|
|
- **STRATEGIC** - Mission/values (persistence: PERMANENT)
|
|
- **OPERATIONAL** - Processes/standards (persistence: HIGH)
|
|
- **TACTICAL** - Specific tasks (persistence: LOW-MEDIUM)
|
|
- **SYSTEM** - Technical config (persistence: HIGH)
|
|
- **STOCHASTIC** - Creative exploration (persistence: VARIABLE)
|
|
|
|
**Why it matters:** Prevents instruction drift and ensures critical directives persist across sessions.
|
|
|
|
### 2. CrossReferenceValidator
|
|
|
|
**Function:** Validates proposed actions against stored instruction history before execution.
|
|
|
|
**Prevents:** Pattern recognition bias where LLM training overrides explicit instructions.
|
|
|
|
**Example:** User says "MongoDB non-default project port", LLM's training pattern autocorrects to "27017". CrossReferenceValidator blocks this as instruction conflict.
|
|
|
|
### 3. BoundaryEnforcer
|
|
|
|
**Function:** Structurally blocks decisions in protected domains, requiring human approval.
|
|
|
|
**Protected domains:**
|
|
- Values decisions (privacy, user agency, ethics)
|
|
- Irreversible changes (deletions, schema changes)
|
|
- Security modifications (authentication, access control)
|
|
- Financial decisions (pricing, billing, payments)
|
|
|
|
**Result:** AI is prevented from executing these decisions without explicit human approval.
|
|
|
|
### 4. ContextPressureMonitor
|
|
|
|
**Function:** Tracks session degradation across five factors.
|
|
|
|
**Monitors:**
|
|
- Conversation length (40% weight) - PRIMARY factor: message count drives compaction events
|
|
- Token usage (30% weight) - Context window pressure
|
|
- Task complexity (15% weight) - Competing demands
|
|
- Error frequency (10% weight) - Quality indicators
|
|
- Instruction density (5% weight) - Directive overload
|
|
|
|
**Action:** Recommends session handoff before quality degrades.
|
|
|
|
### 5. MetacognitiveVerifier
|
|
|
|
**Function:** LLM evaluates its own reasoning before proposing complex actions.
|
|
|
|
**Checks:**
|
|
- Alignment with stated goals
|
|
- Internal coherence
|
|
- Completeness (edge cases considered)
|
|
- Safety risks
|
|
- Alternatives explored
|
|
|
|
**Output:** Confidence score + recommendation (PROCEED / REQUIRE_REVIEW / BLOCKED)
|
|
|
|
### 6. PluralisticDeliberationOrchestrator
|
|
|
|
**Function:** Manages decisions involving competing values frameworks.
|
|
|
|
**Process:**
|
|
1. Detects values conflicts (privacy vs safety, rights vs consequences)
|
|
2. Identifies affected stakeholder groups
|
|
3. Structures multi-perspective deliberation
|
|
4. Documents all positions (including dissent)
|
|
5. Creates reviewable precedents
|
|
|
|
**Principle:** When values genuinely conflict, deliberation quality matters more than decision speed. AI facilitates; humans decide.
|
|
|
|
## Why "Tractatus"?
|
|
|
|
Named after Ludwig Wittgenstein's *Tractatus Logico-Philosophicus* (1921), which established:
|
|
|
|
1. **Language has limits** - Not everything can be meaningfully stated
|
|
2. **Boundaries are structural** - Limits are inherent, not defects
|
|
3. **Clarity through precision** - Define what can and cannot be said
|
|
|
|
Applied to AI safety:
|
|
|
|
1. **AI judgment has limits** - Not every decision can be safely automated
|
|
2. **Safety through architecture** - Build boundaries into system structure
|
|
3. **Reliability through specification** - Precisely define where humans must decide
|
|
|
|
## Demonstrated Failure Modes Prevented
|
|
|
|
### Non-default project port Incident (2025-10-06)
|
|
|
|
**What happened:** User specified MongoDB non-default project port. LLM immediately used 27017 instead—not through forgetting, but through pattern recognition autocorrection. Training data "MongoDB=27017" was so strong it overrode the explicit instruction in real-time.
|
|
|
|
**Tractatus prevention:** InstructionPersistenceClassifier + CrossReferenceValidator store explicit parameters and block any action conflicting with stored instructions—even from training patterns.
|
|
|
|
### Context Degradation (Multiple sessions)
|
|
|
|
**What happens:** Beyond 150k tokens, LLM quality silently degrades: forgets instructions, makes careless errors, fails to verify assumptions.
|
|
|
|
**Tractatus prevention:** ContextPressureMonitor calculates degradation score and recommends session handoff at 75% threshold—before failure occurs.
|
|
|
|
### Values Creep (Ongoing risk)
|
|
|
|
**What happens:** LLM gradually makes values-sensitive decisions without recognizing them as such: privacy vs performance trade-offs, "harmful" content definitions, user agency boundaries.
|
|
|
|
**Tractatus prevention:** BoundaryEnforcer structurally blocks these decisions. LLM cannot execute them without explicit human approval.
|
|
|
|
## Current Implementation Status
|
|
|
|
**Production deployment:** agenticgovernance.digital (this website)
|
|
**Development governance:** Active (this website built under Tractatus governance)
|
|
**Test coverage:** 192 unit tests passing (100% coverage on core services)
|
|
**Database:** Instruction persistence operational (MongoDB)
|
|
**Phase:** 5 PoC - Value pluralism integration active
|
|
|
|
**Dogfooding:** The Tractatus framework governs its own development. Every decision to modify this website passes through Tractatus services.
|
|
|
|
## Technical Architecture
|
|
|
|
- **Runtime:** Node.js (Express)
|
|
- **Database:** MongoDB (instruction persistence, precedent storage)
|
|
- **Frontend:** Vanilla JavaScript (no framework dependencies)
|
|
- **API:** RESTful (OpenAPI 3.0 spec available)
|
|
- **Services:** Six independent modules with defined interfaces
|
|
|
|
**Key design decision:** No machine learning in governance services. All boundaries are deterministic and auditable.
|
|
|
|
## Who Should Use Tractatus?
|
|
|
|
### AI Safety Researchers
|
|
- Architectural approach to alignment problem
|
|
- Formal specification of decision boundaries
|
|
- Empirical validation of degradation detection
|
|
- Novel framework for values pluralism in AI
|
|
|
|
### Software Teams Deploying LLMs
|
|
- Reference implementation code (tested, documented)
|
|
- Immediate safety improvements
|
|
- Integration guides for existing systems
|
|
- Prevents known failure modes
|
|
|
|
### Policy Makers / Advocates
|
|
- Clear framework for AI safety requirements
|
|
- Non-technical explanations available
|
|
- Addresses agency preservation
|
|
- Demonstrates practical implementation
|
|
|
|
## Integration Requirements
|
|
|
|
**Minimum:** LLM with structured output support, persistent storage for instruction history, ability to wrap LLM calls in governance layer.
|
|
|
|
**Recommended:** Session state management, token counting, user authentication for human approval workflows.
|
|
|
|
## Limitations
|
|
|
|
**What Tractatus does NOT do:**
|
|
- Train better LLMs (uses existing models as-is)
|
|
- Ensure "aligned" AI behavior
|
|
- Reduce risk of failures
|
|
- Replace human judgment
|
|
|
|
**What Tractatus DOES do:**
|
|
- Designed to detect specific known failure modes before execution
|
|
- Architecturally enforce boundaries on decision authority
|
|
- Monitor session quality degradation indicators
|
|
- Require human judgment for values-sensitive decisions
|
|
|
|
## Getting Started
|
|
|
|
1. **Read Core Concepts** - Understand the six services in detail
|
|
2. **Review Case Studies** - See real failure modes and prevention
|
|
3. **Check Technical Specification** - API reference and integration guide
|
|
4. **Explore Implementation Guide** - Step-by-step deployment
|
|
|
|
## Research Foundations
|
|
|
|
Tractatus integrates concepts from:
|
|
|
|
- **Philosophy of language** (Wittgenstein) - Limits and boundaries
|
|
- **Organizational theory** (March, Simon) - Bounded rationality, decision premises
|
|
- **Deliberative democracy** (Gutmann, Thompson) - Structured disagreement
|
|
- **Value pluralism** (Berlin, Chang) - Incommensurable values
|
|
- **Systems architecture** (Conway, Brooks) - Structural constraints and boundaries
|
|
|
|
See [Research Foundations](/docs.html) for academic grounding and citations.
|
|
|
|
## Contributing
|
|
|
|
Tractatus is open source and welcomes contributions:
|
|
|
|
- **Code:** GitHub pull requests (Node.js, tests required)
|
|
- **Research:** Theoretical extensions, formal verification
|
|
- **Case studies:** Document real-world applications
|
|
- **Documentation:** Clarity improvements, translations
|
|
|
|
**Repository:** https://codeberg.org/mysovereignty/tractatus-framework
|
|
**Issues:** https://codeberg.org/mysovereignty/tractatus-framework/issues
|
|
|
|
## Contact
|
|
|
|
**Email:** john.stroh.nz@pm.me
|
|
**Website:** https://agenticgovernance.digital
|
|
|
|
---
|
|
|
|
## Licence
|
|
|
|
Copyright © 2026 John Stroh.
|
|
|
|
This work is licensed under the [Creative Commons Attribution 4.0 International Licence (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).
|
|
|
|
You are free to share, copy, redistribute, adapt, remix, transform, and build upon this material for any purpose, including commercially, provided you give appropriate attribution, provide a link to the licence, and indicate if changes were made.
|
|
|
|
**Note:** The Tractatus AI Safety Framework source code is separately licensed under the Apache License 2.0. This Creative Commons licence applies to the research paper text and figures only.
|
|
|
|
---
|
|
|
|
## Document Metadata
|
|
|
|
<div class="document-metadata">
|
|
|
|
- **Version:** 0.5.0
|
|
- **Created:** 2025-10-12
|
|
- **Last Modified:** 2025-10-13
|
|
- **Author:** John Stroh
|
|
- **Word Count:** 1,372 words
|
|
- **Reading Time:** ~7 minutes
|
|
- **Document ID:** introduction-to-the-tractatus-framework
|
|
- **Status:** Active
|
|
|
|
</div>
|
|
|
|
---
|
|
|
|
**Next Steps:**
|
|
- [Core Concepts: Deep Dive into Six Services →](/docs.html)
|
|
- [Case Studies: Real-World Failure Modes →](/docs.html)
|
|
- [Implementation Guide: Deploy Tractatus →](/docs.html)
|