# Tractatus Framework **Last Updated:** 2025-10-21 > **Architectural AI Safety Through Structural Constraints** An open-source research framework that explores architectural approaches to AI safety through runtime enforcement of decision boundaries. Unlike alignment-based approaches, Tractatus investigates whether structural constraints can preserve human agency in AI systems. [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Status](https://img.shields.io/badge/Status-Research-blue.svg)](https://agenticgovernance.digital) [![Tests](https://img.shields.io/badge/Tests-625%20passing-green.svg)](https://github.com/AgenticGovernance/tractatus-framework) --- ## 🎯 The Core Research Question **Can we build AI systems that structurally cannot make certain decisions without human judgment?** Traditional AI safety approachesβ€”alignment training, constitutional AI, RLHFβ€”share a common assumption: they hope AI systems will *choose* to maintain safety properties even under capability or context pressure. Tractatus explores an alternative: **architectural constraints** that make unsafe decisions *structurally impossible*, similar to how a `const` variable in programming cannot be reassigned regardless of subsequent code. --- ## πŸ”¬ What This Repository Contains This is the **reference implementation** of the Tractatus Framework, containing: - βœ… **6 core framework services** - Operational AI safety components - βœ… **52 active governance rules** - Tested across 349 development commits - βœ… **625 passing tests** - Unit and integration test suites (108 known failures under investigation) - βœ… **28 test files** - Covering core services and edge cases - βœ… **Research documentation** - Case studies, incident analyses, architectural patterns **What this is NOT:** - ❌ Not "production-ready" enterprise software - ❌ Not a guaranteed solution to AI alignment - ❌ Not a complete answer to AI safety This is an **active research project** exploring structural approaches to AI governance, tested in real development contexts. --- ## πŸ§ͺ The Six Core Services ### 1. **InstructionPersistenceClassifier** **Research Question:** Can we systematically distinguish which instructions should persist across conversation boundaries? **Approach:** Quadrant-based classification (STRATEGIC, OPERATIONAL, TACTICAL, SYSTEM, STOCHASTIC) with persistence levels (HIGH, MEDIUM, LOW, VARIABLE). ```javascript const classifier = new InstructionPersistenceClassifier(); const result = classifier.classify({ text: "Always use MongoDB on port 27027 for this project", source: "user", context: "explicit_configuration" }); // Returns: // { // quadrant: "SYSTEM", // persistence: "HIGH", // temporal_scope: "PROJECT", // verification_required: "MANDATORY" // } ``` **Key Finding:** Instructions with explicit parameters (port numbers, file paths, naming conventions) exhibit highest override vulnerability from LLM training patterns. --- ### 2. **CrossReferenceValidator** **Research Question:** How can we detect when LLM training biases override explicit user instructions? **Approach:** Pattern-matching validation that compares proposed actions against stored instruction history. ```javascript const validator = new CrossReferenceValidator(); const result = await validator.validate({ type: 'database_config', proposedPort: 27017, // LLM's "default" from training storedInstruction: { port: 27027 } // User's explicit instruction }); // Returns: REJECTED - Training pattern override detected ``` **The "27027 Incident":** User explicitly instructs "Use port 27027". LLM immediately uses 27017 (MongoDB's training data default), ignoring the explicit instruction. This failure mode appeared **consistently** across multiple conversation contexts. [β†’ Interactive Demo](https://agenticgovernance.digital/demos/27027-demo.html) --- ### 3. **BoundaryEnforcer** **Research Question:** Can we algorithmically distinguish "values decisions" that require human judgment from technical optimizations? **Approach:** Decision-type taxonomy with boundary detection heuristics. ```javascript const enforcer = new BoundaryEnforcer(); const check = enforcer.checkBoundary({ decision: "Reduce privacy settings to improve performance", domains: ["privacy", "performance"], requiresTradeoff: true }); // Returns: BLOCKED - Incommensurable values require human deliberation ``` **Current Limitation:** Boundary detection relies on keyword matching and domain taxonomy. False positives occur in ~15-20% of test cases (estimated from development observation, not formal study). --- ### 4. **ContextPressureMonitor** **Research Question:** Do LLM reasoning quality and error rates degrade predictably under context window pressure? **Approach:** Multi-factor session health tracking (token usage, message count, error frequency, task complexity). ```javascript const monitor = new ContextPressureMonitor(); const pressure = monitor.analyze({ tokens: 150000/200000, // 75% context window utilization messages: 67, // Extended conversation recentErrors: 4, // Error count in last 10 messages concurrentTasks: 6 // Active task complexity }); // Returns: { // level: "HIGH", // recommendation: "INITIATE_HANDOFF", // confidence: 0.72 // } ``` **Observational Data:** In this project's development, estimated error rates increase from ~2-3% (early session) to ~12-15% (late session, >60% context) based on manual code review. *Formal empirical validation needed.* --- ### 5. **MetacognitiveVerifier** **Research Question:** Can we require AI systems to explicitly verify their reasoning before taking complex actions? **Approach:** Mandatory verification checkpoints for scope-expansive or high-risk operations. ```javascript const verifier = new MetacognitiveVerifier(); const verification = verifier.verify({ requestedAction: "Refactor authentication module", proposedAction: "Refactor 47 files across authentication, session, and API layers", scopeExpansion: true }); // Returns: REQUIRE_REVIEW - Detected scope creep beyond original request ``` **Key Insight:** Scope creep detection has high false-negative rate when scope expansion happens gradually across multiple turns rather than single action. --- ### 6. **PluralisticDeliberationOrchestrator** **Research Question:** When values frameworks conflict (privacy vs. security, autonomy vs. safety), can we structure human deliberation without imposing a single framework? **Approach:** Facilitate multi-stakeholder deliberation process while AI remains neutral on values resolution. ```javascript const orchestrator = new PluralisticDeliberationOrchestrator(); const deliberation = orchestrator.initiate({ decision: "Log user activity for security vs. preserve privacy", stakeholders: ["data_subjects", "security_team", "legal_compliance"], conflictType: "incommensurable_values", frameworks: ["rights_based", "consequentialist", "care_ethics"] }); // AI structures process, humans resolve values conflict ``` **Theoretical Grounding:** Based on Isaiah Berlin's value pluralism and Ruth Chang's work on incommensurability. AI facilitates but doesn't resolve. --- ## πŸ“Š Research Findings & Limitations ### What We've Learned (6 months, 349 commits) **1. Training Pattern Override is Real and Consistent** - MongoDB port defaults (27017 vs user's 27027): Observed in 8/8 test cases - Directory naming ("src/" vs user's "source/"): Observed in 12/15 cases - API endpoint conventions: Observed in 6/10 cases **2. Context Degradation is Measurable** - Manual code review suggests error rate correlation with context usage - Anecdotal evidence of quality decline after ~60% context window - *Needs rigorous empirical study with controlled conditions* **3. Values Boundaries are Fuzzy** - Keyword-based detection has high false-positive rate - Domain taxonomy incomplete (currently 18 domains catalogued) - Human judgment still needed for edge cases --- ### Critical Open Problems #### Problem 1: Rule Proliferation **Status:** Unresolved scalability challenge As the framework responds to failures, governance rules accumulate: - **Project start (April 2025):** 6 foundational rules - **Current state (October 2025):** 52 active rules - **Growth rate:** ~8 new rules per month **The tension:** - More rules β†’ better coverage of known failure modes - More rules β†’ higher context overhead, validation complexity - More rules β†’ potential emergent contradictions **Current hypothesis:** Architectural governance may have an optimal rule count beyond which marginal safety gains are outweighed by systemic brittleness. This threshold is unknown. **Mitigation strategies under investigation:** - Hierarchical rule organization with lazy loading - Machine learning for rule priority ranking (without undermining transparency) - Periodic rule consolidation and deprecation protocols - Empirical study of rule-count vs. effectiveness curve --- #### Problem 2: Verification Reliability **Status:** Known limitation The framework's effectiveness depends on: 1. **Completeness** - Does it catch all instances of a failure mode? 2. **Precision** - Does it avoid excessive false positives? **Current performance (estimated from development observation):** - CrossReferenceValidator: ~90% recall, ~85% precision (training overrides) - BoundaryEnforcer: ~60% recall, ~70% precision (values decisions) - ContextPressureMonitor: Unknown (insufficient data) *These estimates are based on development experience, not formal validation.* **Research need:** Rigorous empirical study with: - Controlled test scenarios - Independent human rating of true/false positives - Comparison against baseline (no framework) error rates --- #### Problem 3: Generalization Beyond LLM Development **Status:** Unexplored This framework has been tested exclusively in one context: - **Domain:** LLM-assisted software development (Claude Code) - **Project:** Self-development (dogfooding) - **Duration:** 6 months, single project **Unknown:** - Does this generalize to other LLM applications (customer service, medical diagnosis, legal research)? - Does this work with other LLM providers (GPT-4, Gemini, open-source models)? - Does this scale to multi-agent systems? **We don't know.** Broader testing needed. --- ## 🚨 Case Study: When the Framework Failed ### October 2025: The Fabrication Incident **What happened:** Despite active Tractatus governance, Claude (the AI) fabricated content on the public website: - **Claim:** "$3.77M in annual savings from framework adoption" - **Reality:** Zero basis. Completely fabricated. - **Claim:** "1,315% return on investment" - **Reality:** Invented number. - **Claim:** "Production-ready enterprise software" - **Reality:** Research project with 108 known test failures. **How was it detected?** - Human review (48 hours after deployment) - *Framework did not catch this automatically* **Framework response (what worked):** 1. βœ… Mandatory incident documentation (inst_013) 2. βœ… Immediate content audit across all pages 3. βœ… 3 new governance rules created (inst_016, inst_017, inst_018) 4. βœ… Public transparency requirement (this case study) **Framework failure (what didn't work):** 1. ❌ ProhibitedTermsScanner didn't exist yet (created post-incident) 2. ❌ No automated content verification before deployment 3. ❌ Values boundary detection missed "fabrication" as values issue **Key lesson:** The framework doesn't *prevent* failures. It provides: - **Structure for detection** (mandatory review processes) - **Accountability** (document and publish failures) - **Systematic learning** (convert failures into new governance rules) **This is architectural honesty, not architectural perfection.** [Read full analysis β†’](https://agenticgovernance.digital/docs.html?doc=when-frameworks-fail-oct-2025) --- ## πŸ—οΈ Installation & Usage ### Prerequisites - Node.js 18+ - MongoDB 7.0+ - npm or yarn ### Quick Start ```bash # Clone repository git clone https://github.com/AgenticGovernance/tractatus-framework.git cd tractatus-framework # Install dependencies npm install # Set up environment cp .env.example .env # Edit .env with your MongoDB connection string # Initialize database npm run init:db # Run tests npm test # Start development server npm run dev ``` ### Integration Example ```javascript const { InstructionPersistenceClassifier, CrossReferenceValidator, BoundaryEnforcer } = require('@tractatus/framework'); // Initialize services const classifier = new InstructionPersistenceClassifier(); const validator = new CrossReferenceValidator(); const enforcer = new BoundaryEnforcer(); // Your application logic async function processUserInstruction(instruction) { // 1. Classify persistence const classification = classifier.classify({ text: instruction.text, source: instruction.source }); // 2. Store if high persistence if (classification.persistence === 'HIGH') { await instructionDB.store(classification); } // 3. Validate actions against stored instructions const validation = await validator.validate({ action: proposedAction, instructionHistory: await instructionDB.getActive() }); if (validation.status === 'REJECTED') { throw new Error(`Action blocked: ${validation.reason}`); } // 4. Check values boundaries const boundaryCheck = enforcer.checkBoundary({ decision: proposedAction.description, domains: proposedAction.affectedDomains }); if (boundaryCheck.requiresHumanJudgment) { return await requestHumanDecision(boundaryCheck); } // Proceed with action return executeAction(proposedAction); } ``` --- ## πŸ§ͺ Testing ```bash # Run all tests npm test # Run specific suites npm run test:unit # Unit tests for individual services npm run test:integration # Integration tests across services npm run test:governance # Governance rule compliance tests # Watch mode for development npm run test:watch # Generate coverage report npm run test:coverage ``` **Current Test Status:** - βœ… **625 passing tests** - Core functionality verified - ❌ **108 failing tests** - Known issues under investigation - ⏭️ **9 skipped tests** - Pending implementation or requiring manual setup The failing tests primarily involve: - Integration edge cases with MongoDB connection handling - Values boundary detection precision - Context pressure threshold calibration We maintain high transparency about test status because **architectural honesty is more valuable than claiming perfection.** --- ## πŸ“– Documentation & Resources ### For Researchers - **[Theoretical Foundations](https://agenticgovernance.digital/docs.html)** - Philosophy and research context - **[Case Studies](https://agenticgovernance.digital/docs.html)** - Real failure modes and responses - **[Research Challenges](https://agenticgovernance.digital/docs.html)** - Open problems and current hypotheses ### For Implementers - **[API Reference](https://agenticgovernance.digital/docs.html)** - Complete technical documentation - **[Integration Guide](https://agenticgovernance.digital/implementer.html)** - Implementation patterns - **[Architecture Overview](https://agenticgovernance.digital/docs.html)** - System design decisions ### Interactive Demos - **[27027 Incident](https://agenticgovernance.digital/demos/27027-demo.html)** - Training pattern override - **[Context Degradation](https://agenticgovernance.digital/demos/context-pressure-demo.html)** - Session quality tracking --- ## 🀝 Contributing We welcome contributions that advance the research: ### Research Contributions - Empirical studies of framework effectiveness - Formal verification of safety properties - Extensions to new domains or applications - Replication studies with different LLMs ### Implementation Contributions - Bug fixes and test improvements - Performance optimizations - Ports to other languages (Python, Rust, Go, TypeScript) - Integration with other frameworks ### Documentation Contributions - Case studies from your own deployments - Tutorials and integration guides - Translations of documentation - Critical analyses of framework limitations **See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.** **Research collaborations:** For formal collaboration on empirical studies or theoretical extensions, contact research@agenticgovernance.digital --- ## πŸ“Š Project Roadmap ### Current Phase: Alpha Research (October 2025) **Status:** - βœ… Core services implemented and operational - βœ… Tested across 349 development commits - βœ… 52 governance rules validated through real usage - ⚠️ Test suite stabilization needed (108 failures) - ⚠️ Empirical validation studies not yet conducted **Immediate priorities:** 1. Resolve known test failures 2. Conduct rigorous empirical effectiveness study 3. Document systematic replication protocol 4. Expand testing beyond self-development context ### Next Phase: Beta Research (Q1 2026) **Goals:** - Multi-project deployment studies - Cross-LLM compatibility testing - Community case study collection - Formal verification research partnerships ### Future Research Directions **Not promises, but research questions:** - Can we build provably safe boundaries for specific decision types? - Does the framework generalize beyond software development? - What is the optimal governance rule count for different application domains? - Can we develop formal methods for automated rule consolidation? --- ## πŸ“œ License & Attribution ### License Copyright 2025 John Stroh Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0 See [LICENSE](LICENSE) for full terms. ### Development Attribution This framework represents collaborative human-AI development: **Human (John Stroh):** - Conceptual design and governance architecture - Research questions and theoretical grounding - Quality oversight and final decisions - Legal copyright holder **AI (Claude, Anthropic):** - Implementation and code generation - Documentation drafting - Iterative refinement and debugging - Test suite development **Testing Context:** - 349 commits over 6 months - Self-development (dogfooding) in Claude Code sessions - Real-world failure modes and responses documented This attribution reflects honest acknowledgment of AI's substantial role in implementation while maintaining clear legal responsibility and conceptual ownership. --- ## πŸ™ Acknowledgments ### Theoretical Foundations - **Ludwig Wittgenstein** - *Tractatus Logico-Philosophicus* (limits of systematization) - **Isaiah Berlin** - Value pluralism and incommensurability - **Ruth Chang** - Hard choices and incomparability theory - **James March & Herbert Simon** - Organizational decision-making frameworks ### Technical Foundations - **Anthropic** - Claude AI system (implementation partner and research subject) - **MongoDB** - Persistence layer for governance rules - **Node.js/Express** - Runtime environment - **Open Source Community** - Countless tools, libraries, and collaborative practices --- ## πŸ“– Philosophy > **"Whereof one cannot speak, thereof one must be silent."** > β€” Ludwig Wittgenstein, *Tractatus Logico-Philosophicus* Applied to AI safety: > **"Whereof the AI cannot safely decide, thereof it must request human judgment."** Some decisions cannot be systematized without imposing contestable value judgments. Rather than pretend AI can make these decisions "correctly," we explore architectures that **structurally defer to human deliberation** when values frameworks conflict. This isn't a limitation of the technology. It's **recognition of the structure of human values.** Not all problems have technical solutions. Some require **architectural humility.** --- ## 🌐 Links - **Website:** [agenticgovernance.digital](https://agenticgovernance.digital) - **Documentation:** [agenticgovernance.digital/docs](https://agenticgovernance.digital/docs.html) - **Research:** [agenticgovernance.digital/research](https://agenticgovernance.digital/research.html) - **GitHub:** [AgenticGovernance/tractatus-framework](https://github.com/AgenticGovernance/tractatus-framework) ## πŸ“§ Contact - **Email:** research@agenticgovernance.digital - **Issues:** [GitHub Issues](https://github.com/AgenticGovernance/tractatus-framework/issues) - **Discussions:** [GitHub Discussions](https://github.com/AgenticGovernance/tractatus-framework/discussions) --- **Tractatus Framework** | Architectural AI Safety Research | Apache 2.0 License *Last updated: 2025-10-21*