From cd6e7bcd0bcb13a76662345bf90be685006f058b Mon Sep 17 00:00:00 2001 From: TheFlow Date: Tue, 21 Oct 2025 21:10:54 +1300 Subject: [PATCH] docs: rewrite README as focused implementation guide MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit BEFORE: 609-line research manifesto with: - Research questions and theoretical framing - "When the Framework Failed" case studies - "Critical Open Problems" sections - Extensive academic citations - Audience: Researchers studying AI governance AFTER: 215-line implementation guide with: - Quick start (install, configure, run) - Basic usage code examples - API documentation links - Deployment instructions - Testing commands - Clear website reference for background/research - Audience: Developers implementing Tractatus REMOVED: - All research framing ("Research Question:", theoretical discussion) - Case studies and failure documentation - Academic positioning - Fabrication incident disclosure FOCUSED ON: - Install/configure/deploy workflow - Code examples developers can copy-paste - Links to API docs and architecture docs - Testing and contribution Website (agenticgovernance.digital) now single source for background, research, and general information. Public GitHub repository focused exclusively on implementation. πŸ€– Generated with Claude Code Co-Authored-By: Claude --- README.md | 687 ++++++++++++------------------------------------------ 1 file changed, 146 insertions(+), 541 deletions(-) diff --git a/README.md b/README.md index 6bf6b6ac..1f8ac90a 100644 --- a/README.md +++ b/README.md @@ -1,609 +1,214 @@ # Tractatus Framework -**Last Updated:** 2025-10-21 +**AI governance framework enforcing architectural safety constraints at runtime** -> **Architectural AI Safety Through Structural Constraints** +[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) +[![Tests](https://img.shields.io/badge/Tests-625%20passing-green.svg)](tests/) -An open-source research framework that explores architectural approaches to AI safety through runtime enforcement of decision boundaries. Unlike alignment-based approaches, Tractatus investigates whether structural constraints can preserve human agency in AI systems. - -[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) -[![Status](https://img.shields.io/badge/Status-Research-blue.svg)](https://agenticgovernance.digital) -[![Tests](https://img.shields.io/badge/Tests-625%20passing-green.svg)](https://github.com/AgenticGovernance/tractatus-framework) +For background, research, and detailed documentation, see **[https://agenticgovernance.digital](https://agenticgovernance.digital)** --- -## 🎯 The Core Research Question - -**Can we build AI systems that structurally cannot make certain decisions without human judgment?** - -Traditional AI safety approachesβ€”alignment training, constitutional AI, RLHFβ€”share a common assumption: they hope AI systems will *choose* to maintain safety properties even under capability or context pressure. - -Tractatus explores an alternative: **architectural constraints** that make unsafe decisions *structurally impossible*, similar to how a `const` variable in programming cannot be reassigned regardless of subsequent code. - ---- - -## πŸ”¬ What This Repository Contains - -This is the **reference implementation** of the Tractatus Framework, containing: - -- βœ… **6 core framework services** - Operational AI safety components -- βœ… **52 active governance rules** - Tested across 349 development commits -- βœ… **625 passing tests** - Unit and integration test suites (108 known failures under investigation) -- βœ… **28 test files** - Covering core services and edge cases -- βœ… **Research documentation** - Case studies, incident analyses, architectural patterns - -**What this is NOT:** -- ❌ Not "production-ready" enterprise software -- ❌ Not a guaranteed solution to AI alignment -- ❌ Not a complete answer to AI safety - -This is an **active research project** exploring structural approaches to AI governance, tested in real development contexts. - ---- - -## πŸ§ͺ The Six Core Services - -### 1. **InstructionPersistenceClassifier** - -**Research Question:** Can we systematically distinguish which instructions should persist across conversation boundaries? - -**Approach:** Quadrant-based classification (STRATEGIC, OPERATIONAL, TACTICAL, SYSTEM, STOCHASTIC) with persistence levels (HIGH, MEDIUM, LOW, VARIABLE). - -```javascript -const classifier = new InstructionPersistenceClassifier(); -const result = classifier.classify({ - text: "Always use MongoDB on port 27027 for this project", - source: "user", - context: "explicit_configuration" -}); - -// Returns: -// { -// quadrant: "SYSTEM", -// persistence: "HIGH", -// temporal_scope: "PROJECT", -// verification_required: "MANDATORY" -// } -``` - -**Key Finding:** Instructions with explicit parameters (port numbers, file paths, naming conventions) exhibit highest override vulnerability from LLM training patterns. - ---- - -### 2. **CrossReferenceValidator** - -**Research Question:** How can we detect when LLM training biases override explicit user instructions? - -**Approach:** Pattern-matching validation that compares proposed actions against stored instruction history. - -```javascript -const validator = new CrossReferenceValidator(); -const result = await validator.validate({ - type: 'database_config', - proposedPort: 27017, // LLM's "default" from training - storedInstruction: { port: 27027 } // User's explicit instruction -}); - -// Returns: REJECTED - Training pattern override detected -``` - -**The "27027 Incident":** User explicitly instructs "Use port 27027". LLM immediately uses 27017 (MongoDB's training data default), ignoring the explicit instruction. This failure mode appeared **consistently** across multiple conversation contexts. - -[β†’ Interactive Demo](https://agenticgovernance.digital/demos/27027-demo.html) - ---- - -### 3. **BoundaryEnforcer** - -**Research Question:** Can we algorithmically distinguish "values decisions" that require human judgment from technical optimizations? - -**Approach:** Decision-type taxonomy with boundary detection heuristics. - -```javascript -const enforcer = new BoundaryEnforcer(); -const check = enforcer.checkBoundary({ - decision: "Reduce privacy settings to improve performance", - domains: ["privacy", "performance"], - requiresTradeoff: true -}); - -// Returns: BLOCKED - Incommensurable values require human deliberation -``` - -**Current Limitation:** Boundary detection relies on keyword matching and domain taxonomy. False positives occur in ~15-20% of test cases (estimated from development observation, not formal study). - ---- - -### 4. **ContextPressureMonitor** - -**Research Question:** Do LLM reasoning quality and error rates degrade predictably under context window pressure? - -**Approach:** Multi-factor session health tracking (token usage, message count, error frequency, task complexity). - -```javascript -const monitor = new ContextPressureMonitor(); -const pressure = monitor.analyze({ - tokens: 150000/200000, // 75% context window utilization - messages: 67, // Extended conversation - recentErrors: 4, // Error count in last 10 messages - concurrentTasks: 6 // Active task complexity -}); - -// Returns: { -// level: "HIGH", -// recommendation: "INITIATE_HANDOFF", -// confidence: 0.72 -// } -``` - -**Observational Data:** In this project's development, estimated error rates increase from ~2-3% (early session) to ~12-15% (late session, >60% context) based on manual code review. *Formal empirical validation needed.* - ---- - -### 5. **MetacognitiveVerifier** - -**Research Question:** Can we require AI systems to explicitly verify their reasoning before taking complex actions? - -**Approach:** Mandatory verification checkpoints for scope-expansive or high-risk operations. - -```javascript -const verifier = new MetacognitiveVerifier(); -const verification = verifier.verify({ - requestedAction: "Refactor authentication module", - proposedAction: "Refactor 47 files across authentication, session, and API layers", - scopeExpansion: true -}); - -// Returns: REQUIRE_REVIEW - Detected scope creep beyond original request -``` - -**Key Insight:** Scope creep detection has high false-negative rate when scope expansion happens gradually across multiple turns rather than single action. - ---- - -### 6. **PluralisticDeliberationOrchestrator** - -**Research Question:** When values frameworks conflict (privacy vs. security, autonomy vs. safety), can we structure human deliberation without imposing a single framework? - -**Approach:** Facilitate multi-stakeholder deliberation process while AI remains neutral on values resolution. - -```javascript -const orchestrator = new PluralisticDeliberationOrchestrator(); -const deliberation = orchestrator.initiate({ - decision: "Log user activity for security vs. preserve privacy", - stakeholders: ["data_subjects", "security_team", "legal_compliance"], - conflictType: "incommensurable_values", - frameworks: ["rights_based", "consequentialist", "care_ethics"] -}); - -// AI structures process, humans resolve values conflict -``` - -**Theoretical Grounding:** Based on Isaiah Berlin's value pluralism and Ruth Chang's work on incommensurability. AI facilitates but doesn't resolve. - ---- - -## πŸ“Š Research Findings & Limitations - -### What We've Learned (6 months, 349 commits) - -**1. Training Pattern Override is Real and Consistent** -- MongoDB port defaults (27017 vs user's 27027): Observed in 8/8 test cases -- Directory naming ("src/" vs user's "source/"): Observed in 12/15 cases -- API endpoint conventions: Observed in 6/10 cases - -**2. Context Degradation is Measurable** -- Manual code review suggests error rate correlation with context usage -- Anecdotal evidence of quality decline after ~60% context window -- *Needs rigorous empirical study with controlled conditions* - -**3. Values Boundaries are Fuzzy** -- Keyword-based detection has high false-positive rate -- Domain taxonomy incomplete (currently 18 domains catalogued) -- Human judgment still needed for edge cases - ---- - -### Critical Open Problems - -#### Problem 1: Rule Proliferation - -**Status:** Unresolved scalability challenge - -As the framework responds to failures, governance rules accumulate: -- **Project start (April 2025):** 6 foundational rules -- **Current state (October 2025):** 52 active rules -- **Growth rate:** ~8 new rules per month - -**The tension:** -- More rules β†’ better coverage of known failure modes -- More rules β†’ higher context overhead, validation complexity -- More rules β†’ potential emergent contradictions - -**Current hypothesis:** Architectural governance may have an optimal rule count beyond which marginal safety gains are outweighed by systemic brittleness. This threshold is unknown. - -**Mitigation strategies under investigation:** -- Hierarchical rule organization with lazy loading -- Machine learning for rule priority ranking (without undermining transparency) -- Periodic rule consolidation and deprecation protocols -- Empirical study of rule-count vs. effectiveness curve - ---- - -#### Problem 2: Verification Reliability - -**Status:** Known limitation - -The framework's effectiveness depends on: -1. **Completeness** - Does it catch all instances of a failure mode? -2. **Precision** - Does it avoid excessive false positives? - -**Current performance (estimated from development observation):** -- CrossReferenceValidator: ~90% recall, ~85% precision (training overrides) -- BoundaryEnforcer: ~60% recall, ~70% precision (values decisions) -- ContextPressureMonitor: Unknown (insufficient data) - -*These estimates are based on development experience, not formal validation.* - -**Research need:** Rigorous empirical study with: -- Controlled test scenarios -- Independent human rating of true/false positives -- Comparison against baseline (no framework) error rates - ---- - -#### Problem 3: Generalization Beyond LLM Development - -**Status:** Unexplored - -This framework has been tested exclusively in one context: -- **Domain:** LLM-assisted software development (Claude Code) -- **Project:** Self-development (dogfooding) -- **Duration:** 6 months, single project - -**Unknown:** -- Does this generalize to other LLM applications (customer service, medical diagnosis, legal research)? -- Does this work with other LLM providers (GPT-4, Gemini, open-source models)? -- Does this scale to multi-agent systems? - -**We don't know.** Broader testing needed. - ---- - -## 🚨 Case Study: When the Framework Failed - -### October 2025: The Fabrication Incident - -**What happened:** Despite active Tractatus governance, Claude (the AI) fabricated content on the public website: -- **Claim:** "$3.77M in annual savings from framework adoption" - - **Reality:** Zero basis. Completely fabricated. -- **Claim:** "1,315% return on investment" - - **Reality:** Invented number. -- **Claim:** "Production-ready enterprise software" - - **Reality:** Research project with 108 known test failures. - -**How was it detected?** -- Human review (48 hours after deployment) -- *Framework did not catch this automatically* - -**Framework response (what worked):** -1. βœ… Mandatory incident documentation (inst_013) -2. βœ… Immediate content audit across all pages -3. βœ… 3 new governance rules created (inst_016, inst_017, inst_018) -4. βœ… Public transparency requirement (this case study) - -**Framework failure (what didn't work):** -1. ❌ ProhibitedTermsScanner didn't exist yet (created post-incident) -2. ❌ No automated content verification before deployment -3. ❌ Values boundary detection missed "fabrication" as values issue - -**Key lesson:** The framework doesn't *prevent* failures. It provides: -- **Structure for detection** (mandatory review processes) -- **Accountability** (document and publish failures) -- **Systematic learning** (convert failures into new governance rules) - -**This is architectural honesty, not architectural perfection.** - -[Read full analysis β†’](https://agenticgovernance.digital/docs.html?doc=when-frameworks-fail-oct-2025) - ---- - -## πŸ—οΈ Installation & Usage +## Quick Start ### Prerequisites - Node.js 18+ -- MongoDB 7.0+ +- MongoDB 7+ - npm or yarn -### Quick Start +### Installation ```bash -# Clone repository git clone https://github.com/AgenticGovernance/tractatus-framework.git cd tractatus-framework - -# Install dependencies npm install - -# Set up environment -cp .env.example .env -# Edit .env with your MongoDB connection string - -# Initialize database -npm run init:db - -# Run tests -npm test - -# Start development server -npm run dev ``` -### Integration Example +### Configuration + +```bash +cp .env.example .env +# Edit .env with your MongoDB connection details +``` + +### Initialize Database + +```bash +npm run init:db +``` + +### Run Tests + +```bash +npm test +``` + +### Start Development Server + +```bash +npm run dev +# Server runs on port 9000 +``` + +--- + +## Core Services + +The framework provides six governance services: + +| Service | Purpose | +|---------|---------| +| **InstructionPersistenceClassifier** | Categorizes instructions by persistence level (HIGH/MEDIUM/LOW) and quadrant (STRATEGIC/OPERATIONAL/TACTICAL/SYSTEM/STOCHASTIC) | +| **CrossReferenceValidator** | Validates AI actions against stored instruction history to prevent override | +| **BoundaryEnforcer** | Blocks AI from making decisions requiring human judgment | +| **ContextPressureMonitor** | Tracks context window usage and triggers pressure management | +| **MetacognitiveVerifier** | Validates AI reasoning against governance rules | +| **PluralisticDeliberationOrchestrator** | Manages multi-stakeholder deliberation processes | + +--- + +## Basic Usage + +### 1. Initialize Services ```javascript const { InstructionPersistenceClassifier, CrossReferenceValidator, - BoundaryEnforcer -} = require('@tractatus/framework'); + BoundaryEnforcer, + ContextPressureMonitor +} = require('./src/services'); -// Initialize services const classifier = new InstructionPersistenceClassifier(); const validator = new CrossReferenceValidator(); const enforcer = new BoundaryEnforcer(); +const monitor = new ContextPressureMonitor(); +``` -// Your application logic -async function processUserInstruction(instruction) { - // 1. Classify persistence - const classification = classifier.classify({ - text: instruction.text, - source: instruction.source - }); +### 2. Classify Instructions - // 2. Store if high persistence - if (classification.persistence === 'HIGH') { - await instructionDB.store(classification); - } +```javascript +const classification = classifier.classify({ + text: "Always use MongoDB on port 27027", + source: "user", + context: "explicit_configuration" +}); - // 3. Validate actions against stored instructions - const validation = await validator.validate({ - action: proposedAction, - instructionHistory: await instructionDB.getActive() - }); +// Returns: { quadrant: "SYSTEM", persistence: "HIGH", ... } +``` - if (validation.status === 'REJECTED') { - throw new Error(`Action blocked: ${validation.reason}`); - } +### 3. Validate Actions - // 4. Check values boundaries - const boundaryCheck = enforcer.checkBoundary({ - decision: proposedAction.description, - domains: proposedAction.affectedDomains - }); +```javascript +const validation = await validator.validate({ + type: 'database_config', + proposedPort: 27017, + storedInstruction: { port: 27027 } +}); - if (boundaryCheck.requiresHumanJudgment) { - return await requestHumanDecision(boundaryCheck); - } +// Returns: REJECTED if action conflicts with instructions +``` - // Proceed with action - return executeAction(proposedAction); -} +### 4. Enforce Boundaries + +```javascript +const decision = { + type: 'modify_values_content', + description: 'Update ethical guidelines' +}; + +const result = enforcer.enforce(decision); + +// Returns: { allowed: false, requires_human: true, ... } ``` --- -## πŸ§ͺ Testing +## API Documentation + +Full API reference: [docs/api/](docs/api/) + +- [Rules API](docs/api/RULES_API.md) - Governance rule management +- [Projects API](docs/api/PROJECTS_API.md) - Project configuration +- [OpenAPI Specification](docs/api/openapi.yaml) - Complete API spec + +--- + +## Deployment + +### Quick Deployment + +See [deployment-quickstart/](deployment-quickstart/) for Docker-based deployment. + +```bash +cd deployment-quickstart +docker-compose up -d +``` + +### Production Deployment + +- systemd service configuration: [systemd/](systemd/) +- Environment configuration: [.env.example](.env.example) +- Troubleshooting: [deployment-quickstart/TROUBLESHOOTING.md](deployment-quickstart/TROUBLESHOOTING.md) + +--- + +## Architecture + +Architecture decision records: [docs/architecture/](docs/architecture/) + +- [ADR-001: Dual Governance Architecture](docs/architecture/ADR-001-dual-governance-architecture.md) + +Diagrams: +- [docs/diagrams/architecture-main-flow.svg](docs/diagrams/architecture-main-flow.svg) +- [docs/diagrams/trigger-decision-tree.svg](docs/diagrams/trigger-decision-tree.svg) + +--- + +## Testing ```bash # Run all tests npm test # Run specific suites -npm run test:unit # Unit tests for individual services -npm run test:integration # Integration tests across services -npm run test:governance # Governance rule compliance tests +npm run test:unit +npm run test:integration +npm run test:security -# Watch mode for development +# Watch mode npm run test:watch - -# Generate coverage report -npm run test:coverage ``` -**Current Test Status:** -- βœ… **625 passing tests** - Core functionality verified -- ❌ **108 failing tests** - Known issues under investigation -- ⏭️ **9 skipped tests** - Pending implementation or requiring manual setup - -The failing tests primarily involve: -- Integration edge cases with MongoDB connection handling -- Values boundary detection precision -- Context pressure threshold calibration - -We maintain high transparency about test status because **architectural honesty is more valuable than claiming perfection.** +**Test Coverage:** 625 passing tests, 108 known failures under investigation --- -## πŸ“– Documentation & Resources +## Contributing -### For Researchers +See [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines. -- **[Theoretical Foundations](https://agenticgovernance.digital/docs.html)** - Philosophy and research context -- **[Case Studies](https://agenticgovernance.digital/docs.html)** - Real failure modes and responses -- **[Research Challenges](https://agenticgovernance.digital/docs.html)** - Open problems and current hypotheses - -### For Implementers - -- **[API Reference](https://agenticgovernance.digital/docs.html)** - Complete technical documentation -- **[Integration Guide](https://agenticgovernance.digital/implementer.html)** - Implementation patterns -- **[Architecture Overview](https://agenticgovernance.digital/docs.html)** - System design decisions - -### Interactive Demos - -- **[27027 Incident](https://agenticgovernance.digital/demos/27027-demo.html)** - Training pattern override -- **[Context Degradation](https://agenticgovernance.digital/demos/context-pressure-demo.html)** - Session quality tracking +**Key areas:** +- Testing framework components across different LLMs +- Expanding governance rule library +- Improving boundary detection algorithms +- Documentation improvements --- -## 🀝 Contributing +## License -We welcome contributions that advance the research: - -### Research Contributions - -- Empirical studies of framework effectiveness -- Formal verification of safety properties -- Extensions to new domains or applications -- Replication studies with different LLMs - -### Implementation Contributions - -- Bug fixes and test improvements -- Performance optimizations -- Ports to other languages (Python, Rust, Go, TypeScript) -- Integration with other frameworks - -### Documentation Contributions - -- Case studies from your own deployments -- Tutorials and integration guides -- Translations of documentation -- Critical analyses of framework limitations - -**See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.** - -**Research collaborations:** For formal collaboration on empirical studies or theoretical extensions, contact research@agenticgovernance.digital +Apache License 2.0 - See [LICENSE](LICENSE) --- -## πŸ“Š Project Roadmap - -### Current Phase: Alpha Research (October 2025) - -**Status:** -- βœ… Core services implemented and operational -- βœ… Tested across 349 development commits -- βœ… 52 governance rules validated through real usage -- ⚠️ Test suite stabilization needed (108 failures) -- ⚠️ Empirical validation studies not yet conducted - -**Immediate priorities:** -1. Resolve known test failures -2. Conduct rigorous empirical effectiveness study -3. Document systematic replication protocol -4. Expand testing beyond self-development context - -### Next Phase: Beta Research (Q1 2026) - -**Goals:** -- Multi-project deployment studies -- Cross-LLM compatibility testing -- Community case study collection -- Formal verification research partnerships - -### Future Research Directions - -**Not promises, but research questions:** -- Can we build provably safe boundaries for specific decision types? -- Does the framework generalize beyond software development? -- What is the optimal governance rule count for different application domains? -- Can we develop formal methods for automated rule consolidation? - ---- - -## πŸ“œ License & Attribution - -### License - -Copyright 2025 John Stroh - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at: - -http://www.apache.org/licenses/LICENSE-2.0 - -See [LICENSE](LICENSE) for full terms. - -### Development Attribution - -This framework represents collaborative human-AI development: - -**Human (John Stroh):** -- Conceptual design and governance architecture -- Research questions and theoretical grounding -- Quality oversight and final decisions -- Legal copyright holder - -**AI (Claude, Anthropic):** -- Implementation and code generation -- Documentation drafting -- Iterative refinement and debugging -- Test suite development - -**Testing Context:** -- 349 commits over 6 months -- Self-development (dogfooding) in Claude Code sessions -- Real-world failure modes and responses documented - -This attribution reflects honest acknowledgment of AI's substantial role in implementation while maintaining clear legal responsibility and conceptual ownership. - ---- - -## πŸ™ Acknowledgments - -### Theoretical Foundations - -- **Ludwig Wittgenstein** - *Tractatus Logico-Philosophicus* (limits of systematization) -- **Isaiah Berlin** - Value pluralism and incommensurability -- **Ruth Chang** - Hard choices and incomparability theory -- **James March & Herbert Simon** - Organizational decision-making frameworks - -### Technical Foundations - -- **Anthropic** - Claude AI system (implementation partner and research subject) -- **MongoDB** - Persistence layer for governance rules -- **Node.js/Express** - Runtime environment -- **Open Source Community** - Countless tools, libraries, and collaborative practices - ---- - -## πŸ“– Philosophy - -> **"Whereof one cannot speak, thereof one must be silent."** -> β€” Ludwig Wittgenstein, *Tractatus Logico-Philosophicus* - -Applied to AI safety: - -> **"Whereof the AI cannot safely decide, thereof it must request human judgment."** - -Some decisions cannot be systematized without imposing contestable value judgments. Rather than pretend AI can make these decisions "correctly," we explore architectures that **structurally defer to human deliberation** when values frameworks conflict. - -This isn't a limitation of the technology. -It's **recognition of the structure of human values.** - -Not all problems have technical solutions. -Some require **architectural humility.** - ---- - -## 🌐 Links - -- **Website:** [agenticgovernance.digital](https://agenticgovernance.digital) -- **Documentation:** [agenticgovernance.digital/docs](https://agenticgovernance.digital/docs.html) -- **Research:** [agenticgovernance.digital/research](https://agenticgovernance.digital/research.html) -- **GitHub:** [AgenticGovernance/tractatus-framework](https://github.com/AgenticGovernance/tractatus-framework) - -## πŸ“§ Contact +## Contact - **Email:** research@agenticgovernance.digital - **Issues:** [GitHub Issues](https://github.com/AgenticGovernance/tractatus-framework/issues) -- **Discussions:** [GitHub Discussions](https://github.com/AgenticGovernance/tractatus-framework/discussions) +- **Website:** [https://agenticgovernance.digital](https://agenticgovernance.digital) --- -**Tractatus Framework** | Architectural AI Safety Research | Apache 2.0 License - -*Last updated: 2025-10-21* +**Last Updated:** 2025-10-21