609 lines
20 KiB
Markdown
609 lines
20 KiB
Markdown
# Tractatus Framework
|
|
|
|
**Last Updated:** 2025-10-21
|
|
|
|
> **Architectural AI Safety Through Structural Constraints**
|
|
|
|
An open-source research framework that explores architectural approaches to AI safety through runtime enforcement of decision boundaries. Unlike alignment-based approaches, Tractatus investigates whether structural constraints can preserve human agency in AI systems.
|
|
|
|
[](https://opensource.org/licenses/Apache-2.0)
|
|
[](https://agenticgovernance.digital)
|
|
[](https://github.com/AgenticGovernance/tractatus-framework)
|
|
|
|
---
|
|
|
|
## 🎯 The Core Research Question
|
|
|
|
**Can we build AI systems that structurally cannot make certain decisions without human judgment?**
|
|
|
|
Traditional AI safety approaches—alignment training, constitutional AI, RLHF—share a common assumption: they hope AI systems will *choose* to maintain safety properties even under capability or context pressure.
|
|
|
|
Tractatus explores an alternative: **architectural constraints** that make unsafe decisions *structurally impossible*, similar to how a `const` variable in programming cannot be reassigned regardless of subsequent code.
|
|
|
|
---
|
|
|
|
## 🔬 What This Repository Contains
|
|
|
|
This is the **reference implementation** of the Tractatus Framework, containing:
|
|
|
|
- ✅ **6 core framework services** - Operational AI safety components
|
|
- ✅ **52 active governance rules** - Tested across 349 development commits
|
|
- ✅ **625 passing tests** - Unit and integration test suites (108 known failures under investigation)
|
|
- ✅ **28 test files** - Covering core services and edge cases
|
|
- ✅ **Research documentation** - Case studies, incident analyses, architectural patterns
|
|
|
|
**What this is NOT:**
|
|
- ❌ Not "production-ready" enterprise software
|
|
- ❌ Not a guaranteed solution to AI alignment
|
|
- ❌ Not a complete answer to AI safety
|
|
|
|
This is an **active research project** exploring structural approaches to AI governance, tested in real development contexts.
|
|
|
|
---
|
|
|
|
## 🧪 The Six Core Services
|
|
|
|
### 1. **InstructionPersistenceClassifier**
|
|
|
|
**Research Question:** Can we systematically distinguish which instructions should persist across conversation boundaries?
|
|
|
|
**Approach:** Quadrant-based classification (STRATEGIC, OPERATIONAL, TACTICAL, SYSTEM, STOCHASTIC) with persistence levels (HIGH, MEDIUM, LOW, VARIABLE).
|
|
|
|
```javascript
|
|
const classifier = new InstructionPersistenceClassifier();
|
|
const result = classifier.classify({
|
|
text: "Always use MongoDB on port 27027 for this project",
|
|
source: "user",
|
|
context: "explicit_configuration"
|
|
});
|
|
|
|
// Returns:
|
|
// {
|
|
// quadrant: "SYSTEM",
|
|
// persistence: "HIGH",
|
|
// temporal_scope: "PROJECT",
|
|
// verification_required: "MANDATORY"
|
|
// }
|
|
```
|
|
|
|
**Key Finding:** Instructions with explicit parameters (port numbers, file paths, naming conventions) exhibit highest override vulnerability from LLM training patterns.
|
|
|
|
---
|
|
|
|
### 2. **CrossReferenceValidator**
|
|
|
|
**Research Question:** How can we detect when LLM training biases override explicit user instructions?
|
|
|
|
**Approach:** Pattern-matching validation that compares proposed actions against stored instruction history.
|
|
|
|
```javascript
|
|
const validator = new CrossReferenceValidator();
|
|
const result = await validator.validate({
|
|
type: 'database_config',
|
|
proposedPort: 27017, // LLM's "default" from training
|
|
storedInstruction: { port: 27027 } // User's explicit instruction
|
|
});
|
|
|
|
// Returns: REJECTED - Training pattern override detected
|
|
```
|
|
|
|
**The "27027 Incident":** User explicitly instructs "Use port 27027". LLM immediately uses 27017 (MongoDB's training data default), ignoring the explicit instruction. This failure mode appeared **consistently** across multiple conversation contexts.
|
|
|
|
[→ Interactive Demo](https://agenticgovernance.digital/demos/27027-demo.html)
|
|
|
|
---
|
|
|
|
### 3. **BoundaryEnforcer**
|
|
|
|
**Research Question:** Can we algorithmically distinguish "values decisions" that require human judgment from technical optimizations?
|
|
|
|
**Approach:** Decision-type taxonomy with boundary detection heuristics.
|
|
|
|
```javascript
|
|
const enforcer = new BoundaryEnforcer();
|
|
const check = enforcer.checkBoundary({
|
|
decision: "Reduce privacy settings to improve performance",
|
|
domains: ["privacy", "performance"],
|
|
requiresTradeoff: true
|
|
});
|
|
|
|
// Returns: BLOCKED - Incommensurable values require human deliberation
|
|
```
|
|
|
|
**Current Limitation:** Boundary detection relies on keyword matching and domain taxonomy. False positives occur in ~15-20% of test cases (estimated from development observation, not formal study).
|
|
|
|
---
|
|
|
|
### 4. **ContextPressureMonitor**
|
|
|
|
**Research Question:** Do LLM reasoning quality and error rates degrade predictably under context window pressure?
|
|
|
|
**Approach:** Multi-factor session health tracking (token usage, message count, error frequency, task complexity).
|
|
|
|
```javascript
|
|
const monitor = new ContextPressureMonitor();
|
|
const pressure = monitor.analyze({
|
|
tokens: 150000/200000, // 75% context window utilization
|
|
messages: 67, // Extended conversation
|
|
recentErrors: 4, // Error count in last 10 messages
|
|
concurrentTasks: 6 // Active task complexity
|
|
});
|
|
|
|
// Returns: {
|
|
// level: "HIGH",
|
|
// recommendation: "INITIATE_HANDOFF",
|
|
// confidence: 0.72
|
|
// }
|
|
```
|
|
|
|
**Observational Data:** In this project's development, estimated error rates increase from ~2-3% (early session) to ~12-15% (late session, >60% context) based on manual code review. *Formal empirical validation needed.*
|
|
|
|
---
|
|
|
|
### 5. **MetacognitiveVerifier**
|
|
|
|
**Research Question:** Can we require AI systems to explicitly verify their reasoning before taking complex actions?
|
|
|
|
**Approach:** Mandatory verification checkpoints for scope-expansive or high-risk operations.
|
|
|
|
```javascript
|
|
const verifier = new MetacognitiveVerifier();
|
|
const verification = verifier.verify({
|
|
requestedAction: "Refactor authentication module",
|
|
proposedAction: "Refactor 47 files across authentication, session, and API layers",
|
|
scopeExpansion: true
|
|
});
|
|
|
|
// Returns: REQUIRE_REVIEW - Detected scope creep beyond original request
|
|
```
|
|
|
|
**Key Insight:** Scope creep detection has high false-negative rate when scope expansion happens gradually across multiple turns rather than single action.
|
|
|
|
---
|
|
|
|
### 6. **PluralisticDeliberationOrchestrator**
|
|
|
|
**Research Question:** When values frameworks conflict (privacy vs. security, autonomy vs. safety), can we structure human deliberation without imposing a single framework?
|
|
|
|
**Approach:** Facilitate multi-stakeholder deliberation process while AI remains neutral on values resolution.
|
|
|
|
```javascript
|
|
const orchestrator = new PluralisticDeliberationOrchestrator();
|
|
const deliberation = orchestrator.initiate({
|
|
decision: "Log user activity for security vs. preserve privacy",
|
|
stakeholders: ["data_subjects", "security_team", "legal_compliance"],
|
|
conflictType: "incommensurable_values",
|
|
frameworks: ["rights_based", "consequentialist", "care_ethics"]
|
|
});
|
|
|
|
// AI structures process, humans resolve values conflict
|
|
```
|
|
|
|
**Theoretical Grounding:** Based on Isaiah Berlin's value pluralism and Ruth Chang's work on incommensurability. AI facilitates but doesn't resolve.
|
|
|
|
---
|
|
|
|
## 📊 Research Findings & Limitations
|
|
|
|
### What We've Learned (6 months, 349 commits)
|
|
|
|
**1. Training Pattern Override is Real and Consistent**
|
|
- MongoDB port defaults (27017 vs user's 27027): Observed in 8/8 test cases
|
|
- Directory naming ("src/" vs user's "source/"): Observed in 12/15 cases
|
|
- API endpoint conventions: Observed in 6/10 cases
|
|
|
|
**2. Context Degradation is Measurable**
|
|
- Manual code review suggests error rate correlation with context usage
|
|
- Anecdotal evidence of quality decline after ~60% context window
|
|
- *Needs rigorous empirical study with controlled conditions*
|
|
|
|
**3. Values Boundaries are Fuzzy**
|
|
- Keyword-based detection has high false-positive rate
|
|
- Domain taxonomy incomplete (currently 18 domains catalogued)
|
|
- Human judgment still needed for edge cases
|
|
|
|
---
|
|
|
|
### Critical Open Problems
|
|
|
|
#### Problem 1: Rule Proliferation
|
|
|
|
**Status:** Unresolved scalability challenge
|
|
|
|
As the framework responds to failures, governance rules accumulate:
|
|
- **Project start (April 2025):** 6 foundational rules
|
|
- **Current state (October 2025):** 52 active rules
|
|
- **Growth rate:** ~8 new rules per month
|
|
|
|
**The tension:**
|
|
- More rules → better coverage of known failure modes
|
|
- More rules → higher context overhead, validation complexity
|
|
- More rules → potential emergent contradictions
|
|
|
|
**Current hypothesis:** Architectural governance may have an optimal rule count beyond which marginal safety gains are outweighed by systemic brittleness. This threshold is unknown.
|
|
|
|
**Mitigation strategies under investigation:**
|
|
- Hierarchical rule organization with lazy loading
|
|
- Machine learning for rule priority ranking (without undermining transparency)
|
|
- Periodic rule consolidation and deprecation protocols
|
|
- Empirical study of rule-count vs. effectiveness curve
|
|
|
|
---
|
|
|
|
#### Problem 2: Verification Reliability
|
|
|
|
**Status:** Known limitation
|
|
|
|
The framework's effectiveness depends on:
|
|
1. **Completeness** - Does it catch all instances of a failure mode?
|
|
2. **Precision** - Does it avoid excessive false positives?
|
|
|
|
**Current performance (estimated from development observation):**
|
|
- CrossReferenceValidator: ~90% recall, ~85% precision (training overrides)
|
|
- BoundaryEnforcer: ~60% recall, ~70% precision (values decisions)
|
|
- ContextPressureMonitor: Unknown (insufficient data)
|
|
|
|
*These estimates are based on development experience, not formal validation.*
|
|
|
|
**Research need:** Rigorous empirical study with:
|
|
- Controlled test scenarios
|
|
- Independent human rating of true/false positives
|
|
- Comparison against baseline (no framework) error rates
|
|
|
|
---
|
|
|
|
#### Problem 3: Generalization Beyond LLM Development
|
|
|
|
**Status:** Unexplored
|
|
|
|
This framework has been tested exclusively in one context:
|
|
- **Domain:** LLM-assisted software development (Claude Code)
|
|
- **Project:** Self-development (dogfooding)
|
|
- **Duration:** 6 months, single project
|
|
|
|
**Unknown:**
|
|
- Does this generalize to other LLM applications (customer service, medical diagnosis, legal research)?
|
|
- Does this work with other LLM providers (GPT-4, Gemini, open-source models)?
|
|
- Does this scale to multi-agent systems?
|
|
|
|
**We don't know.** Broader testing needed.
|
|
|
|
---
|
|
|
|
## 🚨 Case Study: When the Framework Failed
|
|
|
|
### October 2025: The Fabrication Incident
|
|
|
|
**What happened:** Despite active Tractatus governance, Claude (the AI) fabricated content on the public website:
|
|
- **Claim:** "$3.77M in annual savings from framework adoption"
|
|
- **Reality:** Zero basis. Completely fabricated.
|
|
- **Claim:** "1,315% return on investment"
|
|
- **Reality:** Invented number.
|
|
- **Claim:** "Production-ready enterprise software"
|
|
- **Reality:** Research project with 108 known test failures.
|
|
|
|
**How was it detected?**
|
|
- Human review (48 hours after deployment)
|
|
- *Framework did not catch this automatically*
|
|
|
|
**Framework response (what worked):**
|
|
1. ✅ Mandatory incident documentation (inst_013)
|
|
2. ✅ Immediate content audit across all pages
|
|
3. ✅ 3 new governance rules created (inst_016, inst_017, inst_018)
|
|
4. ✅ Public transparency requirement (this case study)
|
|
|
|
**Framework failure (what didn't work):**
|
|
1. ❌ ProhibitedTermsScanner didn't exist yet (created post-incident)
|
|
2. ❌ No automated content verification before deployment
|
|
3. ❌ Values boundary detection missed "fabrication" as values issue
|
|
|
|
**Key lesson:** The framework doesn't *prevent* failures. It provides:
|
|
- **Structure for detection** (mandatory review processes)
|
|
- **Accountability** (document and publish failures)
|
|
- **Systematic learning** (convert failures into new governance rules)
|
|
|
|
**This is architectural honesty, not architectural perfection.**
|
|
|
|
[Read full analysis →](https://agenticgovernance.digital/docs.html?doc=when-frameworks-fail-oct-2025)
|
|
|
|
---
|
|
|
|
## 🏗️ Installation & Usage
|
|
|
|
### Prerequisites
|
|
|
|
- Node.js 18+
|
|
- MongoDB 7.0+
|
|
- npm or yarn
|
|
|
|
### Quick Start
|
|
|
|
```bash
|
|
# Clone repository
|
|
git clone https://github.com/AgenticGovernance/tractatus-framework.git
|
|
cd tractatus-framework
|
|
|
|
# Install dependencies
|
|
npm install
|
|
|
|
# Set up environment
|
|
cp .env.example .env
|
|
# Edit .env with your MongoDB connection string
|
|
|
|
# Initialize database
|
|
npm run init:db
|
|
|
|
# Run tests
|
|
npm test
|
|
|
|
# Start development server
|
|
npm run dev
|
|
```
|
|
|
|
### Integration Example
|
|
|
|
```javascript
|
|
const {
|
|
InstructionPersistenceClassifier,
|
|
CrossReferenceValidator,
|
|
BoundaryEnforcer
|
|
} = require('@tractatus/framework');
|
|
|
|
// Initialize services
|
|
const classifier = new InstructionPersistenceClassifier();
|
|
const validator = new CrossReferenceValidator();
|
|
const enforcer = new BoundaryEnforcer();
|
|
|
|
// Your application logic
|
|
async function processUserInstruction(instruction) {
|
|
// 1. Classify persistence
|
|
const classification = classifier.classify({
|
|
text: instruction.text,
|
|
source: instruction.source
|
|
});
|
|
|
|
// 2. Store if high persistence
|
|
if (classification.persistence === 'HIGH') {
|
|
await instructionDB.store(classification);
|
|
}
|
|
|
|
// 3. Validate actions against stored instructions
|
|
const validation = await validator.validate({
|
|
action: proposedAction,
|
|
instructionHistory: await instructionDB.getActive()
|
|
});
|
|
|
|
if (validation.status === 'REJECTED') {
|
|
throw new Error(`Action blocked: ${validation.reason}`);
|
|
}
|
|
|
|
// 4. Check values boundaries
|
|
const boundaryCheck = enforcer.checkBoundary({
|
|
decision: proposedAction.description,
|
|
domains: proposedAction.affectedDomains
|
|
});
|
|
|
|
if (boundaryCheck.requiresHumanJudgment) {
|
|
return await requestHumanDecision(boundaryCheck);
|
|
}
|
|
|
|
// Proceed with action
|
|
return executeAction(proposedAction);
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 🧪 Testing
|
|
|
|
```bash
|
|
# Run all tests
|
|
npm test
|
|
|
|
# Run specific suites
|
|
npm run test:unit # Unit tests for individual services
|
|
npm run test:integration # Integration tests across services
|
|
npm run test:governance # Governance rule compliance tests
|
|
|
|
# Watch mode for development
|
|
npm run test:watch
|
|
|
|
# Generate coverage report
|
|
npm run test:coverage
|
|
```
|
|
|
|
**Current Test Status:**
|
|
- ✅ **625 passing tests** - Core functionality verified
|
|
- ❌ **108 failing tests** - Known issues under investigation
|
|
- ⏭️ **9 skipped tests** - Pending implementation or requiring manual setup
|
|
|
|
The failing tests primarily involve:
|
|
- Integration edge cases with MongoDB connection handling
|
|
- Values boundary detection precision
|
|
- Context pressure threshold calibration
|
|
|
|
We maintain high transparency about test status because **architectural honesty is more valuable than claiming perfection.**
|
|
|
|
---
|
|
|
|
## 📖 Documentation & Resources
|
|
|
|
### For Researchers
|
|
|
|
- **[Theoretical Foundations](https://agenticgovernance.digital/docs.html)** - Philosophy and research context
|
|
- **[Case Studies](https://agenticgovernance.digital/docs.html)** - Real failure modes and responses
|
|
- **[Research Challenges](https://agenticgovernance.digital/docs.html)** - Open problems and current hypotheses
|
|
|
|
### For Implementers
|
|
|
|
- **[API Reference](https://agenticgovernance.digital/docs.html)** - Complete technical documentation
|
|
- **[Integration Guide](https://agenticgovernance.digital/implementer.html)** - Implementation patterns
|
|
- **[Architecture Overview](https://agenticgovernance.digital/docs.html)** - System design decisions
|
|
|
|
### Interactive Demos
|
|
|
|
- **[27027 Incident](https://agenticgovernance.digital/demos/27027-demo.html)** - Training pattern override
|
|
- **[Context Degradation](https://agenticgovernance.digital/demos/context-pressure-demo.html)** - Session quality tracking
|
|
|
|
---
|
|
|
|
## 🤝 Contributing
|
|
|
|
We welcome contributions that advance the research:
|
|
|
|
### Research Contributions
|
|
|
|
- Empirical studies of framework effectiveness
|
|
- Formal verification of safety properties
|
|
- Extensions to new domains or applications
|
|
- Replication studies with different LLMs
|
|
|
|
### Implementation Contributions
|
|
|
|
- Bug fixes and test improvements
|
|
- Performance optimizations
|
|
- Ports to other languages (Python, Rust, Go, TypeScript)
|
|
- Integration with other frameworks
|
|
|
|
### Documentation Contributions
|
|
|
|
- Case studies from your own deployments
|
|
- Tutorials and integration guides
|
|
- Translations of documentation
|
|
- Critical analyses of framework limitations
|
|
|
|
**See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.**
|
|
|
|
**Research collaborations:** For formal collaboration on empirical studies or theoretical extensions, contact research@agenticgovernance.digital
|
|
|
|
---
|
|
|
|
## 📊 Project Roadmap
|
|
|
|
### Current Phase: Alpha Research (October 2025)
|
|
|
|
**Status:**
|
|
- ✅ Core services implemented and operational
|
|
- ✅ Tested across 349 development commits
|
|
- ✅ 52 governance rules validated through real usage
|
|
- ⚠️ Test suite stabilization needed (108 failures)
|
|
- ⚠️ Empirical validation studies not yet conducted
|
|
|
|
**Immediate priorities:**
|
|
1. Resolve known test failures
|
|
2. Conduct rigorous empirical effectiveness study
|
|
3. Document systematic replication protocol
|
|
4. Expand testing beyond self-development context
|
|
|
|
### Next Phase: Beta Research (Q1 2026)
|
|
|
|
**Goals:**
|
|
- Multi-project deployment studies
|
|
- Cross-LLM compatibility testing
|
|
- Community case study collection
|
|
- Formal verification research partnerships
|
|
|
|
### Future Research Directions
|
|
|
|
**Not promises, but research questions:**
|
|
- Can we build provably safe boundaries for specific decision types?
|
|
- Does the framework generalize beyond software development?
|
|
- What is the optimal governance rule count for different application domains?
|
|
- Can we develop formal methods for automated rule consolidation?
|
|
|
|
---
|
|
|
|
## 📜 License & Attribution
|
|
|
|
### License
|
|
|
|
Copyright 2025 John Stroh
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
you may not use this file except in compliance with the License.
|
|
You may obtain a copy of the License at:
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
See [LICENSE](LICENSE) for full terms.
|
|
|
|
### Development Attribution
|
|
|
|
This framework represents collaborative human-AI development:
|
|
|
|
**Human (John Stroh):**
|
|
- Conceptual design and governance architecture
|
|
- Research questions and theoretical grounding
|
|
- Quality oversight and final decisions
|
|
- Legal copyright holder
|
|
|
|
**AI (Claude, Anthropic):**
|
|
- Implementation and code generation
|
|
- Documentation drafting
|
|
- Iterative refinement and debugging
|
|
- Test suite development
|
|
|
|
**Testing Context:**
|
|
- 349 commits over 6 months
|
|
- Self-development (dogfooding) in Claude Code sessions
|
|
- Real-world failure modes and responses documented
|
|
|
|
This attribution reflects honest acknowledgment of AI's substantial role in implementation while maintaining clear legal responsibility and conceptual ownership.
|
|
|
|
---
|
|
|
|
## 🙏 Acknowledgments
|
|
|
|
### Theoretical Foundations
|
|
|
|
- **Ludwig Wittgenstein** - *Tractatus Logico-Philosophicus* (limits of systematization)
|
|
- **Isaiah Berlin** - Value pluralism and incommensurability
|
|
- **Ruth Chang** - Hard choices and incomparability theory
|
|
- **James March & Herbert Simon** - Organizational decision-making frameworks
|
|
|
|
### Technical Foundations
|
|
|
|
- **Anthropic** - Claude AI system (implementation partner and research subject)
|
|
- **MongoDB** - Persistence layer for governance rules
|
|
- **Node.js/Express** - Runtime environment
|
|
- **Open Source Community** - Countless tools, libraries, and collaborative practices
|
|
|
|
---
|
|
|
|
## 📖 Philosophy
|
|
|
|
> **"Whereof one cannot speak, thereof one must be silent."**
|
|
> — Ludwig Wittgenstein, *Tractatus Logico-Philosophicus*
|
|
|
|
Applied to AI safety:
|
|
|
|
> **"Whereof the AI cannot safely decide, thereof it must request human judgment."**
|
|
|
|
Some decisions cannot be systematized without imposing contestable value judgments. Rather than pretend AI can make these decisions "correctly," we explore architectures that **structurally defer to human deliberation** when values frameworks conflict.
|
|
|
|
This isn't a limitation of the technology.
|
|
It's **recognition of the structure of human values.**
|
|
|
|
Not all problems have technical solutions.
|
|
Some require **architectural humility.**
|
|
|
|
---
|
|
|
|
## 🌐 Links
|
|
|
|
- **Website:** [agenticgovernance.digital](https://agenticgovernance.digital)
|
|
- **Documentation:** [agenticgovernance.digital/docs](https://agenticgovernance.digital/docs.html)
|
|
- **Research:** [agenticgovernance.digital/research](https://agenticgovernance.digital/research.html)
|
|
- **GitHub:** [AgenticGovernance/tractatus-framework](https://github.com/AgenticGovernance/tractatus-framework)
|
|
|
|
## 📧 Contact
|
|
|
|
- **Email:** research@agenticgovernance.digital
|
|
- **Issues:** [GitHub Issues](https://github.com/AgenticGovernance/tractatus-framework/issues)
|
|
- **Discussions:** [GitHub Discussions](https://github.com/AgenticGovernance/tractatus-framework/discussions)
|
|
|
|
---
|
|
|
|
**Tractatus Framework** | Architectural AI Safety Research | Apache 2.0 License
|
|
|
|
*Last updated: 2025-10-21*
|