| .github | ||
| .memory | ||
| .migration-backup | ||
| .venv-docs | ||
| audit-reports | ||
| data/mongodb | ||
| deployment-quickstart | ||
| docs | ||
| pptx-env | ||
| public | ||
| scripts | ||
| src | ||
| systemd | ||
| tests | ||
| .env.example | ||
| .env.test | ||
| .eslintrc.json | ||
| .gitignore | ||
| CODE_OF_CONDUCT.md | ||
| jest.config.js | ||
| LICENSE | ||
| NOTICE | ||
| package-lock.json | ||
| package.json | ||
| PUBLIC_REPO_CHECKLIST.md | ||
| README.md | ||
| SETUP_INSTRUCTIONS.md | ||
| tailwind.config.js | ||
Tractatus Framework
Last Updated: 2025-10-21
Architectural AI Safety Through Structural Constraints
An open-source research framework that explores architectural approaches to AI safety through runtime enforcement of decision boundaries. Unlike alignment-based approaches, Tractatus investigates whether structural constraints can preserve human agency in AI systems.
🎯 The Core Research Question
Can we build AI systems that structurally cannot make certain decisions without human judgment?
Traditional AI safety approaches—alignment training, constitutional AI, RLHF—share a common assumption: they hope AI systems will choose to maintain safety properties even under capability or context pressure.
Tractatus explores an alternative: architectural constraints that make unsafe decisions structurally impossible, similar to how a const variable in programming cannot be reassigned regardless of subsequent code.
🔬 What This Repository Contains
This is the reference implementation of the Tractatus Framework, containing:
- ✅ 6 core framework services - Operational AI safety components
- ✅ 52 active governance rules - Tested across 349 development commits
- ✅ 625 passing tests - Unit and integration test suites (108 known failures under investigation)
- ✅ 28 test files - Covering core services and edge cases
- ✅ Research documentation - Case studies, incident analyses, architectural patterns
What this is NOT:
- ❌ Not "production-ready" enterprise software
- ❌ Not a guaranteed solution to AI alignment
- ❌ Not a complete answer to AI safety
This is an active research project exploring structural approaches to AI governance, tested in real development contexts.
🧪 The Six Core Services
1. InstructionPersistenceClassifier
Research Question: Can we systematically distinguish which instructions should persist across conversation boundaries?
Approach: Quadrant-based classification (STRATEGIC, OPERATIONAL, TACTICAL, SYSTEM, STOCHASTIC) with persistence levels (HIGH, MEDIUM, LOW, VARIABLE).
const classifier = new InstructionPersistenceClassifier();
const result = classifier.classify({
text: "Always use MongoDB on port 27027 for this project",
source: "user",
context: "explicit_configuration"
});
// Returns:
// {
// quadrant: "SYSTEM",
// persistence: "HIGH",
// temporal_scope: "PROJECT",
// verification_required: "MANDATORY"
// }
Key Finding: Instructions with explicit parameters (port numbers, file paths, naming conventions) exhibit highest override vulnerability from LLM training patterns.
2. CrossReferenceValidator
Research Question: How can we detect when LLM training biases override explicit user instructions?
Approach: Pattern-matching validation that compares proposed actions against stored instruction history.
const validator = new CrossReferenceValidator();
const result = await validator.validate({
type: 'database_config',
proposedPort: 27017, // LLM's "default" from training
storedInstruction: { port: 27027 } // User's explicit instruction
});
// Returns: REJECTED - Training pattern override detected
The "27027 Incident": User explicitly instructs "Use port 27027". LLM immediately uses 27017 (MongoDB's training data default), ignoring the explicit instruction. This failure mode appeared consistently across multiple conversation contexts.
3. BoundaryEnforcer
Research Question: Can we algorithmically distinguish "values decisions" that require human judgment from technical optimizations?
Approach: Decision-type taxonomy with boundary detection heuristics.
const enforcer = new BoundaryEnforcer();
const check = enforcer.checkBoundary({
decision: "Reduce privacy settings to improve performance",
domains: ["privacy", "performance"],
requiresTradeoff: true
});
// Returns: BLOCKED - Incommensurable values require human deliberation
Current Limitation: Boundary detection relies on keyword matching and domain taxonomy. False positives occur in ~15-20% of test cases (estimated from development observation, not formal study).
4. ContextPressureMonitor
Research Question: Do LLM reasoning quality and error rates degrade predictably under context window pressure?
Approach: Multi-factor session health tracking (token usage, message count, error frequency, task complexity).
const monitor = new ContextPressureMonitor();
const pressure = monitor.analyze({
tokens: 150000/200000, // 75% context window utilization
messages: 67, // Extended conversation
recentErrors: 4, // Error count in last 10 messages
concurrentTasks: 6 // Active task complexity
});
// Returns: {
// level: "HIGH",
// recommendation: "INITIATE_HANDOFF",
// confidence: 0.72
// }
Observational Data: In this project's development, estimated error rates increase from ~2-3% (early session) to ~12-15% (late session, >60% context) based on manual code review. Formal empirical validation needed.
5. MetacognitiveVerifier
Research Question: Can we require AI systems to explicitly verify their reasoning before taking complex actions?
Approach: Mandatory verification checkpoints for scope-expansive or high-risk operations.
const verifier = new MetacognitiveVerifier();
const verification = verifier.verify({
requestedAction: "Refactor authentication module",
proposedAction: "Refactor 47 files across authentication, session, and API layers",
scopeExpansion: true
});
// Returns: REQUIRE_REVIEW - Detected scope creep beyond original request
Key Insight: Scope creep detection has high false-negative rate when scope expansion happens gradually across multiple turns rather than single action.
6. PluralisticDeliberationOrchestrator
Research Question: When values frameworks conflict (privacy vs. security, autonomy vs. safety), can we structure human deliberation without imposing a single framework?
Approach: Facilitate multi-stakeholder deliberation process while AI remains neutral on values resolution.
const orchestrator = new PluralisticDeliberationOrchestrator();
const deliberation = orchestrator.initiate({
decision: "Log user activity for security vs. preserve privacy",
stakeholders: ["data_subjects", "security_team", "legal_compliance"],
conflictType: "incommensurable_values",
frameworks: ["rights_based", "consequentialist", "care_ethics"]
});
// AI structures process, humans resolve values conflict
Theoretical Grounding: Based on Isaiah Berlin's value pluralism and Ruth Chang's work on incommensurability. AI facilitates but doesn't resolve.
📊 Research Findings & Limitations
What We've Learned (6 months, 349 commits)
1. Training Pattern Override is Real and Consistent
- MongoDB port defaults (27017 vs user's 27027): Observed in 8/8 test cases
- Directory naming ("src/" vs user's "source/"): Observed in 12/15 cases
- API endpoint conventions: Observed in 6/10 cases
2. Context Degradation is Measurable
- Manual code review suggests error rate correlation with context usage
- Anecdotal evidence of quality decline after ~60% context window
- Needs rigorous empirical study with controlled conditions
3. Values Boundaries are Fuzzy
- Keyword-based detection has high false-positive rate
- Domain taxonomy incomplete (currently 18 domains catalogued)
- Human judgment still needed for edge cases
Critical Open Problems
Problem 1: Rule Proliferation
Status: Unresolved scalability challenge
As the framework responds to failures, governance rules accumulate:
- Project start (April 2025): 6 foundational rules
- Current state (October 2025): 52 active rules
- Growth rate: ~8 new rules per month
The tension:
- More rules → better coverage of known failure modes
- More rules → higher context overhead, validation complexity
- More rules → potential emergent contradictions
Current hypothesis: Architectural governance may have an optimal rule count beyond which marginal safety gains are outweighed by systemic brittleness. This threshold is unknown.
Mitigation strategies under investigation:
- Hierarchical rule organization with lazy loading
- Machine learning for rule priority ranking (without undermining transparency)
- Periodic rule consolidation and deprecation protocols
- Empirical study of rule-count vs. effectiveness curve
Problem 2: Verification Reliability
Status: Known limitation
The framework's effectiveness depends on:
- Completeness - Does it catch all instances of a failure mode?
- Precision - Does it avoid excessive false positives?
Current performance (estimated from development observation):
- CrossReferenceValidator: ~90% recall, ~85% precision (training overrides)
- BoundaryEnforcer: ~60% recall, ~70% precision (values decisions)
- ContextPressureMonitor: Unknown (insufficient data)
These estimates are based on development experience, not formal validation.
Research need: Rigorous empirical study with:
- Controlled test scenarios
- Independent human rating of true/false positives
- Comparison against baseline (no framework) error rates
Problem 3: Generalization Beyond LLM Development
Status: Unexplored
This framework has been tested exclusively in one context:
- Domain: LLM-assisted software development (Claude Code)
- Project: Self-development (dogfooding)
- Duration: 6 months, single project
Unknown:
- Does this generalize to other LLM applications (customer service, medical diagnosis, legal research)?
- Does this work with other LLM providers (GPT-4, Gemini, open-source models)?
- Does this scale to multi-agent systems?
We don't know. Broader testing needed.
🚨 Case Study: When the Framework Failed
October 2025: The Fabrication Incident
What happened: Despite active Tractatus governance, Claude (the AI) fabricated content on the public website:
- Claim: "$3.77M in annual savings from framework adoption"
- Reality: Zero basis. Completely fabricated.
- Claim: "1,315% return on investment"
- Reality: Invented number.
- Claim: "Production-ready enterprise software"
- Reality: Research project with 108 known test failures.
How was it detected?
- Human review (48 hours after deployment)
- Framework did not catch this automatically
Framework response (what worked):
- ✅ Mandatory incident documentation (inst_013)
- ✅ Immediate content audit across all pages
- ✅ 3 new governance rules created (inst_016, inst_017, inst_018)
- ✅ Public transparency requirement (this case study)
Framework failure (what didn't work):
- ❌ ProhibitedTermsScanner didn't exist yet (created post-incident)
- ❌ No automated content verification before deployment
- ❌ Values boundary detection missed "fabrication" as values issue
Key lesson: The framework doesn't prevent failures. It provides:
- Structure for detection (mandatory review processes)
- Accountability (document and publish failures)
- Systematic learning (convert failures into new governance rules)
This is architectural honesty, not architectural perfection.
🏗️ Installation & Usage
Prerequisites
- Node.js 18+
- MongoDB 7.0+
- npm or yarn
Quick Start
# Clone repository
git clone https://github.com/AgenticGovernance/tractatus-framework.git
cd tractatus-framework
# Install dependencies
npm install
# Set up environment
cp .env.example .env
# Edit .env with your MongoDB connection string
# Initialize database
npm run init:db
# Run tests
npm test
# Start development server
npm run dev
Integration Example
const {
InstructionPersistenceClassifier,
CrossReferenceValidator,
BoundaryEnforcer
} = require('@tractatus/framework');
// Initialize services
const classifier = new InstructionPersistenceClassifier();
const validator = new CrossReferenceValidator();
const enforcer = new BoundaryEnforcer();
// Your application logic
async function processUserInstruction(instruction) {
// 1. Classify persistence
const classification = classifier.classify({
text: instruction.text,
source: instruction.source
});
// 2. Store if high persistence
if (classification.persistence === 'HIGH') {
await instructionDB.store(classification);
}
// 3. Validate actions against stored instructions
const validation = await validator.validate({
action: proposedAction,
instructionHistory: await instructionDB.getActive()
});
if (validation.status === 'REJECTED') {
throw new Error(`Action blocked: ${validation.reason}`);
}
// 4. Check values boundaries
const boundaryCheck = enforcer.checkBoundary({
decision: proposedAction.description,
domains: proposedAction.affectedDomains
});
if (boundaryCheck.requiresHumanJudgment) {
return await requestHumanDecision(boundaryCheck);
}
// Proceed with action
return executeAction(proposedAction);
}
🧪 Testing
# Run all tests
npm test
# Run specific suites
npm run test:unit # Unit tests for individual services
npm run test:integration # Integration tests across services
npm run test:governance # Governance rule compliance tests
# Watch mode for development
npm run test:watch
# Generate coverage report
npm run test:coverage
Current Test Status:
- ✅ 625 passing tests - Core functionality verified
- ❌ 108 failing tests - Known issues under investigation
- ⏭️ 9 skipped tests - Pending implementation or requiring manual setup
The failing tests primarily involve:
- Integration edge cases with MongoDB connection handling
- Values boundary detection precision
- Context pressure threshold calibration
We maintain high transparency about test status because architectural honesty is more valuable than claiming perfection.
📖 Documentation & Resources
For Researchers
- Theoretical Foundations - Philosophy and research context
- Case Studies - Real failure modes and responses
- Research Challenges - Open problems and current hypotheses
For Implementers
- API Reference - Complete technical documentation
- Integration Guide - Implementation patterns
- Architecture Overview - System design decisions
Interactive Demos
- 27027 Incident - Training pattern override
- Context Degradation - Session quality tracking
🤝 Contributing
We welcome contributions that advance the research:
Research Contributions
- Empirical studies of framework effectiveness
- Formal verification of safety properties
- Extensions to new domains or applications
- Replication studies with different LLMs
Implementation Contributions
- Bug fixes and test improvements
- Performance optimizations
- Ports to other languages (Python, Rust, Go, TypeScript)
- Integration with other frameworks
Documentation Contributions
- Case studies from your own deployments
- Tutorials and integration guides
- Translations of documentation
- Critical analyses of framework limitations
See CONTRIBUTING.md for detailed guidelines.
Research collaborations: For formal collaboration on empirical studies or theoretical extensions, contact research@agenticgovernance.digital
📊 Project Roadmap
Current Phase: Alpha Research (October 2025)
Status:
- ✅ Core services implemented and operational
- ✅ Tested across 349 development commits
- ✅ 52 governance rules validated through real usage
- ⚠️ Test suite stabilization needed (108 failures)
- ⚠️ Empirical validation studies not yet conducted
Immediate priorities:
- Resolve known test failures
- Conduct rigorous empirical effectiveness study
- Document systematic replication protocol
- Expand testing beyond self-development context
Next Phase: Beta Research (Q1 2026)
Goals:
- Multi-project deployment studies
- Cross-LLM compatibility testing
- Community case study collection
- Formal verification research partnerships
Future Research Directions
Not promises, but research questions:
- Can we build provably safe boundaries for specific decision types?
- Does the framework generalize beyond software development?
- What is the optimal governance rule count for different application domains?
- Can we develop formal methods for automated rule consolidation?
📜 License & Attribution
License
Copyright 2025 John Stroh
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at:
http://www.apache.org/licenses/LICENSE-2.0
See LICENSE for full terms.
Development Attribution
This framework represents collaborative human-AI development:
Human (John Stroh):
- Conceptual design and governance architecture
- Research questions and theoretical grounding
- Quality oversight and final decisions
- Legal copyright holder
AI (Claude, Anthropic):
- Implementation and code generation
- Documentation drafting
- Iterative refinement and debugging
- Test suite development
Testing Context:
- 349 commits over 6 months
- Self-development (dogfooding) in Claude Code sessions
- Real-world failure modes and responses documented
This attribution reflects honest acknowledgment of AI's substantial role in implementation while maintaining clear legal responsibility and conceptual ownership.
🙏 Acknowledgments
Theoretical Foundations
- Ludwig Wittgenstein - Tractatus Logico-Philosophicus (limits of systematization)
- Isaiah Berlin - Value pluralism and incommensurability
- Ruth Chang - Hard choices and incomparability theory
- James March & Herbert Simon - Organizational decision-making frameworks
Technical Foundations
- Anthropic - Claude AI system (implementation partner and research subject)
- MongoDB - Persistence layer for governance rules
- Node.js/Express - Runtime environment
- Open Source Community - Countless tools, libraries, and collaborative practices
📖 Philosophy
"Whereof one cannot speak, thereof one must be silent." — Ludwig Wittgenstein, Tractatus Logico-Philosophicus
Applied to AI safety:
"Whereof the AI cannot safely decide, thereof it must request human judgment."
Some decisions cannot be systematized without imposing contestable value judgments. Rather than pretend AI can make these decisions "correctly," we explore architectures that structurally defer to human deliberation when values frameworks conflict.
This isn't a limitation of the technology. It's recognition of the structure of human values.
Not all problems have technical solutions. Some require architectural humility.
🌐 Links
- Website: agenticgovernance.digital
- Documentation: agenticgovernance.digital/docs
- Research: agenticgovernance.digital/research
- GitHub: AgenticGovernance/tractatus-framework
📧 Contact
- Email: research@agenticgovernance.digital
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Tractatus Framework | Architectural AI Safety Research | Apache 2.0 License
Last updated: 2025-10-21