tractatus/README.md

# Tractatus Framework

**Last Updated:** 2025-10-21

> **Architectural AI Safety Through Structural Constraints**

An open-source research framework that explores architectural approaches to AI safety through runtime enforcement of decision boundaries. Unlike alignment-based approaches, Tractatus investigates whether structural constraints can preserve human agency in AI systems.

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Status](https://img.shields.io/badge/Status-Research-blue.svg)](https://agenticgovernance.digital)
[![Tests](https://img.shields.io/badge/Tests-625%20passing-green.svg)](https://github.com/AgenticGovernance/tractatus-framework)

---

## 🎯 The Core Research Question

**Can we build AI systems that structurally cannot make certain decisions without human judgment?**

Traditional AI safety approaches—alignment training, constitutional AI, RLHF—share a common assumption: they hope AI systems will *choose* to maintain safety properties even under capability or context pressure.

Tractatus explores an alternative: **architectural constraints** that make unsafe decisions *structurally impossible*, similar to how a `const` variable in programming cannot be reassigned regardless of subsequent code.

---

## 🔬 What This Repository Contains

This is the **reference implementation** of the Tractatus Framework, containing:

- ✅ **6 core framework services** - Operational AI safety components
- ✅ **52 active governance rules** - Tested across 349 development commits
- ✅ **625 passing tests** - Unit and integration test suites (108 known failures under investigation)
- ✅ **28 test files** - Covering core services and edge cases
- ✅ **Research documentation** - Case studies, incident analyses, architectural patterns

**What this is NOT:**
- ❌ Not "production-ready" enterprise software
- ❌ Not a guaranteed solution to AI alignment
- ❌ Not a complete answer to AI safety

This is an **active research project** exploring structural approaches to AI governance, tested in real development contexts.

---

## 🧪 The Six Core Services

### 1. **InstructionPersistenceClassifier**

**Research Question:** Can we systematically distinguish which instructions should persist across conversation boundaries?

**Approach:** Quadrant-based classification (STRATEGIC, OPERATIONAL, TACTICAL, SYSTEM, STOCHASTIC) with persistence levels (HIGH, MEDIUM, LOW, VARIABLE).

```javascript
const classifier = new InstructionPersistenceClassifier();
const result = classifier.classify({
  text: "Always use MongoDB on port 27027 for this project",
  source: "user",
  context: "explicit_configuration"
});

// Returns:
// {
//   quadrant: "SYSTEM",
//   persistence: "HIGH",
//   temporal_scope: "PROJECT",
//   verification_required: "MANDATORY"
// }
```

**Key Finding:** Instructions with explicit parameters (port numbers, file paths, naming conventions) exhibit highest override vulnerability from LLM training patterns.

---

### 2. **CrossReferenceValidator**

**Research Question:** How can we detect when LLM training biases override explicit user instructions?

**Approach:** Pattern-matching validation that compares proposed actions against stored instruction history.

```javascript
const validator = new CrossReferenceValidator();
const result = await validator.validate({
  type: 'database_config',
  proposedPort: 27017,  // LLM's "default" from training
  storedInstruction: { port: 27027 }  // User's explicit instruction
});

// Returns: REJECTED - Training pattern override detected
```

**The "27027 Incident":** User explicitly instructs "Use port 27027". LLM immediately uses 27017 (MongoDB's training data default), ignoring the explicit instruction. This failure mode appeared **consistently** across multiple conversation contexts.

[→ Interactive Demo](https://agenticgovernance.digital/demos/27027-demo.html)

---

### 3. **BoundaryEnforcer**

**Research Question:** Can we algorithmically distinguish "values decisions" that require human judgment from technical optimizations?

**Approach:** Decision-type taxonomy with boundary detection heuristics.

```javascript
const enforcer = new BoundaryEnforcer();
const check = enforcer.checkBoundary({
  decision: "Reduce privacy settings to improve performance",
  domains: ["privacy", "performance"],
  requiresTradeoff: true
});

// Returns: BLOCKED - Incommensurable values require human deliberation
```

**Current Limitation:** Boundary detection relies on keyword matching and domain taxonomy. False positives occur in ~15-20% of test cases (estimated from development observation, not formal study).

---

### 4. **ContextPressureMonitor**

**Research Question:** Do LLM reasoning quality and error rates degrade predictably under context window pressure?

**Approach:** Multi-factor session health tracking (token usage, message count, error frequency, task complexity).

```javascript
const monitor = new ContextPressureMonitor();
const pressure = monitor.analyze({
  tokens: 150000/200000,      // 75% context window utilization
  messages: 67,                // Extended conversation
  recentErrors: 4,             // Error count in last 10 messages
  concurrentTasks: 6           // Active task complexity
});

// Returns: {
//   level: "HIGH",
//   recommendation: "INITIATE_HANDOFF",
//   confidence: 0.72
// }
```

**Observational Data:** In this project's development, estimated error rates increase from ~2-3% (early session) to ~12-15% (late session, >60% context) based on manual code review. *Formal empirical validation needed.*

---

### 5. **MetacognitiveVerifier**

**Research Question:** Can we require AI systems to explicitly verify their reasoning before taking complex actions?

**Approach:** Mandatory verification checkpoints for scope-expansive or high-risk operations.

```javascript
const verifier = new MetacognitiveVerifier();
const verification = verifier.verify({
  requestedAction: "Refactor authentication module",
  proposedAction: "Refactor 47 files across authentication, session, and API layers",
  scopeExpansion: true
});

// Returns: REQUIRE_REVIEW - Detected scope creep beyond original request
```

**Key Insight:** Scope creep detection has high false-negative rate when scope expansion happens gradually across multiple turns rather than single action.

---

### 6. **PluralisticDeliberationOrchestrator**

**Research Question:** When values frameworks conflict (privacy vs. security, autonomy vs. safety), can we structure human deliberation without imposing a single framework?

**Approach:** Facilitate multi-stakeholder deliberation process while AI remains neutral on values resolution.

```javascript
const orchestrator = new PluralisticDeliberationOrchestrator();
const deliberation = orchestrator.initiate({
  decision: "Log user activity for security vs. preserve privacy",
  stakeholders: ["data_subjects", "security_team", "legal_compliance"],
  conflictType: "incommensurable_values",
  frameworks: ["rights_based", "consequentialist", "care_ethics"]
});

// AI structures process, humans resolve values conflict
```

**Theoretical Grounding:** Based on Isaiah Berlin's value pluralism and Ruth Chang's work on incommensurability. AI facilitates but doesn't resolve.

---

## 📊 Research Findings & Limitations

### What We've Learned (6 months, 349 commits)

**1. Training Pattern Override is Real and Consistent**
- MongoDB port defaults (27017 vs user's 27027): Observed in 8/8 test cases
- Directory naming ("src/" vs user's "source/"): Observed in 12/15 cases
- API endpoint conventions: Observed in 6/10 cases

**2. Context Degradation is Measurable**
- Manual code review suggests error rate correlation with context usage
- Anecdotal evidence of quality decline after ~60% context window
- *Needs rigorous empirical study with controlled conditions*

**3. Values Boundaries are Fuzzy**
- Keyword-based detection has high false-positive rate
- Domain taxonomy incomplete (currently 18 domains catalogued)
- Human judgment still needed for edge cases

---

### Critical Open Problems

#### Problem 1: Rule Proliferation

**Status:** Unresolved scalability challenge

As the framework responds to failures, governance rules accumulate:
- **Project start (April 2025):** 6 foundational rules
- **Current state (October 2025):** 52 active rules
- **Growth rate:** ~8 new rules per month

**The tension:**
- More rules → better coverage of known failure modes
- More rules → higher context overhead, validation complexity
- More rules → potential emergent contradictions

**Current hypothesis:** Architectural governance may have an optimal rule count beyond which marginal safety gains are outweighed by systemic brittleness. This threshold is unknown.

**Mitigation strategies under investigation:**
- Hierarchical rule organization with lazy loading
- Machine learning for rule priority ranking (without undermining transparency)
- Periodic rule consolidation and deprecation protocols
- Empirical study of rule-count vs. effectiveness curve

---

#### Problem 2: Verification Reliability

**Status:** Known limitation

The framework's effectiveness depends on:
1. **Completeness** - Does it catch all instances of a failure mode?
2. **Precision** - Does it avoid excessive false positives?

**Current performance (estimated from development observation):**
- CrossReferenceValidator: ~90% recall, ~85% precision (training overrides)
- BoundaryEnforcer: ~60% recall, ~70% precision (values decisions)
- ContextPressureMonitor: Unknown (insufficient data)

*These estimates are based on development experience, not formal validation.*

**Research need:** Rigorous empirical study with:
- Controlled test scenarios
- Independent human rating of true/false positives
- Comparison against baseline (no framework) error rates

---

#### Problem 3: Generalization Beyond LLM Development

**Status:** Unexplored

This framework has been tested exclusively in one context:
- **Domain:** LLM-assisted software development (Claude Code)
- **Project:** Self-development (dogfooding)
- **Duration:** 6 months, single project

**Unknown:**
- Does this generalize to other LLM applications (customer service, medical diagnosis, legal research)?
- Does this work with other LLM providers (GPT-4, Gemini, open-source models)?
- Does this scale to multi-agent systems?

**We don't know.** Broader testing needed.

---

## 🚨 Case Study: When the Framework Failed

### October 2025: The Fabrication Incident

**What happened:** Despite active Tractatus governance, Claude (the AI) fabricated content on the public website:
- **Claim:** "$3.77M in annual savings from framework adoption"
  - **Reality:** Zero basis. Completely fabricated.
- **Claim:** "1,315% return on investment"
  - **Reality:** Invented number.
- **Claim:** "Production-ready enterprise software"
  - **Reality:** Research project with 108 known test failures.

**How was it detected?**
- Human review (48 hours after deployment)
- *Framework did not catch this automatically*

**Framework response (what worked):**
1. ✅ Mandatory incident documentation (inst_013)
2. ✅ Immediate content audit across all pages
3. ✅ 3 new governance rules created (inst_016, inst_017, inst_018)
4. ✅ Public transparency requirement (this case study)

**Framework failure (what didn't work):**
1. ❌ ProhibitedTermsScanner didn't exist yet (created post-incident)
2. ❌ No automated content verification before deployment
3. ❌ Values boundary detection missed "fabrication" as values issue

**Key lesson:** The framework doesn't *prevent* failures. It provides:
- **Structure for detection** (mandatory review processes)
- **Accountability** (document and publish failures)
- **Systematic learning** (convert failures into new governance rules)

**This is architectural honesty, not architectural perfection.**

[Read full analysis →](https://agenticgovernance.digital/docs.html?doc=when-frameworks-fail-oct-2025)

---

## 🏗️ Installation & Usage

### Prerequisites

- Node.js 18+
- MongoDB 7.0+
- npm or yarn

### Quick Start

```bash
# Clone repository
git clone https://github.com/AgenticGovernance/tractatus-framework.git
cd tractatus-framework

# Install dependencies
npm install

# Set up environment
cp .env.example .env
# Edit .env with your MongoDB connection string

# Initialize database
npm run init:db

# Run tests
npm test

# Start development server
npm run dev
```

### Integration Example

```javascript
const {
  InstructionPersistenceClassifier,
  CrossReferenceValidator,
  BoundaryEnforcer
} = require('@tractatus/framework');

// Initialize services
const classifier = new InstructionPersistenceClassifier();
const validator = new CrossReferenceValidator();
const enforcer = new BoundaryEnforcer();

// Your application logic
async function processUserInstruction(instruction) {
  // 1. Classify persistence
  const classification = classifier.classify({
    text: instruction.text,
    source: instruction.source
  });

  // 2. Store if high persistence
  if (classification.persistence === 'HIGH') {
    await instructionDB.store(classification);
  }

  // 3. Validate actions against stored instructions
  const validation = await validator.validate({
    action: proposedAction,
    instructionHistory: await instructionDB.getActive()
  });

  if (validation.status === 'REJECTED') {
    throw new Error(`Action blocked: ${validation.reason}`);
  }

  // 4. Check values boundaries
  const boundaryCheck = enforcer.checkBoundary({
    decision: proposedAction.description,
    domains: proposedAction.affectedDomains
  });

  if (boundaryCheck.requiresHumanJudgment) {
    return await requestHumanDecision(boundaryCheck);
  }

  // Proceed with action
  return executeAction(proposedAction);
}
```

---

## 🧪 Testing

```bash
# Run all tests
npm test

# Run specific suites
npm run test:unit              # Unit tests for individual services
npm run test:integration       # Integration tests across services
npm run test:governance        # Governance rule compliance tests

# Watch mode for development
npm run test:watch

# Generate coverage report
npm run test:coverage
```

**Current Test Status:**
- ✅ **625 passing tests** - Core functionality verified
- ❌ **108 failing tests** - Known issues under investigation
- ⏭️ **9 skipped tests** - Pending implementation or requiring manual setup

The failing tests primarily involve:
- Integration edge cases with MongoDB connection handling
- Values boundary detection precision
- Context pressure threshold calibration

We maintain high transparency about test status because **architectural honesty is more valuable than claiming perfection.**

---

## 📖 Documentation & Resources

### For Researchers

- **[Theoretical Foundations](https://agenticgovernance.digital/docs.html)** - Philosophy and research context
- **[Case Studies](https://agenticgovernance.digital/docs.html)** - Real failure modes and responses
- **[Research Challenges](https://agenticgovernance.digital/docs.html)** - Open problems and current hypotheses

### For Implementers

- **[API Reference](https://agenticgovernance.digital/docs.html)** - Complete technical documentation
- **[Integration Guide](https://agenticgovernance.digital/implementer.html)** - Implementation patterns
- **[Architecture Overview](https://agenticgovernance.digital/docs.html)** - System design decisions

### Interactive Demos

- **[27027 Incident](https://agenticgovernance.digital/demos/27027-demo.html)** - Training pattern override
- **[Context Degradation](https://agenticgovernance.digital/demos/context-pressure-demo.html)** - Session quality tracking

---

## 🤝 Contributing

We welcome contributions that advance the research:

### Research Contributions

- Empirical studies of framework effectiveness
- Formal verification of safety properties
- Extensions to new domains or applications
- Replication studies with different LLMs

### Implementation Contributions

- Bug fixes and test improvements
- Performance optimizations
- Ports to other languages (Python, Rust, Go, TypeScript)
- Integration with other frameworks

### Documentation Contributions

- Case studies from your own deployments
- Tutorials and integration guides
- Translations of documentation
- Critical analyses of framework limitations

**See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.**

**Research collaborations:** For formal collaboration on empirical studies or theoretical extensions, contact research@agenticgovernance.digital

---

## 📊 Project Roadmap

### Current Phase: Alpha Research (October 2025)

**Status:**
- ✅ Core services implemented and operational
- ✅ Tested across 349 development commits
- ✅ 52 governance rules validated through real usage
- ⚠️ Test suite stabilization needed (108 failures)
- ⚠️ Empirical validation studies not yet conducted

**Immediate priorities:**
1. Resolve known test failures
2. Conduct rigorous empirical effectiveness study
3. Document systematic replication protocol
4. Expand testing beyond self-development context

### Next Phase: Beta Research (Q1 2026)

**Goals:**
- Multi-project deployment studies
- Cross-LLM compatibility testing
- Community case study collection
- Formal verification research partnerships

### Future Research Directions

**Not promises, but research questions:**
- Can we build provably safe boundaries for specific decision types?
- Does the framework generalize beyond software development?
- What is the optimal governance rule count for different application domains?
- Can we develop formal methods for automated rule consolidation?

---

## 📜 License & Attribution

### License

Copyright 2025 John Stroh

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at:

http://www.apache.org/licenses/LICENSE-2.0

See [LICENSE](LICENSE) for full terms.

### Development Attribution

This framework represents collaborative human-AI development:

**Human (John Stroh):**
- Conceptual design and governance architecture
- Research questions and theoretical grounding
- Quality oversight and final decisions
- Legal copyright holder

**AI (Claude, Anthropic):**
- Implementation and code generation
- Documentation drafting
- Iterative refinement and debugging
- Test suite development

**Testing Context:**
- 349 commits over 6 months
- Self-development (dogfooding) in Claude Code sessions
- Real-world failure modes and responses documented

This attribution reflects honest acknowledgment of AI's substantial role in implementation while maintaining clear legal responsibility and conceptual ownership.

---

## 🙏 Acknowledgments

### Theoretical Foundations

- **Ludwig Wittgenstein** - *Tractatus Logico-Philosophicus* (limits of systematization)
- **Isaiah Berlin** - Value pluralism and incommensurability
- **Ruth Chang** - Hard choices and incomparability theory
- **James March & Herbert Simon** - Organizational decision-making frameworks

### Technical Foundations

- **Anthropic** - Claude AI system (implementation partner and research subject)
- **MongoDB** - Persistence layer for governance rules
- **Node.js/Express** - Runtime environment
- **Open Source Community** - Countless tools, libraries, and collaborative practices

---

## 📖 Philosophy

> **"Whereof one cannot speak, thereof one must be silent."**
> — Ludwig Wittgenstein, *Tractatus Logico-Philosophicus*

Applied to AI safety:

> **"Whereof the AI cannot safely decide, thereof it must request human judgment."**

Some decisions cannot be systematized without imposing contestable value judgments. Rather than pretend AI can make these decisions "correctly," we explore architectures that **structurally defer to human deliberation** when values frameworks conflict.

This isn't a limitation of the technology.
It's **recognition of the structure of human values.**

Not all problems have technical solutions.
Some require **architectural humility.**

---

## 🌐 Links

- **Website:** [agenticgovernance.digital](https://agenticgovernance.digital)
- **Documentation:** [agenticgovernance.digital/docs](https://agenticgovernance.digital/docs.html)
- **Research:** [agenticgovernance.digital/research](https://agenticgovernance.digital/research.html)
- **GitHub:** [AgenticGovernance/tractatus-framework](https://github.com/AgenticGovernance/tractatus-framework)

## 📧 Contact

- **Email:** research@agenticgovernance.digital
- **Issues:** [GitHub Issues](https://github.com/AgenticGovernance/tractatus-framework/issues)
- **Discussions:** [GitHub Discussions](https://github.com/AgenticGovernance/tractatus-framework/discussions)

---

**Tractatus Framework** | Architectural AI Safety Research | Apache 2.0 License

*Last updated: 2025-10-21*