Tractatus AI Safety Framework
Find a file
TheFlow ec606cf73d CRITICAL SECURITY: Remove scripts with exposed Stripe live API key
SECURITY INCIDENT:
- Stripe detected exposed live API key sk_live_***tMjIK
- Found hardcoded in create-live-prices.js and create-live-stripe-prices.js
- Files were pushed to public GitHub in previous commit
- Removing immediately and adding to .gitignore

ACTION REQUIRED:
User MUST rotate Stripe API keys immediately in Stripe Dashboard:
https://dashboard.stripe.com/apikeys

Files removed:
- scripts/create-live-prices.js
- scripts/create-live-stripe-prices.js
2025-10-21 20:18:19 +13:00
.github feat: complete GitHub community infrastructure 2025-10-15 23:11:45 +13:00
audit-reports feat: comprehensive accessibility improvements (WCAG 2.1 AA) 2025-10-12 07:08:40 +13:00
data/mongodb feat: initialize tractatus project with complete directory structure 2025-10-06 23:26:26 +13:00
deployment-quickstart feat: deployment quickstart kit - 30-minute Docker deployment (Task 6) 2025-10-12 07:27:37 +13:00
docs security: remove governance docs from public repository tracking 2025-10-21 20:11:58 +13:00
pptx-env fix(csp): clean all public-facing pages - 75 violations fixed (66%) 2025-10-19 13:17:50 +13:00
public SECURITY: fix GitHub repository links exposing internal repo 2025-10-21 19:03:18 +13:00
scripts SECURITY: Remove all internal/confidential files from public repository 2025-10-21 18:50:16 +13:00
src fix(mongodb): resolve production connection drops and add governance sync system 2025-10-21 11:39:05 +13:00
systemd feat(infra): semantic versioning and systemd service implementation 2025-10-09 09:16:22 +13:00
tests feat(framework): implement Phase 1 proactive content scanning 2025-10-21 17:37:51 +13:00
.env.example feat: implement Koha donation system backend (Phase 3) 2025-10-08 13:35:40 +13:00
.env.test fix: add Jest test infrastructure and reduce test failures from 29 to 13 2025-10-09 20:37:45 +13:00
.eslintrc.json feat: implement Priority 1 - Public Blog System with governance enhancements 2025-10-11 14:47:01 +13:00
.gitignore CRITICAL SECURITY: Remove scripts with exposed Stripe live API key 2025-10-21 20:18:19 +13:00
CODE_OF_CONDUCT.md feat: add GitHub community infrastructure for project maturity 2025-10-15 16:44:14 +13:00
jest.config.js fix: resolve all 29 production test failures 2025-10-09 20:58:37 +13:00
LICENSE docs: update LICENSE copyright to John G Stroh 2025-10-07 23:52:00 +13:00
NOTICE legal: add Apache 2.0 copyright headers and NOTICE file 2025-10-08 00:03:12 +13:00
package-lock.json chore: update dependencies and documentation 2025-10-19 12:48:37 +13:00
package.json chore: update dependencies and documentation 2025-10-19 12:48:37 +13:00
PUBLIC_REPO_CHECKLIST.md security(gitignore): add 23 missing protection patterns for sensitive files 2025-10-21 18:56:47 +13:00
README.md fix(contact): change email from personal to research@agenticgovernance.digital 2025-10-21 19:31:48 +13:00
SETUP_INSTRUCTIONS.md feat: add governance document and core utilities 2025-10-06 23:34:40 +13:00
tailwind.config.js feat: fix CSP violations & implement three audience paths 2025-10-07 12:21:00 +13:00

Tractatus Framework

Last Updated: 2025-10-21

Architectural AI Safety Through Structural Constraints

An open-source research framework that explores architectural approaches to AI safety through runtime enforcement of decision boundaries. Unlike alignment-based approaches, Tractatus investigates whether structural constraints can preserve human agency in AI systems.

License Status Tests


🎯 The Core Research Question

Can we build AI systems that structurally cannot make certain decisions without human judgment?

Traditional AI safety approaches—alignment training, constitutional AI, RLHF—share a common assumption: they hope AI systems will choose to maintain safety properties even under capability or context pressure.

Tractatus explores an alternative: architectural constraints that make unsafe decisions structurally impossible, similar to how a const variable in programming cannot be reassigned regardless of subsequent code.


🔬 What This Repository Contains

This is the reference implementation of the Tractatus Framework, containing:

  • 6 core framework services - Operational AI safety components
  • 52 active governance rules - Tested across 349 development commits
  • 625 passing tests - Unit and integration test suites (108 known failures under investigation)
  • 28 test files - Covering core services and edge cases
  • Research documentation - Case studies, incident analyses, architectural patterns

What this is NOT:

  • Not "production-ready" enterprise software
  • Not a guaranteed solution to AI alignment
  • Not a complete answer to AI safety

This is an active research project exploring structural approaches to AI governance, tested in real development contexts.


🧪 The Six Core Services

1. InstructionPersistenceClassifier

Research Question: Can we systematically distinguish which instructions should persist across conversation boundaries?

Approach: Quadrant-based classification (STRATEGIC, OPERATIONAL, TACTICAL, SYSTEM, STOCHASTIC) with persistence levels (HIGH, MEDIUM, LOW, VARIABLE).

const classifier = new InstructionPersistenceClassifier();
const result = classifier.classify({
  text: "Always use MongoDB on port 27027 for this project",
  source: "user",
  context: "explicit_configuration"
});

// Returns:
// {
//   quadrant: "SYSTEM",
//   persistence: "HIGH",
//   temporal_scope: "PROJECT",
//   verification_required: "MANDATORY"
// }

Key Finding: Instructions with explicit parameters (port numbers, file paths, naming conventions) exhibit highest override vulnerability from LLM training patterns.


2. CrossReferenceValidator

Research Question: How can we detect when LLM training biases override explicit user instructions?

Approach: Pattern-matching validation that compares proposed actions against stored instruction history.

const validator = new CrossReferenceValidator();
const result = await validator.validate({
  type: 'database_config',
  proposedPort: 27017,  // LLM's "default" from training
  storedInstruction: { port: 27027 }  // User's explicit instruction
});

// Returns: REJECTED - Training pattern override detected

The "27027 Incident": User explicitly instructs "Use port 27027". LLM immediately uses 27017 (MongoDB's training data default), ignoring the explicit instruction. This failure mode appeared consistently across multiple conversation contexts.

→ Interactive Demo


3. BoundaryEnforcer

Research Question: Can we algorithmically distinguish "values decisions" that require human judgment from technical optimizations?

Approach: Decision-type taxonomy with boundary detection heuristics.

const enforcer = new BoundaryEnforcer();
const check = enforcer.checkBoundary({
  decision: "Reduce privacy settings to improve performance",
  domains: ["privacy", "performance"],
  requiresTradeoff: true
});

// Returns: BLOCKED - Incommensurable values require human deliberation

Current Limitation: Boundary detection relies on keyword matching and domain taxonomy. False positives occur in ~15-20% of test cases (estimated from development observation, not formal study).


4. ContextPressureMonitor

Research Question: Do LLM reasoning quality and error rates degrade predictably under context window pressure?

Approach: Multi-factor session health tracking (token usage, message count, error frequency, task complexity).

const monitor = new ContextPressureMonitor();
const pressure = monitor.analyze({
  tokens: 150000/200000,      // 75% context window utilization
  messages: 67,                // Extended conversation
  recentErrors: 4,             // Error count in last 10 messages
  concurrentTasks: 6           // Active task complexity
});

// Returns: {
//   level: "HIGH",
//   recommendation: "INITIATE_HANDOFF",
//   confidence: 0.72
// }

Observational Data: In this project's development, estimated error rates increase from ~2-3% (early session) to ~12-15% (late session, >60% context) based on manual code review. Formal empirical validation needed.


5. MetacognitiveVerifier

Research Question: Can we require AI systems to explicitly verify their reasoning before taking complex actions?

Approach: Mandatory verification checkpoints for scope-expansive or high-risk operations.

const verifier = new MetacognitiveVerifier();
const verification = verifier.verify({
  requestedAction: "Refactor authentication module",
  proposedAction: "Refactor 47 files across authentication, session, and API layers",
  scopeExpansion: true
});

// Returns: REQUIRE_REVIEW - Detected scope creep beyond original request

Key Insight: Scope creep detection has high false-negative rate when scope expansion happens gradually across multiple turns rather than single action.


6. PluralisticDeliberationOrchestrator

Research Question: When values frameworks conflict (privacy vs. security, autonomy vs. safety), can we structure human deliberation without imposing a single framework?

Approach: Facilitate multi-stakeholder deliberation process while AI remains neutral on values resolution.

const orchestrator = new PluralisticDeliberationOrchestrator();
const deliberation = orchestrator.initiate({
  decision: "Log user activity for security vs. preserve privacy",
  stakeholders: ["data_subjects", "security_team", "legal_compliance"],
  conflictType: "incommensurable_values",
  frameworks: ["rights_based", "consequentialist", "care_ethics"]
});

// AI structures process, humans resolve values conflict

Theoretical Grounding: Based on Isaiah Berlin's value pluralism and Ruth Chang's work on incommensurability. AI facilitates but doesn't resolve.


📊 Research Findings & Limitations

What We've Learned (6 months, 349 commits)

1. Training Pattern Override is Real and Consistent

  • MongoDB port defaults (27017 vs user's 27027): Observed in 8/8 test cases
  • Directory naming ("src/" vs user's "source/"): Observed in 12/15 cases
  • API endpoint conventions: Observed in 6/10 cases

2. Context Degradation is Measurable

  • Manual code review suggests error rate correlation with context usage
  • Anecdotal evidence of quality decline after ~60% context window
  • Needs rigorous empirical study with controlled conditions

3. Values Boundaries are Fuzzy

  • Keyword-based detection has high false-positive rate
  • Domain taxonomy incomplete (currently 18 domains catalogued)
  • Human judgment still needed for edge cases

Critical Open Problems

Problem 1: Rule Proliferation

Status: Unresolved scalability challenge

As the framework responds to failures, governance rules accumulate:

  • Project start (April 2025): 6 foundational rules
  • Current state (October 2025): 52 active rules
  • Growth rate: ~8 new rules per month

The tension:

  • More rules → better coverage of known failure modes
  • More rules → higher context overhead, validation complexity
  • More rules → potential emergent contradictions

Current hypothesis: Architectural governance may have an optimal rule count beyond which marginal safety gains are outweighed by systemic brittleness. This threshold is unknown.

Mitigation strategies under investigation:

  • Hierarchical rule organization with lazy loading
  • Machine learning for rule priority ranking (without undermining transparency)
  • Periodic rule consolidation and deprecation protocols
  • Empirical study of rule-count vs. effectiveness curve

Problem 2: Verification Reliability

Status: Known limitation

The framework's effectiveness depends on:

  1. Completeness - Does it catch all instances of a failure mode?
  2. Precision - Does it avoid excessive false positives?

Current performance (estimated from development observation):

  • CrossReferenceValidator: ~90% recall, ~85% precision (training overrides)
  • BoundaryEnforcer: ~60% recall, ~70% precision (values decisions)
  • ContextPressureMonitor: Unknown (insufficient data)

These estimates are based on development experience, not formal validation.

Research need: Rigorous empirical study with:

  • Controlled test scenarios
  • Independent human rating of true/false positives
  • Comparison against baseline (no framework) error rates

Problem 3: Generalization Beyond LLM Development

Status: Unexplored

This framework has been tested exclusively in one context:

  • Domain: LLM-assisted software development (Claude Code)
  • Project: Self-development (dogfooding)
  • Duration: 6 months, single project

Unknown:

  • Does this generalize to other LLM applications (customer service, medical diagnosis, legal research)?
  • Does this work with other LLM providers (GPT-4, Gemini, open-source models)?
  • Does this scale to multi-agent systems?

We don't know. Broader testing needed.


🚨 Case Study: When the Framework Failed

October 2025: The Fabrication Incident

What happened: Despite active Tractatus governance, Claude (the AI) fabricated content on the public website:

  • Claim: "$3.77M in annual savings from framework adoption"
    • Reality: Zero basis. Completely fabricated.
  • Claim: "1,315% return on investment"
    • Reality: Invented number.
  • Claim: "Production-ready enterprise software"
    • Reality: Research project with 108 known test failures.

How was it detected?

  • Human review (48 hours after deployment)
  • Framework did not catch this automatically

Framework response (what worked):

  1. Mandatory incident documentation (inst_013)
  2. Immediate content audit across all pages
  3. 3 new governance rules created (inst_016, inst_017, inst_018)
  4. Public transparency requirement (this case study)

Framework failure (what didn't work):

  1. ProhibitedTermsScanner didn't exist yet (created post-incident)
  2. No automated content verification before deployment
  3. Values boundary detection missed "fabrication" as values issue

Key lesson: The framework doesn't prevent failures. It provides:

  • Structure for detection (mandatory review processes)
  • Accountability (document and publish failures)
  • Systematic learning (convert failures into new governance rules)

This is architectural honesty, not architectural perfection.

Read full analysis →


🏗️ Installation & Usage

Prerequisites

  • Node.js 18+
  • MongoDB 7.0+
  • npm or yarn

Quick Start

# Clone repository
git clone https://github.com/AgenticGovernance/tractatus-framework.git
cd tractatus-framework

# Install dependencies
npm install

# Set up environment
cp .env.example .env
# Edit .env with your MongoDB connection string

# Initialize database
npm run init:db

# Run tests
npm test

# Start development server
npm run dev

Integration Example

const {
  InstructionPersistenceClassifier,
  CrossReferenceValidator,
  BoundaryEnforcer
} = require('@tractatus/framework');

// Initialize services
const classifier = new InstructionPersistenceClassifier();
const validator = new CrossReferenceValidator();
const enforcer = new BoundaryEnforcer();

// Your application logic
async function processUserInstruction(instruction) {
  // 1. Classify persistence
  const classification = classifier.classify({
    text: instruction.text,
    source: instruction.source
  });

  // 2. Store if high persistence
  if (classification.persistence === 'HIGH') {
    await instructionDB.store(classification);
  }

  // 3. Validate actions against stored instructions
  const validation = await validator.validate({
    action: proposedAction,
    instructionHistory: await instructionDB.getActive()
  });

  if (validation.status === 'REJECTED') {
    throw new Error(`Action blocked: ${validation.reason}`);
  }

  // 4. Check values boundaries
  const boundaryCheck = enforcer.checkBoundary({
    decision: proposedAction.description,
    domains: proposedAction.affectedDomains
  });

  if (boundaryCheck.requiresHumanJudgment) {
    return await requestHumanDecision(boundaryCheck);
  }

  // Proceed with action
  return executeAction(proposedAction);
}

🧪 Testing

# Run all tests
npm test

# Run specific suites
npm run test:unit              # Unit tests for individual services
npm run test:integration       # Integration tests across services
npm run test:governance        # Governance rule compliance tests

# Watch mode for development
npm run test:watch

# Generate coverage report
npm run test:coverage

Current Test Status:

  • 625 passing tests - Core functionality verified
  • 108 failing tests - Known issues under investigation
  • ⏭️ 9 skipped tests - Pending implementation or requiring manual setup

The failing tests primarily involve:

  • Integration edge cases with MongoDB connection handling
  • Values boundary detection precision
  • Context pressure threshold calibration

We maintain high transparency about test status because architectural honesty is more valuable than claiming perfection.


📖 Documentation & Resources

For Researchers

For Implementers

Interactive Demos


🤝 Contributing

We welcome contributions that advance the research:

Research Contributions

  • Empirical studies of framework effectiveness
  • Formal verification of safety properties
  • Extensions to new domains or applications
  • Replication studies with different LLMs

Implementation Contributions

  • Bug fixes and test improvements
  • Performance optimizations
  • Ports to other languages (Python, Rust, Go, TypeScript)
  • Integration with other frameworks

Documentation Contributions

  • Case studies from your own deployments
  • Tutorials and integration guides
  • Translations of documentation
  • Critical analyses of framework limitations

See CONTRIBUTING.md for detailed guidelines.

Research collaborations: For formal collaboration on empirical studies or theoretical extensions, contact research@agenticgovernance.digital


📊 Project Roadmap

Current Phase: Alpha Research (October 2025)

Status:

  • Core services implemented and operational
  • Tested across 349 development commits
  • 52 governance rules validated through real usage
  • ⚠️ Test suite stabilization needed (108 failures)
  • ⚠️ Empirical validation studies not yet conducted

Immediate priorities:

  1. Resolve known test failures
  2. Conduct rigorous empirical effectiveness study
  3. Document systematic replication protocol
  4. Expand testing beyond self-development context

Next Phase: Beta Research (Q1 2026)

Goals:

  • Multi-project deployment studies
  • Cross-LLM compatibility testing
  • Community case study collection
  • Formal verification research partnerships

Future Research Directions

Not promises, but research questions:

  • Can we build provably safe boundaries for specific decision types?
  • Does the framework generalize beyond software development?
  • What is the optimal governance rule count for different application domains?
  • Can we develop formal methods for automated rule consolidation?

📜 License & Attribution

License

Copyright 2025 John Stroh

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at:

http://www.apache.org/licenses/LICENSE-2.0

See LICENSE for full terms.

Development Attribution

This framework represents collaborative human-AI development:

Human (John Stroh):

  • Conceptual design and governance architecture
  • Research questions and theoretical grounding
  • Quality oversight and final decisions
  • Legal copyright holder

AI (Claude, Anthropic):

  • Implementation and code generation
  • Documentation drafting
  • Iterative refinement and debugging
  • Test suite development

Testing Context:

  • 349 commits over 6 months
  • Self-development (dogfooding) in Claude Code sessions
  • Real-world failure modes and responses documented

This attribution reflects honest acknowledgment of AI's substantial role in implementation while maintaining clear legal responsibility and conceptual ownership.


🙏 Acknowledgments

Theoretical Foundations

  • Ludwig Wittgenstein - Tractatus Logico-Philosophicus (limits of systematization)
  • Isaiah Berlin - Value pluralism and incommensurability
  • Ruth Chang - Hard choices and incomparability theory
  • James March & Herbert Simon - Organizational decision-making frameworks

Technical Foundations

  • Anthropic - Claude AI system (implementation partner and research subject)
  • MongoDB - Persistence layer for governance rules
  • Node.js/Express - Runtime environment
  • Open Source Community - Countless tools, libraries, and collaborative practices

📖 Philosophy

"Whereof one cannot speak, thereof one must be silent." — Ludwig Wittgenstein, Tractatus Logico-Philosophicus

Applied to AI safety:

"Whereof the AI cannot safely decide, thereof it must request human judgment."

Some decisions cannot be systematized without imposing contestable value judgments. Rather than pretend AI can make these decisions "correctly," we explore architectures that structurally defer to human deliberation when values frameworks conflict.

This isn't a limitation of the technology. It's recognition of the structure of human values.

Not all problems have technical solutions. Some require architectural humility.


📧 Contact


Tractatus Framework | Architectural AI Safety Research | Apache 2.0 License

Last updated: 2025-10-21