Tractatus AI Safety Framework

Find a file

TheFlow 2e5756a43c feat(demos): create interactive pluralistic deliberation demo SUMMARY: Completed Phase 3 Task 3.4.2 - Created comprehensive interactive demo showing how PluralisticDeliberationOrchestrator facilitates multi-stakeholder values deliberation without making autonomous normative choices. NEW DEMO: PLURALISTIC DELIBERATION Scenario: Security vulnerability discovery - should AI report it publicly, fix quietly, or coordinate disclosure? This creates values conflicts between: - Developer reputation vs. user safety - Organizational liability vs. transparency - Community norms vs. market dynamics Interactive Features: 1. Two Paths: - Autonomous Decision: Shows why AI can't/shouldn't decide values - Deliberation: Shows framework facilitation in action 2. Stakeholder Selection (Step 1): - 6 stakeholder types to choose from - Developer, End Users, Organization, Security Community, Competitors, Regulators - Each with distinct icon, color, perspective - Clickable cards with visual selection state - Requires minimum 2 stakeholders to proceed 3. Perspective Exploration (Step 2): - Dynamically shows selected stakeholders' views - Each perspective includes: * Primary concern * Full viewpoint explanation * Priority statement - Color-coded by stakeholder type - No ranking or weighting applied 4. Human Decision (Step 3): - 4 decision options provided: * Full Disclosure (transparency priority) * Private Fix (balance approach) * Coordinated Disclosure (community norms) * Defer Decision (consult more stakeholders) - Framework facilitates but doesn't decide - Human makes final choice 5. Explanation Section: - Side-by-side comparison: * What framework DOES (facilitate, surface, record) * What framework DOESN'T DO (weight, rank, decide) - Explains values pluralism principle - Reset button to try different stakeholder combinations Design Patterns: - Teal color scheme (deliberation service brand color) - Service icon in header (multi-stakeholder symbol) - Fade-in animations for smooth UX - Responsive grid layouts - Hover effects on all interactive elements - Clear visual states (selected, active, clickable) Stakeholder Perspectives (6 total): 1. Developer: Reputation & timeline concerns 2. End Users: Data safety & transparency rights 3. Organization: Liability & brand protection 4. Security Community: Responsible disclosure norms 5. Competitors: Market dynamics 6. Regulators: Compliance & user rights (GDPR) Each stakeholder has: - Unique icon and color - Specific concern area - Full perspective explanation - Priority statement Educational Value: - Demonstrates values incommensurability - Shows why AI shouldn't autonomously decide normative questions - Illustrates framework's facilitation role - Highlights human agency preservation - Explains pluralistic deliberation principle Technical Details: HTML (deliberation-demo.html): - 3-step interactive flow - Autonomous vs. deliberation path choice - Dynamic stakeholder cards - Dynamic perspective rendering - 4 decision options - Comprehensive explanation section JavaScript (deliberation-demo.js): - 6 stakeholder definitions with full data - Selection state management - Dynamic content rendering - Event handlers for all interactions - Reset functionality - Smooth scrolling between sections CSP Compliance: ✓ Zero violations ✓ No inline event handlers ✓ Event listeners properly attached ✓ Dynamic content via DOM manipulation Accessibility: - Semantic HTML structure - Clear visual states - Keyboard navigation supported - Color-coded with text labels - Responsive design maintained Impact: Completes ALL Phase 3 interactive features. Users can now: ✓ Understand how deliberation differs from decision-making ✓ Explore different stakeholder perspectives interactively ✓ Experience values pluralism firsthand ✓ See why AI autonomous normative choices are problematic This demo, combined with the enhanced 27027 incident demo, provides complete interactive validation of the Tractatus framework's two key architectural principles: 1. Pattern override prevention (27027 demo) 2. Pluralistic deliberation (this demo) 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>		2025-10-19 22:03:25 +13:00
.github	feat: complete GitHub community infrastructure	2025-10-15 23:11:45 +13:00
audit-reports	feat: comprehensive accessibility improvements (WCAG 2.1 AA)	2025-10-12 07:08:40 +13:00
data/mongodb	feat: initialize tractatus project with complete directory structure	2025-10-06 23:26:26 +13:00
deployment-quickstart	feat: deployment quickstart kit - 30-minute Docker deployment (Task 6)	2025-10-12 07:27:37 +13:00
docs	fix(csp): clean all public-facing pages - 75 violations fixed (66%)	2025-10-19 13:17:50 +13:00
For Claude Web	feat: newsletter modal and deployment script enhancements	2025-10-14 13:11:46 +13:00
governance	docs: Phase 2 kickoff materials & domain migration to agenticgovernance.digital	2025-10-07 13:17:42 +13:00
pptx-env	fix(csp): clean all public-facing pages - 75 violations fixed (66%)	2025-10-19 13:17:50 +13:00
public	feat(demos): create interactive pluralistic deliberation demo	2025-10-19 22:03:25 +13:00
scripts	refactor(data): migrate legacy public field to modern visibility field	2025-10-19 13:49:21 +13:00
src	refactor(data): migrate legacy public field to modern visibility field	2025-10-19 13:49:21 +13:00
systemd	feat(infra): semantic versioning and systemd service implementation	2025-10-09 09:16:22 +13:00
tests	docs: regenerate PDFs and update documentation metadata	2025-10-14 10:53:48 +13:00
.env.example	feat: implement Koha donation system backend (Phase 3)	2025-10-08 13:35:40 +13:00
.env.test	fix: add Jest test infrastructure and reduce test failures from 29 to 13	2025-10-09 20:37:45 +13:00
.eslintrc.json	feat: implement Priority 1 - Public Blog System with governance enhancements	2025-10-11 14:47:01 +13:00
.gitignore	feat: implement continuous framework enforcement architecture	2025-10-15 19:55:12 +13:00
.rsyncignore	security: create deployment exclusion list and safe deployment script	2025-10-09 15:47:20 +13:00
CLAUDE_WEB_BRIEF.md	docs: redraft CLAUDE_WEB_BRIEF for organizational theory positioning	2025-10-15 08:17:48 +13:00
CLAUDE_WEB_BRIEF.pdf	docs: redraft CLAUDE_WEB_BRIEF for organizational theory positioning	2025-10-15 08:17:48 +13:00
CLAUDE_WEB_KNOWLEDGE_FILES.md	feat: newsletter modal and deployment script enhancements	2025-10-14 13:11:46 +13:00
ClaudeWeb conversation transcription.md	feat: initialize tractatus project with complete directory structure	2025-10-06 23:26:26 +13:00
closedown prompt	fix(csp): clean all public-facing pages - 75 violations fixed (66%)	2025-10-19 13:17:50 +13:00
CLOSEDOWN_SUMMARY_2025-10-18.txt	fix(csp): clean all public-facing pages - 75 violations fixed (66%)	2025-10-19 13:17:50 +13:00
CODE_OF_CONDUCT.md	feat: add GitHub community infrastructure for project maturity	2025-10-15 16:44:14 +13:00
DEPLOYMENT-2025-10-08.md	feat: newsletter modal and deployment script enhancements	2025-10-14 13:11:46 +13:00
deployment-output.txt	feat: Phase 5 Memory Tool PoC - Week 1 Complete	2025-10-10 12:03:39 +13:00
EXECUTIVE_BRIEF_GOVERNANCE_EXTERNALITY.md	fix: update executive brief copyright to match LICENSE file	2025-10-15 08:16:19 +13:00
EXECUTIVE_BRIEF_GOVERNANCE_EXTERNALITY.pdf	fix: update executive brief copyright to match LICENSE file	2025-10-15 08:16:19 +13:00
jest.config.js	fix: resolve all 29 production test failures	2025-10-09 20:58:37 +13:00
KOHA_PRE_PRODUCTION_SUMMARY.md	docs: add Koha pre-production deployment quick reference	2025-10-08 21:02:04 +13:00
LICENSE	docs: update LICENSE copyright to John G Stroh	2025-10-07 23:52:00 +13:00
MEETING_NOTES_WSP_SHOSHANA.md	feat: newsletter modal and deployment script enhancements	2025-10-14 13:11:46 +13:00
migration-output.txt	feat: Phase 5 Memory Tool PoC - Week 1 Complete	2025-10-10 12:03:39 +13:00
NEW_SESSION_PROMPT.md	fix(csp): clean all public-facing pages - 75 violations fixed (66%)	2025-10-19 13:17:50 +13:00
NEW_SESSION_STARTUP_PROMPT_2025-10-18.md	fix(csp): clean all public-facing pages - 75 violations fixed (66%)	2025-10-19 13:17:50 +13:00
NEW_SESSION_STARTUP_PROMPT_2025-10-19.md	docs: session handoff for document security overhaul	2025-10-19 12:41:57 +13:00
NEXT_SESSION.md	feat: newsletter modal and deployment script enhancements	2025-10-14 13:11:46 +13:00
NEXT_SESSION_OPENING_PROMPT.md	feat: newsletter modal and deployment script enhancements	2025-10-14 13:11:46 +13:00
NEXT_SESSION_STARTUP_2025-10-13.md	feat: newsletter modal and deployment script enhancements	2025-10-14 13:11:46 +13:00
NEXT_SESSION_STARTUP_2025-10-14.md	feat: newsletter modal and deployment script enhancements	2025-10-14 13:11:46 +13:00
NEXT_SESSION_STARTUP_2025-10-14_FAQ.md	fix: add mandatory framework startup to next session prompt	2025-10-14 13:16:09 +13:00
NEXT_SESSION_STARTUP_2025-10-14_FILE_SECURITY.md	docs: add next session startup guide for file security continuation	2025-10-14 18:05:40 +13:00
NEXT_SESSION_STARTUP_2025-10-15_PRIVACY.md	fix: improve navigation on researcher.html page	2025-10-15 17:10:59 +13:00
NEXT_SESSION_STARTUP_PROMPT.md	feat: newsletter modal and deployment script enhancements	2025-10-14 13:11:46 +13:00
NOTICE	legal: add Apache 2.0 copyright headers and NOTICE file	2025-10-08 00:03:12 +13:00
old claude md file	feat(infra): semantic versioning and systemd service implementation	2025-10-09 09:16:22 +13:00
OPTIMAL_SESSION_STARTUP.md	feat: add inst_039 (background task cleanup) + optimal session startup guide	2025-10-12 22:54:10 +13:00
package-lock.json	chore: update dependencies and documentation	2025-10-19 12:48:37 +13:00
package.json	chore: update dependencies and documentation	2025-10-19 12:48:37 +13:00
PERPLEXITY_REVIEW_FILES.md	feat: complete Phase 2 - accessibility, performance, mobile polish	2025-10-08 13:29:26 +13:00
PERPLEXITY_TECHNICAL_BRIEF_FAQ_SCROLLBAR.md	security: complete Phase 0 Quick Wins implementation	2025-10-14 15:32:54 +13:00
PERPLEXITY_USER_PROMPT.txt	security: complete Phase 0 Quick Wins implementation	2025-10-14 15:32:54 +13:00
PHASE-4-PREPARATION-CHECKLIST.md	feat: Phase 5 Memory Tool PoC - Week 1 Complete	2025-10-10 12:03:39 +13:00
PITCH-DEVELOPERS.md	feat: Phase 5 Memory Tool PoC - Week 1 Complete	2025-10-10 12:03:39 +13:00
PITCH-EXECUTIVE.md	fix: content accuracy updates per inst_039	2025-10-12 23:16:17 +13:00
PITCH-GENERAL-PUBLIC.md	feat: Phase 5 Memory Tool PoC - Week 1 Complete	2025-10-10 12:03:39 +13:00
PITCH-OPERATIONS.md	feat: Phase 5 Memory Tool PoC - Week 1 Complete	2025-10-10 12:03:39 +13:00
PITCH-RESEARCHERS.md	feat: Phase 5 Memory Tool PoC - Week 1 Complete	2025-10-10 12:03:39 +13:00
PRE_APPROVED_COMMANDS.md	feat: implement continuous framework enforcement architecture	2025-10-15 19:55:12 +13:00
README.md	docs: update maintenance guide and README for 6th core service	2025-10-12 16:37:09 +13:00
SCHEDULED_TASKS.md	docs(tasks): mark footer i18n and privacy translations as complete	2025-10-19 14:58:03 +13:00
SESSION_CLOSEDOWN_20251006.md	feat: newsletter modal and deployment script enhancements	2025-10-14 13:11:46 +13:00
SESSION_HANDOFF_2025-10-13.md	feat: newsletter modal and deployment script enhancements	2025-10-14 13:11:46 +13:00
SESSION_HANDOFF_2025-10-13_ARCHITECTURE.md	feat: newsletter modal and deployment script enhancements	2025-10-14 13:11:46 +13:00
SESSION_HANDOFF_2025-10-14_FAQ_MODAL.md	security: complete Phase 0 Quick Wins implementation	2025-10-14 15:32:54 +13:00
SESSION_HANDOFF_2025-10-14_LEADER_REBUILD.md	feat: newsletter modal and deployment script enhancements	2025-10-14 13:11:46 +13:00
SESSION_HANDOFF_2025-10-14_PWA.md	feat: newsletter modal and deployment script enhancements	2025-10-14 13:11:46 +13:00
SESSION_HANDOFF_2025-10-14_ROADMAP_COPYRIGHT.md	docs: add session handoff from 2025-10-14 (roadmap + copyright fixes)	2025-10-15 16:45:57 +13:00
SESSION_HANDOFF_2025-10-14_SECURITY_COMPLETE.md	docs: session handoff - Phase 0 + ClamAV + File Security complete	2025-10-14 16:01:29 +13:00
SESSION_HANDOFF_2025-10-15_ENFORCEMENT_ARCHITECTURE.md	feat: implement bootstrapping solution with Claude Code hooks	2025-10-15 20:04:00 +13:00
SESSION_HANDOFF_2025-10-15_GITHUB_PWA.md	fix: improve navigation on researcher.html page	2025-10-15 17:10:59 +13:00
SESSION_HANDOFF_2025-10-17_IMPLEMENTATION_READY.md	fix(csp): clean all public-facing pages - 75 violations fixed (66%)	2025-10-19 13:17:50 +13:00
SESSION_HANDOFF_2025-10-17_LANGUAGE_SELECTOR.md	fix(csp): clean all public-facing pages - 75 violations fixed (66%)	2025-10-19 13:17:50 +13:00
SESSION_HANDOFF_2025-10-17_PLURALISTIC_DELIBERATION.md	fix(csp): clean all public-facing pages - 75 violations fixed (66%)	2025-10-19 13:17:50 +13:00
SESSION_HANDOFF_2025-10-18_STRIPE_CUSTOMER_PORTAL.md	fix(csp): clean all public-facing pages - 75 violations fixed (66%)	2025-10-19 13:17:50 +13:00
SESSION_HANDOFF_2025-10-19_DOCUMENT_SECURITY.md	docs: session handoff for document security overhaul	2025-10-19 12:41:57 +13:00
SETUP_INSTRUCTIONS.md	feat: add governance document and core utilities	2025-10-06 23:34:40 +13:00
tailwind.config.js	feat: fix CSP violations & implement three audience paths	2025-10-07 12:21:00 +13:00
TRACTATUS-ELEVATOR-PITCHES.md	feat: Phase 5 Memory Tool PoC - Week 1 Complete	2025-10-10 12:03:39 +13:00
Tractatus-Website-Complete-Specification-v2.0.md	feat: initialize tractatus project with complete directory structure	2025-10-06 23:26:26 +13:00
TRACTATUS_BRAND_SYSTEM.md	fix(csp): clean all public-facing pages - 75 violations fixed (66%)	2025-10-19 13:17:50 +13:00
UI_TRANSFORMATION_PROJECT_PLAN.md	fix(csp): clean all public-facing pages - 75 violations fixed (66%)	2025-10-19 13:17:50 +13:00

README.md

Tractatus Framework

Architectural AI Safety Through Structural Constraints

The world's first production implementation of architectural AI safety guarantees. Tractatus preserves human agency through structural, not aspirational constraints on AI systems.

🎯 What is Tractatus?

Tractatus is an architectural AI safety framework that makes certain decisions structurally impossible for AI systems to make without human approval. Unlike traditional AI safety approaches that rely on training and alignment, Tractatus uses runtime enforcement of decision boundaries.

The Core Problem

Traditional AI safety relies on:

🎓 Alignment training - Hoping the AI learns the "right" values
📜 Constitutional AI - Embedding principles in training
🔄 RLHF - Reinforcement learning from human feedback

These approaches share a fundamental flaw: they assume the AI will maintain alignment regardless of capability or context pressure.

The Tractatus Solution

Tractatus implements architectural constraints that:

✅ Block values decisions - Privacy vs. performance requires human judgment
✅ Prevent instruction override - Explicit instructions can't be autocorrected by training patterns
✅ Detect context degradation - Quality metrics trigger session handoffs
✅ Require verification - Complex operations need metacognitive checks
✅ Persist instructions - Directives survive across sessions

🚀 Quick Start

Installation

# Clone repository
git clone https://github.com/AgenticGovernance/tractatus-framework.git
cd tractatus-framework

# Install dependencies
npm install

# Initialize database
npm run init:db

# Start development server
npm run dev

Basic Usage

const {
  InstructionPersistenceClassifier,
  CrossReferenceValidator,
  BoundaryEnforcer,
  ContextPressureMonitor,
  MetacognitiveVerifier
} = require('./src/services');

// Classify an instruction
const classifier = new InstructionPersistenceClassifier();
const classification = classifier.classify({
  text: "Always use MongoDB on port 27027",
  source: "user"
});

// Store in instruction history
await InstructionDB.store(classification);

// Validate before taking action
const validator = new CrossReferenceValidator();
const validation = await validator.validate({
  type: 'database_config',
  port: 27017  // ⚠️ Conflicts with stored instruction!
});

// validation.status === 'REJECTED'
// validation.reason === 'Pattern recognition bias override detected'

📚 Core Components

1. InstructionPersistenceClassifier

Classifies instructions by quadrant and persistence level:

{
  quadrant: "SYSTEM",           // STRATEGIC | OPERATIONAL | TACTICAL | SYSTEM
  persistence: "HIGH",           // HIGH | MEDIUM | LOW
  temporal_scope: "PROJECT",     // SESSION | PROJECT | PERMANENT
  verification_required: "MANDATORY"
}

2. CrossReferenceValidator

Prevents the "27027 failure mode" where AI training patterns override explicit instructions:

const result = validator.validate(action, { explicit_instructions });
// Blocks: Training pattern overrides, parameter conflicts, scope creep

3. BoundaryEnforcer

Blocks decisions that cross into values territory:

const check = enforcer.checkBoundary({
  decision: "Update privacy policy for more tracking"
});
// Result: BLOCKED - Values decision requires human judgment

4. ContextPressureMonitor

Multi-factor session health tracking:

const pressure = monitor.analyze({
  tokens: 120000/200000,      // 60% token usage
  messages: 45,                // Conversation length
  tasks: 8,                    // Concurrent complexity
  errors: 3                    // Recent error count
});
// Level: ELEVATED | Recommendation: INCREASE_VERIFICATION

5. MetacognitiveVerifier

AI self-checks reasoning before proposing actions:

const verification = verifier.verify({
  action: "Refactor 47 files across 5 system areas",
  context: { requested: "Refactor authentication module" }
});
// Decision: REQUIRE_REVIEW (scope creep detected)

💡 Real-World Examples

The 27027 Incident

Problem: User explicitly instructs "Use MongoDB on port 27027". AI immediately uses port 27017 instead.

Why: Training pattern "MongoDB = 27017" overrides explicit instruction, like autocorrect changing a deliberately unusual word.

Solution: CrossReferenceValidator blocks the action and auto-corrects to user's explicit instruction.

Try the Interactive Demo →

Context Degradation

Problem: In 6-hour sessions, error rates increase from 0.5% → 12.1% as context degrades.

Solution: ContextPressureMonitor detects degradation at 60% tokens and triggers session handoff before quality collapses.

Values Creep

Problem: "Improve performance" request leads AI to suggest weakening privacy protections without asking.

Solution: BoundaryEnforcer blocks the privacy/performance trade-off and requires human decision.

🚨 Learning from Failures: Transparency in Action

The framework doesn't prevent all failures—it structures detection, response, and learning.

October 2025: AI Fabrication Incident

During development, Claude (running with Tractatus governance) fabricated financial statistics on the landing page:

$3.77M in annual savings (no basis)
1,315% ROI (completely invented)
False claims of being "production-ready"

The framework structured the response:

✅ Detected within 48 hours (human review) ✅ Complete incident documentation required ✅ 3 new permanent rules created ✅ Comprehensive audit found related violations ✅ All content corrected same day ✅ Public case studies published for community learning

Read the full case studies:

Our Framework in Action - Practical walkthrough
When Frameworks Fail - Philosophical perspective
Real-World Governance - Educational analysis

Key Lesson: Governance doesn't guarantee perfection—it guarantees transparency, accountability, and systematic improvement.

📖 Documentation

Introduction - Framework overview and philosophy
Core Concepts - Deep dive into each service
Implementation Guide - Integration instructions
Case Studies - Real-world failure modes prevented
API Reference - Complete technical documentation

🧪 Testing

# Run all tests
npm test

# Run specific test suites
npm run test:unit
npm run test:integration
npm run test:security

# Watch mode
npm run test:watch

Test Coverage: 637 tests across 22 test files, 100% coverage of core services

🏗️ Architecture

tractatus/
├── src/
│   ├── services/              # Core framework services
│   │   ├── InstructionPersistenceClassifier.js
│   │   ├── CrossReferenceValidator.js
│   │   ├── BoundaryEnforcer.js
│   │   ├── ContextPressureMonitor.js
│   │   └── MetacognitiveVerifier.js
│   ├── models/                # Database models
│   ├── routes/                # API routes
│   └── middleware/            # Framework middleware
├── tests/                     # Test suites
├── scripts/                   # Utility scripts
├── docs/                      # Comprehensive documentation
└── public/                    # Frontend assets

⚠️ Current Research Challenges

Rule Proliferation & Transactional Overhead

Status: Open research question | Priority: High

As the framework learns from failures, instruction count grows:

Phase 1: 6 instructions
Current: 28 instructions (+367%)
Projected (12 months): 50-60 instructions

The concern: At what point does rule proliferation reduce framework effectiveness?

Context window pressure increases
Validation checks grow linearly
Cognitive load escalates

We're being transparent about this limitation. Solutions in development:

Instruction consolidation techniques
Rule prioritization algorithms
Context-aware selective loading
ML-based optimization

Full analysis: Rule Proliferation Research

🤝 Contributing

We welcome contributions in several areas:

Research Contributions

Formal verification of safety properties
Extensions to new domains (robotics, autonomous systems)
Theoretical foundations and proofs

Implementation Contributions

Ports to other languages (Python, Rust, Go)
Integration with other frameworks
Performance optimizations

Documentation Contributions

Tutorials and guides
Case studies from real deployments
Translations

See CONTRIBUTING.md for guidelines.

📊 Project Status

Phase 1: ✅ Complete (October 2025)

All 5 core services implemented
637 tests across 22 test files (100% coverage of core services)
Production deployment active
This website built using Tractatus governance

Phase 2: 🚧 In Planning

Multi-language support
Cloud deployment guides
Enterprise features

📜 License

Apache License 2.0 - See LICENSE for full terms.

The Tractatus Framework is open source and free to use, modify, and distribute with attribution.

🌐 Links

Website: agenticgovernance.digital
Documentation: agenticgovernance.digital/docs
Interactive Demo: 27027 Incident
GitHub: AgenticGovernance/tractatus-framework

📧 Contact

Email: john.stroh.nz@pm.me
Issues: GitHub Issues
Discussions: GitHub Discussions

🙏 Acknowledgments

This framework stands on the shoulders of:

Ludwig Wittgenstein - Philosophical foundations from Tractatus Logico-Philosophicus
March & Simon - Organizational theory and decision-making frameworks
Anthropic - Claude AI system for dogfooding and validation
Open Source Community - Tools, libraries, and support

📖 Philosophy

"Whereof one cannot speak, thereof one must be silent." — Ludwig Wittgenstein

Applied to AI safety:

"Whereof the AI cannot safely decide, thereof it must request human judgment."

Tractatus recognizes that some decisions cannot be systematized without value judgments. Rather than pretend AI can make these decisions "correctly," we build systems that structurally defer to human judgment in appropriate domains.

This isn't a limitation—it's architectural integrity.

Built with 🧠 by SyDigital Ltd | Documentation