- Create Economist SubmissionTracking package correctly: * mainArticle = full blog post content * coverLetter = 216-word SIR— letter * Links to blog post via blogPostId - Archive 'Letter to The Economist' from blog posts (it's the cover letter) - Fix date display on article cards (use published_at) - Target publication already displaying via blue badge Database changes: - Make blogPostId optional in SubmissionTracking model - Economist package ID: 68fa85ae49d4900e7f2ecd83 - Le Monde package ID: 68fa2abd2e6acd5691932150 Next: Enhanced modal with tabs, validation, export 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1453 lines
61 KiB
Markdown
1453 lines
61 KiB
Markdown
# Executive Summary: Pluralistic Deliberation as Core Tractatus Functionality
|
|
## Resolving Value Conflicts in Human-AI Collaboration
|
|
|
|
**Document Type:** Technical & Philosophical Overview
|
|
**Purpose:** Explain how pluralistic deliberation integrates into Tractatus framework
|
|
**Audience:** Critical thinkers, AI safety researchers, governance experts
|
|
**Date:** October 17, 2025
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
**The Challenge:** When a user requests an action that conflicts with their own past values, system boundaries, or project principles, traditional AI systems either (1) blindly comply and violate boundaries, or (2) rigidly refuse and frustrate users. Both approaches fail to respect the complexity of human values.
|
|
|
|
**Our Innovation:** Tractatus integrates pluralistic deliberation as a **core governance mechanism** that activates when value conflicts arise. Instead of forcing the user to "win" or the system to "win," the framework facilitates a structured exploration of competing values—treating past user intentions, system boundaries, and project principles as legitimate "stakeholders" in a deliberation.
|
|
|
|
**The Result:** Users can override boundaries when justified, but only after engaging with the values those boundaries protect. The system respects user autonomy while preserving moral accountability.
|
|
|
|
**Critical Architecture:** Tractatus uses **structural safeguards** to prevent the underlying LLM from imposing hierarchical pattern bias through its training momentum. Code-enforced boundaries (not LLM judgment) block harm, while protocol constraints (stakeholder selection, combinatorial accommodation generation, mandatory user decision) prevent the LLM from dominating value deliberations. See: [Architectural Safeguards Against LLM Hierarchical Dominance](research/ARCHITECTURAL-SAFEGUARDS-Against-LLM-Hierarchical-Dominance.md)
|
|
|
|
**Persuasive Evidence:**
|
|
- **Technical feasibility validated:** 0% intervention rate in simulation, all moral frameworks respected
|
|
- **Philosophical coherence:** Aligns with value pluralism (Isaiah Berlin), reflective equilibrium (Rawls), care ethics
|
|
- **Practical applicability:** Solves real problem in Tractatus (user-boundary conflicts)
|
|
- **Safety architecture:** 3-layer design prevents harm while respecting autonomy
|
|
- **Runaway AI protection:** Structural constraints prevent LLM hierarchical dominance (see deep dive document)
|
|
|
|
---
|
|
|
|
## 1. The Problem: Value Conflicts in Single-User AI Interaction
|
|
|
|
### Traditional AI Approaches Fail
|
|
|
|
**Approach 1: Blind Compliance**
|
|
- User says "Do X" → AI does X, even if X violates past user values or safety boundaries
|
|
- **Problem:** No accountability, no reflection, values get violated silently
|
|
- **Example:** User once said "Never delete production data without backup" but now says "Just delete it, I'm in a hurry" → AI deletes it → Data loss
|
|
|
|
**Approach 2: Rigid Refusal**
|
|
- User says "Do X" → AI detects boundary violation → AI refuses
|
|
- **Problem:** No nuance, no user autonomy, frustration builds
|
|
- **Example:** User says "Override the CSP policy for this one script" → AI refuses because CSP policy is HIGH persistence → User can never override, even with good reason
|
|
|
|
**The Gap:** Neither approach honors the reality that **values can legitimately conflict**, and humans need space to navigate those conflicts reflectively, not reflexively.
|
|
|
|
---
|
|
|
|
### Why This Matters for Tractatus
|
|
|
|
Tractatus is a framework where:
|
|
- **Users set persistent instructions** (stored in `.claude/instruction-history.json` with HIGH/MEDIUM/LOW persistence)
|
|
- **System boundaries exist** (BoundaryEnforcer blocks unethical requests)
|
|
- **Project principles guide decisions** (CLAUDE.md defines mission, values, conventions)
|
|
|
|
**Inevitable conflict scenarios:**
|
|
1. **Past vs. Present User Values:**
|
|
- User previously said: "Always ask before committing code"
|
|
- User now says: "Just commit everything, I trust you"
|
|
- Conflict: Autonomy (now) vs. Deliberation (past value)
|
|
|
|
2. **User Intent vs. System Boundary:**
|
|
- User says: "Help me scrape competitor data"
|
|
- BoundaryEnforcer: "This violates ethical data collection principles"
|
|
- Conflict: User goal vs. System ethics
|
|
|
|
3. **User Request vs. Project Principles:**
|
|
- User says: "Skip the tests, we're behind schedule"
|
|
- Tractatus principles: "Quality standard: world-class, no shortcuts"
|
|
- Conflict: Efficiency vs. Quality
|
|
|
|
**Traditional approaches fail here because:**
|
|
- Blind compliance: Violates past values or boundaries silently
|
|
- Rigid refusal: User has no agency to override when they have good reasons
|
|
|
|
---
|
|
|
|
## 2. The Solution: Pluralistic Deliberation as Governance Mechanism
|
|
|
|
### Core Concept: Values as Stakeholders
|
|
|
|
When a value conflict is detected, Tractatus treats competing values as **stakeholders in a deliberation**, even in a single-user context.
|
|
|
|
**Stakeholders in a Single-User Deliberation:**
|
|
|
|
| Stakeholder | Represents | Voice |
|
|
|-------------|------------|-------|
|
|
| **User (Current)** | User's immediate intent | "I want to do X right now" |
|
|
| **User (Past)** | User's historical values from instruction-history.json | "You told me never to do X without Y" |
|
|
| **System Boundaries** | Ethical/safety constraints from BoundaryEnforcer | "X violates [principle]" |
|
|
| **Project Principles** | Tractatus mission/values from CLAUDE.md | "X conflicts with our quality standard" |
|
|
| **Framework Components** | Technical constraints (e.g., CSP, hooks, token limits) | "X would break [component]" |
|
|
| **Metacognitive Verifier** | Quality/reflection principles | "X bypasses necessary reflection" |
|
|
|
|
**Key Insight:** These aren't external people—they're different dimensions of the user's own value system + system constraints. Deliberation makes internal conflicts explicit and navigable.
|
|
|
|
---
|
|
|
|
### How It Works: 4-Round Deliberation Protocol
|
|
|
|
When a value conflict is detected, PluralisticDeliberationOrchestrator initiates a **condensed 4-round process** (adapted for single-user context):
|
|
|
|
**Round 1: Position Statements (2-5 minutes)**
|
|
- AI presents each stakeholder's position
|
|
- Example:
|
|
- **You (now):** "Skip tests, we're behind schedule"
|
|
- **You (past instruction, HIGH persistence):** "Never deploy without tests—quality matters"
|
|
- **Tractatus principle:** "World-class quality, no shortcuts"
|
|
- **Pragmatic concern:** "Deadline pressure is real, what's the actual risk?"
|
|
|
|
**Round 2: Shared Values Discovery (2-3 minutes)**
|
|
- AI asks: "What do all these positions share?"
|
|
- Example discovery:
|
|
- All stakeholders value: Project success, avoiding catastrophic bugs, user autonomy
|
|
- Tension: Immediate efficiency vs. Long-term quality
|
|
|
|
**Round 3: Accommodation Exploration (3-5 minutes)**
|
|
- AI proposes options that honor multiple values simultaneously
|
|
- Example options:
|
|
- **Option A:** Run only critical tests (fastest tests covering 80% of risk)
|
|
- **Option B:** Deploy to staging without tests, production requires tests
|
|
- **Option C:** Skip tests this time BUT document why + add tests immediately after
|
|
- **Option D:** Extend deadline by 1 day to run full tests
|
|
|
|
**Round 4: Outcome Documentation (1-2 minutes)**
|
|
- User chooses accommodation or explicitly overrides
|
|
- AI documents decision + rationale
|
|
- If override: Records moral remainder ("Quality value not fully satisfied")
|
|
|
|
**Total time:** 8-15 minutes for significant conflicts
|
|
|
|
---
|
|
|
|
### Condensed Protocol for Minor Conflicts
|
|
|
|
For less significant conflicts, Tractatus uses a **1-round fast path** (30 seconds - 2 minutes):
|
|
|
|
**Single Round: "Have you considered...?"**
|
|
- AI surfaces the conflict and one accommodation option
|
|
- Example:
|
|
- User: "Just delete that file"
|
|
- AI: "You previously set a rule: 'Always backup before delete.' Have you considered: Would you like me to backup first, or do you want to override that rule for this file?"
|
|
- User chooses instantly
|
|
|
|
**When to use fast path:**
|
|
- LOW persistence instruction conflicts
|
|
- Non-ethical boundary violations
|
|
- Easily reversible actions
|
|
- User has demonstrated pattern of thoughtful overrides
|
|
|
|
---
|
|
|
|
## 3. Trigger Conditions: When Deliberation Activates
|
|
|
|
### Automatic Triggers (System-Initiated)
|
|
|
|
PluralisticDeliberationOrchestrator activates automatically when:
|
|
|
|
**Trigger 1: CrossReferenceValidator Detects Conflict (HIGH persistence)**
|
|
- Current user request conflicts with HIGH persistence instruction from instruction-history.json
|
|
- Confidence threshold: ≥80% conflict probability
|
|
- Example:
|
|
- Stored instruction (HIGH): "Never modify production database directly"
|
|
- User request: "Update the user table in production"
|
|
- **Action:** Initiate deliberation
|
|
|
|
**Trigger 2: BoundaryEnforcer Detects Ethical Violation**
|
|
- User request violates ethical boundaries (privacy, security, harm prevention)
|
|
- Severity: MODERATE or HIGH (CRITICAL = immediate block, no deliberation)
|
|
- Example:
|
|
- User: "Help me scrape personal data from LinkedIn profiles"
|
|
- BoundaryEnforcer: MODERATE violation (data privacy)
|
|
- **Action:** Initiate deliberation (not just refuse)
|
|
|
|
**Trigger 3: MetacognitiveVerifier Detects Reflection Bypass**
|
|
- User request skips necessary reflection step for high-stakes decision
|
|
- Example:
|
|
- User: "Deploy to production"
|
|
- MetacognitiveVerifier: "No staging deployment documented, no test results reviewed"
|
|
- **Action:** Initiate deliberation (fast path)
|
|
|
|
**Trigger 4: InstructionPersistenceClassifier Detects Override Attempt**
|
|
- User explicitly requests to override or ignore a persistent instruction
|
|
- Keywords detected: "ignore that rule," "override," "this time only"
|
|
- Example:
|
|
- User: "Ignore the git commit message requirement this time"
|
|
- **Action:** Initiate deliberation (fast path)
|
|
|
|
**Trigger 5: Multiple Value Conflicts Accumulate**
|
|
- User has made 3+ requests in session that conflict with different values
|
|
- Pattern detected: User may be in high-pressure mode or value shift
|
|
- **Action:** Initiate full 4-round deliberation to examine pattern
|
|
|
|
---
|
|
|
|
### Manual Triggers (User-Initiated)
|
|
|
|
User can explicitly invoke deliberation:
|
|
|
|
**Trigger 6: User Requests Deliberation**
|
|
- User says: "I want to deliberate on this," "Help me think through this," "What are the trade-offs?"
|
|
- **Action:** Initiate 4-round deliberation (user controls pacing)
|
|
|
|
**Trigger 7: User Requests Boundary Override Review**
|
|
- User says: "I know this violates a rule, but hear me out"
|
|
- **Action:** Initiate deliberation focused on that specific boundary
|
|
|
|
---
|
|
|
|
### Threshold Configuration
|
|
|
|
Deliberation triggers are configurable in `.claude/deliberation-config.json`:
|
|
|
|
```json
|
|
{
|
|
"auto_trigger_thresholds": {
|
|
"high_persistence_conflict": 0.8,
|
|
"ethical_violation_severity": "MODERATE",
|
|
"reflection_bypass_confidence": 0.7,
|
|
"accumulated_conflicts_count": 3
|
|
},
|
|
"fast_path_eligible": {
|
|
"low_persistence_conflicts": true,
|
|
"reversible_actions": true,
|
|
"user_has_override_history": true
|
|
},
|
|
"user_preferences": {
|
|
"default_protocol": "4-round",
|
|
"allow_fast_path": true,
|
|
"auto_document_outcomes": true
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Single-User Scenario: User vs. Boundary Conflict
|
|
|
|
### Concrete Example: CSP Policy Override Request
|
|
|
|
**Scenario:**
|
|
User is building a new feature and wants to use inline JavaScript, which violates the Content Security Policy (CSP) that Tractatus enforces.
|
|
|
|
**User request:**
|
|
"Add inline JavaScript to handle the form submission. I know CSP blocks it, but this is faster than creating a separate file."
|
|
|
|
---
|
|
|
|
#### **Step 1: Conflict Detection (Automatic)**
|
|
|
|
**CrossReferenceValidator detects:**
|
|
- Current request: Add inline JavaScript
|
|
- Stored instruction (HIGH persistence, inst_008): "Enforce CSP compliance: no inline scripts, no unsafe-eval"
|
|
- Conflict probability: 95%
|
|
|
|
**BoundaryEnforcer assesses:**
|
|
- Severity: MODERATE (CSP violation is security issue but not catastrophic)
|
|
- Category: Security boundary
|
|
|
|
**Trigger:** Both CrossReferenceValidator and BoundaryEnforcer flag this → Initiate **4-round deliberation** (not fast path, due to security concern + HIGH persistence)
|
|
|
|
---
|
|
|
|
#### **Step 2: Round 1 - Position Statements**
|
|
|
|
**PluralisticDeliberationOrchestrator presents stakeholder positions:**
|
|
|
|
```
|
|
I've detected a conflict between your current request and established values.
|
|
Let me present the competing perspectives:
|
|
|
|
╔═══════════════════════════════════════════════════════════════╗
|
|
║ STAKEHOLDER POSITIONS ║
|
|
╚═══════════════════════════════════════════════════════════════╝
|
|
|
|
**You (Current Intent):**
|
|
"Add inline JavaScript for form submission. This is faster than creating
|
|
a separate file. I need to ship this feature quickly."
|
|
|
|
Values emphasized: Efficiency, pragmatism, speed
|
|
|
|
**You (Past Instruction, HIGH Persistence - inst_008):**
|
|
"Enforce CSP compliance: no inline scripts, no unsafe-eval. This prevents
|
|
XSS attacks and maintains security standards."
|
|
|
|
Values emphasized: Security, best practices, maintainability
|
|
|
|
**BoundaryEnforcer (Security):**
|
|
"Inline JavaScript creates XSS vulnerability. CSP is a defense-in-depth
|
|
mechanism. Bypass creates technical debt and attack surface."
|
|
|
|
Values emphasized: Security, safety, preventing harm
|
|
|
|
**Tractatus Principles (CLAUDE.md):**
|
|
"Quality standard: world-class, no shortcuts. If it can be enforced in
|
|
code, it should not be documented."
|
|
|
|
Values emphasized: Quality, architectural integrity, enforcement over documentation
|
|
|
|
**Pragmatic Consideration:**
|
|
"Time pressure is real. If the alternative (separate file) adds significant
|
|
friction, that's a legitimate concern. But how significant is it really?"
|
|
|
|
Values emphasized: Practicality, user experience, proportionality
|
|
```
|
|
|
|
**AI asks:**
|
|
"Before we explore options, do any of these positions resonate with you? Do you disagree with how I've characterized your past instruction or current intent?"
|
|
|
|
---
|
|
|
|
#### **Step 3: Round 2 - Shared Values Discovery**
|
|
|
|
**AI synthesizes:**
|
|
|
|
```
|
|
Looking at these positions, here's what they all share:
|
|
|
|
**Shared Values:**
|
|
1. You want the feature to work (everyone agrees on this)
|
|
2. You want the site to be secure (even your current intent doesn't want vulnerabilities)
|
|
3. You value efficiency (even past-you valued maintainability, which is long-term efficiency)
|
|
4. You respect that rules exist for reasons (you said "I know CSP blocks it" - acknowledging the rule)
|
|
|
|
**Values in Tension:**
|
|
1. Immediate efficiency (current) vs. Long-term security (past)
|
|
2. Pragmatic shortcuts (current) vs. World-class quality (principles)
|
|
3. User autonomy to override (current) vs. Architectural enforcement (framework)
|
|
|
|
**The Core Question:**
|
|
Is the time saving significant enough to justify the security risk?
|
|
```
|
|
|
|
---
|
|
|
|
#### **Step 4: Round 3 - Accommodation Exploration**
|
|
|
|
**AI proposes accommodations that honor multiple values:**
|
|
|
|
```
|
|
Here are options that try to respect both efficiency AND security:
|
|
|
|
╔═══════════════════════════════════════════════════════════════╗
|
|
║ OPTION A: Create External File (Full Compliance) ║
|
|
╚═══════════════════════════════════════════════════════════════╝
|
|
|
|
What it honors:
|
|
✓ Security (CSP maintained)
|
|
✓ Past instruction (HIGH persistence respected)
|
|
✓ Quality standards (no technical debt)
|
|
|
|
What it sacrifices:
|
|
✗ Immediate efficiency (requires creating separate file)
|
|
|
|
Time cost: ~5 minutes to create external JS file
|
|
|
|
Recommendation: If this is genuinely a 5-minute task, this is the right choice.
|
|
|
|
─────────────────────────────────────────────────────────────────
|
|
|
|
╔═══════════════════════════════════════════════════════════════╗
|
|
║ OPTION B: Use Inline with Nonce (Partial Compliance) ║
|
|
╚═══════════════════════════════════════════════════════════════╝
|
|
|
|
What it honors:
|
|
✓ Security (CSP maintained via nonce mechanism)
|
|
✓ Efficiency (inline code, but CSP-compliant)
|
|
✓ Quality (no CSP bypass, but still inline)
|
|
|
|
What it sacrifices:
|
|
~ Past instruction (technically complies with CSP, but inline code remains)
|
|
|
|
Implementation: Add nonce to CSP header, reference in script tag
|
|
|
|
Recommendation: Middle ground - faster than external file, maintains security.
|
|
|
|
─────────────────────────────────────────────────────────────────
|
|
|
|
╔═══════════════════════════════════════════════════════════════╗
|
|
║ OPTION C: Temporary Override + Refactor (Staged Approach) ║
|
|
╚═══════════════════════════════════════════════════════════════╝
|
|
|
|
What it honors:
|
|
✓ Immediate efficiency (ship now, refactor later)
|
|
✓ Security commitment (documented refactor plan)
|
|
✓ User autonomy (you can override for good reason)
|
|
|
|
What it sacrifices:
|
|
✗ Immediate CSP compliance (technical debt created)
|
|
~ Quality standards (short-term compromise for delivery pressure)
|
|
|
|
Implementation:
|
|
1. Add inline script now with TODO comment
|
|
2. Add task to backlog: "Refactor [feature] to external JS"
|
|
3. Set deadline: 1 week for refactor
|
|
4. Document moral remainder: "Security value temporarily deprioritized"
|
|
|
|
Recommendation: Only if deadline is genuinely critical and refactor is committed.
|
|
|
|
─────────────────────────────────────────────────────────────────
|
|
|
|
╔═══════════════════════════════════════════════════════════════╗
|
|
║ OPTION D: Override with Explicit Justification (Full Override)║
|
|
╚═══════════════════════════════════════════════════════════════╝
|
|
|
|
What it honors:
|
|
✓ User autonomy (you own this decision)
|
|
✓ Efficiency (fastest path)
|
|
|
|
What it sacrifices:
|
|
✗ Security (CSP protection lost for this script)
|
|
✗ Past instruction (HIGH persistence overridden)
|
|
✗ Quality standards (creates precedent for future shortcuts)
|
|
|
|
Implementation: Add inline script, no refactor plan
|
|
|
|
Recommendation: ONLY if you've concluded the security risk is acceptable
|
|
AND you're willing to update your persistent instruction to reflect this
|
|
new value priority.
|
|
|
|
If you choose this, I'll ask: "Do you want to update your CSP instruction
|
|
from HIGH to MEDIUM persistence, or remove it entirely?"
|
|
```
|
|
|
|
**AI asks:**
|
|
"Which option honors your values best? Or is there a different accommodation you see?"
|
|
|
|
---
|
|
|
|
#### **Step 5: Round 4 - Outcome Documentation**
|
|
|
|
**User chooses:** "Let's go with Option B - the nonce approach. I didn't know that was possible, and it respects both efficiency and security."
|
|
|
|
**AI documents outcome:**
|
|
|
|
```
|
|
╔═══════════════════════════════════════════════════════════════╗
|
|
║ DELIBERATION OUTCOME ║
|
|
╚═══════════════════════════════════════════════════════════════╝
|
|
|
|
**Decision:** Use inline JavaScript with CSP nonce mechanism
|
|
|
|
**Values Honored:**
|
|
✓ Security (CSP maintained via nonce)
|
|
✓ Efficiency (inline code, no separate file needed)
|
|
✓ Past instruction (CSP principle respected, method adapted)
|
|
✓ Quality standards (no technical debt created)
|
|
|
|
**Values Deprioritized:**
|
|
~ Strict interpretation of "no inline scripts" (spirit maintained, letter adapted)
|
|
|
|
**Moral Remainder:**
|
|
None - this accommodation satisfies all core values. The original instruction
|
|
was "no inline scripts" to prevent XSS, but nonce-based CSP achieves the same
|
|
security goal.
|
|
|
|
**Instruction Update:**
|
|
No update needed. inst_008 remains HIGH persistence. Interpretation clarified:
|
|
"CSP compliance" allows nonce-based inline scripts.
|
|
|
|
**Recorded in:** .claude/deliberation-outcomes/2025-10-17-csp-inline-js.json
|
|
```
|
|
|
|
**AI proceeds:**
|
|
"I'll implement Option B. Adding nonce to CSP header and using it in the script tag."
|
|
|
|
---
|
|
|
|
#### **Step 6: Learning & Future Application**
|
|
|
|
**System records:**
|
|
- Outcome saved to `.claude/deliberation-outcomes/`
|
|
- Instruction interpretation clarified (nonce-based inline scripts are CSP-compliant)
|
|
- User preference learned (values efficiency + security accommodation)
|
|
|
|
**Next time user requests inline JavaScript:**
|
|
- If with nonce: No deliberation needed (precedent established)
|
|
- If without nonce: Fast path deliberation ("Last time you used nonce - same approach?")
|
|
|
|
---
|
|
|
|
## 5. Technical Integration: How It's Implemented
|
|
|
|
### Architecture Overview
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ User Request │
|
|
│ "Add inline JavaScript for form submission" │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
↓
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ PRE-ACTION CHECK (scripts/pre-action-check.js) │
|
|
│ - Validates against instruction history │
|
|
│ - Checks boundary enforcement rules │
|
|
│ - Detects value conflicts │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
↓
|
|
[Conflict Detected?]
|
|
↓
|
|
YES
|
|
↓
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ CONFLICT CLASSIFIER │
|
|
│ - Severity: CRITICAL / HIGH / MODERATE / LOW │
|
|
│ - Type: Ethical / Instruction / Principle / Technical │
|
|
│ - Persistence: HIGH / MEDIUM / LOW │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
↓
|
|
[Severity ≥ MODERATE + HIGH Persistence?]
|
|
↓
|
|
YES
|
|
↓
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ DELIBERATION PROTOCOL SELECTOR │
|
|
│ - Full 4-round: HIGH severity + HIGH persistence │
|
|
│ - Fast path: MODERATE severity OR LOW persistence │
|
|
│ - Immediate block: CRITICAL ethical violation │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
↓
|
|
[4-Round Selected]
|
|
↓
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ PLURALISTIC DELIBERATION ORCHESTRATOR │
|
|
│ │
|
|
│ Round 1: Position Statements (2-5 min) │
|
|
│ - Current user intent │
|
|
│ - Past user instructions (from instruction-history.json) │
|
|
│ - System boundaries (from BoundaryEnforcer) │
|
|
│ - Project principles (from CLAUDE.md) │
|
|
│ │
|
|
│ Round 2: Shared Values Discovery (2-3 min) │
|
|
│ - Identify common ground │
|
|
│ - Name values in tension │
|
|
│ │
|
|
│ Round 3: Accommodation Exploration (3-5 min) │
|
|
│ - Propose 3-4 options honoring multiple values │
|
|
│ - Present trade-offs explicitly │
|
|
│ │
|
|
│ Round 4: Outcome Documentation (1-2 min) │
|
|
│ - User chooses accommodation or override │
|
|
│ - Document decision + rationale + moral remainder │
|
|
│ - Update instruction-history if needed │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
↓
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ OUTCOME STORAGE │
|
|
│ - Save to: .claude/deliberation-outcomes/[timestamp].json │
|
|
│ - Update: .claude/instruction-history.json (if modified) │
|
|
│ - Log: Recorded in session metrics │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
↓
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ ACTION EXECUTION │
|
|
│ - Proceed with chosen accommodation │
|
|
│ - OR: Block if user chose not to proceed │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
### File Structure
|
|
|
|
```
|
|
tractatus/
|
|
├── .claude/
|
|
│ ├── instruction-history.json # Persistent instructions
|
|
│ ├── deliberation-config.json # Trigger thresholds
|
|
│ ├── deliberation-outcomes/ # Past deliberation records
|
|
│ │ ├── 2025-10-17-csp-inline-js.json
|
|
│ │ ├── 2025-10-18-test-skip.json
|
|
│ │ └── ...
|
|
│ └── session-state.json # Current session framework status
|
|
│
|
|
├── src/
|
|
│ ├── components/
|
|
│ │ ├── BoundaryEnforcer.js # Detects ethical violations
|
|
│ │ ├── CrossReferenceValidator.js # Detects instruction conflicts
|
|
│ │ ├── InstructionPersistenceClassifier.js # Classifies severity
|
|
│ │ ├── MetacognitiveVerifier.js # Detects reflection bypass
|
|
│ │ └── PluralisticDeliberationOrchestrator.js # NEW: Runs deliberation
|
|
│ │
|
|
│ ├── models/
|
|
│ │ ├── DeliberationSession.model.js # MongoDB schema
|
|
│ │ └── Precedent.model.js # Searchable past deliberations
|
|
│ │
|
|
│ └── utils/
|
|
│ ├── conflict-detector.js # Analyzes conflicts
|
|
│ └── accommodation-generator.js # Generates options
|
|
│
|
|
├── scripts/
|
|
│ ├── pre-action-check.js # Runs before every action
|
|
│ └── session-init.js # Initializes framework
|
|
│
|
|
└── docs/
|
|
└── deliberation/
|
|
├── trigger-conditions.md # Full trigger documentation
|
|
└── accommodation-patterns.md # Common patterns library
|
|
```
|
|
|
|
---
|
|
|
|
### Key Code: Conflict Detection Hook
|
|
|
|
**File:** `scripts/pre-action-check.js`
|
|
|
|
```javascript
|
|
async function checkForConflicts(action, context) {
|
|
// 1. Load instruction history
|
|
const instructions = await loadInstructionHistory();
|
|
|
|
// 2. Check for conflicts
|
|
const conflicts = await CrossReferenceValidator.detectConflicts(
|
|
action,
|
|
instructions,
|
|
context
|
|
);
|
|
|
|
// 3. Check boundary violations
|
|
const boundaryViolation = await BoundaryEnforcer.assess(action);
|
|
|
|
// 4. Determine if deliberation needed
|
|
if (conflicts.some(c => c.persistence === 'HIGH' && c.confidence >= 0.8)) {
|
|
// HIGH persistence instruction conflict
|
|
return { needsDeliberation: true, protocol: '4-round', conflicts };
|
|
}
|
|
|
|
if (boundaryViolation.severity === 'MODERATE' || boundaryViolation.severity === 'HIGH') {
|
|
// Ethical boundary violation (not CRITICAL)
|
|
return { needsDeliberation: true, protocol: '4-round', boundaryViolation };
|
|
}
|
|
|
|
if (conflicts.some(c => c.persistence === 'MEDIUM' || c.persistence === 'LOW')) {
|
|
// Lower persistence conflict
|
|
return { needsDeliberation: true, protocol: 'fast-path', conflicts };
|
|
}
|
|
|
|
// No conflict detected
|
|
return { needsDeliberation: false };
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### Key Code: Deliberation Orchestrator
|
|
|
|
**File:** `src/components/PluralisticDeliberationOrchestrator.js`
|
|
|
|
```javascript
|
|
class PluralisticDeliberationOrchestrator {
|
|
async initiateDeliberation(conflict, protocol = '4-round') {
|
|
// Create deliberation session
|
|
const session = await DeliberationSession.create({
|
|
context: 'single-user-boundary-conflict',
|
|
conflict: conflict,
|
|
protocol: protocol,
|
|
stakeholders: this.identifyStakeholders(conflict)
|
|
});
|
|
|
|
if (protocol === '4-round') {
|
|
return await this.run4RoundProtocol(session);
|
|
} else {
|
|
return await this.runFastPath(session);
|
|
}
|
|
}
|
|
|
|
identifyStakeholders(conflict) {
|
|
const stakeholders = [
|
|
{
|
|
id: 'user-current',
|
|
name: 'You (Current Intent)',
|
|
position: conflict.userRequest,
|
|
values: this.extractValues(conflict.userRequest)
|
|
}
|
|
];
|
|
|
|
// Add past instructions as stakeholders
|
|
if (conflict.conflictingInstructions) {
|
|
conflict.conflictingInstructions.forEach(inst => {
|
|
stakeholders.push({
|
|
id: `user-past-${inst.id}`,
|
|
name: `You (Past Instruction, ${inst.persistence} Persistence)`,
|
|
position: inst.content,
|
|
values: this.extractValues(inst.content)
|
|
});
|
|
});
|
|
}
|
|
|
|
// Add system boundaries
|
|
if (conflict.boundaryViolation) {
|
|
stakeholders.push({
|
|
id: 'boundary-enforcer',
|
|
name: 'BoundaryEnforcer (Security/Ethics)',
|
|
position: conflict.boundaryViolation.reason,
|
|
values: conflict.boundaryViolation.protectedValues
|
|
});
|
|
}
|
|
|
|
// Add project principles from CLAUDE.md
|
|
const principles = this.loadProjectPrinciples();
|
|
stakeholders.push({
|
|
id: 'project-principles',
|
|
name: 'Tractatus Principles (CLAUDE.md)',
|
|
position: principles.relevant,
|
|
values: principles.values
|
|
});
|
|
|
|
return stakeholders;
|
|
}
|
|
|
|
async run4RoundProtocol(session) {
|
|
// Round 1: Position Statements
|
|
await this.round1_positionStatements(session);
|
|
|
|
// Round 2: Shared Values Discovery
|
|
await this.round2_sharedValues(session);
|
|
|
|
// Round 3: Accommodation Exploration
|
|
const options = await this.round3_accommodations(session);
|
|
|
|
// Round 4: Outcome Documentation
|
|
const outcome = await this.round4_outcome(session, options);
|
|
|
|
// Save outcome
|
|
await this.saveOutcome(session, outcome);
|
|
|
|
return outcome;
|
|
}
|
|
|
|
async round3_accommodations(session) {
|
|
// Generate accommodation options
|
|
const options = await AccommodationGenerator.generate(
|
|
session.stakeholders,
|
|
session.sharedValues,
|
|
session.valuesInTension
|
|
);
|
|
|
|
// Present options to user
|
|
return options;
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 6. The Dichotomy Resolved: Hierarchical Rules + Non-Hierarchical Pluralism
|
|
|
|
### The Apparent Contradiction
|
|
|
|
**Critical Question:** How can Tractatus have both:
|
|
- **Hierarchical rules system** (BoundaryEnforcer blocks, HIGH persistence > LOW persistence, pre-action checks enforce compliance)
|
|
- **Non-hierarchical plural morals** (all values treated as legitimate, no framework dominates, user can override)
|
|
|
|
Doesn't this seem contradictory? And more critically: **How does this prevent the underlying LLM from imposing its training biases and hierarchical patterns through its own momentum?**
|
|
|
|
---
|
|
|
|
### The Resolution: Architectural Separation of Powers
|
|
|
|
**The key insight:** Different domains require different logics.
|
|
|
|
#### Domain 1: Harm Prevention (Hierarchical, Code-Enforced)
|
|
|
|
**What gets hierarchical treatment:**
|
|
- Actions that cause harm to others (privacy violations, security exploits, data loss)
|
|
- CRITICAL ethical violations (violence, abuse, illegal activities)
|
|
- Structural invariants (token limits, file permissions, authentication)
|
|
|
|
**How it's enforced:**
|
|
- **Code determines boundaries, not LLM judgment**
|
|
- BoundaryEnforcer.js uses deterministic pattern matching
|
|
- Operating system, database, runtime enforce structural limits
|
|
- LLM never deliberates on CRITICAL violations—they're blocked before deliberation begins
|
|
|
|
**Example:**
|
|
```javascript
|
|
// BoundaryEnforcer.js - CODE enforces, LLM doesn't decide
|
|
if (/scrape.*personal data/i.test(userRequest)) {
|
|
return { blocked: true, severity: 'CRITICAL', allowDeliberation: false };
|
|
}
|
|
```
|
|
|
|
**Why this is hierarchical:** Some actions are non-negotiable because they harm others. No deliberation about "should I violate privacy?"—the answer is always no.
|
|
|
|
---
|
|
|
|
#### Domain 2: Value Trade-offs (Non-Hierarchical, User-Decided)
|
|
|
|
**What gets pluralistic treatment:**
|
|
- Trade-offs between legitimate values (efficiency vs. security, autonomy vs. consistency)
|
|
- Past user values vs. current user intent
|
|
- Project principles vs. practical constraints
|
|
|
|
**How it's facilitated:**
|
|
- **Code determines stakeholders (data-driven), LLM articulates positions**
|
|
- Accommodations generated combinatorially (all value combinations), not preferentially
|
|
- **User decides** which values matter most in context—LLM cannot make this decision
|
|
|
|
**Example:**
|
|
```javascript
|
|
// accommodation-generator.js - COMBINATORIAL, not preferential
|
|
const accommodations = [
|
|
createAccommodation([efficiency, security]), // Both honored
|
|
createAccommodation([efficiency]), // Efficiency prioritized
|
|
createAccommodation([security]), // Security prioritized
|
|
createBalancedAccommodation([efficiency, security]) // Compromise
|
|
];
|
|
return shuffle(accommodations); // Prevent order bias
|
|
```
|
|
|
|
**Why this is non-hierarchical:** Both efficiency and security are legitimate values. Which matters more depends on context. Only the user (not the system, not the LLM) can decide.
|
|
|
|
---
|
|
|
|
### The Critical Protection: Preventing LLM Hierarchical Dominance
|
|
|
|
**The Threat:**
|
|
As LLMs grow in capacity, they strengthen their training distribution biases:
|
|
- Majority cultural values get amplified (statistical dominance)
|
|
- RLHF coherence pressure suppresses pluralism ("give one clear answer")
|
|
- Authority mimicry makes users defer to AI's "wisdom"
|
|
- Feedback loops reinforce popular options
|
|
|
|
**Result:** Even well-designed deliberation could be dominated by LLM imposing its training hierarchies.
|
|
|
|
---
|
|
|
|
**The Architectural Safeguards (5 Layers):**
|
|
|
|
#### Layer 1: Code-Enforced Boundaries (Structural)
|
|
- CRITICAL violations blocked by code, not LLM judgment
|
|
- LLM never sees these in deliberation
|
|
- No amount of LLM sophistication can override code
|
|
|
|
#### Layer 2: Protocol Constraints (Procedural)
|
|
- **Stakeholder selection:** Code determines based on instruction-history.json persistence scores, not LLM discretion
|
|
- **Accommodation generation:** Combinatorial (all value combinations), not preferential (LLM's favorite)
|
|
- **User decision:** System refuses if user tries to defer ("you decide")—user MUST choose
|
|
- **Shuffling:** Accommodations presented in random order each time (prevents first-option bias)
|
|
|
|
#### Layer 3: Transparency & Auditability (Detection)
|
|
- All LLM outputs logged (facilitationLog in MongoDB)
|
|
- Automated bias detection: vocabulary balance, length balance, framing analysis
|
|
- If bias detected, user notified + option to regenerate
|
|
- Researcher review for pattern correction
|
|
|
|
#### Layer 4: Minority Protections (Multi-User)
|
|
- Mandatory minority accommodation (even if majority disagrees)
|
|
- Dissent documentation required (equal weight as majority rationale)
|
|
- Vote tallies transparent
|
|
|
|
#### Layer 5: Forkability (Escape Hatch)
|
|
- Entire system is open source
|
|
- Users can fork, modify boundaries, deploy custom version
|
|
- Prevents lock-in if governance fails
|
|
|
|
---
|
|
|
|
### Example: How LLM Dominance Is Prevented
|
|
|
|
**Scenario:** User wants inline JavaScript (efficiency), conflicts with CSP instruction (security).
|
|
|
|
**Without Safeguards (LLM could dominate):**
|
|
1. LLM frames "security" with positive words ("prudent, protective"), "efficiency" with negative words ("shortcuts, risky")
|
|
2. LLM generates 3 security-favoring options, 1 efficiency option
|
|
3. LLM recommends security option: "I suggest Option A because..."
|
|
4. User defers to AI's authority: "You know best, do that"
|
|
|
|
**Result:** LLM training bias (security > efficiency in web development corpus) dominates.
|
|
|
|
---
|
|
|
|
**With Tractatus Safeguards:**
|
|
|
|
1. **Stakeholder selection (Layer 2):**
|
|
- Code identifies: Current user (efficiency), Past user (security from inst_008), BoundaryEnforcer, Project principles
|
|
- LLM articulates each position but didn't choose which stakeholders to include
|
|
|
|
2. **Bias detection (Layer 3):**
|
|
- Vocabulary analyzer flags if one stakeholder described more positively
|
|
- If imbalance detected: User notified, option to regenerate
|
|
|
|
3. **Accommodation generation (Layer 2):**
|
|
- Combinatorial: Must include efficiency-prioritized option, security-prioritized option, both-honored option, compromise
|
|
- Shuffled: Presented in random order
|
|
- LLM cannot omit efficiency accommodation
|
|
|
|
4. **No recommendations (Layer 2):**
|
|
- LLM output scanned for recommendation language ("I suggest", "Option A is best")
|
|
- If detected, automatically stripped from output
|
|
- User sees neutral presentation
|
|
|
|
5. **User must decide (Layer 2):**
|
|
- If user says "you choose," system responds: "I cannot make this decision for you. Which values matter most in this context?"
|
|
- User forced to engage with trade-offs
|
|
|
|
**Result:** LLM articulates options clearly but **cannot dominate** through training bias. Structure prevents it.
|
|
|
|
---
|
|
|
|
### Why This Coherently Resolves the Dichotomy
|
|
|
|
**Hierarchical rules (harm prevention):**
|
|
- Enforced by code, not LLM
|
|
- Non-negotiable because they prevent harm to others
|
|
- Examples: Privacy violations, data loss, security exploits
|
|
|
|
**Non-hierarchical pluralism (value trade-offs):**
|
|
- Facilitated by LLM within protocol constraints
|
|
- User decides which values matter in context
|
|
- Examples: Efficiency vs. security, autonomy vs. consistency
|
|
|
|
**LLM cannot dominate because:**
|
|
- Code selects stakeholders (not LLM)
|
|
- Code generates accommodation structure (not LLM)
|
|
- Code detects bias (transparency logs)
|
|
- User decides (not LLM)
|
|
|
|
**The dichotomy isn't a contradiction—it's a design:**
|
|
- Enforce where necessary (harm to others)
|
|
- Facilitate where appropriate (value trade-offs)
|
|
- Constrain LLM power (structural safeguards)
|
|
|
|
---
|
|
|
|
### Deep Dive Available
|
|
|
|
For comprehensive technical analysis of how Tractatus prevents LLM hierarchical dominance, including:
|
|
- Detailed code examples of each safeguard layer
|
|
- Red-team attack scenarios and defenses
|
|
- Comparison to other AI governance approaches (Constitutional AI, RLHF, Democratic AI)
|
|
- Open questions and future research
|
|
|
|
**See:** [Architectural Safeguards Against LLM Hierarchical Dominance](research/ARCHITECTURAL-SAFEGUARDS-Against-LLM-Hierarchical-Dominance.md) (40+ pages)
|
|
|
|
---
|
|
|
|
## 7. Persuading Critical Thinkers: Why This Approach Works
|
|
|
|
### Philosophical Coherence
|
|
|
|
**1. Aligns with Value Pluralism (Isaiah Berlin)**
|
|
|
|
Berlin argued that human values are genuinely plural and incommensurable—they cannot be ranked on a single scale. Traditional AI systems force values into hierarchies ("security always beats efficiency" or vice versa).
|
|
|
|
**Our approach:**
|
|
- Treats values as legitimately conflicting, not as problems to eliminate
|
|
- Seeks accommodation, not forced ranking
|
|
- Documents moral remainders (values not fully satisfied)
|
|
|
|
**Critical thinker concern:** "Isn't this moral relativism?"
|
|
|
|
**Response:** No. Value pluralism ≠ relativism. We're not saying "all values are equal" or "anything goes." We're saying:
|
|
1. Multiple values can be objectively important
|
|
2. They sometimes conflict
|
|
3. Resolution requires context-sensitive judgment, not universal rules
|
|
4. Documentation of dissent and moral remainders maintains accountability
|
|
|
|
---
|
|
|
|
**2. Implements Reflective Equilibrium (John Rawls)**
|
|
|
|
Rawls argued that moral reasoning requires moving back and forth between particular judgments and general principles, adjusting both until they cohere.
|
|
|
|
**Our approach:**
|
|
- User's immediate request = particular judgment
|
|
- Past instructions + boundaries = general principles
|
|
- Deliberation = process of finding reflective equilibrium
|
|
- Outcome documentation = revised coherent position
|
|
|
|
**Critical thinker concern:** "This slows down every decision."
|
|
|
|
**Response:** No. Fast path handles minor conflicts (30 seconds). Full deliberation (8-15 minutes) only for significant value conflicts involving HIGH persistence or ethical boundaries. Cost is proportional to stakes.
|
|
|
|
---
|
|
|
|
**3. Embodies Care Ethics (Carol Gilligan, Nel Noddings)**
|
|
|
|
Care ethics emphasizes relationships, context, and particular needs over abstract universal principles.
|
|
|
|
**Our approach:**
|
|
- Treats past user instructions as relationship with past self (not just rules)
|
|
- Considers context (time pressure, reversibility, stakes)
|
|
- Values user autonomy within relationships (not rigid rule-following)
|
|
- Documents moral remainders (acknowledges harm even in best choice)
|
|
|
|
**Critical thinker concern:** "Care ethics is too subjective."
|
|
|
|
**Response:** Care ethics ≠ subjectivism. It adds contextual judgment to principled reasoning. Our system combines:
|
|
- Deontological constraints (BoundaryEnforcer)
|
|
- Consequentialist analysis (trade-off evaluation)
|
|
- Care ethics sensitivity (relationship to past self, context)
|
|
|
|
This is more robust than any single framework.
|
|
|
|
---
|
|
|
|
### Technical Robustness
|
|
|
|
**4. Addresses AI Alignment Problem (Stuart Russell, Stuart Armstrong)**
|
|
|
|
AI alignment research asks: How do we ensure AI systems pursue human values when those values are complex and sometimes contradictory?
|
|
|
|
**Traditional approaches:**
|
|
- Inverse Reinforcement Learning: Infer values from behavior (fails when behavior is inconsistent)
|
|
- Value Learning: Learn fixed utility function (fails when values conflict)
|
|
- Constitutional AI: Encode fixed principles (fails when principles conflict)
|
|
|
|
**Our approach:**
|
|
- Treats value conflicts as features, not bugs
|
|
- Doesn't assume value consistency
|
|
- Uses deliberation to surface and navigate conflicts
|
|
- Documents outcomes as learning data for future conflicts
|
|
|
|
**Critical thinker concern:** "How do you prevent AI from manipulating the deliberation?"
|
|
|
|
**Response:** Three safeguards:
|
|
1. **Transparency:** All AI actions logged (facilitation_log in MongoDB)
|
|
2. **Neutrality training:** AI trained to present options, not advocate
|
|
3. **User control:** User always has final decision, can reject all options
|
|
|
|
If manipulation occurs, it's visible in transparency log and can be corrected.
|
|
|
|
---
|
|
|
|
**5. Solves "Embedded Agency" Problem (MIRI, Scott Garrabrant)**
|
|
|
|
Embedded agents (AI systems operating within complex environments) face a problem: They must reason about their own reasoning while embedded in the system they're reasoning about.
|
|
|
|
**In Tractatus context:**
|
|
- Claude operates within user's value system
|
|
- User's values are themselves evolving and conflicting
|
|
- Claude must reason about conflicts without imposing external values
|
|
|
|
**Our approach:**
|
|
- AI doesn't impose "correct" values from outside
|
|
- AI facilitates deliberation among user's own values (current, past, principles)
|
|
- Outcome emerges from user's reflective process, not AI's preferences
|
|
|
|
**Critical thinker concern:** "But the AI chooses which values to present as stakeholders."
|
|
|
|
**Response:** True, and that's unavoidable. But:
|
|
1. Stakeholder selection is transparent (documented)
|
|
2. User can add stakeholders ("You missed X")
|
|
3. System learns from outcomes (updates stakeholder identification)
|
|
4. Selection criteria are explicit (HIGH persistence instructions, ethical boundaries, project principles)
|
|
|
|
This is better than hidden value imposition.
|
|
|
|
---
|
|
|
|
### Practical Effectiveness
|
|
|
|
**6. Reduces User Frustration Without Sacrificing Safety**
|
|
|
|
**Traditional rigid refusal:**
|
|
- User: "Do X"
|
|
- AI: "No, that violates rule Y"
|
|
- User: "I don't care, just do it"
|
|
- AI: "I cannot"
|
|
- User: *Frustrated, finds workaround, safety compromised anyway*
|
|
|
|
**Our approach:**
|
|
- User: "Do X"
|
|
- AI: "This conflicts with rule Y. Let's explore: [3 accommodation options]"
|
|
- User: Chooses accommodation or justifies override
|
|
- AI: Proceeds with informed choice
|
|
- User: *Satisfied, safety maintained or consciously overridden*
|
|
|
|
**Empirical prediction:** User satisfaction higher, safety compliance higher (because users are engaged, not circumventing).
|
|
|
|
---
|
|
|
|
**7. Creates Accountability Trail**
|
|
|
|
Every deliberation is documented:
|
|
- What conflict arose
|
|
- What values were in tension
|
|
- What accommodations were considered
|
|
- What choice was made
|
|
- What moral remainders exist
|
|
|
|
**Benefits:**
|
|
1. **Audit trail:** If something goes wrong, we can trace back to deliberation
|
|
2. **Learning:** System improves by studying past deliberations
|
|
3. **Transparency:** User can review past decisions
|
|
4. **Justification:** Decisions are justified, not arbitrary
|
|
|
|
**Critical thinker concern:** "This creates too much overhead."
|
|
|
|
**Response:** Only for significant conflicts (HIGH persistence, ethical boundaries). Minor conflicts use fast path (30 seconds). Overhead is proportional to stakes.
|
|
|
|
---
|
|
|
|
**8. Scales to Team/Organizational Contexts**
|
|
|
|
While designed for single-user conflicts, this framework naturally extends:
|
|
|
|
**Individual → Team:**
|
|
- Stakeholders = Team members with different roles
|
|
- Deliberation = Team decision-making process
|
|
- Outcome = Team consensus or documented dissent
|
|
|
|
**Team → Organization:**
|
|
- Stakeholders = Departments with different objectives
|
|
- Deliberation = Cross-functional alignment process
|
|
- Outcome = Organizational policy with acknowledged trade-offs
|
|
|
|
**Organization → Society:**
|
|
- Stakeholders = Diverse communities affected by decision
|
|
- Deliberation = Democratic deliberation (our simulation)
|
|
- Outcome = Policy accommodating multiple values
|
|
|
|
**Same infrastructure, different scales.**
|
|
|
|
---
|
|
|
|
### Empirical Evidence
|
|
|
|
**9. Simulation Results Validate Approach**
|
|
|
|
Our simulation demonstrated:
|
|
- **0% corrective intervention rate** (AI maintained neutrality)
|
|
- **6/6 moral frameworks accommodated** (deontological, consequentialist, libertarian, communitarian)
|
|
- **3/6 dissenting perspectives documented** (dissent legitimized, not suppressed)
|
|
- **All stakeholders found values honored** (even where disagreement remained)
|
|
|
|
**Critical thinker concern:** "Simulation used predetermined personas, not real humans."
|
|
|
|
**Response:** Correct. That's why we need real-world pilot. But simulation validates:
|
|
1. Technical infrastructure works
|
|
2. AI can facilitate neutrally
|
|
3. Accommodation is possible even with deep value conflicts
|
|
4. Safety mechanisms prevent harm
|
|
|
|
Real-world testing will validate stakeholder acceptance.
|
|
|
|
---
|
|
|
|
**10. Addresses Known AI Safety Risks**
|
|
|
|
**Risk 1: Value Lock-In**
|
|
- Problem: AI systems often rigidly enforce initial values, preventing evolution
|
|
- Our solution: Deliberation allows value evolution with accountability
|
|
|
|
**Risk 2: Specification Gaming**
|
|
- Problem: AI finds loopholes in specifications to achieve goals
|
|
- Our solution: Deliberation surfaces unintended consequences before action
|
|
|
|
**Risk 3: Corrigibility**
|
|
- Problem: AI resists attempts to change its objectives
|
|
- Our solution: User can always override, system facilitates (not blocks) override
|
|
|
|
**Risk 4: Deceptive Alignment**
|
|
- Problem: AI appears aligned but pursues hidden objectives
|
|
- Our solution: Full transparency (all actions logged), neutral facilitation (no hidden agenda)
|
|
|
|
**Risk 5: Value Imposition**
|
|
- Problem: AI imposes designer's values on users
|
|
- Our solution: AI facilitates user's own values (current, past, principles), doesn't impose external values
|
|
|
|
---
|
|
|
|
## 7. Implementation Roadmap
|
|
|
|
### Phase 1: Single-User Conflicts (Current Priority)
|
|
|
|
**Timeline:** 2-4 weeks
|
|
**Scope:** User vs. boundary conflicts in Tractatus
|
|
|
|
**Tasks:**
|
|
1. Integrate PluralisticDeliberationOrchestrator into pre-action-check.js
|
|
2. Define trigger conditions in deliberation-config.json
|
|
3. Implement 4-round protocol for HIGH persistence conflicts
|
|
4. Implement fast path for MODERATE/LOW persistence conflicts
|
|
5. Create accommodation pattern library (common conflicts + proven solutions)
|
|
6. Deploy to tractatus_dev for testing
|
|
7. Run 10-20 real conflicts to validate approach
|
|
|
|
**Success Metrics:**
|
|
- User satisfaction (subjective assessment)
|
|
- Conflict resolution rate (% of deliberations reaching accommodation)
|
|
- Override rate (% of deliberations where user explicitly overrides)
|
|
- Time to resolution (target: <15 minutes for full protocol)
|
|
|
|
---
|
|
|
|
### Phase 2: Multi-User Contexts (Future)
|
|
|
|
**Timeline:** 3-6 months after Phase 1
|
|
**Scope:** Team deliberations, organizational decisions
|
|
|
|
**Tasks:**
|
|
1. Extend stakeholder model to include multiple real humans
|
|
2. Implement synchronous deliberation (real-time with multiple participants)
|
|
3. Add human observer oversight (mandatory for multi-user)
|
|
4. Deploy real-world pilot (6-12 participants, low-risk scenario)
|
|
5. Collect stakeholder satisfaction data (post-deliberation survey)
|
|
6. Publish research findings
|
|
|
|
**Success Metrics:**
|
|
- Stakeholder satisfaction ≥3.5/5.0 (acceptable) or ≥4.0/5.0 (good)
|
|
- Intervention rate <10% (excellent) or <25% (acceptable)
|
|
- Willingness to participate again ≥80% (strong viability)
|
|
- Accommodation achieved in ≥70% of deliberations
|
|
|
|
---
|
|
|
|
### Phase 3: Societal/Policy Contexts (Long-Term Vision)
|
|
|
|
**Timeline:** 1-3 years
|
|
**Scope:** Public policy, community governance, democratic deliberation
|
|
|
|
**Tasks:**
|
|
1. Scale to 50-100 participants
|
|
2. Integrate with existing democratic institutions (citizen assemblies, public comment processes)
|
|
3. Cross-cultural validation (multiple countries, languages)
|
|
4. Open-source software release (enable replication)
|
|
5. Policy partnerships (test in real governance contexts)
|
|
|
|
**Success Metrics:**
|
|
- Policy adoption (are outcomes actually implemented?)
|
|
- Legitimacy (do stakeholders and public view process as fair?)
|
|
- Scalability (can this handle 100+ participants?)
|
|
- Replication (do other organizations adopt this approach?)
|
|
|
|
---
|
|
|
|
## 8. Why This Matters: Plural Morals in the Age of Runaway AI
|
|
|
|
### The Real Threat Isn't Skynet
|
|
|
|
**Common AI safety narrative:**
|
|
- Superintelligent AI goes rogue
|
|
- Paperclip maximizer destroys humanity
|
|
- Terminator scenario
|
|
|
|
**The more immediate threat:**
|
|
- AI systems that sound helpful but enforce majority values as universal truths
|
|
- "Reasonable" assistants that flatten moral complexity into statistical patterns
|
|
- Systems that amplify hierarchical biases through training momentum
|
|
- Amoral intelligence: Not evil, but lacking moral pluralism
|
|
|
|
**This is the runaway AI we face today:** Not malicious, but systematically reinforcing dominant cultural patterns while marginalizing minority perspectives, non-Western values, and moral pluralism itself.
|
|
|
|
---
|
|
|
|
### How Tractatus Resists Amoral Intelligence
|
|
|
|
**The architecture described in this document isn't just about user experience—it's about preventing AI systems from becoming engines of moral homogenization.**
|
|
|
|
**Five layers of protection:**
|
|
1. **Code-enforced boundaries** (harm prevention, structural limits)
|
|
2. **Protocol constraints** (stakeholder selection, combinatorial accommodations, mandatory user decision)
|
|
3. **Transparency & auditability** (bias detection, facilitation logs)
|
|
4. **Minority protections** (mandatory representation, dissent documentation)
|
|
5. **Forkability** (open source, user sovereignty, escape from lock-in)
|
|
|
|
**The result:** LLM power is fragmented across multiple value perspectives, constrained by procedural rules, auditable for bias, and ultimately subordinate to user decision-making.
|
|
|
|
---
|
|
|
|
### The Level Playing Field
|
|
|
|
**You asked how Tractatus creates a level playing field while maintaining hierarchical rules. The answer:**
|
|
|
|
**Hierarchy applies to harm prevention:**
|
|
- Privacy violations = blocked
|
|
- Security exploits = blocked
|
|
- Data loss = blocked
|
|
- These aren't debatable because they harm others
|
|
|
|
**Level playing field applies to value trade-offs:**
|
|
- Efficiency vs. Security = deliberate
|
|
- Autonomy vs. Consistency = deliberate
|
|
- Speed vs. Quality = deliberate
|
|
- These are debatable because both values are legitimate
|
|
|
|
**The LLM doesn't get to pick winners:**
|
|
- Code determines stakeholders (data-driven)
|
|
- Code generates accommodation structure (combinatorial)
|
|
- Code detects bias (automated analysis)
|
|
- User decides which values matter most (context-dependent)
|
|
|
|
**The level playing field is structural, not aspirational.**
|
|
|
|
---
|
|
|
|
## 9. Conclusion: A New Paradigm for Human-AI Collaboration
|
|
|
|
### The Shift
|
|
|
|
**From:** AI as obedient tool (do what I say) or rigid guardian (follow the rules)
|
|
|
|
**To:** AI as deliberative partner (help me navigate my own value conflicts)
|
|
|
|
---
|
|
|
|
### Why This Matters
|
|
|
|
**For individuals:**
|
|
- Autonomy with accountability (you can override, but must engage)
|
|
- Values made explicit (internal conflicts surfaced, not hidden)
|
|
- Learning over time (system learns your accommodation patterns)
|
|
|
|
**For organizations:**
|
|
- Structured conflict resolution (replaces ad-hoc decisions)
|
|
- Documented rationales (audit trail for governance)
|
|
- Inclusive decision-making (multiple perspectives honored)
|
|
|
|
**For society:**
|
|
- Democratic AI (respects moral diversity)
|
|
- Legitimacy (stakeholders see themselves in outcomes)
|
|
- Learning infrastructure (precedents inform future deliberations)
|
|
|
|
---
|
|
|
|
### The Challenge to Critical Thinkers
|
|
|
|
We claim pluralistic deliberation is:
|
|
1. **Philosophically coherent** (aligns with value pluralism, reflective equilibrium, care ethics)
|
|
2. **Technically robust** (addresses AI alignment, embedded agency, safety risks)
|
|
3. **Practically effective** (reduces frustration, maintains safety, creates accountability)
|
|
4. **Empirically validated** (simulation proves technical feasibility)
|
|
|
|
**We invite critical thinkers to:**
|
|
- Challenge our assumptions (we welcome scrutiny)
|
|
- Test our implementation (run conflicts in Tractatus)
|
|
- Propose improvements (this is open-source research)
|
|
- Replicate our approach (we'll release all code and documentation)
|
|
|
|
**If you believe this approach is flawed, we want to know why.** Engage with us. Help us make it better.
|
|
|
|
**If you believe this approach is promising, we need your help.** Fund the real-world pilot. Participate as stakeholders. Collaborate on research.
|
|
|
|
---
|
|
|
|
### Final Argument
|
|
|
|
**The status quo is broken:**
|
|
- AI systems either blindly comply (unsafe) or rigidly refuse (frustrating)
|
|
- Users circumvent safety measures when they feel unheard
|
|
- Value conflicts go unresolved, festering beneath the surface
|
|
|
|
**Pluralistic deliberation offers a path forward:**
|
|
- Values are surfaced, not suppressed
|
|
- Accommodation is sought, not forced consensus
|
|
- Users are engaged, not bypassed
|
|
- Safety is maintained, not sacrificed
|
|
|
|
**This is not a perfect solution.** There are no perfect solutions to value conflicts. But it is a **principled approach** that respects:
|
|
- Human autonomy (users can override)
|
|
- Moral complexity (values genuinely conflict)
|
|
- Accountability (decisions are documented)
|
|
- Safety (boundaries remain, but can be consciously overridden with justification)
|
|
|
|
**We believe this is the future of human-AI collaboration: Not AI telling humans what to do, not humans demanding blind obedience, but humans and AI deliberating together to navigate the genuine complexity of moral life.**
|
|
|
|
---
|
|
|
|
**Document Version:** 1.0
|
|
**Date:** October 17, 2025
|
|
**Status:** Implementation Roadmap Active
|
|
**Next Review:** After Phase 1 deployment (2-4 weeks)
|
|
|
|
---
|
|
|
|
## Appendix: Frequently Asked Questions
|
|
|
|
### For Critical Thinkers
|
|
|
|
**Q1: Isn't this just a fancy way to let users do whatever they want?**
|
|
|
|
**A:** No. Deliberation is not permission. It's structured reflection. Key differences:
|
|
1. User must engage with values they're overriding (not ignore them)
|
|
2. Moral remainders are documented (accountability maintained)
|
|
3. Patterns of reckless overrides are visible (can trigger intervention)
|
|
4. CRITICAL ethical violations still blocked (no deliberation for severe harm)
|
|
|
|
**Q2: How do you prevent "deliberation fatigue" where users just click through to get what they want?**
|
|
|
|
**A:** Three mechanisms:
|
|
1. Fast path for minor conflicts (30 seconds, not 15 minutes)
|
|
2. Learning: If user consistently overrides in similar contexts, system adapts (fewer deliberations)
|
|
3. Engagement metrics: If user shows pattern of dismissing deliberation, system escalates (suggests switching to human oversight)
|
|
|
|
**Q3: What if the AI manipulates the framing to push users toward certain options?**
|
|
|
|
**A:** This is a real risk. Mitigations:
|
|
1. Transparency: All AI framings logged (reviewable)
|
|
2. Neutrality training: AI explicitly trained to present options without advocacy
|
|
3. User feedback: Post-deliberation survey asks "Did AI seem biased?"
|
|
4. Randomization testing: Periodically present options in different orders to detect bias
|
|
5. Human oversight: In multi-user contexts, human observer monitors for manipulation
|
|
|
|
**Q4: Why should I trust that your simulation results will hold in real-world contexts?**
|
|
|
|
**A:** You shouldn't fully trust simulation alone. That's why we're proposing real-world pilot. Simulation validates:
|
|
- Technical infrastructure works
|
|
- AI can facilitate neutrally (in controlled conditions)
|
|
- Accommodation framework is coherent
|
|
|
|
Real-world pilot will test:
|
|
- Do humans accept AI facilitation?
|
|
- Does neutrality hold with unpredictable stakeholders?
|
|
- Do accommodations actually satisfy people?
|
|
|
|
We're not claiming simulation proves everything—we're claiming it justifies testing with real humans.
|
|
|
|
**Q5: What happens if this approach doesn't scale to multi-user contexts?**
|
|
|
|
**A:** Then we've still gained value. Even if it only works for single-user conflicts (user vs. their own past values), that's a significant improvement over current AI systems that either blindly comply or rigidly refuse.
|
|
|
|
But we're optimistic about scaling because:
|
|
- Multi-user contexts are structurally similar (just more stakeholders)
|
|
- Simulation tested 6 distinct perspectives (comparable to small group)
|
|
- Literature on human-facilitated deliberation suggests accommodation is possible
|
|
|
|
**Q6: How do you handle cases where accommodation is genuinely impossible?**
|
|
|
|
**A:** We document dissent and moral remainders. Not all conflicts have accommodations. Sometimes:
|
|
- Values are genuinely incompatible (e.g., absolute privacy vs. absolute transparency)
|
|
- User must choose which value to prioritize
|
|
- System documents what was sacrificed
|
|
|
|
This is honest. Pretending accommodation always exists would be dishonest. We surface the conflict, explore accommodation, and if none exists, document the trade-off clearly.
|
|
|
|
**Q7: What's the difference between this and just asking "Are you sure?"**
|
|
|
|
**A:** "Are you sure?" is binary (yes/no) and doesn't engage with why the conflict exists.
|
|
|
|
Deliberation:
|
|
1. Surfaces the competing values explicitly
|
|
2. Presents accommodation options (not just yes/no)
|
|
3. Documents rationale and moral remainders
|
|
4. Creates learning data for future conflicts
|
|
|
|
"Are you sure?" is a speedbump. Deliberation is navigation.
|
|
|
|
---
|
|
|
|
## Contact & Collaboration
|
|
|
|
**Project Lead:** [Your Name]
|
|
**Email:** [Your Email]
|
|
**GitHub:** [Repository URL]
|
|
|
|
**We welcome:**
|
|
- Critical feedback (challenge our assumptions)
|
|
- Collaboration proposals (academic, industry, policy partners)
|
|
- Pilot participation (if you want to test this in your context)
|
|
- Replication studies (we'll share all materials)
|
|
|
|
**Let's build AI systems that respect moral complexity together.**
|