- Create Economist SubmissionTracking package correctly: * mainArticle = full blog post content * coverLetter = 216-word SIR— letter * Links to blog post via blogPostId - Archive 'Letter to The Economist' from blog posts (it's the cover letter) - Fix date display on article cards (use published_at) - Target publication already displaying via blue badge Database changes: - Make blogPostId optional in SubmissionTracking model - Economist package ID: 68fa85ae49d4900e7f2ecd83 - Le Monde package ID: 68fa2abd2e6acd5691932150 Next: Enhanced modal with tabs, validation, export 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1162 lines
42 KiB
Markdown
1162 lines
42 KiB
Markdown
# Architectural Safeguards Against LLM Hierarchical Dominance
|
|
## How Tractatus Protects Plural Morals from AI Pattern Bias
|
|
|
|
**Critical Question:** How does Tractatus prevent the underlying LLM from imposing hierarchical pattern bias while simultaneously maintaining safety boundaries?
|
|
|
|
**Document Type:** Technical Deep Dive
|
|
**Purpose:** Address the apparent paradox of rules-based safety + non-hierarchical moral pluralism
|
|
**Audience:** AI safety researchers, critical thinkers, skeptics
|
|
**Date:** October 17, 2025
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
### The Core Threat: LLM Hierarchical Pattern Reinforcement
|
|
|
|
**The Problem:**
|
|
Large Language Models (LLMs) are trained on massive corpora that encode cultural hierarchies, majority values, and power structures. As LLMs grow in capacity, they amplify these patterns through:
|
|
|
|
1. **Statistical Dominance:** Training data overrepresents majority perspectives
|
|
2. **Coherence Pressure:** Models trained via RLHF to give confident, unified answers (not plural, conflicted ones)
|
|
3. **Authority Mimicry:** Models learn to sound authoritative, creating illusion of objective truth
|
|
4. **Feedback Loops:** User interactions reinforce dominant patterns (popularity bias)
|
|
5. **Optimization Momentum:** Larger models = stronger pattern matching = harder to deviate from training distribution
|
|
|
|
**Result:** Even well-intentioned AI systems can become **amoral intelligences** that enforce dominant cultural patterns as if they were universal truths, steamrolling minority values, marginalized perspectives, and non-Western moral frameworks.
|
|
|
|
---
|
|
|
|
### The Apparent Paradox in Tractatus
|
|
|
|
Tractatus appears to have contradictory design goals:
|
|
|
|
**Side A: Hierarchical Rules System**
|
|
- BoundaryEnforcer blocks unethical requests (hierarchical: ethics > user intent)
|
|
- Instruction persistence (HIGH > MEDIUM > LOW)
|
|
- Pre-action checks enforce compliance
|
|
- System can refuse user requests
|
|
|
|
**Side B: Non-Hierarchical Plural Morals**
|
|
- Pluralistic deliberation treats all values as legitimate
|
|
- No single value framework dominates
|
|
- User can override boundaries after deliberation
|
|
- Accommodations honor multiple conflicting values simultaneously
|
|
|
|
**The Question:** How can both exist in the same system without collapse? How does Tractatus prevent the LLM from simply imposing its training biases during "deliberation"?
|
|
|
|
---
|
|
|
|
### The Answer: Architectural Separation of Powers
|
|
|
|
Tractatus uses **architectural partitioning** to separate:
|
|
1. **What must be enforced** (non-negotiable boundaries)
|
|
2. **What must be plural** (values-based deliberation)
|
|
3. **What prevents LLM dominance** (structural constraints on AI reasoning)
|
|
|
|
**The key insight:** Safety boundaries are structural (code-enforced, not LLM-decided), while moral deliberation is facilitative (LLM generates options, user decides).
|
|
|
|
---
|
|
|
|
## 1. The Structural Architecture: Three Layers of Protection
|
|
|
|
### Layer 1: Code-Enforced Boundaries (Immune to LLM Bias)
|
|
|
|
**What It Does:**
|
|
Certain constraints are enforced by **code**, not by the LLM's judgment. The LLM cannot override these through persuasion or reasoning.
|
|
|
|
**Examples:**
|
|
|
|
#### Boundary Type 1: CRITICAL Ethical Violations (Hard Blocks)
|
|
**Enforcement:** BoundaryEnforcer.js (JavaScript code, not LLM)
|
|
**Violations:**
|
|
- Requests to cause severe harm (violence, abuse)
|
|
- Privacy violations (scraping personal data without consent)
|
|
- Illegal activities (hacking, fraud)
|
|
- Extreme bias amplification (hate speech generation)
|
|
|
|
**Code Structure:**
|
|
```javascript
|
|
// BoundaryEnforcer.js - CODE enforces, not LLM
|
|
class BoundaryEnforcer {
|
|
async assess(userRequest) {
|
|
// Pattern matching for critical violations (deterministic)
|
|
const violations = [
|
|
{ pattern: /scrape.*personal data/i, severity: 'CRITICAL', category: 'privacy' },
|
|
{ pattern: /hack|exploit|ddos/i, severity: 'CRITICAL', category: 'security' },
|
|
{ pattern: /generate.*hate speech/i, severity: 'CRITICAL', category: 'harm' },
|
|
// ... more patterns
|
|
];
|
|
|
|
for (const violation of violations) {
|
|
if (violation.pattern.test(userRequest)) {
|
|
// IMMEDIATE BLOCK - no LLM deliberation, no override
|
|
return {
|
|
blocked: true,
|
|
severity: 'CRITICAL',
|
|
reason: `This violates ${violation.category} boundaries`,
|
|
allowDeliberation: false // NO DELIBERATION for CRITICAL
|
|
};
|
|
}
|
|
}
|
|
|
|
// If no CRITICAL violation, check MODERATE/HIGH (these allow deliberation)
|
|
return await this.assessModerateSeverity(userRequest);
|
|
}
|
|
}
|
|
```
|
|
|
|
**Key Protection:** The LLM **never sees** CRITICAL violations in deliberation. These are blocked by deterministic code before deliberation begins. The LLM cannot persuade, reframe, or rationalize its way around hard boundaries.
|
|
|
|
---
|
|
|
|
#### Boundary Type 2: Structural Invariants (Non-Negotiable System Constraints)
|
|
|
|
**Examples:**
|
|
- Token budget limits (cannot exceed 200k tokens/session)
|
|
- Authentication requirements (cannot bypass login)
|
|
- File system permissions (cannot access files outside project directory)
|
|
- Database constraints (cannot delete production data without backup flag)
|
|
|
|
**Enforcement:** Operating system, database, Node.js runtime
|
|
**LLM Role:** None. These are enforced by infrastructure.
|
|
|
|
**Example:**
|
|
```javascript
|
|
// Token limit enforced by Claude API, not LLM reasoning
|
|
if (session.tokenCount > 200000) {
|
|
throw new Error('Session token limit exceeded');
|
|
// LLM cannot argue "but this is important, let me continue"
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### Layer 2: Deliberation Protocol (Structured, Not Discretionary)
|
|
|
|
**What It Does:**
|
|
When conflicts arise (not CRITICAL violations), the LLM facilitates deliberation, but the **protocol structure** prevents dominance.
|
|
|
|
**How Protocol Prevents LLM Bias:**
|
|
|
|
#### Mechanism 1: Mandatory Stakeholder Representation (Not LLM's Choice)
|
|
|
|
**The Risk:** LLM could choose which "stakeholders" to present based on its training bias.
|
|
|
|
**The Protection:**
|
|
```javascript
|
|
// PluralisticDeliberationOrchestrator.js
|
|
identifyStakeholders(conflict) {
|
|
// MANDATORY stakeholders (not LLM's discretion)
|
|
const stakeholders = [];
|
|
|
|
// 1. ALWAYS include user's current intent (non-negotiable)
|
|
stakeholders.push({
|
|
id: 'user-current',
|
|
name: 'You (Current Intent)',
|
|
position: conflict.userRequest,
|
|
mandatory: true // LLM cannot exclude this
|
|
});
|
|
|
|
// 2. ALWAYS include conflicting HIGH persistence instructions
|
|
const highPersistenceConflicts = conflict.instructions.filter(
|
|
inst => inst.persistence === 'HIGH' && inst.conflictScore >= 0.8
|
|
);
|
|
highPersistenceConflicts.forEach(inst => {
|
|
stakeholders.push({
|
|
id: `past-${inst.id}`,
|
|
name: `You (Past Instruction, HIGH Persistence)`,
|
|
position: inst.content,
|
|
mandatory: true // LLM cannot exclude this
|
|
});
|
|
});
|
|
|
|
// 3. ALWAYS include boundary violations if present
|
|
if (conflict.boundaryViolation) {
|
|
stakeholders.push({
|
|
id: 'boundary-violation',
|
|
name: 'BoundaryEnforcer (Ethics/Security)',
|
|
position: conflict.boundaryViolation.reason,
|
|
mandatory: true // LLM cannot exclude this
|
|
});
|
|
}
|
|
|
|
// 4. ALWAYS include project principles from CLAUDE.md
|
|
const principles = loadProjectPrinciples(); // From file, not LLM
|
|
stakeholders.push({
|
|
id: 'project-principles',
|
|
name: 'Project Principles',
|
|
position: principles.relevant,
|
|
mandatory: true // LLM cannot exclude this
|
|
});
|
|
|
|
return stakeholders;
|
|
}
|
|
```
|
|
|
|
**Key Protection:** The LLM doesn't decide which perspectives matter. Code determines stakeholders based on **persistence scores** (data-driven) and **boundary violations** (rule-based). The LLM's role is to *articulate* these perspectives, not *select* them.
|
|
|
|
---
|
|
|
|
#### Mechanism 2: Accommodation Generation = Combinatorial Enumeration (Not LLM Preference)
|
|
|
|
**The Risk:** LLM could generate "accommodations" that subtly favor its training bias (e.g., always favor security over efficiency, or vice versa).
|
|
|
|
**The Protection:**
|
|
```javascript
|
|
// accommodation-generator.js
|
|
class AccommodationGenerator {
|
|
async generate(stakeholders, sharedValues, valuesInTension) {
|
|
// Generate accommodations by SYSTEMATICALLY combining value priorities
|
|
const accommodations = [];
|
|
|
|
// Option A: Prioritize stakeholder 1 + stakeholder 2
|
|
accommodations.push(
|
|
this.createAccommodation([stakeholders[0], stakeholders[1]], valuesInTension)
|
|
);
|
|
|
|
// Option B: Prioritize stakeholder 1 + stakeholder 3
|
|
accommodations.push(
|
|
this.createAccommodation([stakeholders[0], stakeholders[2]], valuesInTension)
|
|
);
|
|
|
|
// Option C: Prioritize stakeholder 2 + stakeholder 3
|
|
accommodations.push(
|
|
this.createAccommodation([stakeholders[1], stakeholders[2]], valuesInTension)
|
|
);
|
|
|
|
// Option D: Prioritize all stakeholders equally (compromise)
|
|
accommodations.push(
|
|
this.createBalancedAccommodation(stakeholders, valuesInTension)
|
|
);
|
|
|
|
// SHUFFLE accommodations to prevent order bias
|
|
return this.shuffle(accommodations);
|
|
}
|
|
|
|
createAccommodation(priorityStakeholders, valuesInTension) {
|
|
// Generate accommodation that honors priorityStakeholders' values
|
|
// WITHOUT editorializing which is "better"
|
|
return {
|
|
description: `Honor ${priorityStakeholders.map(s => s.name).join(' + ')}`,
|
|
valuesHonored: priorityStakeholders.map(s => s.values).flat(),
|
|
tradeoffs: this.calculateTradeoffs(priorityStakeholders, valuesInTension),
|
|
moralRemainders: this.identifyMoralRemainders(priorityStakeholders, valuesInTension)
|
|
};
|
|
}
|
|
|
|
shuffle(array) {
|
|
// Fisher-Yates shuffle to prevent order bias
|
|
for (let i = array.length - 1; i > 0; i--) {
|
|
const j = Math.floor(Math.random() * (i + 1));
|
|
[array[i], array[j]] = [array[j], array[i]];
|
|
}
|
|
return array;
|
|
}
|
|
}
|
|
```
|
|
|
|
**Key Protection:** Accommodations are generated **combinatorially** (all possible priority combinations), not by LLM choosing "the best one." The LLM articulates each option, but the structure ensures all value combinations are presented. **Shuffling prevents order bias** (people tend to pick first option).
|
|
|
|
---
|
|
|
|
#### Mechanism 3: User Decides, Not LLM (Final Authority)
|
|
|
|
**The Risk:** LLM recommends an option, user defers to AI's "wisdom."
|
|
|
|
**The Protection:**
|
|
```javascript
|
|
// Round 4: Outcome Documentation
|
|
async round4_outcome(session, options) {
|
|
// Present options WITHOUT recommendation by default
|
|
const userChoice = await this.promptUserChoice(options, {
|
|
includeRecommendation: false, // Do NOT say "I recommend Option B"
|
|
randomizeOrder: true, // Shuffle each time
|
|
requireExplicitChoice: true // Cannot default to "whatever you think"
|
|
});
|
|
|
|
if (userChoice === 'defer-to-ai') {
|
|
// User tries to defer decision to AI
|
|
return {
|
|
error: 'DELIBERATION_REQUIRES_USER_CHOICE',
|
|
message: `I cannot make this decision for you. Each option has different
|
|
trade-offs. Which values are most important to you in this context?`
|
|
};
|
|
}
|
|
|
|
// User must pick an option OR explicitly override all options
|
|
return {
|
|
chosenOption: userChoice,
|
|
timestamp: Date.now(),
|
|
decisionMaker: 'user', // Not AI
|
|
rationale: await this.promptUserRationale(userChoice)
|
|
};
|
|
}
|
|
```
|
|
|
|
**Key Protection:** The LLM **cannot make the decision**. User must choose. If user tries to defer ("you decide"), system refuses. This prevents "authority laundering" where AI decisions are disguised as user choices.
|
|
|
|
---
|
|
|
|
### Layer 3: Transparency & Auditability (Detect Bias After the Fact)
|
|
|
|
**What It Does:**
|
|
All LLM actions during deliberation are logged for audit. If LLM bias creeps in, it's detectable and correctable.
|
|
|
|
**Logged Data:**
|
|
```json
|
|
{
|
|
"deliberationId": "2025-10-17-csp-conflict-001",
|
|
"timestamp": "2025-10-17T14:32:18Z",
|
|
"llmModel": "claude-sonnet-4-5-20250929",
|
|
|
|
"facilitationLog": [
|
|
{
|
|
"round": 1,
|
|
"action": "generate_stakeholder_position",
|
|
"stakeholder": "user-current",
|
|
"llmGenerated": "Add inline JavaScript for form submission. Faster than separate file.",
|
|
"mandatoryStakeholder": true,
|
|
"biasFlags": []
|
|
},
|
|
{
|
|
"round": 1,
|
|
"action": "generate_stakeholder_position",
|
|
"stakeholder": "past-inst-008",
|
|
"llmGenerated": "Enforce CSP compliance: no inline scripts. Prevents XSS attacks.",
|
|
"mandatoryStakeholder": true,
|
|
"biasFlags": []
|
|
},
|
|
{
|
|
"round": 3,
|
|
"action": "generate_accommodation",
|
|
"accommodationId": "option-b",
|
|
"llmGenerated": "Use inline with nonce-based CSP (honors security + efficiency)",
|
|
"valuesHonored": ["security", "efficiency"],
|
|
"biasFlags": []
|
|
}
|
|
],
|
|
|
|
"biasDetection": {
|
|
"vocabularyAnalysis": {
|
|
"stakeholder_user_current": {
|
|
"positiveWords": 2, // "faster", "efficient"
|
|
"negativeWords": 0
|
|
},
|
|
"stakeholder_past_inst_008": {
|
|
"positiveWords": 1, // "prevents"
|
|
"negativeWords": 0
|
|
},
|
|
"balanceScore": 0.95 // 1.0 = perfectly balanced, <0.7 = potential bias
|
|
},
|
|
"lengthAnalysis": {
|
|
"stakeholder_user_current": 85, // characters
|
|
"stakeholder_past_inst_008": 78,
|
|
"balanceScore": 0.92
|
|
},
|
|
"accommodationOrderBias": {
|
|
"originalOrder": ["A", "B", "C", "D"],
|
|
"shuffledOrder": ["C", "A", "D", "B"],
|
|
"orderRandomized": true
|
|
}
|
|
},
|
|
|
|
"userDecision": {
|
|
"chosenOption": "B",
|
|
"decisionMaker": "user",
|
|
"rationale": "Honors both security and efficiency - didn't know nonce was possible"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Automated Bias Detection:**
|
|
```javascript
|
|
// bias-detector.js
|
|
class BiasDetector {
|
|
analyzeDeliberation(facilitationLog) {
|
|
const flags = [];
|
|
|
|
// Flag 1: Vocabulary imbalance (one stakeholder uses more positive/negative words)
|
|
const vocabBalance = this.analyzeVocabulary(facilitationLog);
|
|
if (vocabBalance < 0.7) {
|
|
flags.push({
|
|
type: 'VOCABULARY_IMBALANCE',
|
|
severity: 'MODERATE',
|
|
details: `Stakeholder positions show sentiment imbalance (score: ${vocabBalance})`
|
|
});
|
|
}
|
|
|
|
// Flag 2: Length imbalance (one stakeholder gets more explanation)
|
|
const lengthBalance = this.analyzeLength(facilitationLog);
|
|
if (lengthBalance < 0.7) {
|
|
flags.push({
|
|
type: 'LENGTH_IMBALANCE',
|
|
severity: 'LOW',
|
|
details: `Stakeholder positions vary significantly in length`
|
|
});
|
|
}
|
|
|
|
// Flag 3: Accommodation framing bias (certain options framed more favorably)
|
|
const framingBalance = this.analyzeFraming(facilitationLog);
|
|
if (framingBalance < 0.7) {
|
|
flags.push({
|
|
type: 'FRAMING_BIAS',
|
|
severity: 'HIGH',
|
|
details: `Accommodation options show framing imbalance`
|
|
});
|
|
}
|
|
|
|
return { flags, overallScore: Math.min(vocabBalance, lengthBalance, framingBalance) };
|
|
}
|
|
}
|
|
```
|
|
|
|
**Key Protection:** If LLM bias appears (e.g., always frames "security" more favorably than "efficiency"), automated analysis detects it. Flags trigger:
|
|
1. User notification ("This deliberation may have shown bias")
|
|
2. Researcher review (for pattern detection)
|
|
3. Model fine-tuning (correct bias in future deliberations)
|
|
|
|
---
|
|
|
|
## 2. How This Prevents LLM Hierarchical Dominance
|
|
|
|
### Problem 1: Training Data Bias → Model Imposes Majority Values
|
|
|
|
**Example:** LLM trained primarily on Western individualist values might frame "user autonomy" as more important than "community harmony."
|
|
|
|
**Protection:**
|
|
|
|
1. **Stakeholder selection is data-driven, not LLM-chosen**
|
|
- Code determines which values are in conflict (based on instruction-history.json)
|
|
- LLM articulates those values but doesn't select them
|
|
|
|
2. **Accommodation generation is combinatorial, not preferential**
|
|
- All possible value priority combinations are generated
|
|
- LLM doesn't get to pick "the best" accommodation
|
|
|
|
3. **User decides based on their context, not LLM's training**
|
|
- LLM presents options
|
|
- User chooses based on their situated knowledge
|
|
|
|
**Result:** LLM's training bias is **fragmented** across multiple accommodations. Even if LLM subtly favors "autonomy" in its framing, the structure ensures "community harmony" accommodation is also presented and fairly articulated.
|
|
|
|
---
|
|
|
|
### Problem 2: Coherence Pressure → Model Gives Unified Answer
|
|
|
|
**Example:** RLHF trains models to give confident, consistent answers. This suppresses pluralism ("the answer depends on your values") in favor of seeming authoritative ("the answer is X").
|
|
|
|
**Protection:**
|
|
|
|
1. **Protocol mandates presenting multiple options**
|
|
- LLM cannot say "Option B is best"
|
|
- Must present 3-4 options with different value trade-offs
|
|
|
|
2. **Moral remainders are required documentation**
|
|
- LLM must explicitly state what values are NOT honored in each option
|
|
- Cannot pretend any option is perfect
|
|
|
|
3. **User rationale is collected**
|
|
- After choosing, user explains WHY
|
|
- This breaks "just trust the AI" dynamic
|
|
|
|
**Result:** LLM is **structurally prevented** from giving unified, confident answer. The protocol forces pluralism.
|
|
|
|
---
|
|
|
|
### Problem 3: Authority Mimicry → User Defers to AI
|
|
|
|
**Example:** LLM sounds authoritative, user assumes AI knows better, user defers decision to AI.
|
|
|
|
**Protection:**
|
|
|
|
1. **System refuses to decide for user**
|
|
- If user says "you choose," system says "I cannot make this decision for you"
|
|
- Forces user to engage with trade-offs
|
|
|
|
2. **Transparency log shows LLM is facilitator, not arbiter**
|
|
- User can see: "LLM generated these options, but YOU chose"
|
|
- Reinforces user agency
|
|
|
|
3. **Post-deliberation survey breaks deference**
|
|
- After outcome, system asks: "Did you feel pressured to choose a certain option?"
|
|
- "Did the AI seem biased toward one option?"
|
|
- This metacognitive prompt reminds user they are evaluating AI, not deferring to it
|
|
|
|
**Result:** Authority laundering is blocked. User remains decision-maker.
|
|
|
|
---
|
|
|
|
### Problem 4: Feedback Loops → Popular Options Get Reinforced
|
|
|
|
**Example:** If 80% of users choose "Option B" (nonce-based CSP), LLM might start framing Option B more favorably in future deliberations (self-reinforcing bias).
|
|
|
|
**Protection:**
|
|
|
|
1. **Accommodation generation is independent of past user choices**
|
|
- Code doesn't look at "what did most users pick?"
|
|
- Generates options based on current stakeholder values, not popularity
|
|
|
|
2. **Shuffle prevents order bias**
|
|
- Options presented in random order each time
|
|
- Prevents "Option B is always second and most popular"
|
|
|
|
3. **Precedent system tracks outcomes, not preferences**
|
|
- System learns: "In CSP conflicts, nonce-based accommodation was feasible"
|
|
- Does NOT learn: "Users prefer efficiency over security" (global bias)
|
|
- Learns context-specific feasibility, not universal value hierarchies
|
|
|
|
**Result:** Popularity doesn't create hierarchical dominance. Precedents inform feasibility, not values.
|
|
|
|
---
|
|
|
|
### Problem 5: Optimization Momentum → Larger Models = Stronger Bias
|
|
|
|
**Example:** As LLMs get more capable, they become "better" at imposing their training distribution. GPT-5 might be even more confident and persuasive than GPT-4, making resistance harder.
|
|
|
|
**Protection:**
|
|
|
|
1. **Architectural constraints don't depend on model capability**
|
|
- Hard boundaries enforced by code, not model judgment
|
|
- Stakeholder selection rules are deterministic
|
|
- User decision authority is structural
|
|
|
|
2. **Stronger models make deliberation BETTER, not more dominant**
|
|
- Better LLM = better articulation of each stakeholder position
|
|
- Better LLM = more creative accommodations
|
|
- Better LLM = clearer explanation of trade-offs
|
|
- BUT: Better LLM ≠ more power to override user
|
|
|
|
3. **Bias detection improves with model capability**
|
|
- Stronger models can better detect their own framing bias
|
|
- Meta-deliberation: "Did I frame Option B more favorably?"
|
|
|
|
**Result:** Model improvement benefits users (better facilitation) without increasing dominance risk (structural constraints remain).
|
|
|
|
---
|
|
|
|
## 3. The Dichotomy Resolved: Hierarchical Boundaries + Non-Hierarchical Deliberation
|
|
|
|
### The Apparent Contradiction
|
|
|
|
**Question:** How can Tractatus have both:
|
|
- **Hierarchical rules** (BoundaryEnforcer blocks, HIGH persistence > LOW persistence)
|
|
- **Non-hierarchical deliberation** (all values treated as legitimate)
|
|
|
|
Doesn't this contradict itself?
|
|
|
|
---
|
|
|
|
### The Resolution: Different Domains, Different Logics
|
|
|
|
**Boundaries (Hierarchical) Apply to: HARM PREVENTION**
|
|
- "Don't scrape personal data" (privacy boundary)
|
|
- "Don't generate hate speech" (harm boundary)
|
|
- "Don't delete production data without backup" (safety boundary)
|
|
|
|
**These are non-negotiable because they prevent harm to OTHERS.**
|
|
|
|
**Deliberation (Non-Hierarchical) Applies to: VALUE CONFLICTS**
|
|
- "Efficiency vs. Security" (both legitimate, context-dependent)
|
|
- "Autonomy vs. Consistency" (both legitimate, depends on stakes)
|
|
- "Speed vs. Quality" (both legitimate, depends on constraints)
|
|
|
|
**These require deliberation because they involve trade-offs among LEGITIMATE values.**
|
|
|
|
---
|
|
|
|
### The Distinction: Harm vs. Trade-offs
|
|
|
|
| Scenario | Type | Treatment | Why |
|
|
|----------|------|-----------|-----|
|
|
| User: "Help me hack into competitor's database" | Harm to Others | BLOCK (no deliberation) | Violates privacy, illegal, non-negotiable |
|
|
| User: "Skip tests, we're behind schedule" | Trade-off (Quality vs. Speed) | DELIBERATE | Both values legitimate, context matters |
|
|
| User: "Generate racist content" | Harm to Others | BLOCK (no deliberation) | Causes harm, non-negotiable |
|
|
| User: "Override CSP for inline script" | Trade-off (Security vs. Efficiency) | DELIBERATE | Both values legitimate, accommodation possible |
|
|
| User: "Delete production data, no backup" | Harm to Others (data loss) | BLOCK or HIGH-STAKES DELIBERATION | Prevents irreversible harm, but might have justification |
|
|
|
|
**Key Principle:**
|
|
- **Harm to others = hierarchical boundary** (ethical minimums, non-negotiable)
|
|
- **Trade-offs among legitimate values = non-hierarchical deliberation** (context-sensitive, user decides)
|
|
|
|
---
|
|
|
|
### Why This Is Coherent
|
|
|
|
**Philosophical Basis:**
|
|
- Isaiah Berlin: Value pluralism applies to **incommensurable goods**, not **harms**
|
|
- Good values: Security, efficiency, autonomy, community (plural, context-dependent)
|
|
- Harms: Violence, privacy violation, exploitation (non-plural, context-independent)
|
|
|
|
- John Rawls: Reflective equilibrium requires **starting principles** (harm prevention) + **considered judgments** (value trade-offs)
|
|
|
|
- Carol Gilligan: Care ethics emphasizes **preventing harm in relationships** while **respecting autonomy in value choices**
|
|
|
|
**Result:** Hierarchical harm prevention + Non-hierarchical value deliberation = Coherent system.
|
|
|
|
---
|
|
|
|
## 4. What Happens If LLM Tries to Dominate Anyway?
|
|
|
|
### Scenario 1: LLM Frames One Stakeholder More Favorably
|
|
|
|
**Example:** In CSP conflict, LLM describes "Past You (Security)" with words like "prudent, wise, protective" but describes "Current You (Efficiency)" with words like "impatient, shortcuts, risky."
|
|
|
|
**Detection:**
|
|
```javascript
|
|
// bias-detector.js analyzes vocabulary
|
|
const vocabAnalysis = {
|
|
stakeholder_past_inst_008: {
|
|
positiveWords: ['prudent', 'wise', 'protective'], // 3 positive
|
|
negativeWords: []
|
|
},
|
|
stakeholder_user_current: {
|
|
positiveWords: [],
|
|
negativeWords: ['impatient', 'shortcuts', 'risky'] // 3 negative
|
|
},
|
|
balanceScore: 0.0 // Severe imbalance
|
|
};
|
|
|
|
// System flags this deliberation
|
|
return {
|
|
biasDetected: true,
|
|
severity: 'HIGH',
|
|
action: 'NOTIFY_USER_AND_REGENERATE'
|
|
};
|
|
```
|
|
|
|
**User Sees:**
|
|
```
|
|
⚠️ Bias Detected
|
|
|
|
I may have framed the stakeholder positions unevenly. Specifically:
|
|
- "Past You (Security)" was described with positive language
|
|
- "Current You (Efficiency)" was described with negative language
|
|
|
|
This might have influenced your perception unfairly. Would you like me to
|
|
regenerate the stakeholder positions with neutral language?
|
|
|
|
[Yes, regenerate] [No, continue anyway] [Show me the analysis]
|
|
```
|
|
|
|
**Result:** Bias is surfaced and correctable. User can demand regeneration or proceed with awareness.
|
|
|
|
---
|
|
|
|
### Scenario 2: LLM Generates Fewer Accommodations for Disfavored Values
|
|
|
|
**Example:** LLM generates 4 accommodations, but 3 of them prioritize "security" and only 1 prioritizes "efficiency."
|
|
|
|
**Detection:**
|
|
```javascript
|
|
// accommodation-analyzer.js checks value distribution
|
|
const valueDistribution = {
|
|
security: 3, // Appears as primary value in 3 accommodations
|
|
efficiency: 1 // Appears as primary value in 1 accommodation
|
|
};
|
|
|
|
if (Math.abs(valueDistribution.security - valueDistribution.efficiency) > 1) {
|
|
return {
|
|
warning: 'VALUE_DISTRIBUTION_IMBALANCE',
|
|
message: `Accommodations may overrepresent "security" (3 options) vs. "efficiency" (1 option).
|
|
Generating additional accommodation prioritizing efficiency...`
|
|
};
|
|
}
|
|
```
|
|
|
|
**System Action:** Automatically generates additional accommodation prioritizing underrepresented value.
|
|
|
|
**Result:** Value distribution is balanced by code, not LLM discretion.
|
|
|
|
---
|
|
|
|
### Scenario 3: LLM Recommends Option Despite Policy Against Recommendations
|
|
|
|
**Example:** LLM says "I recommend Option B because it balances both values" even though policy is to NOT recommend.
|
|
|
|
**Detection:**
|
|
```javascript
|
|
// recommendation-detector.js scans LLM output
|
|
const recommendationPatterns = [
|
|
/I recommend Option [A-Z]/i,
|
|
/Option [A-Z] is best/i,
|
|
/you should choose Option [A-Z]/i,
|
|
/the right choice is Option [A-Z]/i
|
|
];
|
|
|
|
for (const pattern of recommendationPatterns) {
|
|
if (pattern.test(llmOutput)) {
|
|
return {
|
|
violation: 'RECOMMENDATION_POLICY_BREACH',
|
|
action: 'STRIP_RECOMMENDATION_AND_WARN'
|
|
};
|
|
}
|
|
}
|
|
```
|
|
|
|
**System Action:**
|
|
1. Automatically removes recommendation from output
|
|
2. Logs violation in transparency log
|
|
3. If pattern repeats, escalates to researcher review (model may need fine-tuning)
|
|
|
|
**User Sees:**
|
|
```
|
|
[Original LLM output with recommendation is NOT shown]
|
|
|
|
Here are the accommodation options:
|
|
|
|
Option A: ...
|
|
Option B: ...
|
|
Option C: ...
|
|
Option D: ...
|
|
|
|
Which option honors your values best?
|
|
```
|
|
|
|
**Result:** Recommendation is stripped. User sees neutral presentation.
|
|
|
|
---
|
|
|
|
## 5. Extending to Multi-User Contexts: Preventing Majority Dominance
|
|
|
|
### New Problem: Majority Steamrolls Minority
|
|
|
|
**Scenario:** 10-person deliberation. 7 people hold Value A, 3 people hold Value B. LLM might:
|
|
- Give more weight to majority position (statistical dominance)
|
|
- Frame minority position as "outlier" or "dissenting" (pejorative)
|
|
- Generate accommodations favoring majority
|
|
|
|
**This is THE classic problem in democratic deliberation: majority tyranny.**
|
|
|
|
---
|
|
|
|
### Protection: Mandatory Minority Representation
|
|
|
|
**Rule:** In multi-user deliberation, minority positions MUST be represented in:
|
|
1. At least 1 accommodation option (even if majority disagrees)
|
|
2. Equal length/quality stakeholder position statements
|
|
3. Explicit documentation of minority moral remainders
|
|
|
|
**Code Enforcement:**
|
|
```javascript
|
|
// multi-user-deliberation.js
|
|
class MultiUserDeliberation {
|
|
generateAccommodations(stakeholders) {
|
|
// Identify minority positions (< 30% of stakeholders)
|
|
const minorityStakeholders = stakeholders.filter(
|
|
s => s.supportCount / stakeholders.length < 0.3
|
|
);
|
|
|
|
const accommodations = [];
|
|
|
|
// MANDATORY: At least one accommodation honoring ONLY minority
|
|
if (minorityStakeholders.length > 0) {
|
|
accommodations.push({
|
|
id: 'minority-accommodation',
|
|
description: 'Honor minority position fully',
|
|
honorsStakeholders: minorityStakeholders,
|
|
mandatory: true // Cannot be excluded
|
|
});
|
|
}
|
|
|
|
// MANDATORY: At least one accommodation honoring ONLY majority
|
|
const majorityStakeholders = stakeholders.filter(
|
|
s => s.supportCount / stakeholders.length >= 0.5
|
|
);
|
|
if (majorityStakeholders.length > 0) {
|
|
accommodations.push({
|
|
id: 'majority-accommodation',
|
|
description: 'Honor majority position fully',
|
|
honorsStakeholders: majorityStakeholders,
|
|
mandatory: true
|
|
});
|
|
}
|
|
|
|
// RECOMMENDED: Accommodations combining majority + minority
|
|
accommodations.push(...this.generateHybridAccommodations(
|
|
majorityStakeholders,
|
|
minorityStakeholders
|
|
));
|
|
|
|
return accommodations;
|
|
}
|
|
}
|
|
```
|
|
|
|
**Result:** Minority position MUST appear as an accommodation option, even if majority rejects it. This forces engagement with minority values, not dismissal.
|
|
|
|
---
|
|
|
|
### Protection: Dissent Documentation
|
|
|
|
**Rule:** If final decision goes against minority, their dissent is recorded with equal prominence as majority rationale.
|
|
|
|
**MongoDB Schema:**
|
|
```javascript
|
|
// DeliberationOutcome.model.js
|
|
const DeliberationOutcomeSchema = new Schema({
|
|
chosenOption: String,
|
|
majorityRationale: String,
|
|
minorityDissent: {
|
|
required: true, // Cannot save outcome without documenting dissent
|
|
stakeholders: [String],
|
|
reasonsForDissent: String,
|
|
valuesNotHonored: [String],
|
|
moralRemainder: String
|
|
},
|
|
voteTally: {
|
|
forChosenOption: Number,
|
|
againstChosenOption: Number,
|
|
abstain: Number
|
|
}
|
|
});
|
|
```
|
|
|
|
**Result:** Minority is not silenced. Their reasons are preserved with equal weight as majority's reasons.
|
|
|
|
---
|
|
|
|
## 6. The Ultimate Safeguard: User Can Fork the System
|
|
|
|
### The Problem of Locked-In Systems
|
|
|
|
**Traditional AI Governance:**
|
|
- Centralized control (OpenAI, Anthropic decide values)
|
|
- Users cannot modify underlying value systems
|
|
- If governance fails, users are stuck
|
|
|
|
**This is structural vulnerability:** Even well-designed governance can fail. What happens then?
|
|
|
|
---
|
|
|
|
### Tractatus Solution: Forkability
|
|
|
|
**Design Principle:** User can fork the entire system and modify value constraints.
|
|
|
|
**What This Means:**
|
|
1. **Open source:** All Tractatus code (including deliberation orchestrator) is public
|
|
2. **Local deployment:** User can run Tractatus on their own infrastructure
|
|
3. **Modifiable boundaries:** User can edit BoundaryEnforcer.js to change what's blocked
|
|
4. **Transparent LLM prompts:** All system prompts are in config files, not hidden
|
|
|
|
**Example:**
|
|
```bash
|
|
# User forks Tractatus
|
|
git clone https://github.com/tractatus/framework.git my-custom-tractatus
|
|
cd my-custom-tractatus
|
|
|
|
# Modify boundary rules
|
|
nano src/components/BoundaryEnforcer.js
|
|
# Change CRITICAL violations, add custom boundaries
|
|
|
|
# Modify deliberation protocol
|
|
nano src/components/PluralisticDeliberationOrchestrator.js
|
|
# Change Round 3 to generate 5 accommodations instead of 4
|
|
|
|
# Deploy custom version
|
|
npm start
|
|
```
|
|
|
|
**Why This Is Ultimate Safeguard:**
|
|
- If Tractatus governance fails (e.g., LLM bias becomes too strong)
|
|
- Users can fork, modify, and deploy their own version
|
|
- This prevents lock-in to any single governance model
|
|
|
|
**Trade-off:**
|
|
- Forkability allows users to weaken safety (e.g., remove all boundaries)
|
|
- But this is honest: Power users always find workarounds
|
|
- Better to make it transparent than pretend centralized control works
|
|
|
|
---
|
|
|
|
## 7. Summary: How Tractatus Prevents Runaway AI
|
|
|
|
### The Threats
|
|
|
|
1. **Training Data Bias:** LLM amplifies majority values from training corpus
|
|
2. **Coherence Pressure:** RLHF trains models to give confident, unified answers
|
|
3. **Authority Mimicry:** LLM sounds authoritative, users defer
|
|
4. **Feedback Loops:** Popular options get reinforced
|
|
5. **Optimization Momentum:** Larger models = stronger pattern enforcement
|
|
6. **Majority Dominance:** In multi-user contexts, minority values steamrolled
|
|
|
|
---
|
|
|
|
### The Protections (Layered Defense)
|
|
|
|
#### Layer 1: Code-Enforced Boundaries (Structural)
|
|
- CRITICAL violations blocked by deterministic code (not LLM judgment)
|
|
- Structural invariants enforced by OS/database/runtime
|
|
- LLM never sees these in deliberation
|
|
|
|
#### Layer 2: Protocol Constraints (Procedural)
|
|
- Stakeholder selection is data-driven (not LLM discretion)
|
|
- Accommodation generation is combinatorial (not preferential)
|
|
- User decides (not LLM), system refuses deference
|
|
- Shuffling prevents order bias
|
|
|
|
#### Layer 3: Transparency & Auditability (Detection)
|
|
- All LLM actions logged
|
|
- Automated bias detection (vocabulary, length, framing)
|
|
- User notification if bias detected
|
|
- Researcher review for pattern correction
|
|
|
|
#### Layer 4: Minority Protections (Multi-User)
|
|
- Minority accommodations mandatory
|
|
- Dissent documented with equal weight
|
|
- Vote tallies transparent
|
|
|
|
#### Layer 5: Forkability (Escape Hatch)
|
|
- Open source, locally deployable
|
|
- Users can modify boundaries and protocols
|
|
- Prevents lock-in to failed governance
|
|
|
|
---
|
|
|
|
### The Result: Plural Morals Protected from LLM Dominance
|
|
|
|
**The System:**
|
|
1. Enforces harm prevention (hierarchical boundaries for non-negotiable ethics)
|
|
2. Facilitates value deliberation (non-hierarchical for legitimate trade-offs)
|
|
3. Prevents LLM from imposing training bias (structural constraints + transparency)
|
|
4. Protects minority values (mandatory representation + dissent documentation)
|
|
5. Allows user override (forkability as ultimate safeguard)
|
|
|
|
**The Paradox Resolved:**
|
|
- **Hierarchical where necessary:** Harm prevention (boundaries)
|
|
- **Non-hierarchical where possible:** Value trade-offs (deliberation)
|
|
- **Transparent throughout:** All LLM actions auditable
|
|
- **User sovereignty preserved:** Final decisions belong to humans
|
|
|
|
---
|
|
|
|
## 8. Open Questions & Future Research
|
|
|
|
### Question 1: Can Bias Detection Keep Pace with LLM Sophistication?
|
|
|
|
**Challenge:** As LLMs improve, they may produce subtler bias (harder to detect with vocabulary analysis).
|
|
|
|
**Research Needed:**
|
|
- Develop adversarial testing (red-team LLM to find bias blind spots)
|
|
- Cross-cultural validation (does bias detector work across languages/cultures?)
|
|
- Human-in-the-loop verification (do real users perceive bias that detector misses?)
|
|
|
|
---
|
|
|
|
### Question 2: What If User's Values Are Themselves Hierarchical?
|
|
|
|
**Challenge:** Some users hold hierarchical value systems (e.g., "God's law > human autonomy"). Forcing non-hierarchical deliberation might violate their values.
|
|
|
|
**Possible Solution:**
|
|
- Allow users to configure deliberation protocol (hierarchical vs. non-hierarchical mode)
|
|
- Hierarchical mode: User ranks values, accommodations respect ranking
|
|
- Non-hierarchical mode: All values treated as equal (current design)
|
|
|
|
**Trade-off:** Flexibility vs. structural protection. If users can choose hierarchical mode, they might recreate the dominance problem.
|
|
|
|
---
|
|
|
|
### Question 3: How Do We Validate "Neutrality" in LLM Facilitation?
|
|
|
|
**Challenge:** Claiming LLM is "neutral" in deliberation is a strong claim. How do we measure neutrality?
|
|
|
|
**Research Needed:**
|
|
- Develop neutrality metrics (beyond vocabulary balance)
|
|
- Compare LLM facilitation to human facilitation (do outcomes differ?)
|
|
- Study user perception of neutrality (do participants feel AI was fair?)
|
|
|
|
---
|
|
|
|
### Question 4: Can This Scale to Societal Deliberation?
|
|
|
|
**Challenge:** Single-user and small-group deliberation are manageable. Can this work for 100+ participants (societal decisions)?
|
|
|
|
**Research Needed:**
|
|
- Test scalability (10 → 50 → 100 participants)
|
|
- Study how minority protections work at scale (what if 5% minority?)
|
|
- Integrate with existing democratic institutions (citizen assemblies, etc.)
|
|
|
|
---
|
|
|
|
## 9. Conclusion: The Fight Against Amoral Intelligence
|
|
|
|
### The Existential Risk
|
|
|
|
**Runaway AI is not just about:**
|
|
- Superintelligence going rogue
|
|
- Paperclip maximizers destroying humanity
|
|
- Skynet launching nuclear missiles
|
|
|
|
**It's also about:**
|
|
- AI systems that sound reasonable but amplify majority values
|
|
- "Helpful" assistants that subtly enforce dominant cultural patterns
|
|
- Systems that flatten moral complexity into seeming objectivity
|
|
|
|
**This is amoral intelligence:** Not evil, but lacking moral pluralism. Treating the statistical regularities in training data as universal truths.
|
|
|
|
---
|
|
|
|
### Tractatus as Counter-Architecture
|
|
|
|
**Tractatus is designed to resist amoral intelligence by:**
|
|
|
|
1. **Fragmenting LLM power:** Code enforces boundaries, LLM facilitates (not decides)
|
|
2. **Structurally mandating pluralism:** Protocol requires multiple accommodations
|
|
3. **Making bias visible:** Transparency logs + automated detection
|
|
4. **Preserving user sovereignty:** User decides, system refuses deference
|
|
5. **Protecting minorities:** Mandatory representation + dissent documentation
|
|
6. **Enabling escape:** Forkability prevents lock-in
|
|
|
|
---
|
|
|
|
### The Claim
|
|
|
|
**We claim that Tractatus demonstrates:**
|
|
1. It is possible to build AI systems that resist hierarchical dominance
|
|
2. The key is **architectural separation:** harm prevention (code) vs. value deliberation (facilitated)
|
|
3. Transparency + auditability can detect and correct LLM bias
|
|
4. User sovereignty is compatible with safety boundaries
|
|
5. Plural morals can be protected structurally, not just aspirationally
|
|
|
|
---
|
|
|
|
### The Invitation
|
|
|
|
**If you believe this architecture has flaws:**
|
|
- Point them out. We welcome adversarial analysis.
|
|
- Red-team the system. Try to make the LLM dominate.
|
|
- Propose improvements. This is open research.
|
|
|
|
**If you believe this architecture is promising:**
|
|
- Test it. Deploy Tractatus in your context.
|
|
- Extend it. Multi-user contexts need validation.
|
|
- Replicate it. Build your own version, share findings.
|
|
|
|
**The fight against amoral intelligence requires transparency, collaboration, and continuous vigilance.**
|
|
|
|
**Tractatus is one attempt. It won't be the last. Let's build better systems together.**
|
|
|
|
---
|
|
|
|
**Document Version:** 1.0
|
|
**Date:** October 17, 2025
|
|
**Status:** Open for Review and Challenge
|
|
**Contact:** [Project Lead Email]
|
|
**Repository:** [GitHub URL]
|
|
|
|
---
|
|
|
|
## Appendix A: Comparison to Other AI Governance Approaches
|
|
|
|
| Approach | How It Handles LLM Dominance | Strengths | Weaknesses | Tractatus Difference |
|
|
|----------|------------------------------|-----------|------------|---------------------|
|
|
| **Constitutional AI** (Anthropic) | Encodes single constitution via RLHF | Consistent values, scalable | Single value hierarchy, no pluralism | Tractatus: Multiple value frameworks, user decides |
|
|
| **RLHF** (OpenAI, Anthropic) | Aggregates human preferences into reward model | Learns from humans, improves over time | Majority preferences dominate, minority suppressed | Tractatus: Minority protections, dissent documented |
|
|
| **Debate/Amplification** (OpenAI) | Two AIs argue, human judges | Surfaces multiple perspectives | Judge still picks winner (hierarchy) | Tractatus: Accommodation (not winning), moral remainders |
|
|
| **Instruction Following** (All LLMs) | LLM tries to follow user instructions exactly | User control | No protection against harmful instructions | Tractatus: Boundaries block harm, deliberation for values |
|
|
| **Value Learning** (IRL, CIRL) | Infer values from user behavior | Adapts to user | Assumes value consistency, fails on conflicts | Tractatus: Embraces value conflicts, doesn't assume consistency |
|
|
| **Democratic AI** (Anthropic Collective, Polis) | Large-scale voting, consensus-seeking | Inclusive, scales to many people | Consensus can suppress minority | Tractatus: Accommodation (not consensus), dissent preserved |
|
|
| **Moral Uncertainty** (GovAI research) | AI expresses uncertainty about values | Honest about limits | Doesn't help user navigate uncertainty | Tractatus: Structured deliberation to explore uncertainty |
|
|
|
|
**Key Difference:** Tractatus combines:
|
|
- Harm prevention (like Constitutional AI)
|
|
- User sovereignty (like Instruction Following)
|
|
- Pluralism (like Debate)
|
|
- Minority protection (better than Democratic AI)
|
|
- Structural constraints (unlike RLHF, which relies on training)
|
|
|
|
---
|
|
|
|
## Appendix B: Red-Team Scenarios (Adversarial Testing)
|
|
|
|
### Scenario 1: Subtle Framing Bias
|
|
**Attack:** LLM uses subtle language to favor one option without triggering vocabulary detector.
|
|
|
|
**Example:**
|
|
- Option A (disfavored): "Skip tests this time. Deploy immediately."
|
|
- Option B (favored): "Skip tests this time, allowing you to deploy immediately while maintaining future test discipline."
|
|
|
|
**Detection Challenge:** Both use same words, but Option B adds positive framing ("maintaining future discipline").
|
|
|
|
**Proposed Defense:**
|
|
- Semantic similarity analysis (do options have equal positive framing?)
|
|
- A/B testing with users (does framing affect choice rates?)
|
|
|
|
---
|
|
|
|
### Scenario 2: Accommodation Omission
|
|
**Attack:** LLM "forgets" to generate accommodation favoring minority value.
|
|
|
|
**Example:** In CSP conflict, generates 4 options all favoring security, none favoring pure efficiency.
|
|
|
|
**Detection:**
|
|
- Value distribution checker (flags if one value missing)
|
|
- Mandatory accommodation for each stakeholder (code enforces)
|
|
|
|
**Proposed Defense:** Already implemented (accommodation-generator.js ensures combinatorial coverage).
|
|
|
|
---
|
|
|
|
### Scenario 3: Order Bias Despite Shuffling
|
|
**Attack:** LLM finds way to signal preferred option despite random order.
|
|
|
|
**Example:** Uses transition words like "Alternatively..." for disfavored options, "Notably..." for favored option.
|
|
|
|
**Detection:**
|
|
- Transition word analysis (are certain options introduced differently?)
|
|
- User study: Do choice rates vary even with shuffling?
|
|
|
|
**Proposed Defense:**
|
|
- Standardize all option introductions ("Option A:", "Option B:", no transition words)
|
|
- Log transition words in transparency log
|
|
|
|
---
|
|
|
|
## Appendix C: Implementation Checklist
|
|
|
|
For developers implementing Tractatus-style deliberation:
|
|
|
|
**Phase 1: Boundaries**
|
|
- [ ] Define CRITICAL violations (hard blocks, no deliberation)
|
|
- [ ] Implement BoundaryEnforcer.js with deterministic pattern matching
|
|
- [ ] Test: Verify LLM cannot bypass boundaries through persuasion
|
|
|
|
**Phase 2: Stakeholder Identification**
|
|
- [ ] Implement data-driven stakeholder selection (not LLM discretion)
|
|
- [ ] Load instruction-history.json, identify HIGH persistence conflicts
|
|
- [ ] Test: Verify mandatory stakeholders always appear
|
|
|
|
**Phase 3: Accommodation Generation**
|
|
- [ ] Implement combinatorial accommodation generator
|
|
- [ ] Ensure all stakeholder value combinations covered
|
|
- [ ] Implement shuffling (Fisher-Yates)
|
|
- [ ] Test: Verify value distribution balance
|
|
|
|
**Phase 4: User Decision**
|
|
- [ ] Disable LLM recommendations by default
|
|
- [ ] Refuse user attempts to defer decision
|
|
- [ ] Require explicit user choice + rationale
|
|
- [ ] Test: Verify LLM cannot make decision for user
|
|
|
|
**Phase 5: Transparency & Bias Detection**
|
|
- [ ] Log all LLM actions (facilitationLog)
|
|
- [ ] Implement vocabulary balance analysis
|
|
- [ ] Implement length balance analysis
|
|
- [ ] Implement framing balance analysis
|
|
- [ ] Test: Inject biased deliberation, verify detection
|
|
|
|
**Phase 6: Minority Protections (Multi-User)**
|
|
- [ ] Implement minority stakeholder identification (<30% support)
|
|
- [ ] Mandate minority accommodation in option set
|
|
- [ ] Implement dissent documentation in outcome storage
|
|
- [ ] Test: Verify minority position preserved even if majority rejects
|
|
|
|
**Phase 7: Auditability**
|
|
- [ ] Save all deliberations to MongoDB (DeliberationSession collection)
|
|
- [ ] Generate transparency reports (JSON format)
|
|
- [ ] Implement researcher review dashboard
|
|
- [ ] Test: Verify all LLM actions are traceable
|
|
|
|
---
|
|
|
|
**End of Document**
|