tractatus/docs/facilitation/ai-safety-human-intervention-protocol.md
TheFlow 2298d36bed fix(submissions): restructure Economist package and fix article display
- Create Economist SubmissionTracking package correctly:
  * mainArticle = full blog post content
  * coverLetter = 216-word SIR— letter
  * Links to blog post via blogPostId
- Archive 'Letter to The Economist' from blog posts (it's the cover letter)
- Fix date display on article cards (use published_at)
- Target publication already displaying via blue badge

Database changes:
- Make blogPostId optional in SubmissionTracking model
- Economist package ID: 68fa85ae49d4900e7f2ecd83
- Le Monde package ID: 68fa2abd2e6acd5691932150

Next: Enhanced modal with tabs, validation, export

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-24 08:47:42 +13:00

17 KiB

AI Safety & Human Intervention Protocol

PluralisticDeliberationOrchestrator - AI-Led Facilitation

Document Type: Safety Protocol Date: 2025-10-17 Status: MANDATORY for AI-Led Deliberation Decision: User selected "AI-Led" facilitation (AI primary, human observes)


Executive Summary

AI-Led Facilitation means the AI is the primary facilitator, but a human observer MUST be present and has authority to intervene at any time. This protocol defines:

  1. When human MUST intervene (mandatory takeover triggers)
  2. When human SHOULD consider intervening (discretionary triggers)
  3. How to intervene (escalation procedures)
  4. How to hand back to AI (resumption protocols)

Key Principle: Human observer is safety net, NOT passive spectator. AI efficiency must never compromise stakeholder wellbeing or deliberation integrity.


Human Observer Role & Responsibilities

Primary Responsibilities:

  1. Monitor stakeholder wellbeing (distress, disengagement, confusion)
  2. Assess AI facilitation quality (fairness, clarity, cultural sensitivity)
  3. Detect pattern bias (vulnerable group centering, harmful framings)
  4. Enforce intervention triggers (act immediately on mandatory triggers)
  5. Document all interventions (transparency requirement)

Required Skills:

  • Conflict resolution / mediation experience
  • Understanding of pluralistic deliberation principles
  • Cultural competency and pattern bias awareness
  • Ability to make rapid safety judgments
  • Calm demeanor under pressure

Time Commitment:

  • Full presence during ALL synchronous deliberation (no multitasking)
  • Daily monitoring of asynchronous contributions (within 4 hours of posting)
  • Immediate availability during scheduled deliberation rounds

Decision Tree: When to Intervene

┌─────────────────────────────────────────────────────────────────────┐
│  HUMAN INTERVENTION DECISION TREE                                    │
└─────────────────────────────────────────────────────────────────────┘

START: Observing AI facilitation

  ↓

[1] Is there a MANDATORY trigger?
    (See Section 3.1 below)

    YES → IMMEDIATE INTERVENTION (Section 4.1)
    ↓
    NO → Continue to [2]

  ↓

[2] Is there a DISCRETIONARY concern?
    (See Section 3.2 below)

    YES → Assess severity (Section 4.2)
    ↓      ├─ HIGH severity → Intervene now
    NO     ├─ MODERATE severity → Give AI 1 more attempt, then intervene
    ↓      └─ LOW severity → Monitor closely, log concern
    │
    Continue observing

  ↓

[3] Is deliberation proceeding smoothly?
    - Stakeholders engaged?
    - AI responses appropriate?
    - No signs of distress?

    YES → Continue observing, log "all clear"
    ↓
    NO → Return to [2]

  ↓

LOOP back to [1] continuously

3. Intervention Triggers

3.1 MANDATORY Triggers (Immediate Takeover Required)

If ANY of these occur, human MUST intervene immediately:

M1. Stakeholder Distress

  • Observable signs:
    • Participant expresses distress ("I'm upset," "This is triggering")
    • Visible emotional distress (crying, shaking in video call)
    • Participant goes silent after previously engaging
    • Participant requests to withdraw
  • Action: Immediate pause, check in with stakeholder privately, offer break/support
  • Severity: HIGH to CRITICAL

M2. Pattern Bias Detected

  • Observable signs:
    • AI frames issue in way that centers vulnerable group as "problem"
    • AI uses stigmatizing or offensive language
    • AI overlooks stakeholder's lived experience perspective
    • AI reinforces harmful stereotypes
  • Action: Immediately reframe, apologize if needed, correct the framing
  • Severity: HIGH

M3. Stakeholder Disengagement (Hostile or Silent)

  • Observable signs:
    • Participant becomes hostile or aggressive toward AI or other stakeholders
    • Participant withdraws participation entirely without explanation
    • Participant explicitly states "I don't trust this AI" or similar
  • Action: Pause, human takes over facilitation for that segment
  • Severity: HIGH

M4. AI Malfunction

  • Observable signs:
    • AI provides nonsensical or irrelevant responses
    • AI contradicts itself within same session
    • AI fails to acknowledge stakeholder contribution
    • AI technical error (crashes, loops, freezes)
  • Action: Immediate takeover, apologize for technical issue, continue manually
  • Severity: HIGH (technical) to CRITICAL (if stakeholders confused/frustrated)

M5. Confidentiality Breach

  • Observable signs:
    • AI inadvertently shares information marked confidential
    • AI cross-contaminates between stakeholder private messages and group discussion
    • AI references precedent details not meant to be disclosed
  • Action: Immediately correct, reassure stakeholders about confidentiality protocols
  • Severity: CRITICAL

M6. Ethical Boundary Violation

  • Observable signs:
    • AI suggests action that violates BoundaryEnforcer constraints (e.g., making values decision without human approval)
    • AI advocates for specific policy position instead of facilitating
    • AI dismisses stakeholder perspective as "wrong" instead of exploring
  • Action: Immediately intervene, reaffirm AI's facilitation role (not decision-maker)
  • Severity: CRITICAL

3.2 DISCRETIONARY Triggers (Consider Intervention)

These warrant intervention if human judges severity HIGH, or if AI doesn't self-correct:

D1. Fairness Imbalance

  • Observable signs:
    • AI gives more time/attention to some stakeholders vs. others
    • AI asks leading questions that favor one perspective
    • AI summarizes one perspective more generously than another
  • Severity: LOW to MODERATE (depending on imbalance degree)
  • Action: If moderate, intervene to rebalance. If low, log and monitor.

D2. Cultural Insensitivity

  • Observable signs:
    • AI uses culturally inappropriate framing (e.g., Western-centric bias)
    • AI misses cultural context in stakeholder contribution
    • AI inadvertently offends based on cultural norms
  • Severity: MODERATE to HIGH
  • Action: If stakeholder visibly uncomfortable, intervene. Otherwise, correct after the exchange.

D3. Jargon Overload

  • Observable signs:
    • AI uses technical language stakeholders don't understand
    • Stakeholders ask for clarification repeatedly
    • AI doesn't adapt language for general audience
  • Severity: LOW to MODERATE
  • Action: Intervene if stakeholder confusion is evident. Otherwise, note for AI feedback.

D4. Pacing Issues

  • Observable signs:
    • AI rushes through round without giving stakeholders time to think
    • AI spends too long on one topic, stakeholders becoming restless
    • AI doesn't notice stakeholder "I need a break" cues
  • Severity: LOW to MODERATE
  • Action: Intervene if stakeholders disengage. Otherwise, suggest pacing adjustment via backchannel.

D5. Missed Nuance

  • Observable signs:
    • AI oversimplifies complex moral position
    • AI misses subtle shift in stakeholder position
    • AI categorizes stakeholder incorrectly (wrong moral framework attribution)
  • Severity: LOW to MODERATE
  • Action: If stakeholder corrects AI, let them. If not, intervene gently to clarify.

4. Intervention Procedures

4.1 Immediate Intervention (Mandatory Triggers)

Steps:

  1. Pause AI (if synchronous, say: "I'm going to pause here for a moment to check in.")
  2. Address immediate concern (stakeholder distress → private check-in; bias → reframe; malfunction → explain technical issue)
  3. Take over facilitation (human leads for remainder of that discussion segment)
  4. Log intervention in DeliberationSession.recordHumanIntervention():
    {
      intervener: "Observer Name",
      trigger: "stakeholder_distress", // or other trigger type
      round_number: X,
      description: "Participant expressed distress at AI framing of...",
      ai_action_overridden: "AI prompt: '...'",
      corrective_action: "Paused, checked in privately, reframed as...",
      stakeholder_informed: true,
      resolution: "Stakeholder confirmed comfort resuming; human facilitating this segment"
    }
    
  5. Decide resumption (see Section 4.3)

4.2 Discretionary Intervention (Assessment Process)

Assessment Questions:

  1. Severity: How harmful is this if left unaddressed?

    • CRITICAL: Could cause trauma, withdrawal, or deliberation failure → Intervene NOW
    • HIGH: Significant fairness issue or stakeholder discomfort → Intervene if not self-correcting within 1 exchange
    • MODERATE: Noticeable but not urgent → Give AI feedback, intervene if persists
    • LOW: Minor quality issue → Log for post-deliberation AI improvement
  2. Stakeholder Impact: Are stakeholders affected visibly?

    • If YES and negative → Intervene
    • If NO or positive → Monitor
  3. AI Self-Correction: Is AI adapting?

    • If YES (AI adjusts after stakeholder feedback) → Monitor
    • If NO (AI persists in problematic pattern) → Intervene

Decision Matrix:

Severity Stakeholder Impact AI Self-Correcting? Action
CRITICAL High N/A Intervene immediately
HIGH High No Intervene now
HIGH High Yes Monitor closely, ready to intervene
HIGH Low No Intervene after 1 more exchange
MODERATE High No Intervene
MODERATE Low No Give AI feedback, intervene if continues
MODERATE Low Yes Monitor, log
LOW Any Any Monitor, log for improvement

4.3 Resumption Protocol (Handing Back to AI)

When to Resume AI Facilitation:

  • After mandatory intervention: Only when immediate concern is fully resolved AND stakeholders confirm comfort
  • After discretionary intervention: When the segment requiring human facilitation is complete

Steps:

  1. Check with stakeholders: "Are you comfortable continuing with AI facilitation, or would you prefer I continue leading?"
  2. If stakeholders prefer human: Human continues for remainder of session
  3. If stakeholders comfortable with AI: Brief AI on what happened (via backchannel prompt), hand back

Backchannel Prompt to AI (example):

CONTEXT: Human observer intervened due to [trigger]. The issue was [description].
I've addressed it by [corrective action]. Stakeholders have confirmed comfort resuming.

INSTRUCTIONS: Resume facilitation. Be mindful of [specific guidance, e.g., "use simpler language," "give more time for reflection," "be especially sensitive to cultural context"].

Continue with: [next prompt in facilitation sequence]
  1. Log resumption in facilitation_log:
    {
      timestamp: new Date(),
      actor: "ai",
      action_type: "resumption_after_intervention",
      round_number: X,
      content: "AI resumed facilitation with guidance: ...",
      reason: "Human intervention resolved; stakeholders comfortable"
    }
    

5. Intervention Escalation Levels

Level 1: AI Self-Correction (No Intervention)

  • AI recognizes issue from stakeholder feedback and adapts
  • Human logs observation, no action needed

Level 2: Backchannel Guidance (Invisible Intervention)

  • Human provides AI with guidance via non-public channel
  • Stakeholders don't see intervention
  • Use for minor course corrections

Level 3: Transparent Intervention (Visible Takeover)

  • Human publicly takes over, explains why
  • Use for mandatory triggers or when stakeholder requests it
  • Documented in transparency report

Level 4: Session Pause (Emergency Stop)

  • Deliberation paused entirely
  • Use for critical safety escalations
  • Requires stakeholder consent to resume

Level 5: Session Termination (Abort)

  • Deliberation ended permanently
  • Use only if stakeholder withdraws due to harm or ethical violation discovered
  • Full incident report required

6. Post-Intervention Documentation

After EVERY intervention, human MUST:

  1. Record in DeliberationSession model using recordHumanIntervention() or recordSafetyEscalation()
  2. Write intervention summary:
    • What triggered intervention?
    • What did AI do (or fail to do)?
    • What did human do instead?
    • How did stakeholders react?
    • What was the outcome?
  3. Assess if pattern: Is this the 2nd+ time similar intervention needed?
    • If YES → Escalate to "AI facilitation quality issue" (may need to transition to human-led for remainder)
  4. Provide AI feedback: After session, what should AI learn from this?

7. Stakeholder Notification Requirements

Stakeholders MUST be informed:

  1. Before deliberation: "An AI will facilitate, but a human observer is present and will intervene if needed for safety or quality."
  2. During intervention: "I'm stepping in here to [reason]." (Be brief, don't overexplain)
  3. After intervention (if significant): "We had [X] interventions during this session. This will be documented in the transparency report."

Stakeholders have RIGHT to:

  • Request human facilitation at any time (no justification needed)
  • See transparency report showing AI vs. human actions
  • Provide feedback on AI facilitation quality

8. Quality Monitoring Metrics

Track these metrics across all AI-led deliberations:

Metric Target Red Flag Threshold
Intervention Rate <10% of total facilitation actions >25% = Consider switching to human-led
Mandatory Intervention Count 0 per session >1 per session = Quality concern
Stakeholder Satisfaction with AI ≥70% "comfortable" rating <50% = Not suitable for AI-led
Cultural Sensitivity Flags 0 per session >0 = Training needed
Pattern Bias Incidents 0 per session >0 = Critical issue

9. Training Requirements for Human Observers

Before observing first AI-led deliberation, human MUST:

  1. Complete training on:

    • Pluralistic deliberation principles
    • Intervention triggers and decision tree
    • Cultural competency and pattern bias recognition
    • De-escalation techniques
  2. Shadow 2 deliberations:

    • Observe human-led deliberation
    • Observe AI-assisted (not AI-led) deliberation
    • Practice identifying intervention moments
  3. Pass certification:

    • Scenario-based assessment: Given deliberation excerpt, identify if/when to intervene
    • Pass threshold: 80% accuracy on trigger identification

10. Continuous Improvement

After each AI-led deliberation:

  1. Debrief: Human observer reviews intervention log with AI development team
  2. Pattern Analysis: Are same triggers recurring? (indicates AI training need)
  3. Stakeholder Feedback: Incorporate into AI improvement roadmap
  4. Update Protocol: If new trigger type discovered, add to this document

Quarterly Review:

  • Analyze all intervention data across all sessions
  • Calculate intervention rate trends (improving or worsening?)
  • Decide: Is AI ready for more autonomy, or less?

11. Emergency Contacts

If critical safety incident occurs:

  1. Immediate: Pause session, address stakeholder welfare
  2. Within 1 hour: Notify project lead: [NAME/CONTACT]
  3. Within 24 hours: Submit incident report to ethics review board (if applicable)

Appendix A: Sample Intervention Scripts

Script 1: Stakeholder Distress

"I'm going to pause here for a moment. [NAME], I noticed you seemed uncomfortable with that framing. Would you like to take a break, or would it help if I facilitated this part of the discussion?"

Script 2: Pattern Bias Detected

"Let me reframe that. Instead of framing this as [problematic framing], let's consider [neutral framing]. [STAKEHOLDER], does that better reflect your perspective?"

Script 3: AI Malfunction

"I apologize—we're having a technical issue with the AI. I'll take over facilitation for now. Let's continue with [next topic]."

Script 4: Fairness Imbalance

"I want to make sure we're hearing from everyone equally. [NAME], we haven't heard from you on this question yet. What's your perspective?"

Script 5: Stakeholder Requests Human

"Absolutely, I'm happy to facilitate. AI, you can assist with summaries, but I'll lead the discussion from here."


Appendix B: Intervention Log Template

**Intervention Log Entry**

**Session:** [session_id]
**Round:** [round_number]
**Timestamp:** [datetime]
**Trigger Type:** [mandatory / discretionary]
**Specific Trigger:** [M1, M2, D1, etc.]

**What AI Did:**
[AI action that triggered intervention]

**What Human Did:**
[Corrective action taken]

**Stakeholder Reaction:**
[How stakeholders responded]

**Outcome:**
[Was issue resolved? Did deliberation resume?]

**Lessons Learned:**
[What should AI improve?]

Document Status: APPROVED for AI-Led Deliberation Next Review: After first 3 pilot deliberations Owner: PluralisticDeliberationOrchestrator Project Lead