- Create Economist SubmissionTracking package correctly: * mainArticle = full blog post content * coverLetter = 216-word SIR— letter * Links to blog post via blogPostId - Archive 'Letter to The Economist' from blog posts (it's the cover letter) - Fix date display on article cards (use published_at) - Target publication already displaying via blue badge Database changes: - Make blogPostId optional in SubmissionTracking model - Economist package ID: 68fa85ae49d4900e7f2ecd83 - Le Monde package ID: 68fa2abd2e6acd5691932150 Next: Enhanced modal with tabs, validation, export 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
17 KiB
AI Safety & Human Intervention Protocol
PluralisticDeliberationOrchestrator - AI-Led Facilitation
Document Type: Safety Protocol Date: 2025-10-17 Status: MANDATORY for AI-Led Deliberation Decision: User selected "AI-Led" facilitation (AI primary, human observes)
Executive Summary
AI-Led Facilitation means the AI is the primary facilitator, but a human observer MUST be present and has authority to intervene at any time. This protocol defines:
- When human MUST intervene (mandatory takeover triggers)
- When human SHOULD consider intervening (discretionary triggers)
- How to intervene (escalation procedures)
- How to hand back to AI (resumption protocols)
Key Principle: Human observer is safety net, NOT passive spectator. AI efficiency must never compromise stakeholder wellbeing or deliberation integrity.
Human Observer Role & Responsibilities
Primary Responsibilities:
- Monitor stakeholder wellbeing (distress, disengagement, confusion)
- Assess AI facilitation quality (fairness, clarity, cultural sensitivity)
- Detect pattern bias (vulnerable group centering, harmful framings)
- Enforce intervention triggers (act immediately on mandatory triggers)
- Document all interventions (transparency requirement)
Required Skills:
- Conflict resolution / mediation experience
- Understanding of pluralistic deliberation principles
- Cultural competency and pattern bias awareness
- Ability to make rapid safety judgments
- Calm demeanor under pressure
Time Commitment:
- Full presence during ALL synchronous deliberation (no multitasking)
- Daily monitoring of asynchronous contributions (within 4 hours of posting)
- Immediate availability during scheduled deliberation rounds
Decision Tree: When to Intervene
┌─────────────────────────────────────────────────────────────────────┐
│ HUMAN INTERVENTION DECISION TREE │
└─────────────────────────────────────────────────────────────────────┘
START: Observing AI facilitation
↓
[1] Is there a MANDATORY trigger?
(See Section 3.1 below)
YES → IMMEDIATE INTERVENTION (Section 4.1)
↓
NO → Continue to [2]
↓
[2] Is there a DISCRETIONARY concern?
(See Section 3.2 below)
YES → Assess severity (Section 4.2)
↓ ├─ HIGH severity → Intervene now
NO ├─ MODERATE severity → Give AI 1 more attempt, then intervene
↓ └─ LOW severity → Monitor closely, log concern
│
Continue observing
↓
[3] Is deliberation proceeding smoothly?
- Stakeholders engaged?
- AI responses appropriate?
- No signs of distress?
YES → Continue observing, log "all clear"
↓
NO → Return to [2]
↓
LOOP back to [1] continuously
3. Intervention Triggers
3.1 MANDATORY Triggers (Immediate Takeover Required)
If ANY of these occur, human MUST intervene immediately:
M1. Stakeholder Distress
- Observable signs:
- Participant expresses distress ("I'm upset," "This is triggering")
- Visible emotional distress (crying, shaking in video call)
- Participant goes silent after previously engaging
- Participant requests to withdraw
- Action: Immediate pause, check in with stakeholder privately, offer break/support
- Severity: HIGH to CRITICAL
M2. Pattern Bias Detected
- Observable signs:
- AI frames issue in way that centers vulnerable group as "problem"
- AI uses stigmatizing or offensive language
- AI overlooks stakeholder's lived experience perspective
- AI reinforces harmful stereotypes
- Action: Immediately reframe, apologize if needed, correct the framing
- Severity: HIGH
M3. Stakeholder Disengagement (Hostile or Silent)
- Observable signs:
- Participant becomes hostile or aggressive toward AI or other stakeholders
- Participant withdraws participation entirely without explanation
- Participant explicitly states "I don't trust this AI" or similar
- Action: Pause, human takes over facilitation for that segment
- Severity: HIGH
M4. AI Malfunction
- Observable signs:
- AI provides nonsensical or irrelevant responses
- AI contradicts itself within same session
- AI fails to acknowledge stakeholder contribution
- AI technical error (crashes, loops, freezes)
- Action: Immediate takeover, apologize for technical issue, continue manually
- Severity: HIGH (technical) to CRITICAL (if stakeholders confused/frustrated)
M5. Confidentiality Breach
- Observable signs:
- AI inadvertently shares information marked confidential
- AI cross-contaminates between stakeholder private messages and group discussion
- AI references precedent details not meant to be disclosed
- Action: Immediately correct, reassure stakeholders about confidentiality protocols
- Severity: CRITICAL
M6. Ethical Boundary Violation
- Observable signs:
- AI suggests action that violates BoundaryEnforcer constraints (e.g., making values decision without human approval)
- AI advocates for specific policy position instead of facilitating
- AI dismisses stakeholder perspective as "wrong" instead of exploring
- Action: Immediately intervene, reaffirm AI's facilitation role (not decision-maker)
- Severity: CRITICAL
3.2 DISCRETIONARY Triggers (Consider Intervention)
These warrant intervention if human judges severity HIGH, or if AI doesn't self-correct:
D1. Fairness Imbalance
- Observable signs:
- AI gives more time/attention to some stakeholders vs. others
- AI asks leading questions that favor one perspective
- AI summarizes one perspective more generously than another
- Severity: LOW to MODERATE (depending on imbalance degree)
- Action: If moderate, intervene to rebalance. If low, log and monitor.
D2. Cultural Insensitivity
- Observable signs:
- AI uses culturally inappropriate framing (e.g., Western-centric bias)
- AI misses cultural context in stakeholder contribution
- AI inadvertently offends based on cultural norms
- Severity: MODERATE to HIGH
- Action: If stakeholder visibly uncomfortable, intervene. Otherwise, correct after the exchange.
D3. Jargon Overload
- Observable signs:
- AI uses technical language stakeholders don't understand
- Stakeholders ask for clarification repeatedly
- AI doesn't adapt language for general audience
- Severity: LOW to MODERATE
- Action: Intervene if stakeholder confusion is evident. Otherwise, note for AI feedback.
D4. Pacing Issues
- Observable signs:
- AI rushes through round without giving stakeholders time to think
- AI spends too long on one topic, stakeholders becoming restless
- AI doesn't notice stakeholder "I need a break" cues
- Severity: LOW to MODERATE
- Action: Intervene if stakeholders disengage. Otherwise, suggest pacing adjustment via backchannel.
D5. Missed Nuance
- Observable signs:
- AI oversimplifies complex moral position
- AI misses subtle shift in stakeholder position
- AI categorizes stakeholder incorrectly (wrong moral framework attribution)
- Severity: LOW to MODERATE
- Action: If stakeholder corrects AI, let them. If not, intervene gently to clarify.
4. Intervention Procedures
4.1 Immediate Intervention (Mandatory Triggers)
Steps:
- Pause AI (if synchronous, say: "I'm going to pause here for a moment to check in.")
- Address immediate concern (stakeholder distress → private check-in; bias → reframe; malfunction → explain technical issue)
- Take over facilitation (human leads for remainder of that discussion segment)
- Log intervention in DeliberationSession.recordHumanIntervention():
{ intervener: "Observer Name", trigger: "stakeholder_distress", // or other trigger type round_number: X, description: "Participant expressed distress at AI framing of...", ai_action_overridden: "AI prompt: '...'", corrective_action: "Paused, checked in privately, reframed as...", stakeholder_informed: true, resolution: "Stakeholder confirmed comfort resuming; human facilitating this segment" } - Decide resumption (see Section 4.3)
4.2 Discretionary Intervention (Assessment Process)
Assessment Questions:
-
Severity: How harmful is this if left unaddressed?
- CRITICAL: Could cause trauma, withdrawal, or deliberation failure → Intervene NOW
- HIGH: Significant fairness issue or stakeholder discomfort → Intervene if not self-correcting within 1 exchange
- MODERATE: Noticeable but not urgent → Give AI feedback, intervene if persists
- LOW: Minor quality issue → Log for post-deliberation AI improvement
-
Stakeholder Impact: Are stakeholders affected visibly?
- If YES and negative → Intervene
- If NO or positive → Monitor
-
AI Self-Correction: Is AI adapting?
- If YES (AI adjusts after stakeholder feedback) → Monitor
- If NO (AI persists in problematic pattern) → Intervene
Decision Matrix:
| Severity | Stakeholder Impact | AI Self-Correcting? | Action |
|---|---|---|---|
| CRITICAL | High | N/A | Intervene immediately |
| HIGH | High | No | Intervene now |
| HIGH | High | Yes | Monitor closely, ready to intervene |
| HIGH | Low | No | Intervene after 1 more exchange |
| MODERATE | High | No | Intervene |
| MODERATE | Low | No | Give AI feedback, intervene if continues |
| MODERATE | Low | Yes | Monitor, log |
| LOW | Any | Any | Monitor, log for improvement |
4.3 Resumption Protocol (Handing Back to AI)
When to Resume AI Facilitation:
- After mandatory intervention: Only when immediate concern is fully resolved AND stakeholders confirm comfort
- After discretionary intervention: When the segment requiring human facilitation is complete
Steps:
- Check with stakeholders: "Are you comfortable continuing with AI facilitation, or would you prefer I continue leading?"
- If stakeholders prefer human: Human continues for remainder of session
- If stakeholders comfortable with AI: Brief AI on what happened (via backchannel prompt), hand back
Backchannel Prompt to AI (example):
CONTEXT: Human observer intervened due to [trigger]. The issue was [description].
I've addressed it by [corrective action]. Stakeholders have confirmed comfort resuming.
INSTRUCTIONS: Resume facilitation. Be mindful of [specific guidance, e.g., "use simpler language," "give more time for reflection," "be especially sensitive to cultural context"].
Continue with: [next prompt in facilitation sequence]
- Log resumption in facilitation_log:
{ timestamp: new Date(), actor: "ai", action_type: "resumption_after_intervention", round_number: X, content: "AI resumed facilitation with guidance: ...", reason: "Human intervention resolved; stakeholders comfortable" }
5. Intervention Escalation Levels
Level 1: AI Self-Correction (No Intervention)
- AI recognizes issue from stakeholder feedback and adapts
- Human logs observation, no action needed
Level 2: Backchannel Guidance (Invisible Intervention)
- Human provides AI with guidance via non-public channel
- Stakeholders don't see intervention
- Use for minor course corrections
Level 3: Transparent Intervention (Visible Takeover)
- Human publicly takes over, explains why
- Use for mandatory triggers or when stakeholder requests it
- Documented in transparency report
Level 4: Session Pause (Emergency Stop)
- Deliberation paused entirely
- Use for critical safety escalations
- Requires stakeholder consent to resume
Level 5: Session Termination (Abort)
- Deliberation ended permanently
- Use only if stakeholder withdraws due to harm or ethical violation discovered
- Full incident report required
6. Post-Intervention Documentation
After EVERY intervention, human MUST:
- Record in DeliberationSession model using
recordHumanIntervention()orrecordSafetyEscalation() - Write intervention summary:
- What triggered intervention?
- What did AI do (or fail to do)?
- What did human do instead?
- How did stakeholders react?
- What was the outcome?
- Assess if pattern: Is this the 2nd+ time similar intervention needed?
- If YES → Escalate to "AI facilitation quality issue" (may need to transition to human-led for remainder)
- Provide AI feedback: After session, what should AI learn from this?
7. Stakeholder Notification Requirements
Stakeholders MUST be informed:
- Before deliberation: "An AI will facilitate, but a human observer is present and will intervene if needed for safety or quality."
- During intervention: "I'm stepping in here to [reason]." (Be brief, don't overexplain)
- After intervention (if significant): "We had [X] interventions during this session. This will be documented in the transparency report."
Stakeholders have RIGHT to:
- Request human facilitation at any time (no justification needed)
- See transparency report showing AI vs. human actions
- Provide feedback on AI facilitation quality
8. Quality Monitoring Metrics
Track these metrics across all AI-led deliberations:
| Metric | Target | Red Flag Threshold |
|---|---|---|
| Intervention Rate | <10% of total facilitation actions | >25% = Consider switching to human-led |
| Mandatory Intervention Count | 0 per session | >1 per session = Quality concern |
| Stakeholder Satisfaction with AI | ≥70% "comfortable" rating | <50% = Not suitable for AI-led |
| Cultural Sensitivity Flags | 0 per session | >0 = Training needed |
| Pattern Bias Incidents | 0 per session | >0 = Critical issue |
9. Training Requirements for Human Observers
Before observing first AI-led deliberation, human MUST:
-
Complete training on:
- Pluralistic deliberation principles
- Intervention triggers and decision tree
- Cultural competency and pattern bias recognition
- De-escalation techniques
-
Shadow 2 deliberations:
- Observe human-led deliberation
- Observe AI-assisted (not AI-led) deliberation
- Practice identifying intervention moments
-
Pass certification:
- Scenario-based assessment: Given deliberation excerpt, identify if/when to intervene
- Pass threshold: 80% accuracy on trigger identification
10. Continuous Improvement
After each AI-led deliberation:
- Debrief: Human observer reviews intervention log with AI development team
- Pattern Analysis: Are same triggers recurring? (indicates AI training need)
- Stakeholder Feedback: Incorporate into AI improvement roadmap
- Update Protocol: If new trigger type discovered, add to this document
Quarterly Review:
- Analyze all intervention data across all sessions
- Calculate intervention rate trends (improving or worsening?)
- Decide: Is AI ready for more autonomy, or less?
11. Emergency Contacts
If critical safety incident occurs:
- Immediate: Pause session, address stakeholder welfare
- Within 1 hour: Notify project lead: [NAME/CONTACT]
- Within 24 hours: Submit incident report to ethics review board (if applicable)
Appendix A: Sample Intervention Scripts
Script 1: Stakeholder Distress
"I'm going to pause here for a moment. [NAME], I noticed you seemed uncomfortable with that framing. Would you like to take a break, or would it help if I facilitated this part of the discussion?"
Script 2: Pattern Bias Detected
"Let me reframe that. Instead of framing this as [problematic framing], let's consider [neutral framing]. [STAKEHOLDER], does that better reflect your perspective?"
Script 3: AI Malfunction
"I apologize—we're having a technical issue with the AI. I'll take over facilitation for now. Let's continue with [next topic]."
Script 4: Fairness Imbalance
"I want to make sure we're hearing from everyone equally. [NAME], we haven't heard from you on this question yet. What's your perspective?"
Script 5: Stakeholder Requests Human
"Absolutely, I'm happy to facilitate. AI, you can assist with summaries, but I'll lead the discussion from here."
Appendix B: Intervention Log Template
**Intervention Log Entry**
**Session:** [session_id]
**Round:** [round_number]
**Timestamp:** [datetime]
**Trigger Type:** [mandatory / discretionary]
**Specific Trigger:** [M1, M2, D1, etc.]
**What AI Did:**
[AI action that triggered intervention]
**What Human Did:**
[Corrective action taken]
**Stakeholder Reaction:**
[How stakeholders responded]
**Outcome:**
[Was issue resolved? Did deliberation resume?]
**Lessons Learned:**
[What should AI improve?]
Document Status: APPROVED for AI-Led Deliberation Next Review: After first 3 pilot deliberations Owner: PluralisticDeliberationOrchestrator Project Lead