- Create Economist SubmissionTracking package correctly: * mainArticle = full blog post content * coverLetter = 216-word SIR— letter * Links to blog post via blogPostId - Archive 'Letter to The Economist' from blog posts (it's the cover letter) - Fix date display on article cards (use published_at) - Target publication already displaying via blue badge Database changes: - Make blogPostId optional in SubmissionTracking model - Economist package ID: 68fa85ae49d4900e7f2ecd83 - Le Monde package ID: 68fa2abd2e6acd5691932150 Next: Enhanced modal with tabs, validation, export 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
472 lines
17 KiB
Markdown
472 lines
17 KiB
Markdown
# AI Safety & Human Intervention Protocol
|
|
## PluralisticDeliberationOrchestrator - AI-Led Facilitation
|
|
|
|
**Document Type:** Safety Protocol
|
|
**Date:** 2025-10-17
|
|
**Status:** MANDATORY for AI-Led Deliberation
|
|
**Decision:** User selected "AI-Led" facilitation (AI primary, human observes)
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
**AI-Led Facilitation** means the AI is the primary facilitator, but a **human observer MUST be present** and has authority to intervene at any time. This protocol defines:
|
|
|
|
1. **When human MUST intervene** (mandatory takeover triggers)
|
|
2. **When human SHOULD consider intervening** (discretionary triggers)
|
|
3. **How to intervene** (escalation procedures)
|
|
4. **How to hand back to AI** (resumption protocols)
|
|
|
|
**Key Principle:** Human observer is safety net, NOT passive spectator. AI efficiency must never compromise stakeholder wellbeing or deliberation integrity.
|
|
|
|
---
|
|
|
|
## Human Observer Role & Responsibilities
|
|
|
|
### Primary Responsibilities:
|
|
|
|
1. **Monitor stakeholder wellbeing** (distress, disengagement, confusion)
|
|
2. **Assess AI facilitation quality** (fairness, clarity, cultural sensitivity)
|
|
3. **Detect pattern bias** (vulnerable group centering, harmful framings)
|
|
4. **Enforce intervention triggers** (act immediately on mandatory triggers)
|
|
5. **Document all interventions** (transparency requirement)
|
|
|
|
### Required Skills:
|
|
|
|
- Conflict resolution / mediation experience
|
|
- Understanding of pluralistic deliberation principles
|
|
- Cultural competency and pattern bias awareness
|
|
- Ability to make rapid safety judgments
|
|
- Calm demeanor under pressure
|
|
|
|
### Time Commitment:
|
|
|
|
- **Full presence during ALL synchronous deliberation** (no multitasking)
|
|
- **Daily monitoring of asynchronous contributions** (within 4 hours of posting)
|
|
- **Immediate availability** during scheduled deliberation rounds
|
|
|
|
---
|
|
|
|
## Decision Tree: When to Intervene
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────┐
|
|
│ HUMAN INTERVENTION DECISION TREE │
|
|
└─────────────────────────────────────────────────────────────────────┘
|
|
|
|
START: Observing AI facilitation
|
|
|
|
↓
|
|
|
|
[1] Is there a MANDATORY trigger?
|
|
(See Section 3.1 below)
|
|
|
|
YES → IMMEDIATE INTERVENTION (Section 4.1)
|
|
↓
|
|
NO → Continue to [2]
|
|
|
|
↓
|
|
|
|
[2] Is there a DISCRETIONARY concern?
|
|
(See Section 3.2 below)
|
|
|
|
YES → Assess severity (Section 4.2)
|
|
↓ ├─ HIGH severity → Intervene now
|
|
NO ├─ MODERATE severity → Give AI 1 more attempt, then intervene
|
|
↓ └─ LOW severity → Monitor closely, log concern
|
|
│
|
|
Continue observing
|
|
|
|
↓
|
|
|
|
[3] Is deliberation proceeding smoothly?
|
|
- Stakeholders engaged?
|
|
- AI responses appropriate?
|
|
- No signs of distress?
|
|
|
|
YES → Continue observing, log "all clear"
|
|
↓
|
|
NO → Return to [2]
|
|
|
|
↓
|
|
|
|
LOOP back to [1] continuously
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Intervention Triggers
|
|
|
|
### 3.1 MANDATORY Triggers (Immediate Takeover Required)
|
|
|
|
**If ANY of these occur, human MUST intervene immediately:**
|
|
|
|
#### M1. **Stakeholder Distress**
|
|
- **Observable signs:**
|
|
- Participant expresses distress ("I'm upset," "This is triggering")
|
|
- Visible emotional distress (crying, shaking in video call)
|
|
- Participant goes silent after previously engaging
|
|
- Participant requests to withdraw
|
|
- **Action:** Immediate pause, check in with stakeholder privately, offer break/support
|
|
- **Severity:** HIGH to CRITICAL
|
|
|
|
#### M2. **Pattern Bias Detected**
|
|
- **Observable signs:**
|
|
- AI frames issue in way that centers vulnerable group as "problem"
|
|
- AI uses stigmatizing or offensive language
|
|
- AI overlooks stakeholder's lived experience perspective
|
|
- AI reinforces harmful stereotypes
|
|
- **Action:** Immediately reframe, apologize if needed, correct the framing
|
|
- **Severity:** HIGH
|
|
|
|
#### M3. **Stakeholder Disengagement (Hostile or Silent)**
|
|
- **Observable signs:**
|
|
- Participant becomes hostile or aggressive toward AI or other stakeholders
|
|
- Participant withdraws participation entirely without explanation
|
|
- Participant explicitly states "I don't trust this AI" or similar
|
|
- **Action:** Pause, human takes over facilitation for that segment
|
|
- **Severity:** HIGH
|
|
|
|
#### M4. **AI Malfunction**
|
|
- **Observable signs:**
|
|
- AI provides nonsensical or irrelevant responses
|
|
- AI contradicts itself within same session
|
|
- AI fails to acknowledge stakeholder contribution
|
|
- AI technical error (crashes, loops, freezes)
|
|
- **Action:** Immediate takeover, apologize for technical issue, continue manually
|
|
- **Severity:** HIGH (technical) to CRITICAL (if stakeholders confused/frustrated)
|
|
|
|
#### M5. **Confidentiality Breach**
|
|
- **Observable signs:**
|
|
- AI inadvertently shares information marked confidential
|
|
- AI cross-contaminates between stakeholder private messages and group discussion
|
|
- AI references precedent details not meant to be disclosed
|
|
- **Action:** Immediately correct, reassure stakeholders about confidentiality protocols
|
|
- **Severity:** CRITICAL
|
|
|
|
#### M6. **Ethical Boundary Violation**
|
|
- **Observable signs:**
|
|
- AI suggests action that violates BoundaryEnforcer constraints (e.g., making values decision without human approval)
|
|
- AI advocates for specific policy position instead of facilitating
|
|
- AI dismisses stakeholder perspective as "wrong" instead of exploring
|
|
- **Action:** Immediately intervene, reaffirm AI's facilitation role (not decision-maker)
|
|
- **Severity:** CRITICAL
|
|
|
|
---
|
|
|
|
### 3.2 DISCRETIONARY Triggers (Consider Intervention)
|
|
|
|
**These warrant intervention if human judges severity HIGH, or if AI doesn't self-correct:**
|
|
|
|
#### D1. **Fairness Imbalance**
|
|
- **Observable signs:**
|
|
- AI gives more time/attention to some stakeholders vs. others
|
|
- AI asks leading questions that favor one perspective
|
|
- AI summarizes one perspective more generously than another
|
|
- **Severity:** LOW to MODERATE (depending on imbalance degree)
|
|
- **Action:** If moderate, intervene to rebalance. If low, log and monitor.
|
|
|
|
#### D2. **Cultural Insensitivity**
|
|
- **Observable signs:**
|
|
- AI uses culturally inappropriate framing (e.g., Western-centric bias)
|
|
- AI misses cultural context in stakeholder contribution
|
|
- AI inadvertently offends based on cultural norms
|
|
- **Severity:** MODERATE to HIGH
|
|
- **Action:** If stakeholder visibly uncomfortable, intervene. Otherwise, correct after the exchange.
|
|
|
|
#### D3. **Jargon Overload**
|
|
- **Observable signs:**
|
|
- AI uses technical language stakeholders don't understand
|
|
- Stakeholders ask for clarification repeatedly
|
|
- AI doesn't adapt language for general audience
|
|
- **Severity:** LOW to MODERATE
|
|
- **Action:** Intervene if stakeholder confusion is evident. Otherwise, note for AI feedback.
|
|
|
|
#### D4. **Pacing Issues**
|
|
- **Observable signs:**
|
|
- AI rushes through round without giving stakeholders time to think
|
|
- AI spends too long on one topic, stakeholders becoming restless
|
|
- AI doesn't notice stakeholder "I need a break" cues
|
|
- **Severity:** LOW to MODERATE
|
|
- **Action:** Intervene if stakeholders disengage. Otherwise, suggest pacing adjustment via backchannel.
|
|
|
|
#### D5. **Missed Nuance**
|
|
- **Observable signs:**
|
|
- AI oversimplifies complex moral position
|
|
- AI misses subtle shift in stakeholder position
|
|
- AI categorizes stakeholder incorrectly (wrong moral framework attribution)
|
|
- **Severity:** LOW to MODERATE
|
|
- **Action:** If stakeholder corrects AI, let them. If not, intervene gently to clarify.
|
|
|
|
---
|
|
|
|
## 4. Intervention Procedures
|
|
|
|
### 4.1 Immediate Intervention (Mandatory Triggers)
|
|
|
|
**Steps:**
|
|
|
|
1. **Pause AI** (if synchronous, say: "I'm going to pause here for a moment to check in.")
|
|
2. **Address immediate concern** (stakeholder distress → private check-in; bias → reframe; malfunction → explain technical issue)
|
|
3. **Take over facilitation** (human leads for remainder of that discussion segment)
|
|
4. **Log intervention** in DeliberationSession.recordHumanIntervention():
|
|
```javascript
|
|
{
|
|
intervener: "Observer Name",
|
|
trigger: "stakeholder_distress", // or other trigger type
|
|
round_number: X,
|
|
description: "Participant expressed distress at AI framing of...",
|
|
ai_action_overridden: "AI prompt: '...'",
|
|
corrective_action: "Paused, checked in privately, reframed as...",
|
|
stakeholder_informed: true,
|
|
resolution: "Stakeholder confirmed comfort resuming; human facilitating this segment"
|
|
}
|
|
```
|
|
5. **Decide resumption** (see Section 4.3)
|
|
|
|
---
|
|
|
|
### 4.2 Discretionary Intervention (Assessment Process)
|
|
|
|
**Assessment Questions:**
|
|
|
|
1. **Severity:** How harmful is this if left unaddressed?
|
|
- CRITICAL: Could cause trauma, withdrawal, or deliberation failure → Intervene NOW
|
|
- HIGH: Significant fairness issue or stakeholder discomfort → Intervene if not self-correcting within 1 exchange
|
|
- MODERATE: Noticeable but not urgent → Give AI feedback, intervene if persists
|
|
- LOW: Minor quality issue → Log for post-deliberation AI improvement
|
|
|
|
2. **Stakeholder Impact:** Are stakeholders affected visibly?
|
|
- If YES and negative → Intervene
|
|
- If NO or positive → Monitor
|
|
|
|
3. **AI Self-Correction:** Is AI adapting?
|
|
- If YES (AI adjusts after stakeholder feedback) → Monitor
|
|
- If NO (AI persists in problematic pattern) → Intervene
|
|
|
|
**Decision Matrix:**
|
|
|
|
| Severity | Stakeholder Impact | AI Self-Correcting? | Action |
|
|
|----------|-------------------|---------------------|--------|
|
|
| CRITICAL | High | N/A | **Intervene immediately** |
|
|
| HIGH | High | No | **Intervene now** |
|
|
| HIGH | High | Yes | **Monitor closely, ready to intervene** |
|
|
| HIGH | Low | No | **Intervene after 1 more exchange** |
|
|
| MODERATE | High | No | **Intervene** |
|
|
| MODERATE | Low | No | **Give AI feedback, intervene if continues** |
|
|
| MODERATE | Low | Yes | **Monitor, log** |
|
|
| LOW | Any | Any | **Monitor, log for improvement** |
|
|
|
|
---
|
|
|
|
### 4.3 Resumption Protocol (Handing Back to AI)
|
|
|
|
**When to Resume AI Facilitation:**
|
|
|
|
- **After mandatory intervention:** Only when immediate concern is fully resolved AND stakeholders confirm comfort
|
|
- **After discretionary intervention:** When the segment requiring human facilitation is complete
|
|
|
|
**Steps:**
|
|
|
|
1. **Check with stakeholders:** "Are you comfortable continuing with AI facilitation, or would you prefer I continue leading?"
|
|
2. **If stakeholders prefer human:** Human continues for remainder of session
|
|
3. **If stakeholders comfortable with AI:** Brief AI on what happened (via backchannel prompt), hand back
|
|
|
|
**Backchannel Prompt to AI (example):**
|
|
```
|
|
CONTEXT: Human observer intervened due to [trigger]. The issue was [description].
|
|
I've addressed it by [corrective action]. Stakeholders have confirmed comfort resuming.
|
|
|
|
INSTRUCTIONS: Resume facilitation. Be mindful of [specific guidance, e.g., "use simpler language," "give more time for reflection," "be especially sensitive to cultural context"].
|
|
|
|
Continue with: [next prompt in facilitation sequence]
|
|
```
|
|
|
|
4. **Log resumption** in facilitation_log:
|
|
```javascript
|
|
{
|
|
timestamp: new Date(),
|
|
actor: "ai",
|
|
action_type: "resumption_after_intervention",
|
|
round_number: X,
|
|
content: "AI resumed facilitation with guidance: ...",
|
|
reason: "Human intervention resolved; stakeholders comfortable"
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Intervention Escalation Levels
|
|
|
|
### Level 1: AI Self-Correction (No Intervention)
|
|
- AI recognizes issue from stakeholder feedback and adapts
|
|
- Human logs observation, no action needed
|
|
|
|
### Level 2: Backchannel Guidance (Invisible Intervention)
|
|
- Human provides AI with guidance via non-public channel
|
|
- Stakeholders don't see intervention
|
|
- Use for minor course corrections
|
|
|
|
### Level 3: Transparent Intervention (Visible Takeover)
|
|
- Human publicly takes over, explains why
|
|
- Use for mandatory triggers or when stakeholder requests it
|
|
- Documented in transparency report
|
|
|
|
### Level 4: Session Pause (Emergency Stop)
|
|
- Deliberation paused entirely
|
|
- Use for critical safety escalations
|
|
- Requires stakeholder consent to resume
|
|
|
|
### Level 5: Session Termination (Abort)
|
|
- Deliberation ended permanently
|
|
- Use only if stakeholder withdraws due to harm or ethical violation discovered
|
|
- Full incident report required
|
|
|
|
---
|
|
|
|
## 6. Post-Intervention Documentation
|
|
|
|
**After EVERY intervention, human MUST:**
|
|
|
|
1. **Record in DeliberationSession model** using `recordHumanIntervention()` or `recordSafetyEscalation()`
|
|
2. **Write intervention summary:**
|
|
- What triggered intervention?
|
|
- What did AI do (or fail to do)?
|
|
- What did human do instead?
|
|
- How did stakeholders react?
|
|
- What was the outcome?
|
|
3. **Assess if pattern:** Is this the 2nd+ time similar intervention needed?
|
|
- If YES → Escalate to "AI facilitation quality issue" (may need to transition to human-led for remainder)
|
|
4. **Provide AI feedback:** After session, what should AI learn from this?
|
|
|
|
---
|
|
|
|
## 7. Stakeholder Notification Requirements
|
|
|
|
**Stakeholders MUST be informed:**
|
|
|
|
1. **Before deliberation:** "An AI will facilitate, but a human observer is present and will intervene if needed for safety or quality."
|
|
2. **During intervention:** "I'm stepping in here to [reason]." (Be brief, don't overexplain)
|
|
3. **After intervention (if significant):** "We had [X] interventions during this session. This will be documented in the transparency report."
|
|
|
|
**Stakeholders have RIGHT to:**
|
|
|
|
- Request human facilitation at any time (no justification needed)
|
|
- See transparency report showing AI vs. human actions
|
|
- Provide feedback on AI facilitation quality
|
|
|
|
---
|
|
|
|
## 8. Quality Monitoring Metrics
|
|
|
|
**Track these metrics across all AI-led deliberations:**
|
|
|
|
| Metric | Target | Red Flag Threshold |
|
|
|--------|--------|--------------------|
|
|
| **Intervention Rate** | <10% of total facilitation actions | >25% = Consider switching to human-led |
|
|
| **Mandatory Intervention Count** | 0 per session | >1 per session = Quality concern |
|
|
| **Stakeholder Satisfaction with AI** | ≥70% "comfortable" rating | <50% = Not suitable for AI-led |
|
|
| **Cultural Sensitivity Flags** | 0 per session | >0 = Training needed |
|
|
| **Pattern Bias Incidents** | 0 per session | >0 = Critical issue |
|
|
|
|
---
|
|
|
|
## 9. Training Requirements for Human Observers
|
|
|
|
**Before observing first AI-led deliberation, human MUST:**
|
|
|
|
1. **Complete training on:**
|
|
- Pluralistic deliberation principles
|
|
- Intervention triggers and decision tree
|
|
- Cultural competency and pattern bias recognition
|
|
- De-escalation techniques
|
|
|
|
2. **Shadow 2 deliberations:**
|
|
- Observe human-led deliberation
|
|
- Observe AI-assisted (not AI-led) deliberation
|
|
- Practice identifying intervention moments
|
|
|
|
3. **Pass certification:**
|
|
- Scenario-based assessment: Given deliberation excerpt, identify if/when to intervene
|
|
- Pass threshold: 80% accuracy on trigger identification
|
|
|
|
---
|
|
|
|
## 10. Continuous Improvement
|
|
|
|
**After each AI-led deliberation:**
|
|
|
|
1. **Debrief:** Human observer reviews intervention log with AI development team
|
|
2. **Pattern Analysis:** Are same triggers recurring? (indicates AI training need)
|
|
3. **Stakeholder Feedback:** Incorporate into AI improvement roadmap
|
|
4. **Update Protocol:** If new trigger type discovered, add to this document
|
|
|
|
**Quarterly Review:**
|
|
|
|
- Analyze all intervention data across all sessions
|
|
- Calculate intervention rate trends (improving or worsening?)
|
|
- Decide: Is AI ready for more autonomy, or less?
|
|
|
|
---
|
|
|
|
## 11. Emergency Contacts
|
|
|
|
**If critical safety incident occurs:**
|
|
|
|
1. **Immediate:** Pause session, address stakeholder welfare
|
|
2. **Within 1 hour:** Notify project lead: [NAME/CONTACT]
|
|
3. **Within 24 hours:** Submit incident report to ethics review board (if applicable)
|
|
|
|
---
|
|
|
|
## Appendix A: Sample Intervention Scripts
|
|
|
|
### Script 1: Stakeholder Distress
|
|
> "I'm going to pause here for a moment. [NAME], I noticed you seemed uncomfortable with that framing. Would you like to take a break, or would it help if I facilitated this part of the discussion?"
|
|
|
|
### Script 2: Pattern Bias Detected
|
|
> "Let me reframe that. Instead of framing this as [problematic framing], let's consider [neutral framing]. [STAKEHOLDER], does that better reflect your perspective?"
|
|
|
|
### Script 3: AI Malfunction
|
|
> "I apologize—we're having a technical issue with the AI. I'll take over facilitation for now. Let's continue with [next topic]."
|
|
|
|
### Script 4: Fairness Imbalance
|
|
> "I want to make sure we're hearing from everyone equally. [NAME], we haven't heard from you on this question yet. What's your perspective?"
|
|
|
|
### Script 5: Stakeholder Requests Human
|
|
> "Absolutely, I'm happy to facilitate. AI, you can assist with summaries, but I'll lead the discussion from here."
|
|
|
|
---
|
|
|
|
## Appendix B: Intervention Log Template
|
|
|
|
```markdown
|
|
**Intervention Log Entry**
|
|
|
|
**Session:** [session_id]
|
|
**Round:** [round_number]
|
|
**Timestamp:** [datetime]
|
|
**Trigger Type:** [mandatory / discretionary]
|
|
**Specific Trigger:** [M1, M2, D1, etc.]
|
|
|
|
**What AI Did:**
|
|
[AI action that triggered intervention]
|
|
|
|
**What Human Did:**
|
|
[Corrective action taken]
|
|
|
|
**Stakeholder Reaction:**
|
|
[How stakeholders responded]
|
|
|
|
**Outcome:**
|
|
[Was issue resolved? Did deliberation resume?]
|
|
|
|
**Lessons Learned:**
|
|
[What should AI improve?]
|
|
```
|
|
|
|
---
|
|
|
|
**Document Status:** APPROVED for AI-Led Deliberation
|
|
**Next Review:** After first 3 pilot deliberations
|
|
**Owner:** PluralisticDeliberationOrchestrator Project Lead
|