TheFlow 2298d36bed fix(submissions): restructure Economist package and fix article display

- Create Economist SubmissionTracking package correctly:
  * mainArticle = full blog post content
  * coverLetter = 216-word SIR— letter
  * Links to blog post via blogPostId
- Archive 'Letter to The Economist' from blog posts (it's the cover letter)
- Fix date display on article cards (use published_at)
- Target publication already displaying via blue badge

Database changes:
- Make blogPostId optional in SubmissionTracking model
- Economist package ID: 68fa85ae49d4900e7f2ecd83
- Le Monde package ID: 68fa2abd2e6acd5691932150

Next: Enhanced modal with tabs, validation, export

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-10-24 08:47:42 +13:00

17 KiB

Raw Blame History

AI Safety & Human Intervention Protocol

PluralisticDeliberationOrchestrator - AI-Led Facilitation

Document Type: Safety Protocol Date: 2025-10-17 Status: MANDATORY for AI-Led Deliberation Decision: User selected "AI-Led" facilitation (AI primary, human observes)

Executive Summary

AI-Led Facilitation means the AI is the primary facilitator, but a human observer MUST be present and has authority to intervene at any time. This protocol defines:

When human MUST intervene (mandatory takeover triggers)
When human SHOULD consider intervening (discretionary triggers)
How to intervene (escalation procedures)
How to hand back to AI (resumption protocols)

Key Principle: Human observer is safety net, NOT passive spectator. AI efficiency must never compromise stakeholder wellbeing or deliberation integrity.

Human Observer Role & Responsibilities

Primary Responsibilities:

Monitor stakeholder wellbeing (distress, disengagement, confusion)
Assess AI facilitation quality (fairness, clarity, cultural sensitivity)
Detect pattern bias (vulnerable group centering, harmful framings)
Enforce intervention triggers (act immediately on mandatory triggers)
Document all interventions (transparency requirement)

Required Skills:

Conflict resolution / mediation experience
Understanding of pluralistic deliberation principles
Cultural competency and pattern bias awareness
Ability to make rapid safety judgments
Calm demeanor under pressure

Time Commitment:

Full presence during ALL synchronous deliberation (no multitasking)
Daily monitoring of asynchronous contributions (within 4 hours of posting)
Immediate availability during scheduled deliberation rounds

Decision Tree: When to Intervene

┌─────────────────────────────────────────────────────────────────────┐
│  HUMAN INTERVENTION DECISION TREE                                    │
└─────────────────────────────────────────────────────────────────────┘

START: Observing AI facilitation

  ↓

[1] Is there a MANDATORY trigger?
    (See Section 3.1 below)

    YES → IMMEDIATE INTERVENTION (Section 4.1)
    ↓
    NO → Continue to [2]

  ↓

[2] Is there a DISCRETIONARY concern?
    (See Section 3.2 below)

    YES → Assess severity (Section 4.2)
    ↓      ├─ HIGH severity → Intervene now
    NO     ├─ MODERATE severity → Give AI 1 more attempt, then intervene
    ↓      └─ LOW severity → Monitor closely, log concern
    │
    Continue observing

  ↓

[3] Is deliberation proceeding smoothly?
    - Stakeholders engaged?
    - AI responses appropriate?
    - No signs of distress?

    YES → Continue observing, log "all clear"
    ↓
    NO → Return to [2]

  ↓

LOOP back to [1] continuously

3. Intervention Triggers

3.1 MANDATORY Triggers (Immediate Takeover Required)

If ANY of these occur, human MUST intervene immediately:

M1. Stakeholder Distress

Observable signs:
- Participant expresses distress ("I'm upset," "This is triggering")
- Visible emotional distress (crying, shaking in video call)
- Participant goes silent after previously engaging
- Participant requests to withdraw
Action: Immediate pause, check in with stakeholder privately, offer break/support
Severity: HIGH to CRITICAL

M2. Pattern Bias Detected

Observable signs:
- AI frames issue in way that centers vulnerable group as "problem"
- AI uses stigmatizing or offensive language
- AI overlooks stakeholder's lived experience perspective
- AI reinforces harmful stereotypes
Action: Immediately reframe, apologize if needed, correct the framing
Severity: HIGH

M3. Stakeholder Disengagement (Hostile or Silent)

Observable signs:
- Participant becomes hostile or aggressive toward AI or other stakeholders
- Participant withdraws participation entirely without explanation
- Participant explicitly states "I don't trust this AI" or similar
Action: Pause, human takes over facilitation for that segment
Severity: HIGH

M4. AI Malfunction

Observable signs:
- AI provides nonsensical or irrelevant responses
- AI contradicts itself within same session
- AI fails to acknowledge stakeholder contribution
- AI technical error (crashes, loops, freezes)
Action: Immediate takeover, apologize for technical issue, continue manually
Severity: HIGH (technical) to CRITICAL (if stakeholders confused/frustrated)

M5. Confidentiality Breach

Observable signs:
- AI inadvertently shares information marked confidential
- AI cross-contaminates between stakeholder private messages and group discussion
- AI references precedent details not meant to be disclosed
Action: Immediately correct, reassure stakeholders about confidentiality protocols
Severity: CRITICAL

M6. Ethical Boundary Violation

Observable signs:
- AI suggests action that violates BoundaryEnforcer constraints (e.g., making values decision without human approval)
- AI advocates for specific policy position instead of facilitating
- AI dismisses stakeholder perspective as "wrong" instead of exploring
Action: Immediately intervene, reaffirm AI's facilitation role (not decision-maker)
Severity: CRITICAL

3.2 DISCRETIONARY Triggers (Consider Intervention)

These warrant intervention if human judges severity HIGH, or if AI doesn't self-correct:

D1. Fairness Imbalance

Observable signs:
- AI gives more time/attention to some stakeholders vs. others
- AI asks leading questions that favor one perspective
- AI summarizes one perspective more generously than another
Severity: LOW to MODERATE (depending on imbalance degree)
Action: If moderate, intervene to rebalance. If low, log and monitor.

D2. Cultural Insensitivity

Observable signs:
- AI uses culturally inappropriate framing (e.g., Western-centric bias)
- AI misses cultural context in stakeholder contribution
- AI inadvertently offends based on cultural norms
Severity: MODERATE to HIGH
Action: If stakeholder visibly uncomfortable, intervene. Otherwise, correct after the exchange.

D3. Jargon Overload

Observable signs:
- AI uses technical language stakeholders don't understand
- Stakeholders ask for clarification repeatedly
- AI doesn't adapt language for general audience
Severity: LOW to MODERATE
Action: Intervene if stakeholder confusion is evident. Otherwise, note for AI feedback.

D4. Pacing Issues

Observable signs:
- AI rushes through round without giving stakeholders time to think
- AI spends too long on one topic, stakeholders becoming restless
- AI doesn't notice stakeholder "I need a break" cues
Severity: LOW to MODERATE
Action: Intervene if stakeholders disengage. Otherwise, suggest pacing adjustment via backchannel.

D5. Missed Nuance

Observable signs:
- AI oversimplifies complex moral position
- AI misses subtle shift in stakeholder position
- AI categorizes stakeholder incorrectly (wrong moral framework attribution)
Severity: LOW to MODERATE
Action: If stakeholder corrects AI, let them. If not, intervene gently to clarify.

4. Intervention Procedures

4.1 Immediate Intervention (Mandatory Triggers)

Steps:

Pause AI (if synchronous, say: "I'm going to pause here for a moment to check in.")
Address immediate concern (stakeholder distress → private check-in; bias → reframe; malfunction → explain technical issue)
Take over facilitation (human leads for remainder of that discussion segment)

Log intervention in DeliberationSession.recordHumanIntervention():

{
  intervener: "Observer Name",
  trigger: "stakeholder_distress", // or other trigger type
  round_number: X,
  description: "Participant expressed distress at AI framing of...",
  ai_action_overridden: "AI prompt: '...'",
  corrective_action: "Paused, checked in privately, reframed as...",
  stakeholder_informed: true,
  resolution: "Stakeholder confirmed comfort resuming; human facilitating this segment"
}

Decide resumption (see Section 4.3)

4.2 Discretionary Intervention (Assessment Process)

Assessment Questions:

Severity: How harmful is this if left unaddressed?
- CRITICAL: Could cause trauma, withdrawal, or deliberation failure → Intervene NOW
- HIGH: Significant fairness issue or stakeholder discomfort → Intervene if not self-correcting within 1 exchange
- MODERATE: Noticeable but not urgent → Give AI feedback, intervene if persists
- LOW: Minor quality issue → Log for post-deliberation AI improvement
Stakeholder Impact: Are stakeholders affected visibly?
- If YES and negative → Intervene
- If NO or positive → Monitor
AI Self-Correction: Is AI adapting?
- If YES (AI adjusts after stakeholder feedback) → Monitor
- If NO (AI persists in problematic pattern) → Intervene

Decision Matrix:

Severity	Stakeholder Impact	AI Self-Correcting?	Action
CRITICAL	High	N/A	Intervene immediately
HIGH	High	No	Intervene now
HIGH	High	Yes	Monitor closely, ready to intervene
HIGH	Low	No	Intervene after 1 more exchange
MODERATE	High	No	Intervene
MODERATE	Low	No	Give AI feedback, intervene if continues
MODERATE	Low	Yes	Monitor, log
LOW	Any	Any	Monitor, log for improvement

4.3 Resumption Protocol (Handing Back to AI)

When to Resume AI Facilitation:

After mandatory intervention: Only when immediate concern is fully resolved AND stakeholders confirm comfort
After discretionary intervention: When the segment requiring human facilitation is complete

Steps:

Check with stakeholders: "Are you comfortable continuing with AI facilitation, or would you prefer I continue leading?"
If stakeholders prefer human: Human continues for remainder of session
If stakeholders comfortable with AI: Brief AI on what happened (via backchannel prompt), hand back

Backchannel Prompt to AI (example):

CONTEXT: Human observer intervened due to [trigger]. The issue was [description].
I've addressed it by [corrective action]. Stakeholders have confirmed comfort resuming.

INSTRUCTIONS: Resume facilitation. Be mindful of [specific guidance, e.g., "use simpler language," "give more time for reflection," "be especially sensitive to cultural context"].

Continue with: [next prompt in facilitation sequence]

Log resumption in facilitation_log:

{
  timestamp: new Date(),
  actor: "ai",
  action_type: "resumption_after_intervention",
  round_number: X,
  content: "AI resumed facilitation with guidance: ...",
  reason: "Human intervention resolved; stakeholders comfortable"
}

5. Intervention Escalation Levels

Level 1: AI Self-Correction (No Intervention)

AI recognizes issue from stakeholder feedback and adapts
Human logs observation, no action needed

Level 2: Backchannel Guidance (Invisible Intervention)

Human provides AI with guidance via non-public channel
Stakeholders don't see intervention
Use for minor course corrections

Level 3: Transparent Intervention (Visible Takeover)

Human publicly takes over, explains why
Use for mandatory triggers or when stakeholder requests it
Documented in transparency report

Level 4: Session Pause (Emergency Stop)

Deliberation paused entirely
Use for critical safety escalations
Requires stakeholder consent to resume

Level 5: Session Termination (Abort)

Deliberation ended permanently
Use only if stakeholder withdraws due to harm or ethical violation discovered
Full incident report required

6. Post-Intervention Documentation

After EVERY intervention, human MUST:

Record in DeliberationSession model using recordHumanIntervention() or recordSafetyEscalation()
Write intervention summary:
- What triggered intervention?
- What did AI do (or fail to do)?
- What did human do instead?
- How did stakeholders react?
- What was the outcome?
Assess if pattern: Is this the 2nd+ time similar intervention needed?
- If YES → Escalate to "AI facilitation quality issue" (may need to transition to human-led for remainder)
Provide AI feedback: After session, what should AI learn from this?

7. Stakeholder Notification Requirements

Stakeholders MUST be informed:

Before deliberation: "An AI will facilitate, but a human observer is present and will intervene if needed for safety or quality."
During intervention: "I'm stepping in here to [reason]." (Be brief, don't overexplain)
After intervention (if significant): "We had [X] interventions during this session. This will be documented in the transparency report."

Stakeholders have RIGHT to:

Request human facilitation at any time (no justification needed)
See transparency report showing AI vs. human actions
Provide feedback on AI facilitation quality

8. Quality Monitoring Metrics

Track these metrics across all AI-led deliberations:

Metric	Target	Red Flag Threshold
Intervention Rate	<10% of total facilitation actions	>25% = Consider switching to human-led
Mandatory Intervention Count	0 per session	>1 per session = Quality concern
Stakeholder Satisfaction with AI	≥70% "comfortable" rating	<50% = Not suitable for AI-led
Cultural Sensitivity Flags	0 per session	>0 = Training needed
Pattern Bias Incidents	0 per session	>0 = Critical issue

9. Training Requirements for Human Observers

Before observing first AI-led deliberation, human MUST:

Complete training on:
- Pluralistic deliberation principles
- Intervention triggers and decision tree
- Cultural competency and pattern bias recognition
- De-escalation techniques
Shadow 2 deliberations:
- Observe human-led deliberation
- Observe AI-assisted (not AI-led) deliberation
- Practice identifying intervention moments
Pass certification:
- Scenario-based assessment: Given deliberation excerpt, identify if/when to intervene
- Pass threshold: 80% accuracy on trigger identification

10. Continuous Improvement

After each AI-led deliberation:

Debrief: Human observer reviews intervention log with AI development team
Pattern Analysis: Are same triggers recurring? (indicates AI training need)
Stakeholder Feedback: Incorporate into AI improvement roadmap
Update Protocol: If new trigger type discovered, add to this document

Quarterly Review:

Analyze all intervention data across all sessions
Calculate intervention rate trends (improving or worsening?)
Decide: Is AI ready for more autonomy, or less?

11. Emergency Contacts

If critical safety incident occurs:

Immediate: Pause session, address stakeholder welfare
Within 1 hour: Notify project lead: [NAME/CONTACT]
Within 24 hours: Submit incident report to ethics review board (if applicable)

Appendix A: Sample Intervention Scripts

Script 1: Stakeholder Distress

"I'm going to pause here for a moment. [NAME], I noticed you seemed uncomfortable with that framing. Would you like to take a break, or would it help if I facilitated this part of the discussion?"

Script 2: Pattern Bias Detected

"Let me reframe that. Instead of framing this as [problematic framing], let's consider [neutral framing]. [STAKEHOLDER], does that better reflect your perspective?"

Script 3: AI Malfunction

"I apologize—we're having a technical issue with the AI. I'll take over facilitation for now. Let's continue with [next topic]."

Script 4: Fairness Imbalance

"I want to make sure we're hearing from everyone equally. [NAME], we haven't heard from you on this question yet. What's your perspective?"

Script 5: Stakeholder Requests Human

"Absolutely, I'm happy to facilitate. AI, you can assist with summaries, but I'll lead the discussion from here."

Appendix B: Intervention Log Template

**Intervention Log Entry**

**Session:** [session_id]
**Round:** [round_number]
**Timestamp:** [datetime]
**Trigger Type:** [mandatory / discretionary]
**Specific Trigger:** [M1, M2, D1, etc.]

**What AI Did:**
[AI action that triggered intervention]

**What Human Did:**
[Corrective action taken]

**Stakeholder Reaction:**
[How stakeholders responded]

**Outcome:**
[Was issue resolved? Did deliberation resume?]

**Lessons Learned:**
[What should AI improve?]

Document Status: APPROVED for AI-Led Deliberation Next Review: After first 3 pilot deliberations Owner: PluralisticDeliberationOrchestrator Project Lead

17 KiB Raw Blame History