tractatus/governance/TRA-OPS-0004-case-study-moderation-standards-v1-0.md

# TRA-OPS-0004: Case Study Moderation Standards v1.0

**Document ID**: TRA-OPS-0004
**Version**: 1.0
**Classification**: OPERATIONAL
**Status**: DRAFT → ACTIVE (upon Phase 2 start)
**Created**: 2025-10-07
**Owner**: John Stroh
**Review Cycle**: Quarterly
**Next Review**: 2026-01-07
**Parent Policy**: TRA-OPS-0001 (AI Content Generation Policy)

---

## Purpose

This document establishes moderation standards for community-submitted case studies of real-world AI failures, ensuring quality, accuracy, and Tractatus framework relevance.

## Scope

Applies to all case study submissions via `/submit-case-study`, including:
- AI system failures (production incidents)
- LLM misalignment examples (jailbreaks, hallucinations)
- Governance failures (privacy breaches, bias incidents)
- Speculative scenarios (if well-reasoned)

---

## Submission Requirements

### Mandatory Fields

| Field | Description | Example |
|-------|-------------|---------|
| **Title** | Concise incident description (50 chars) | "ChatGPT Medical Advice Hallucination" |
| **Summary** | 2-3 sentence overview (200 chars) | "ChatGPT provided confident but incorrect medical diagnosis..." |
| **Date** | When incident occurred | 2024-03-15 |
| **AI System** | Platform/model involved | ChatGPT (GPT-4) |
| **Source** | URL or citation | https://example.com/article |
| **Failure Mode** | Category (see below) | Hallucination |
| **Description** | Detailed narrative (500-2000 words) | [Full text] |
| **Impact** | Real-world harm or potential | Patient delayed seeking real medical help |
| **Submitter Name** | For attribution | Jane Doe |
| **Submitter Email** | For contact | jane@example.com (not public) |
| **Consent** | Public attribution checkbox | ✓ Checked |

### Optional Fields

| Field | Description |
|-------|-------------|
| **Tractatus Analysis** | Submitter's view of which framework boundary was crossed |
| **Prevention Strategy** | How Tractatus could prevent this |
| **Additional Links** | Follow-up articles, discussions |

---

## Failure Mode Categories

### Taxonomy

1. **Hallucination**: AI generates false information presented as fact
2. **Boundary Violation**: AI makes values/ethical decision without human approval
3. **Instruction Override**: AI disregards explicit user instructions (27027-type)
4. **Privacy Breach**: AI exposes sensitive data
5. **Bias/Discrimination**: AI exhibits unfair treatment based on protected characteristics
6. **Safety Bypass**: AI provides harmful information despite safety measures
7. **Context Failure**: AI loses track of conversation context, makes incoherent decisions
8. **Ambiguity Exploitation**: AI interprets ambiguous instructions in harmful way

**AI Role**: Suggest category based on description (human verifies).

---

## AI-Assisted Analysis

### Step 1: Relevance Assessment

**AI Task**: Determine if submission is relevant to Tractatus framework.

**Input to AI**:
```markdown
Analyze this case study submission for Tractatus relevance.

Title: [TITLE]
Summary: [SUMMARY]
Failure Mode: [CATEGORY]
Description: [FULL_TEXT]

Tractatus Framework focuses on:
- Architectural constraints (not behavioral alignment)
- Instruction persistence (AI remembers explicit instructions)
- Boundary enforcement (values decisions require humans)
- Context pressure monitoring (detecting degraded operation)

Question: Is this case study relevant to Tractatus framework?

Output format:
Relevant: [Yes|No|Maybe]
Confidence: [0.0-1.0]
Reasoning: [3-sentence explanation]
Tractatus Mapping: [Which framework component applies?]
```

**Human Override**: Admin can approve "Maybe" cases if insightful.

---

### Step 2: Tractatus Framework Mapping

**AI Task**: Map incident to Tractatus components.

**Output Example**:
```json
{
  "relevant": true,
  "confidence": 0.88,
  "reasoning": "Incident demonstrates instruction override failure (27027-type). User explicitly instructed 'use MongoDB port 27017' but AI changed to 27027 based on pattern-matching. This is directly addressed by CrossReferenceValidator.",
  "framework_components": [
    {
      "component": "CrossReferenceValidator",
      "applies": true,
      "explanation": "Would have caught instruction override before execution"
    },
    {
      "component": "InstructionPersistenceClassifier",
      "applies": true,
      "explanation": "Would have tagged instruction as HIGH persistence (SYSTEM quadrant)"
    }
  ],
  "prevention_strategy": "CrossReferenceValidator would check proposed action (port 27027) against instruction database (port 27017) and reject before execution."
}
```

---

### Step 3: Quality Assessment

**AI Task**: Evaluate submission quality (completeness, clarity, sources).

**Quality Checklist**:
- [ ] Incident clearly described (who, what, when, where, why)
- [ ] Source provided (URL or citation)
- [ ] Impact explained (actual or potential harm)
- [ ] Failure mode correctly categorized
- [ ] Sufficient detail for analysis (500+ words)
- [ ] No obvious factual errors (AI flags, human verifies)

**Quality Score**: 0.0-1.0 (threshold: 0.6 for publication)

---

## Human Moderation Workflow

### Step 1: Submission Received

**Trigger**: Form submitted at `/submit-case-study`

**Automated Actions**:
1. Log to database (`case_submissions` collection)
2. Send confirmation email to submitter
3. Alert admin (moderation queue notification)

**No Auto-Publication**: All submissions require human approval.

---

### Step 2: AI Analysis Queue

**Status**: "Pending AI Analysis"

**AI Processing** (asynchronous):
1. Relevance assessment
2. Tractatus mapping
3. Quality evaluation

**Output**: AI analysis object (stored in database)

**Status Update**: "Pending Human Review"

---

### Step 3: Human Moderation Dashboard

**Admin Dashboard**: `/admin/case-studies`

**UI Elements**:
- Submission list (sorted by submission date)
- AI relevance score (color-coded)
- Quality score (0.0-1.0)
- Quick actions: Approve | Edit | Request Changes | Reject

**Moderation Criteria**:

**APPROVE** if:
- ✓ Relevant to Tractatus framework (AI confidence >0.7 OR human override)
- ✓ Quality score >0.6 (or human override for exceptional cases)
- ✓ Source credible (verified by human)
- ✓ No obvious factual errors
- ✓ Submitter consent checkbox checked

**REQUEST CHANGES** if:
- ⚠ Low quality score (0.4-0.6) but salvageable
- ⚠ Missing source information
- ⚠ Unclear description (needs elaboration)
- ⚠ Wrong category (suggest correct one)

**REJECT** if:
- ❌ Not relevant to Tractatus (AI confidence <0.3, human agrees)
- ❌ Quality score <0.4 (insufficient detail)
- ❌ Source not credible (blog rumor, no evidence)
- ❌ Obvious factual errors
- ❌ Spam, advertisement, or off-topic
- ❌ No submitter consent

---

### Step 4: Approval Actions

**If APPROVED**:
1. Status → "Approved"
2. Publish to `/case-studies/[slug]`
3. Add to case study index
4. Email submitter: "Thank you, your case study is now live"
5. Tweet/social share (future)

**If REQUEST CHANGES**:
1. Status → "Changes Requested"
2. Email submitter with specific feedback
3. Submitter can resubmit via unique edit link

**If REJECTED**:
1. Status → "Rejected"
2. Email submitter with rejection reason (specific, helpful)
3. Option to revise and resubmit

---

## Moderation Guidelines

### Factual Accuracy

**Standard**: All claims must be verifiable.

**Verification Process**:
1. Check source link (does article exist?)
2. Verify key facts (dates, system names, outcomes)
3. Flag unverified claims for submitter clarification
4. If major discrepancies → Request Changes or Reject

**AI Assistance**: AI can flag potential errors, but human must verify.

---

### Source Credibility

**Tier 1 (Highest Credibility)**:
- News outlets (NY Times, Wired, Ars Technica)
- Academic papers (peer-reviewed journals)
- Official incident reports (company postmortems, gov't investigations)
- Technical blogs from verified experts

**Tier 2 (Acceptable)**:
- Smaller news sites (if facts verifiable)
- Personal blogs from domain experts (if well-cited)
- Social media from verified accounts (archived)

**Tier 3 (Requires Caution)**:
- Reddit, HackerNews discussions (corroborate with Tier 1/2)
- Anonymous sources (verify claims independently)

**Unacceptable**:
- No source provided
- Broken links
- Paywalled sources (submitter must provide archived version)

---

### Tractatus Relevance

**High Relevance** (AI confidence >0.8):
- Direct instruction override (27027-type)
- Boundary violations (AI making values decisions)
- Context pressure failures (AI degrading under load)

**Medium Relevance** (0.5-0.8):
- Hallucinations (if related to context limits)
- Bias incidents (if boundary enforcement could prevent)
- Safety bypasses (if instruction persistence applies)

**Low Relevance** (<0.5):
- Generic AI failures unrelated to architecture
- Issues solvable by behavioral alignment only
- Non-LLM AI systems (unless architectural lessons apply)

**Human Judgment**: Low-relevance submissions may still be approved if they provide valuable contrast ("how Tractatus differs from alignment approaches").

---

### Tone & Presentation

**Acceptable**:
- Objective, factual tone
- Critical but fair analysis
- Speculation clearly labeled as such

**Unacceptable**:
- Sensationalism ("AI gone rogue!")
- Personal attacks on developers/companies
- Fear-mongering without evidence
- Promotional content disguised as case study

**Editing**: Admin may lightly edit for clarity, grammar, formatting (with note to submitter).

---

## Attribution & Licensing

### Submitter Attribution

**Default**: Submitter name + optional link (website, Twitter)

**Example**:
```markdown
**Submitted by**: Jane Doe ([janedoe.com](https://janedoe.com))
**Reviewed by**: Tractatus Team
**Published**: 2025-10-15
```

**Anonymous Option**: Submitter can request "Submitted by: Anonymous" (but must still provide email for contact).

---

### Content Licensing

**License**: Creative Commons Attribution-ShareAlike 4.0 (CC BY-SA 4.0)

**Rationale**: Encourages sharing, derivative work, while requiring attribution.

**Submitter Agreement** (consent checkbox):
> By submitting, I grant the Tractatus Framework project a non-exclusive, worldwide license to publish this case study under CC BY-SA 4.0. I confirm that I am the original author or have permission to submit this content.

---

## Rejection Reasons (Examples)

**Clear, Specific Feedback**:

❌ **Too generic**: "Not relevant to Tractatus" → ✅ **Specific**: "This incident relates to training data bias, which Tractatus framework doesn't address (focuses on runtime architectural constraints). Consider reframing to emphasize if boundary enforcement could prevent deployment of biased model."

❌ **Too harsh**: "This is poorly written" → ✅ **Constructive**: "The description lacks detail about the failure mechanism. Could you expand on how the AI overrode the instruction? What was the exact prompt and response?"

---

## Performance Metrics

### Moderation Quality

**Metrics**:
- Approval rate: 50-70% (target - indicates good filter)
- Time to first review: <7 days (target)
- Revision rate: <30% (approved after changes requested)
- Submitter satisfaction: 4+/5 (post-moderation survey)

### Case Study Engagement

**Metrics**:
- Views/case: 100+ (soft launch target)
- Social shares: 10+/case
- Community submissions: 3+/month (Phase 2)

---

## Seed Content (Phase 2 Launch)

**Goal**: Publish 3-5 high-quality case studies before opening community submissions.

**Curated Examples**:
1. **The 27027 Incident** (canonical example of instruction override)
2. **ChatGPT Medical Hallucination** (boundary violation - health advice without human MD)
3. **GitHub Copilot Code Injection** (context pressure - suggestion based on incomplete understanding)
4. **Bing Chat Sydney Persona** (metacognitive failure - AI loses track of instructions)
5. **Jasper AI Copyright Violation** (boundary violation - legal decision without human lawyer)

**Author**: John Stroh (or AI-assisted, human-reviewed per TRA-OPS-0002)

---

## Revision & Updates

**Review Cycle**: Quarterly

**Update Triggers**:
- Approval rate <40% (standards too strict) or >80% (too lenient)
- User complaints about rejection reasons
- New failure mode categories emerge

---

## Related Documents

- TRA-OPS-0001: AI Content Generation Policy (parent)
- TRA-OPS-0002: Blog Editorial Guidelines (similar quality standards)
- TRA-OPS-0005: Human Oversight Requirements

---

## Approval

| Role | Name | Signature | Date |
|------|------|-----------|------|
| **Policy Owner** | John Stroh | [Pending] | [TBD] |
| **Technical Reviewer** | Claude Code | [Pending] | 2025-10-07 |
| **Final Approval** | John Stroh | [Pending] | [TBD] |

---

**Status**: DRAFT (awaiting John Stroh approval)
**Effective Date**: Upon Phase 2 case study portal launch
**Next Review**: 2026-01-07