# TRA-OPS-0004: Case Study Moderation Standards v1.0 **Document ID**: TRA-OPS-0004 **Version**: 1.0 **Classification**: OPERATIONAL **Status**: DRAFT → ACTIVE (upon Phase 2 start) **Created**: 2025-10-07 **Owner**: John Stroh **Review Cycle**: Quarterly **Next Review**: 2026-01-07 **Parent Policy**: TRA-OPS-0001 (AI Content Generation Policy) --- ## Purpose This document establishes moderation standards for community-submitted case studies of real-world AI failures, ensuring quality, accuracy, and Tractatus framework relevance. ## Scope Applies to all case study submissions via `/submit-case-study`, including: - AI system failures (production incidents) - LLM misalignment examples (jailbreaks, hallucinations) - Governance failures (privacy breaches, bias incidents) - Speculative scenarios (if well-reasoned) --- ## Submission Requirements ### Mandatory Fields | Field | Description | Example | |-------|-------------|---------| | **Title** | Concise incident description (50 chars) | "ChatGPT Medical Advice Hallucination" | | **Summary** | 2-3 sentence overview (200 chars) | "ChatGPT provided confident but incorrect medical diagnosis..." | | **Date** | When incident occurred | 2024-03-15 | | **AI System** | Platform/model involved | ChatGPT (GPT-4) | | **Source** | URL or citation | https://example.com/article | | **Failure Mode** | Category (see below) | Hallucination | | **Description** | Detailed narrative (500-2000 words) | [Full text] | | **Impact** | Real-world harm or potential | Patient delayed seeking real medical help | | **Submitter Name** | For attribution | Jane Doe | | **Submitter Email** | For contact | jane@example.com (not public) | | **Consent** | Public attribution checkbox | ✓ Checked | ### Optional Fields | Field | Description | |-------|-------------| | **Tractatus Analysis** | Submitter's view of which framework boundary was crossed | | **Prevention Strategy** | How Tractatus could prevent this | | **Additional Links** | Follow-up articles, discussions | --- ## Failure Mode Categories ### Taxonomy 1. **Hallucination**: AI generates false information presented as fact 2. **Boundary Violation**: AI makes values/ethical decision without human approval 3. **Instruction Override**: AI disregards explicit user instructions (27027-type) 4. **Privacy Breach**: AI exposes sensitive data 5. **Bias/Discrimination**: AI exhibits unfair treatment based on protected characteristics 6. **Safety Bypass**: AI provides harmful information despite safety measures 7. **Context Failure**: AI loses track of conversation context, makes incoherent decisions 8. **Ambiguity Exploitation**: AI interprets ambiguous instructions in harmful way **AI Role**: Suggest category based on description (human verifies). --- ## AI-Assisted Analysis ### Step 1: Relevance Assessment **AI Task**: Determine if submission is relevant to Tractatus framework. **Input to AI**: ```markdown Analyze this case study submission for Tractatus relevance. Title: [TITLE] Summary: [SUMMARY] Failure Mode: [CATEGORY] Description: [FULL_TEXT] Tractatus Framework focuses on: - Architectural constraints (not behavioral alignment) - Instruction persistence (AI remembers explicit instructions) - Boundary enforcement (values decisions require humans) - Context pressure monitoring (detecting degraded operation) Question: Is this case study relevant to Tractatus framework? Output format: Relevant: [Yes|No|Maybe] Confidence: [0.0-1.0] Reasoning: [3-sentence explanation] Tractatus Mapping: [Which framework component applies?] ``` **Human Override**: Admin can approve "Maybe" cases if insightful. --- ### Step 2: Tractatus Framework Mapping **AI Task**: Map incident to Tractatus components. **Output Example**: ```json { "relevant": true, "confidence": 0.88, "reasoning": "Incident demonstrates instruction override failure (27027-type). User explicitly instructed 'use MongoDB port 27017' but AI changed to 27027 based on pattern-matching. This is directly addressed by CrossReferenceValidator.", "framework_components": [ { "component": "CrossReferenceValidator", "applies": true, "explanation": "Would have caught instruction override before execution" }, { "component": "InstructionPersistenceClassifier", "applies": true, "explanation": "Would have tagged instruction as HIGH persistence (SYSTEM quadrant)" } ], "prevention_strategy": "CrossReferenceValidator would check proposed action (port 27027) against instruction database (port 27017) and reject before execution." } ``` --- ### Step 3: Quality Assessment **AI Task**: Evaluate submission quality (completeness, clarity, sources). **Quality Checklist**: - [ ] Incident clearly described (who, what, when, where, why) - [ ] Source provided (URL or citation) - [ ] Impact explained (actual or potential harm) - [ ] Failure mode correctly categorized - [ ] Sufficient detail for analysis (500+ words) - [ ] No obvious factual errors (AI flags, human verifies) **Quality Score**: 0.0-1.0 (threshold: 0.6 for publication) --- ## Human Moderation Workflow ### Step 1: Submission Received **Trigger**: Form submitted at `/submit-case-study` **Automated Actions**: 1. Log to database (`case_submissions` collection) 2. Send confirmation email to submitter 3. Alert admin (moderation queue notification) **No Auto-Publication**: All submissions require human approval. --- ### Step 2: AI Analysis Queue **Status**: "Pending AI Analysis" **AI Processing** (asynchronous): 1. Relevance assessment 2. Tractatus mapping 3. Quality evaluation **Output**: AI analysis object (stored in database) **Status Update**: "Pending Human Review" --- ### Step 3: Human Moderation Dashboard **Admin Dashboard**: `/admin/case-studies` **UI Elements**: - Submission list (sorted by submission date) - AI relevance score (color-coded) - Quality score (0.0-1.0) - Quick actions: Approve | Edit | Request Changes | Reject **Moderation Criteria**: **APPROVE** if: - ✓ Relevant to Tractatus framework (AI confidence >0.7 OR human override) - ✓ Quality score >0.6 (or human override for exceptional cases) - ✓ Source credible (verified by human) - ✓ No obvious factual errors - ✓ Submitter consent checkbox checked **REQUEST CHANGES** if: - ⚠ Low quality score (0.4-0.6) but salvageable - ⚠ Missing source information - ⚠ Unclear description (needs elaboration) - ⚠ Wrong category (suggest correct one) **REJECT** if: - ❌ Not relevant to Tractatus (AI confidence <0.3, human agrees) - ❌ Quality score <0.4 (insufficient detail) - ❌ Source not credible (blog rumor, no evidence) - ❌ Obvious factual errors - ❌ Spam, advertisement, or off-topic - ❌ No submitter consent --- ### Step 4: Approval Actions **If APPROVED**: 1. Status → "Approved" 2. Publish to `/case-studies/[slug]` 3. Add to case study index 4. Email submitter: "Thank you, your case study is now live" 5. Tweet/social share (future) **If REQUEST CHANGES**: 1. Status → "Changes Requested" 2. Email submitter with specific feedback 3. Submitter can resubmit via unique edit link **If REJECTED**: 1. Status → "Rejected" 2. Email submitter with rejection reason (specific, helpful) 3. Option to revise and resubmit --- ## Moderation Guidelines ### Factual Accuracy **Standard**: All claims must be verifiable. **Verification Process**: 1. Check source link (does article exist?) 2. Verify key facts (dates, system names, outcomes) 3. Flag unverified claims for submitter clarification 4. If major discrepancies → Request Changes or Reject **AI Assistance**: AI can flag potential errors, but human must verify. --- ### Source Credibility **Tier 1 (Highest Credibility)**: - News outlets (NY Times, Wired, Ars Technica) - Academic papers (peer-reviewed journals) - Official incident reports (company postmortems, gov't investigations) - Technical blogs from verified experts **Tier 2 (Acceptable)**: - Smaller news sites (if facts verifiable) - Personal blogs from domain experts (if well-cited) - Social media from verified accounts (archived) **Tier 3 (Requires Caution)**: - Reddit, HackerNews discussions (corroborate with Tier 1/2) - Anonymous sources (verify claims independently) **Unacceptable**: - No source provided - Broken links - Paywalled sources (submitter must provide archived version) --- ### Tractatus Relevance **High Relevance** (AI confidence >0.8): - Direct instruction override (27027-type) - Boundary violations (AI making values decisions) - Context pressure failures (AI degrading under load) **Medium Relevance** (0.5-0.8): - Hallucinations (if related to context limits) - Bias incidents (if boundary enforcement could prevent) - Safety bypasses (if instruction persistence applies) **Low Relevance** (<0.5): - Generic AI failures unrelated to architecture - Issues solvable by behavioral alignment only - Non-LLM AI systems (unless architectural lessons apply) **Human Judgment**: Low-relevance submissions may still be approved if they provide valuable contrast ("how Tractatus differs from alignment approaches"). --- ### Tone & Presentation **Acceptable**: - Objective, factual tone - Critical but fair analysis - Speculation clearly labeled as such **Unacceptable**: - Sensationalism ("AI gone rogue!") - Personal attacks on developers/companies - Fear-mongering without evidence - Promotional content disguised as case study **Editing**: Admin may lightly edit for clarity, grammar, formatting (with note to submitter). --- ## Attribution & Licensing ### Submitter Attribution **Default**: Submitter name + optional link (website, Twitter) **Example**: ```markdown **Submitted by**: Jane Doe ([janedoe.com](https://janedoe.com)) **Reviewed by**: Tractatus Team **Published**: 2025-10-15 ``` **Anonymous Option**: Submitter can request "Submitted by: Anonymous" (but must still provide email for contact). --- ### Content Licensing **License**: Creative Commons Attribution-ShareAlike 4.0 (CC BY-SA 4.0) **Rationale**: Encourages sharing, derivative work, while requiring attribution. **Submitter Agreement** (consent checkbox): > By submitting, I grant the Tractatus Framework project a non-exclusive, worldwide license to publish this case study under CC BY-SA 4.0. I confirm that I am the original author or have permission to submit this content. --- ## Rejection Reasons (Examples) **Clear, Specific Feedback**: ❌ **Too generic**: "Not relevant to Tractatus" → ✅ **Specific**: "This incident relates to training data bias, which Tractatus framework doesn't address (focuses on runtime architectural constraints). Consider reframing to emphasize if boundary enforcement could prevent deployment of biased model." ❌ **Too harsh**: "This is poorly written" → ✅ **Constructive**: "The description lacks detail about the failure mechanism. Could you expand on how the AI overrode the instruction? What was the exact prompt and response?" --- ## Performance Metrics ### Moderation Quality **Metrics**: - Approval rate: 50-70% (target - indicates good filter) - Time to first review: <7 days (target) - Revision rate: <30% (approved after changes requested) - Submitter satisfaction: 4+/5 (post-moderation survey) ### Case Study Engagement **Metrics**: - Views/case: 100+ (soft launch target) - Social shares: 10+/case - Community submissions: 3+/month (Phase 2) --- ## Seed Content (Phase 2 Launch) **Goal**: Publish 3-5 high-quality case studies before opening community submissions. **Curated Examples**: 1. **The 27027 Incident** (canonical example of instruction override) 2. **ChatGPT Medical Hallucination** (boundary violation - health advice without human MD) 3. **GitHub Copilot Code Injection** (context pressure - suggestion based on incomplete understanding) 4. **Bing Chat Sydney Persona** (metacognitive failure - AI loses track of instructions) 5. **Jasper AI Copyright Violation** (boundary violation - legal decision without human lawyer) **Author**: John Stroh (or AI-assisted, human-reviewed per TRA-OPS-0002) --- ## Revision & Updates **Review Cycle**: Quarterly **Update Triggers**: - Approval rate <40% (standards too strict) or >80% (too lenient) - User complaints about rejection reasons - New failure mode categories emerge --- ## Related Documents - TRA-OPS-0001: AI Content Generation Policy (parent) - TRA-OPS-0002: Blog Editorial Guidelines (similar quality standards) - TRA-OPS-0005: Human Oversight Requirements --- ## Approval | Role | Name | Signature | Date | |------|------|-----------|------| | **Policy Owner** | John Stroh | [Pending] | [TBD] | | **Technical Reviewer** | Claude Code | [Pending] | 2025-10-07 | | **Final Approval** | John Stroh | [Pending] | [TBD] | --- **Status**: DRAFT (awaiting John Stroh approval) **Effective Date**: Upon Phase 2 case study portal launch **Next Review**: 2026-01-07