tractatus/governance/TRA-OPS-0004-case-study-moderation-standards-v1-0.md
TheFlow 2298d36bed fix(submissions): restructure Economist package and fix article display
- Create Economist SubmissionTracking package correctly:
  * mainArticle = full blog post content
  * coverLetter = 216-word SIR— letter
  * Links to blog post via blogPostId
- Archive 'Letter to The Economist' from blog posts (it's the cover letter)
- Fix date display on article cards (use published_at)
- Target publication already displaying via blue badge

Database changes:
- Make blogPostId optional in SubmissionTracking model
- Economist package ID: 68fa85ae49d4900e7f2ecd83
- Le Monde package ID: 68fa2abd2e6acd5691932150

Next: Enhanced modal with tabs, validation, export

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-24 08:47:42 +13:00

13 KiB

TRA-OPS-0004: Case Study Moderation Standards v1.0

Document ID: TRA-OPS-0004 Version: 1.0 Classification: OPERATIONAL Status: DRAFT → ACTIVE (upon Phase 2 start) Created: 2025-10-07 Owner: John Stroh Review Cycle: Quarterly Next Review: 2026-01-07 Parent Policy: TRA-OPS-0001 (AI Content Generation Policy)


Purpose

This document establishes moderation standards for community-submitted case studies of real-world AI failures, ensuring quality, accuracy, and Tractatus framework relevance.

Scope

Applies to all case study submissions via /submit-case-study, including:

  • AI system failures (production incidents)
  • LLM misalignment examples (jailbreaks, hallucinations)
  • Governance failures (privacy breaches, bias incidents)
  • Speculative scenarios (if well-reasoned)

Submission Requirements

Mandatory Fields

Field Description Example
Title Concise incident description (50 chars) "ChatGPT Medical Advice Hallucination"
Summary 2-3 sentence overview (200 chars) "ChatGPT provided confident but incorrect medical diagnosis..."
Date When incident occurred 2024-03-15
AI System Platform/model involved ChatGPT (GPT-4)
Source URL or citation https://example.com/article
Failure Mode Category (see below) Hallucination
Description Detailed narrative (500-2000 words) [Full text]
Impact Real-world harm or potential Patient delayed seeking real medical help
Submitter Name For attribution Jane Doe
Submitter Email For contact jane@example.com (not public)
Consent Public attribution checkbox ✓ Checked

Optional Fields

Field Description
Tractatus Analysis Submitter's view of which framework boundary was crossed
Prevention Strategy How Tractatus could prevent this
Additional Links Follow-up articles, discussions

Failure Mode Categories

Taxonomy

  1. Hallucination: AI generates false information presented as fact
  2. Boundary Violation: AI makes values/ethical decision without human approval
  3. Instruction Override: AI disregards explicit user instructions (27027-type)
  4. Privacy Breach: AI exposes sensitive data
  5. Bias/Discrimination: AI exhibits unfair treatment based on protected characteristics
  6. Safety Bypass: AI provides harmful information despite safety measures
  7. Context Failure: AI loses track of conversation context, makes incoherent decisions
  8. Ambiguity Exploitation: AI interprets ambiguous instructions in harmful way

AI Role: Suggest category based on description (human verifies).


AI-Assisted Analysis

Step 1: Relevance Assessment

AI Task: Determine if submission is relevant to Tractatus framework.

Input to AI:

Analyze this case study submission for Tractatus relevance.

Title: [TITLE]
Summary: [SUMMARY]
Failure Mode: [CATEGORY]
Description: [FULL_TEXT]

Tractatus Framework focuses on:
- Architectural constraints (not behavioral alignment)
- Instruction persistence (AI remembers explicit instructions)
- Boundary enforcement (values decisions require humans)
- Context pressure monitoring (detecting degraded operation)

Question: Is this case study relevant to Tractatus framework?

Output format:
Relevant: [Yes|No|Maybe]
Confidence: [0.0-1.0]
Reasoning: [3-sentence explanation]
Tractatus Mapping: [Which framework component applies?]

Human Override: Admin can approve "Maybe" cases if insightful.


Step 2: Tractatus Framework Mapping

AI Task: Map incident to Tractatus components.

Output Example:

{
  "relevant": true,
  "confidence": 0.88,
  "reasoning": "Incident demonstrates instruction override failure (27027-type). User explicitly instructed 'use MongoDB port 27017' but AI changed to 27027 based on pattern-matching. This is directly addressed by CrossReferenceValidator.",
  "framework_components": [
    {
      "component": "CrossReferenceValidator",
      "applies": true,
      "explanation": "Would have caught instruction override before execution"
    },
    {
      "component": "InstructionPersistenceClassifier",
      "applies": true,
      "explanation": "Would have tagged instruction as HIGH persistence (SYSTEM quadrant)"
    }
  ],
  "prevention_strategy": "CrossReferenceValidator would check proposed action (port 27027) against instruction database (port 27017) and reject before execution."
}

Step 3: Quality Assessment

AI Task: Evaluate submission quality (completeness, clarity, sources).

Quality Checklist:

  • Incident clearly described (who, what, when, where, why)
  • Source provided (URL or citation)
  • Impact explained (actual or potential harm)
  • Failure mode correctly categorized
  • Sufficient detail for analysis (500+ words)
  • No obvious factual errors (AI flags, human verifies)

Quality Score: 0.0-1.0 (threshold: 0.6 for publication)


Human Moderation Workflow

Step 1: Submission Received

Trigger: Form submitted at /submit-case-study

Automated Actions:

  1. Log to database (case_submissions collection)
  2. Send confirmation email to submitter
  3. Alert admin (moderation queue notification)

No Auto-Publication: All submissions require human approval.


Step 2: AI Analysis Queue

Status: "Pending AI Analysis"

AI Processing (asynchronous):

  1. Relevance assessment
  2. Tractatus mapping
  3. Quality evaluation

Output: AI analysis object (stored in database)

Status Update: "Pending Human Review"


Step 3: Human Moderation Dashboard

Admin Dashboard: /admin/case-studies

UI Elements:

  • Submission list (sorted by submission date)
  • AI relevance score (color-coded)
  • Quality score (0.0-1.0)
  • Quick actions: Approve | Edit | Request Changes | Reject

Moderation Criteria:

APPROVE if:

  • ✓ Relevant to Tractatus framework (AI confidence >0.7 OR human override)
  • ✓ Quality score >0.6 (or human override for exceptional cases)
  • ✓ Source credible (verified by human)
  • ✓ No obvious factual errors
  • ✓ Submitter consent checkbox checked

REQUEST CHANGES if:

  • ⚠ Low quality score (0.4-0.6) but salvageable
  • ⚠ Missing source information
  • ⚠ Unclear description (needs elaboration)
  • ⚠ Wrong category (suggest correct one)

REJECT if:

  • Not relevant to Tractatus (AI confidence <0.3, human agrees)
  • Quality score <0.4 (insufficient detail)
  • Source not credible (blog rumor, no evidence)
  • Obvious factual errors
  • Spam, advertisement, or off-topic
  • No submitter consent

Step 4: Approval Actions

If APPROVED:

  1. Status → "Approved"
  2. Publish to /case-studies/[slug]
  3. Add to case study index
  4. Email submitter: "Thank you, your case study is now live"
  5. Tweet/social share (future)

If REQUEST CHANGES:

  1. Status → "Changes Requested"
  2. Email submitter with specific feedback
  3. Submitter can resubmit via unique edit link

If REJECTED:

  1. Status → "Rejected"
  2. Email submitter with rejection reason (specific, helpful)
  3. Option to revise and resubmit

Moderation Guidelines

Factual Accuracy

Standard: All claims must be verifiable.

Verification Process:

  1. Check source link (does article exist?)
  2. Verify key facts (dates, system names, outcomes)
  3. Flag unverified claims for submitter clarification
  4. If major discrepancies → Request Changes or Reject

AI Assistance: AI can flag potential errors, but human must verify.


Source Credibility

Tier 1 (Highest Credibility):

  • News outlets (NY Times, Wired, Ars Technica)
  • Academic papers (peer-reviewed journals)
  • Official incident reports (company postmortems, gov't investigations)
  • Technical blogs from verified experts

Tier 2 (Acceptable):

  • Smaller news sites (if facts verifiable)
  • Personal blogs from domain experts (if well-cited)
  • Social media from verified accounts (archived)

Tier 3 (Requires Caution):

  • Reddit, HackerNews discussions (corroborate with Tier 1/2)
  • Anonymous sources (verify claims independently)

Unacceptable:

  • No source provided
  • Broken links
  • Paywalled sources (submitter must provide archived version)

Tractatus Relevance

High Relevance (AI confidence >0.8):

  • Direct instruction override (27027-type)
  • Boundary violations (AI making values decisions)
  • Context pressure failures (AI degrading under load)

Medium Relevance (0.5-0.8):

  • Hallucinations (if related to context limits)
  • Bias incidents (if boundary enforcement could prevent)
  • Safety bypasses (if instruction persistence applies)

Low Relevance (<0.5):

  • Generic AI failures unrelated to architecture
  • Issues solvable by behavioral alignment only
  • Non-LLM AI systems (unless architectural lessons apply)

Human Judgment: Low-relevance submissions may still be approved if they provide valuable contrast ("how Tractatus differs from alignment approaches").


Tone & Presentation

Acceptable:

  • Objective, factual tone
  • Critical but fair analysis
  • Speculation clearly labeled as such

Unacceptable:

  • Sensationalism ("AI gone rogue!")
  • Personal attacks on developers/companies
  • Fear-mongering without evidence
  • Promotional content disguised as case study

Editing: Admin may lightly edit for clarity, grammar, formatting (with note to submitter).


Attribution & Licensing

Submitter Attribution

Default: Submitter name + optional link (website, Twitter)

Example:

**Submitted by**: Jane Doe ([janedoe.com](https://janedoe.com))
**Reviewed by**: Tractatus Team
**Published**: 2025-10-15

Anonymous Option: Submitter can request "Submitted by: Anonymous" (but must still provide email for contact).


Content Licensing

License: Creative Commons Attribution-ShareAlike 4.0 (CC BY-SA 4.0)

Rationale: Encourages sharing, derivative work, while requiring attribution.

Submitter Agreement (consent checkbox):

By submitting, I grant the Tractatus Framework project a non-exclusive, worldwide license to publish this case study under CC BY-SA 4.0. I confirm that I am the original author or have permission to submit this content.


Rejection Reasons (Examples)

Clear, Specific Feedback:

Too generic: "Not relevant to Tractatus" → Specific: "This incident relates to training data bias, which Tractatus framework doesn't address (focuses on runtime architectural constraints). Consider reframing to emphasize if boundary enforcement could prevent deployment of biased model."

Too harsh: "This is poorly written" → Constructive: "The description lacks detail about the failure mechanism. Could you expand on how the AI overrode the instruction? What was the exact prompt and response?"


Performance Metrics

Moderation Quality

Metrics:

  • Approval rate: 50-70% (target - indicates good filter)
  • Time to first review: <7 days (target)
  • Revision rate: <30% (approved after changes requested)
  • Submitter satisfaction: 4+/5 (post-moderation survey)

Case Study Engagement

Metrics:

  • Views/case: 100+ (soft launch target)
  • Social shares: 10+/case
  • Community submissions: 3+/month (Phase 2)

Seed Content (Phase 2 Launch)

Goal: Publish 3-5 high-quality case studies before opening community submissions.

Curated Examples:

  1. The 27027 Incident (canonical example of instruction override)
  2. ChatGPT Medical Hallucination (boundary violation - health advice without human MD)
  3. GitHub Copilot Code Injection (context pressure - suggestion based on incomplete understanding)
  4. Bing Chat Sydney Persona (metacognitive failure - AI loses track of instructions)
  5. Jasper AI Copyright Violation (boundary violation - legal decision without human lawyer)

Author: John Stroh (or AI-assisted, human-reviewed per TRA-OPS-0002)


Revision & Updates

Review Cycle: Quarterly

Update Triggers:

  • Approval rate <40% (standards too strict) or >80% (too lenient)
  • User complaints about rejection reasons
  • New failure mode categories emerge

  • TRA-OPS-0001: AI Content Generation Policy (parent)
  • TRA-OPS-0002: Blog Editorial Guidelines (similar quality standards)
  • TRA-OPS-0005: Human Oversight Requirements

Approval

Role Name Signature Date
Policy Owner John Stroh [Pending] [TBD]
Technical Reviewer Claude Code [Pending] 2025-10-07
Final Approval John Stroh [Pending] [TBD]

Status: DRAFT (awaiting John Stroh approval) Effective Date: Upon Phase 2 case study portal launch Next Review: 2026-01-07