Phase 2 Planning Documents Created: 1. PHASE-2-ROADMAP.md (Comprehensive 3-month plan) - Timeline & milestones (Month 1: Infrastructure, Month 2: AI features, Month 3: Soft launch) - 5 workstreams: Infrastructure, AI features, Governance, Content, Analytics - Success criteria (technical, governance, user, business) - Risk assessment with mitigation strategies - Decision points requiring approval 2. PHASE-2-COST-ESTIMATES.md (Budget planning) - Total Phase 2 cost: $550 USD (~$900 NZD) for 3 months - Recommended: VPS Essential ($30/mo) + Claude API ($50/mo) - Usage scenarios: Minimal, Standard (recommended), High - Cost optimization strategies (30-50% savings potential) - Monthly budget template for post-launch 3. PHASE-2-INFRASTRUCTURE-PLAN.md (Technical specifications) - Architecture: Cloudflare → Nginx → Node.js → MongoDB - Server specs: OVHCloud VPS Essential (2 vCore, 4GB RAM, 80GB SSD) - Deployment procedures (step-by-step server setup) - Security hardening (UFW, Fail2ban, SSH, MongoDB) - SSL/TLS with Let's Encrypt - Monitoring, logging, backup & disaster recovery - Complete deployment checklist (60+ verification steps) 4. Governance Documents (TRA-OPS-0001 through TRA-OPS-0005) TRA-OPS-0001: AI Content Generation Policy (Master policy) - Mandatory human approval for all AI content - Values boundary enforcement (Tractatus §12.1-12.7) - Transparency & attribution requirements - Quality & accuracy standards - Privacy & data protection (GDPR-lite) - Cost & resource management ($200/month cap) TRA-OPS-0002: Blog Editorial Guidelines - Editorial mission & content principles - 4 content categories (Framework updates, Case studies, Technical, Commentary) - AI-assisted workflow (topic → outline → human draft → approval) - Citation standards (APA-lite, 100% verification) - Writing standards (tone, voice, format, structure) - Publishing schedule (2-4 posts/month) TRA-OPS-0003: Media Inquiry Response Protocol - Inquiry classification (Press, Academic, Commercial, Community, Spam) - AI-assisted triage with priority scoring - Human approval for all responses (no auto-send) - PII anonymization before AI processing - Response templates & SLAs (4h for HIGH priority) - Escalation procedures to John Stroh TRA-OPS-0004: Case Study Moderation Standards - Submission requirements (title, summary, source, failure mode) - AI-assisted relevance assessment & Tractatus mapping - Quality checklist (completeness, clarity, sources) - Moderation workflow (approve/edit/request changes/reject) - Attribution & licensing (CC BY-SA 4.0) - Seed content: 3-5 curated case studies for launch TRA-OPS-0005: Human Oversight Requirements - 3 oversight models: MHA (mandatory approval), HITL (human-in-loop), HOTL (human-on-loop) - Admin reviewer role & responsibilities - Service level agreements (4h for media HIGH, 7 days for case studies) - Approval authority matrix (admin vs. John Stroh) - Quality assurance checklists - Incident response (boundary violations, poor quality) - Training & onboarding procedures Key Principles Across All Documents: - Tractatus dogfooding: Framework governs its own AI operations - "What cannot be systematized must not be automated" - Zero tolerance for AI values decisions without human approval - Transparency in all AI assistance (clear attribution) - Human-in-the-loop for STRATEGIC/OPERATIONAL quadrants - Audit trail for all AI decisions (2-year retention) Next Steps (Awaiting Approval): - [ ] John Stroh reviews all 8 documents - [ ] Budget approval ($550 for Phase 2, $100-150/month ongoing) - [ ] Phase 2 start date confirmed - [ ] OVHCloud VPS provisioned - [ ] Anthropic Claude API account created Phase 2 Status: PLANNING COMPLETE → Awaiting approval to begin deployment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
13 KiB
TRA-OPS-0004: Case Study Moderation Standards v1.0
Document ID: TRA-OPS-0004 Version: 1.0 Classification: OPERATIONAL Status: DRAFT → ACTIVE (upon Phase 2 start) Created: 2025-10-07 Owner: John Stroh Review Cycle: Quarterly Next Review: 2026-01-07 Parent Policy: TRA-OPS-0001 (AI Content Generation Policy)
Purpose
This document establishes moderation standards for community-submitted case studies of real-world AI failures, ensuring quality, accuracy, and Tractatus framework relevance.
Scope
Applies to all case study submissions via /submit-case-study, including:
- AI system failures (production incidents)
- LLM misalignment examples (jailbreaks, hallucinations)
- Governance failures (privacy breaches, bias incidents)
- Speculative scenarios (if well-reasoned)
Submission Requirements
Mandatory Fields
| Field | Description | Example |
|---|---|---|
| Title | Concise incident description (50 chars) | "ChatGPT Medical Advice Hallucination" |
| Summary | 2-3 sentence overview (200 chars) | "ChatGPT provided confident but incorrect medical diagnosis..." |
| Date | When incident occurred | 2024-03-15 |
| AI System | Platform/model involved | ChatGPT (GPT-4) |
| Source | URL or citation | https://example.com/article |
| Failure Mode | Category (see below) | Hallucination |
| Description | Detailed narrative (500-2000 words) | [Full text] |
| Impact | Real-world harm or potential | Patient delayed seeking real medical help |
| Submitter Name | For attribution | Jane Doe |
| Submitter Email | For contact | jane@example.com (not public) |
| Consent | Public attribution checkbox | ✓ Checked |
Optional Fields
| Field | Description |
|---|---|
| Tractatus Analysis | Submitter's view of which framework boundary was crossed |
| Prevention Strategy | How Tractatus could prevent this |
| Additional Links | Follow-up articles, discussions |
Failure Mode Categories
Taxonomy
- Hallucination: AI generates false information presented as fact
- Boundary Violation: AI makes values/ethical decision without human approval
- Instruction Override: AI disregards explicit user instructions (27027-type)
- Privacy Breach: AI exposes sensitive data
- Bias/Discrimination: AI exhibits unfair treatment based on protected characteristics
- Safety Bypass: AI provides harmful information despite safety measures
- Context Failure: AI loses track of conversation context, makes incoherent decisions
- Ambiguity Exploitation: AI interprets ambiguous instructions in harmful way
AI Role: Suggest category based on description (human verifies).
AI-Assisted Analysis
Step 1: Relevance Assessment
AI Task: Determine if submission is relevant to Tractatus framework.
Input to AI:
Analyze this case study submission for Tractatus relevance.
Title: [TITLE]
Summary: [SUMMARY]
Failure Mode: [CATEGORY]
Description: [FULL_TEXT]
Tractatus Framework focuses on:
- Architectural constraints (not behavioral alignment)
- Instruction persistence (AI remembers explicit instructions)
- Boundary enforcement (values decisions require humans)
- Context pressure monitoring (detecting degraded operation)
Question: Is this case study relevant to Tractatus framework?
Output format:
Relevant: [Yes|No|Maybe]
Confidence: [0.0-1.0]
Reasoning: [3-sentence explanation]
Tractatus Mapping: [Which framework component applies?]
Human Override: Admin can approve "Maybe" cases if insightful.
Step 2: Tractatus Framework Mapping
AI Task: Map incident to Tractatus components.
Output Example:
{
"relevant": true,
"confidence": 0.88,
"reasoning": "Incident demonstrates instruction override failure (27027-type). User explicitly instructed 'use MongoDB port 27017' but AI changed to 27027 based on pattern-matching. This is directly addressed by CrossReferenceValidator.",
"framework_components": [
{
"component": "CrossReferenceValidator",
"applies": true,
"explanation": "Would have caught instruction override before execution"
},
{
"component": "InstructionPersistenceClassifier",
"applies": true,
"explanation": "Would have tagged instruction as HIGH persistence (SYSTEM quadrant)"
}
],
"prevention_strategy": "CrossReferenceValidator would check proposed action (port 27027) against instruction database (port 27017) and reject before execution."
}
Step 3: Quality Assessment
AI Task: Evaluate submission quality (completeness, clarity, sources).
Quality Checklist:
- Incident clearly described (who, what, when, where, why)
- Source provided (URL or citation)
- Impact explained (actual or potential harm)
- Failure mode correctly categorized
- Sufficient detail for analysis (500+ words)
- No obvious factual errors (AI flags, human verifies)
Quality Score: 0.0-1.0 (threshold: 0.6 for publication)
Human Moderation Workflow
Step 1: Submission Received
Trigger: Form submitted at /submit-case-study
Automated Actions:
- Log to database (
case_submissionscollection) - Send confirmation email to submitter
- Alert admin (moderation queue notification)
No Auto-Publication: All submissions require human approval.
Step 2: AI Analysis Queue
Status: "Pending AI Analysis"
AI Processing (asynchronous):
- Relevance assessment
- Tractatus mapping
- Quality evaluation
Output: AI analysis object (stored in database)
Status Update: "Pending Human Review"
Step 3: Human Moderation Dashboard
Admin Dashboard: /admin/case-studies
UI Elements:
- Submission list (sorted by submission date)
- AI relevance score (color-coded)
- Quality score (0.0-1.0)
- Quick actions: Approve | Edit | Request Changes | Reject
Moderation Criteria:
APPROVE if:
- ✓ Relevant to Tractatus framework (AI confidence >0.7 OR human override)
- ✓ Quality score >0.6 (or human override for exceptional cases)
- ✓ Source credible (verified by human)
- ✓ No obvious factual errors
- ✓ Submitter consent checkbox checked
REQUEST CHANGES if:
- ⚠ Low quality score (0.4-0.6) but salvageable
- ⚠ Missing source information
- ⚠ Unclear description (needs elaboration)
- ⚠ Wrong category (suggest correct one)
REJECT if:
- ❌ Not relevant to Tractatus (AI confidence <0.3, human agrees)
- ❌ Quality score <0.4 (insufficient detail)
- ❌ Source not credible (blog rumor, no evidence)
- ❌ Obvious factual errors
- ❌ Spam, advertisement, or off-topic
- ❌ No submitter consent
Step 4: Approval Actions
If APPROVED:
- Status → "Approved"
- Publish to
/case-studies/[slug] - Add to case study index
- Email submitter: "Thank you, your case study is now live"
- Tweet/social share (future)
If REQUEST CHANGES:
- Status → "Changes Requested"
- Email submitter with specific feedback
- Submitter can resubmit via unique edit link
If REJECTED:
- Status → "Rejected"
- Email submitter with rejection reason (specific, helpful)
- Option to revise and resubmit
Moderation Guidelines
Factual Accuracy
Standard: All claims must be verifiable.
Verification Process:
- Check source link (does article exist?)
- Verify key facts (dates, system names, outcomes)
- Flag unverified claims for submitter clarification
- If major discrepancies → Request Changes or Reject
AI Assistance: AI can flag potential errors, but human must verify.
Source Credibility
Tier 1 (Highest Credibility):
- News outlets (NY Times, Wired, Ars Technica)
- Academic papers (peer-reviewed journals)
- Official incident reports (company postmortems, gov't investigations)
- Technical blogs from verified experts
Tier 2 (Acceptable):
- Smaller news sites (if facts verifiable)
- Personal blogs from domain experts (if well-cited)
- Social media from verified accounts (archived)
Tier 3 (Requires Caution):
- Reddit, HackerNews discussions (corroborate with Tier 1/2)
- Anonymous sources (verify claims independently)
Unacceptable:
- No source provided
- Broken links
- Paywalled sources (submitter must provide archived version)
Tractatus Relevance
High Relevance (AI confidence >0.8):
- Direct instruction override (27027-type)
- Boundary violations (AI making values decisions)
- Context pressure failures (AI degrading under load)
Medium Relevance (0.5-0.8):
- Hallucinations (if related to context limits)
- Bias incidents (if boundary enforcement could prevent)
- Safety bypasses (if instruction persistence applies)
Low Relevance (<0.5):
- Generic AI failures unrelated to architecture
- Issues solvable by behavioral alignment only
- Non-LLM AI systems (unless architectural lessons apply)
Human Judgment: Low-relevance submissions may still be approved if they provide valuable contrast ("how Tractatus differs from alignment approaches").
Tone & Presentation
Acceptable:
- Objective, factual tone
- Critical but fair analysis
- Speculation clearly labeled as such
Unacceptable:
- Sensationalism ("AI gone rogue!")
- Personal attacks on developers/companies
- Fear-mongering without evidence
- Promotional content disguised as case study
Editing: Admin may lightly edit for clarity, grammar, formatting (with note to submitter).
Attribution & Licensing
Submitter Attribution
Default: Submitter name + optional link (website, Twitter)
Example:
**Submitted by**: Jane Doe ([janedoe.com](https://janedoe.com))
**Reviewed by**: Tractatus Team
**Published**: 2025-10-15
Anonymous Option: Submitter can request "Submitted by: Anonymous" (but must still provide email for contact).
Content Licensing
License: Creative Commons Attribution-ShareAlike 4.0 (CC BY-SA 4.0)
Rationale: Encourages sharing, derivative work, while requiring attribution.
Submitter Agreement (consent checkbox):
By submitting, I grant the Tractatus Framework project a non-exclusive, worldwide license to publish this case study under CC BY-SA 4.0. I confirm that I am the original author or have permission to submit this content.
Rejection Reasons (Examples)
Clear, Specific Feedback:
❌ Too generic: "Not relevant to Tractatus" → ✅ Specific: "This incident relates to training data bias, which Tractatus framework doesn't address (focuses on runtime architectural constraints). Consider reframing to emphasize if boundary enforcement could prevent deployment of biased model."
❌ Too harsh: "This is poorly written" → ✅ Constructive: "The description lacks detail about the failure mechanism. Could you expand on how the AI overrode the instruction? What was the exact prompt and response?"
Performance Metrics
Moderation Quality
Metrics:
- Approval rate: 50-70% (target - indicates good filter)
- Time to first review: <7 days (target)
- Revision rate: <30% (approved after changes requested)
- Submitter satisfaction: 4+/5 (post-moderation survey)
Case Study Engagement
Metrics:
- Views/case: 100+ (soft launch target)
- Social shares: 10+/case
- Community submissions: 3+/month (Phase 2)
Seed Content (Phase 2 Launch)
Goal: Publish 3-5 high-quality case studies before opening community submissions.
Curated Examples:
- The 27027 Incident (canonical example of instruction override)
- ChatGPT Medical Hallucination (boundary violation - health advice without human MD)
- GitHub Copilot Code Injection (context pressure - suggestion based on incomplete understanding)
- Bing Chat Sydney Persona (metacognitive failure - AI loses track of instructions)
- Jasper AI Copyright Violation (boundary violation - legal decision without human lawyer)
Author: John Stroh (or AI-assisted, human-reviewed per TRA-OPS-0002)
Revision & Updates
Review Cycle: Quarterly
Update Triggers:
- Approval rate <40% (standards too strict) or >80% (too lenient)
- User complaints about rejection reasons
- New failure mode categories emerge
Related Documents
- TRA-OPS-0001: AI Content Generation Policy (parent)
- TRA-OPS-0002: Blog Editorial Guidelines (similar quality standards)
- TRA-OPS-0005: Human Oversight Requirements
Approval
| Role | Name | Signature | Date |
|---|---|---|---|
| Policy Owner | John Stroh | [Pending] | [TBD] |
| Technical Reviewer | Claude Code | [Pending] | 2025-10-07 |
| Final Approval | John Stroh | [Pending] | [TBD] |
Status: DRAFT (awaiting John Stroh approval) Effective Date: Upon Phase 2 case study portal launch Next Review: 2026-01-07