tractatus/docs/PHASE-2-ROADMAP.md

# Phase 2 Roadmap: Production Deployment & AI-Powered Features

**Project**: Tractatus AI Safety Framework Website
**Phase**: 2 of 3
**Status**: Planning
**Created**: 2025-10-07
**Owner**: John Stroh
**Duration**: 2-3 months (estimated)

---

## Table of Contents

1. [Overview](#overview)
2. [Phase 1 Completion Summary](#phase-1-completion-summary)
3. [Phase 2 Objectives](#phase-2-objectives)
4. [Timeline & Milestones](#timeline--milestones)
5. [Workstreams](#workstreams)
6. [Success Criteria](#success-criteria)
7. [Risk Assessment](#risk-assessment)
8. [Decision Points](#decision-points)
9. [Dependencies](#dependencies)
10. [Budget Requirements](#budget-requirements)

---

## Overview

Phase 2 transitions the Tractatus Framework from a **local prototype** (Phase 1) to a **production-ready platform** with real users and AI-powered content features. This phase demonstrates the framework's capacity to govern its own AI operations through human-oversight workflows.

### Key Themes
- **Production Deployment**: OVHCloud hosting, domain configuration, SSL/TLS
- **AI Integration**: Claude API for blog curation, media triage, case studies
- **Dogfooding**: Tractatus framework governs all AI content generation
- **Security & Privacy**: Hardening, monitoring, privacy-respecting analytics
- **Soft Launch**: Initial user testing before public announcement (Phase 3)

---

## Phase 1 Completion Summary

**Completed**: 2025-10-07
**Status**: ✅ All objectives achieved

### Deliverables Completed
- ✅ MongoDB instance (port 27017, database `tractatus_dev`)
- ✅ Express application (port 9000, CSP-compliant)
- ✅ Document migration pipeline (12+ documents)
- ✅ Three audience paths (Researcher, Implementer, Advocate)
- ✅ Interactive demonstrations (27027, classification, boundary)
- ✅ Tractatus governance services (100% test coverage on core services)
  - InstructionPersistenceClassifier (85.3%)
  - CrossReferenceValidator (96.4%)
  - BoundaryEnforcer (100%)
  - ContextPressureMonitor (60.9%)
  - MetacognitiveVerifier (56.1%)
- ✅ Admin dashboard with moderation workflows
- ✅ API reference documentation
- ✅ WCAG AA accessibility compliance
- ✅ Mobile responsiveness optimization
- ✅ 118 integration tests (all passing)

### Technical Achievements
- CSP compliance: 100% (script-src 'self')
- Test coverage: 85.3%+ on Tractatus services
- Accessibility: WCAG AA compliant
- Performance: <2s page load times (local)
- Security: JWT authentication, role-based access control

---

## Phase 2 Objectives

### Primary Goals
1. **Deploy to production** on OVHCloud with domain `agenticgovernance.digital`
2. **Integrate Claude API** for AI-powered content features
3. **Implement human oversight workflows** via Tractatus framework
4. **Launch blog curation system** with moderation queue
5. **Enable media inquiry triage** with AI classification
6. **Create case study submission portal** for community contributions
7. **Soft launch** to initial user cohort (researchers, implementers)

### Non-Goals (Deferred to Phase 3)
- ❌ Koha donation system
- ❌ Multi-language translations
- ❌ Public marketing campaign
- ❌ Community forums/discussion boards
- ❌ Mobile app development

---

## Timeline & Milestones

### Month 1: Infrastructure & Deployment (Weeks 1-4)

**Week 1: Environment Setup**
- [ ] Provision OVHCloud VPS (specs TBD)
- [ ] Configure DNS for `agenticgovernance.digital` → production IP
- [ ] SSL/TLS certificates (Let's Encrypt)
- [ ] Firewall rules (UFW) and SSH hardening
- [ ] Create production MongoDB instance
- [ ] Set up systemd services (tractatus.service, mongodb-tractatus.service)

**Week 2: Application Deployment**
- [ ] Deploy Express application to production
- [ ] Configure Nginx reverse proxy (port 80/443 → 9000)
- [ ] Environment variables (.env.production)
- [ ] Production logging (file rotation, syslog)
- [ ] Database migration scripts (seed production data)
- [ ] Backup automation (MongoDB dumps, code snapshots)

**Week 3: Security Hardening**
- [ ] Fail2ban configuration (SSH, HTTP)
- [ ] Rate limiting (Nginx + application-level)
- [ ] Security headers audit (OWASP compliance)
- [ ] Vulnerability scanning (Trivy, npm audit)
- [ ] ProtonBridge email integration
- [ ] Admin notification system (email alerts)

**Week 4: Monitoring & Testing**
- [ ] Plausible Analytics deployment (self-hosted)
- [ ] Error tracking (Sentry or self-hosted alternative)
- [ ] Uptime monitoring (UptimeRobot or self-hosted)
- [ ] Performance baseline (Lighthouse, WebPageTest)
- [ ] Load testing (k6 or Artillery)
- [ ] Disaster recovery drill (restore from backup)

**Milestone 1**: Production environment live, accessible at `https://agenticgovernance.digital` ✅

---

### Month 2: AI-Powered Features (Weeks 5-8)

**Week 5: Claude API Integration**
- [ ] Anthropic API key setup (production account)
- [ ] ClaudeAPI.service refactoring for production
- [ ] Rate limiting and cost monitoring
- [ ] Error handling (API failures, timeout recovery)
- [ ] Prompt templates for blog/media/cases
- [ ] Token usage tracking and alerting

**Week 6: Blog Curation System**
- [ ] BlogCuration.service implementation
  - AI topic suggestion pipeline
  - Outline generation
  - Citation extraction
  - Draft formatting
- [ ] Human moderation workflow (approve/reject/edit)
- [ ] Blog post model (MongoDB schema)
- [ ] Blog UI (list, single post, RSS feed)
- [ ] OpenGraph/Twitter card metadata
- [ ] Seed content: 5-10 human-written posts

**Week 7: Media Inquiry Triage**
- [ ] MediaTriage.service implementation
  - Incoming inquiry classification (press, academic, commercial)
  - Priority scoring (high/medium/low)
  - Auto-draft response generation (for human approval)
- [ ] Media inquiry form (public-facing)
- [ ] Admin triage dashboard
- [ ] Email notification system
- [ ] Response templates

**Week 8: Case Study Portal**
- [ ] CaseSubmission.service implementation
  - Community submission form
  - AI relevance analysis (Tractatus framework mapping)
  - Failure mode categorization
- [ ] Case study moderation queue
- [ ] Public case study viewer
- [ ] Submission guidelines documentation
- [ ] Initial case studies (3-5 curated examples)

**Milestone 2**: All AI-powered features operational with human oversight ✅

---

### Month 3: Polish, Testing & Soft Launch (Weeks 9-12)

**Week 9: Governance Enforcement**
- [ ] Review all AI prompts against TRA-OPS-* policies
- [ ] Audit moderation workflows (Tractatus compliance)
- [ ] Test boundary enforcement (values decisions require humans)
- [ ] Cross-reference validator integration (AI content checks)
- [ ] MetacognitiveVerifier for complex AI operations
- [ ] Document AI decision audit trail

**Week 10: Content & Documentation**
- [ ] Final document migration review
- [ ] Cross-reference link validation
- [ ] PDF generation pipeline (downloads section)
- [ ] Citation index completion
- [ ] Privacy policy finalization
- [ ] Terms of service drafting
- [ ] About/Contact page updates

**Week 11: Testing & Optimization**
- [ ] End-to-end testing (user journeys)
- [ ] Performance optimization (CDN evaluation)
- [ ] Mobile testing (real devices)
- [ ] Browser compatibility (Firefox, Safari, Chrome)
- [ ] Accessibility re-audit (WCAG AA)
- [ ] Security penetration testing
- [ ] Load testing under realistic traffic

**Week 12: Soft Launch**
- [ ] Invite initial user cohort (20-50 users)
  - AI safety researchers
  - Academic institutions
  - Aligned organizations
- [ ] Collect feedback via structured surveys
- [ ] Monitor error rates and performance
- [ ] Iterate on UX issues
- [ ] Prepare for public launch (Phase 3)

**Milestone 3**: Soft launch complete, feedback collected, ready for public launch ✅

---

## Workstreams

### 1. Infrastructure & Deployment

**Owner**: Infrastructure Lead (or John Stroh)
**Duration**: Month 1 (Weeks 1-4)

#### Tasks
1. **Hosting Provision**
   - Select OVHCloud VPS tier (see Budget Requirements)
   - Provision server (Ubuntu 22.04 LTS recommended)
   - Configure DNS (A records, AAAA for IPv6)
   - Set up SSH key authentication (disable password auth)

2. **Web Server Configuration**
   - Install Nginx
   - Configure reverse proxy (port 9000 → 80/443)
   - SSL/TLS via Let's Encrypt (Certbot)
   - HTTP/2 and compression (gzip/brotli)
   - Security headers (CSP, HSTS, X-Frame-Options)

3. **Database Setup**
   - Install MongoDB 7.x
   - Configure authentication
   - Set up replication (optional for HA)
   - Automated backups (daily, 7-day retention)
   - Restore testing

4. **Application Deployment**
   - Git-based deployment workflow
   - Environment variables management
   - Systemd service configuration
   - Log rotation and management
   - Process monitoring (PM2 or systemd watchdog)

5. **Security Hardening**
   - UFW firewall (allow 22, 80, 443, deny all others)
   - Fail2ban (SSH, HTTP)
   - Unattended security updates
   - Intrusion detection (optional: OSSEC, Wazuh)
   - Regular security audits

**Deliverables**:
- Production server accessible at `https://agenticgovernance.digital`
- SSL/TLS A+ rating (SSL Labs)
- Automated backup system operational
- Monitoring dashboards configured

---

### 2. AI-Powered Features

**Owner**: AI Integration Lead (or John Stroh with Claude Code)
**Duration**: Month 2 (Weeks 5-8)

#### Tasks

##### 2.1 Claude API Integration
- **API Setup**
  - Anthropic production API key
  - Rate limiting configuration (requests/min, tokens/day)
  - Cost monitoring and alerting
  - Fallback handling (API downtime)

- **Service Architecture**
  - `ClaudeAPI.service.js` - Core API wrapper
  - Prompt template management
  - Token usage tracking
  - Error handling and retry logic

##### 2.2 Blog Curation System
- **AI Pipeline**
  - Topic suggestion (from AI safety news feeds)
  - Outline generation
  - Citation extraction and validation
  - Draft formatting (Markdown)

- **Human Oversight**
  - Moderation queue integration
  - Approve/Reject/Edit workflows
  - Tractatus boundary enforcement (AI cannot publish without approval)
  - Audit trail (who approved, when, why)

- **Publishing**
  - Blog post model (title, slug, content, author, published_at)
  - Blog list UI (pagination, filtering)
  - Single post viewer (comments optional)
  - RSS feed generation
  - Social media metadata (OpenGraph, Twitter cards)

##### 2.3 Media Inquiry Triage
- **AI Classification**
  - Inquiry type (press, academic, commercial, spam)
  - Priority scoring (urgency, relevance, reach)
  - Auto-draft responses (for human review)

- **Moderation Workflow**
  - Admin triage dashboard
  - Email notification to John Stroh
  - Response approval (before sending)
  - Contact management (CRM-lite)

##### 2.4 Case Study Portal
- **Community Submissions**
  - Public submission form (structured data)
  - AI relevance analysis (Tractatus applicability)
  - Failure mode categorization (27027-type, boundary violation, etc.)

- **Human Moderation**
  - Case review queue
  - Approve/Reject/Request Edits
  - Publication workflow
  - Attribution and licensing (CC BY-SA 4.0)

**Deliverables**:
- Blog system with 5-10 initial posts
- Media inquiry form with AI triage
- Case study portal with 3-5 examples
- All AI decisions subject to human approval

---

### 3. Governance & Policy

**Owner**: Governance Lead (or John Stroh)
**Duration**: Throughout Phase 2

#### Tasks
1. **Create TRA-OPS-* Documents**
   - TRA-OPS-0001: AI Content Generation Policy
   - TRA-OPS-0002: Blog Editorial Guidelines
   - TRA-OPS-0003: Media Inquiry Response Protocol
   - TRA-OPS-0004: Case Study Moderation Standards
   - TRA-OPS-0005: Human Oversight Requirements

2. **Tractatus Framework Enforcement**
   - Ensure all AI actions classified (STR/OPS/TAC/SYS/STO)
   - Cross-reference validator integration
   - Boundary enforcement (no AI values decisions)
   - Audit trail for AI decisions

3. **Legal & Compliance**
   - Privacy Policy (GDPR-lite, no tracking cookies)
   - Terms of Service
   - Content licensing (CC BY-SA 4.0 for community contributions)
   - Cookie policy (if analytics use cookies)

**Deliverables**:
- 5+ TRA-OPS-* governance documents
- Privacy Policy and Terms of Service
- Tractatus framework audit demonstrating compliance

---

### 4. Content & Documentation

**Owner**: Content Lead (or John Stroh)
**Duration**: Month 3 (Weeks 9-12)

#### Tasks
1. **Document Review**
   - Final review of all migrated documents
   - Cross-reference link validation
   - Formatting consistency
   - Citation completeness

2. **Blog Launch Content**
   - Write 5-10 seed blog posts (human-authored)
   - Topics: Framework introduction, 27027 incident, use cases, etc.
   - RSS feed implementation
   - Newsletter signup (optional)

3. **Legal Pages**
   - Privacy Policy
   - Terms of Service
   - About page (mission, values, Te Tiriti acknowledgment)
   - Contact page (ProtonMail integration)

4. **PDF Generation**
   - Automated PDF export for key documents
   - Download links in UI
   - Version tracking

**Deliverables**:
- All documents reviewed and polished
- 5-10 initial blog posts published
- Privacy Policy and Terms of Service live
- PDF downloads available

---

### 5. Analytics & Monitoring

**Owner**: Operations Lead (or John Stroh)
**Duration**: Month 1 & ongoing

#### Tasks
1. **Privacy-Respecting Analytics**
   - Deploy Plausible (self-hosted) or Matomo
   - No cookies, no tracking, GDPR-compliant
   - Metrics: page views, unique visitors, referrers
   - Geographic data (country-level only)

2. **Error Tracking**
   - Sentry (cloud) or self-hosted alternative (GlitchTip)
   - JavaScript error tracking
   - Server error logging
   - Alerting on critical errors

3. **Performance Monitoring**
   - Uptime monitoring (UptimeRobot or self-hosted)
   - Response time tracking
   - Database query performance
   - API usage metrics (Claude API tokens/day)

4. **Business Metrics**
   - Blog post views and engagement
   - Media inquiry volume
   - Case study submissions
   - Admin moderation activity

**Deliverables**:
- Analytics dashboard operational
- Error tracking with alerting
- Uptime monitoring (99.9% target)
- Monthly metrics report template

---

## Success Criteria

Phase 2 is considered **complete** when:

### Technical Success
- [ ] Production site live at `https://agenticgovernance.digital` with SSL/TLS
- [ ] All Phase 1 features operational in production
- [ ] Blog system publishing AI-curated content (with human approval)
- [ ] Media inquiry triage system processing requests
- [ ] Case study portal accepting community submissions
- [ ] Uptime: 99%+ over 30-day period
- [ ] Performance: <3s page load (95th percentile)
- [ ] Security: No critical vulnerabilities (OWASP Top 10)

### Governance Success
- [ ] All AI content requires human approval (0 auto-published posts)
- [ ] Tractatus framework audit shows 100% compliance
- [ ] TRA-OPS-* policies documented and enforced
- [ ] Boundary enforcer blocks values decisions by AI
- [ ] Audit trail for all AI decisions (who, what, when, why)

### User Success
- [ ] Soft launch cohort: 20-50 users
- [ ] User satisfaction: 4+/5 average rating
- [ ] Blog engagement: 50+ readers/post average
- [ ] Media inquiries: 5+ per month
- [ ] Case study submissions: 3+ per month
- [ ] Accessibility: WCAG AA maintained

### Business Success
- [ ] Monthly hosting costs <$100/month
- [ ] Claude API costs <$200/month
- [ ] Zero data breaches or security incidents
- [ ] Privacy policy: zero complaints
- [ ] Positive feedback from initial users

---

## Risk Assessment

### High-Risk Items

| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| **Claude API costs exceed budget** | Medium | High | Implement strict rate limiting, token usage alerts, monthly spending cap |
| **Security breach (data leak)** | Low | Critical | Security audit, penetration testing, bug bounty program (Phase 3) |
| **AI generates inappropriate content** | Medium | High | Mandatory human approval, content filters, moderation queue |
| **Server downtime during soft launch** | Medium | Medium | Uptime monitoring, automated backups, disaster recovery plan |
| **GDPR/privacy compliance issues** | Low | High | Legal review, privacy-by-design, no third-party tracking |

### Medium-Risk Items

| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| **OVHCloud service disruption** | Low | Medium | Multi-region backup plan, cloud provider diversification (Phase 3) |
| **Email delivery issues (ProtonBridge)** | Medium | Low | Fallback SMTP provider, email queue system |
| **Blog content quality concerns** | Medium | Medium | Editorial guidelines, human review, reader feedback loop |
| **Performance degradation under load** | Medium | Medium | Load testing, CDN evaluation, database optimization |
| **User confusion with UI/UX** | High | Low | User testing, clear documentation, onboarding flow |

### Low-Risk Items

| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| **Domain registration issues** | Very Low | Low | Auto-renewal, registrar lock |
| **SSL certificate expiry** | Very Low | Low | Certbot auto-renewal, monitoring alerts |
| **Dependency vulnerabilities** | Medium | Very Low | Dependabot, regular npm audit |

---

## Decision Points

### Before Starting Phase 2

**Required Approvals from John Stroh:**

1. **Budget Approval** (see Budget Requirements section)
   - OVHCloud hosting: $30-80/month
   - Claude API: $50-200/month
   - Total: ~$100-300/month

2. **Timeline Confirmation**
   - Start date for Phase 2
   - Acceptable completion timeframe (2-3 months)
   - Soft launch target date

3. **Content Strategy**
   - Blog editorial guidelines (TRA-OPS-0002)
   - Media response protocol (TRA-OPS-0003)
   - Case study moderation standards (TRA-OPS-0004)

4. **Privacy Policy**
   - Final wording for data collection
   - Analytics tool selection (Plausible vs. Matomo)
   - Email handling practices

5. **Soft Launch Strategy**
   - Target user cohort (researchers, implementers, advocates)
   - Invitation method (email, social media)
   - Feedback collection process

### During Phase 2

**Interim Decisions:**

1. **Week 2**: VPS tier selection (based on performance testing)
2. **Week 5**: Claude API usage limits (tokens/day, cost caps)
3. **Week 8**: Blog launch readiness (sufficient seed content?)
4. **Week 10**: Soft launch invite list (who to include?)
5. **Week 12**: Phase 3 go/no-go decision

---

## Dependencies

### External Dependencies

1. **OVHCloud**
   - VPS availability in preferred region
   - DNS propagation (<24 hours)
   - Support response time (for issues)

2. **Anthropic**
   - Claude API production access
   - API stability and uptime
   - Pricing stability (no unexpected increases)

3. **Let's Encrypt**
   - Certificate issuance
   - Auto-renewal functionality

4. **ProtonMail**
   - ProtonBridge availability
   - Email delivery reliability

### Internal Dependencies

1. **Phase 1 Completion** ✅
   - All features tested and working
   - Clean codebase
   - Documentation complete

2. **Governance Documents**
   - TRA-OPS-* policies drafted (see Task 3)
   - Privacy Policy finalized
   - Terms of Service drafted

3. **Seed Content**
   - 5-10 initial blog posts (human-written)
   - 3-5 case studies (curated examples)
   - Documentation complete

4. **User Cohort**
   - 20-50 users identified for soft launch
   - Invitation list prepared
   - Feedback survey drafted

---

## Budget Requirements

**See separate document: PHASE-2-COST-ESTIMATES.md**

Summary:
- **One-time**: $50-200 (SSL, setup)
- **Monthly recurring**: $100-300 (hosting + API)
- **Total Phase 2 cost**: ~$500-1,200 (3 months)

---

## Phase 2 → Phase 3 Transition

### Exit Criteria
Phase 2 ends and Phase 3 begins when:
- All success criteria met (see Success Criteria section)
- Soft launch feedback incorporated
- Zero critical bugs outstanding
- Governance audit complete
- John Stroh approves public launch

### Phase 3 Preview
- Public launch and marketing campaign
- Koha donation system (micropayments)
- Multi-language translations
- Community forums/discussion
- Bug bounty program
- Academic partnerships

---

## Appendices

### Appendix A: Technology Stack (Production)

**Hosting**: OVHCloud VPS
**OS**: Ubuntu 22.04 LTS
**Web Server**: Nginx 1.24+
**Application**: Node.js 18+, Express 4.x
**Database**: MongoDB 7.x
**SSL/TLS**: Let's Encrypt (Certbot)
**Email**: ProtonMail + ProtonBridge
**Analytics**: Plausible (self-hosted) or Matomo
**Error Tracking**: Sentry (cloud) or GlitchTip (self-hosted)
**Monitoring**: UptimeRobot or self-hosted
**AI Integration**: Anthropic Claude API (Sonnet 4.5)

### Appendix B: Key Performance Indicators (KPIs)

**Technical KPIs**:
- Uptime: 99.9%+
- Response time: <3s (95th percentile)
- Error rate: <0.1%
- Security vulnerabilities: 0 critical

**User KPIs**:
- Unique visitors: 100+/month (soft launch)
- Blog readers: 50+/post average
- Media inquiries: 5+/month
- Case submissions: 3+/month

**Business KPIs**:
- Hosting costs: <$100/month
- API costs: <$200/month
- User satisfaction: 4+/5
- AI approval rate: 100% (all content human-approved)

### Appendix C: Rollback Plan

If Phase 2 encounters critical issues:

1. **Immediate**: Revert to Phase 1 (local prototype)
2. **Within 24h**: Root cause analysis
3. **Within 72h**: Fix deployed or timeline extended
4. **Escalation**: Consult security experts if breach suspected

---

**Document Version**: 1.0
**Last Updated**: 2025-10-07
**Next Review**: Start of Phase 2 (TBD)
**Owner**: John Stroh
**Contributors**: Claude Code (Anthropic Sonnet 4.5)