tractatus/docs/PHASE-2-ROADMAP.md
TheFlow 2298d36bed fix(submissions): restructure Economist package and fix article display
- Create Economist SubmissionTracking package correctly:
  * mainArticle = full blog post content
  * coverLetter = 216-word SIR— letter
  * Links to blog post via blogPostId
- Archive 'Letter to The Economist' from blog posts (it's the cover letter)
- Fix date display on article cards (use published_at)
- Target publication already displaying via blue badge

Database changes:
- Make blogPostId optional in SubmissionTracking model
- Economist package ID: 68fa85ae49d4900e7f2ecd83
- Le Monde package ID: 68fa2abd2e6acd5691932150

Next: Enhanced modal with tabs, validation, export

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-24 08:47:42 +13:00

690 lines
21 KiB
Markdown

# Phase 2 Roadmap: Production Deployment & AI-Powered Features
**Project**: Tractatus AI Safety Framework Website
**Phase**: 2 of 3
**Status**: Planning
**Created**: 2025-10-07
**Owner**: John Stroh
**Duration**: 2-3 months (estimated)
---
## Table of Contents
1. [Overview](#overview)
2. [Phase 1 Completion Summary](#phase-1-completion-summary)
3. [Phase 2 Objectives](#phase-2-objectives)
4. [Timeline & Milestones](#timeline--milestones)
5. [Workstreams](#workstreams)
6. [Success Criteria](#success-criteria)
7. [Risk Assessment](#risk-assessment)
8. [Decision Points](#decision-points)
9. [Dependencies](#dependencies)
10. [Budget Requirements](#budget-requirements)
---
## Overview
Phase 2 transitions the Tractatus Framework from a **local prototype** (Phase 1) to a **production-ready platform** with real users and AI-powered content features. This phase demonstrates the framework's capacity to govern its own AI operations through human-oversight workflows.
### Key Themes
- **Production Deployment**: OVHCloud hosting, domain configuration, SSL/TLS
- **AI Integration**: Claude API for blog curation, media triage, case studies
- **Dogfooding**: Tractatus framework governs all AI content generation
- **Security & Privacy**: Hardening, monitoring, privacy-respecting analytics
- **Soft Launch**: Initial user testing before public announcement (Phase 3)
---
## Phase 1 Completion Summary
**Completed**: 2025-10-07
**Status**: ✅ All objectives achieved
### Deliverables Completed
- ✅ MongoDB instance (port 27017, database `tractatus_dev`)
- ✅ Express application (port 9000, CSP-compliant)
- ✅ Document migration pipeline (12+ documents)
- ✅ Three audience paths (Researcher, Implementer, Advocate)
- ✅ Interactive demonstrations (27027, classification, boundary)
- ✅ Tractatus governance services (100% test coverage on core services)
- InstructionPersistenceClassifier (85.3%)
- CrossReferenceValidator (96.4%)
- BoundaryEnforcer (100%)
- ContextPressureMonitor (60.9%)
- MetacognitiveVerifier (56.1%)
- ✅ Admin dashboard with moderation workflows
- ✅ API reference documentation
- ✅ WCAG AA accessibility compliance
- ✅ Mobile responsiveness optimization
- ✅ 118 integration tests (all passing)
### Technical Achievements
- CSP compliance: 100% (script-src 'self')
- Test coverage: 85.3%+ on Tractatus services
- Accessibility: WCAG AA compliant
- Performance: <2s page load times (local)
- Security: JWT authentication, role-based access control
---
## Phase 2 Objectives
### Primary Goals
1. **Deploy to production** on OVHCloud with domain `agenticgovernance.digital`
2. **Integrate Claude API** for AI-powered content features
3. **Implement human oversight workflows** via Tractatus framework
4. **Launch blog curation system** with moderation queue
5. **Enable media inquiry triage** with AI classification
6. **Create case study submission portal** for community contributions
7. **Soft launch** to initial user cohort (researchers, implementers)
### Non-Goals (Deferred to Phase 3)
- Koha donation system
- Multi-language translations
- Public marketing campaign
- Community forums/discussion boards
- Mobile app development
---
## Timeline & Milestones
### Month 1: Infrastructure & Deployment (Weeks 1-4)
**Week 1: Environment Setup**
- [ ] Provision OVHCloud VPS (specs TBD)
- [ ] Configure DNS for `agenticgovernance.digital` production IP
- [ ] SSL/TLS certificates (Let's Encrypt)
- [ ] Firewall rules (UFW) and SSH hardening
- [ ] Create production MongoDB instance
- [ ] Set up systemd services (tractatus.service, mongodb-tractatus.service)
**Week 2: Application Deployment**
- [ ] Deploy Express application to production
- [ ] Configure Nginx reverse proxy (port 80/443 9000)
- [ ] Environment variables (.env.production)
- [ ] Production logging (file rotation, syslog)
- [ ] Database migration scripts (seed production data)
- [ ] Backup automation (MongoDB dumps, code snapshots)
**Week 3: Security Hardening**
- [ ] Fail2ban configuration (SSH, HTTP)
- [ ] Rate limiting (Nginx + application-level)
- [ ] Security headers audit (OWASP compliance)
- [ ] Vulnerability scanning (Trivy, npm audit)
- [ ] ProtonBridge email integration
- [ ] Admin notification system (email alerts)
**Week 4: Monitoring & Testing**
- [ ] Plausible Analytics deployment (self-hosted)
- [ ] Error tracking (Sentry or self-hosted alternative)
- [ ] Uptime monitoring (UptimeRobot or self-hosted)
- [ ] Performance baseline (Lighthouse, WebPageTest)
- [ ] Load testing (k6 or Artillery)
- [ ] Disaster recovery drill (restore from backup)
**Milestone 1**: Production environment live, accessible at `https://agenticgovernance.digital`
---
### Month 2: AI-Powered Features (Weeks 5-8)
**Week 5: Claude API Integration**
- [ ] Anthropic API key setup (production account)
- [ ] ClaudeAPI.service refactoring for production
- [ ] Rate limiting and cost monitoring
- [ ] Error handling (API failures, timeout recovery)
- [ ] Prompt templates for blog/media/cases
- [ ] Token usage tracking and alerting
**Week 6: Blog Curation System**
- [ ] BlogCuration.service implementation
- AI topic suggestion pipeline
- Outline generation
- Citation extraction
- Draft formatting
- [ ] Human moderation workflow (approve/reject/edit)
- [ ] Blog post model (MongoDB schema)
- [ ] Blog UI (list, single post, RSS feed)
- [ ] OpenGraph/Twitter card metadata
- [ ] Seed content: 5-10 human-written posts
**Week 7: Media Inquiry Triage**
- [ ] MediaTriage.service implementation
- Incoming inquiry classification (press, academic, commercial)
- Priority scoring (high/medium/low)
- Auto-draft response generation (for human approval)
- [ ] Media inquiry form (public-facing)
- [ ] Admin triage dashboard
- [ ] Email notification system
- [ ] Response templates
**Week 8: Case Study Portal**
- [ ] CaseSubmission.service implementation
- Community submission form
- AI relevance analysis (Tractatus framework mapping)
- Failure mode categorization
- [ ] Case study moderation queue
- [ ] Public case study viewer
- [ ] Submission guidelines documentation
- [ ] Initial case studies (3-5 curated examples)
**Milestone 2**: All AI-powered features operational with human oversight
---
### Month 3: Polish, Testing & Soft Launch (Weeks 9-12)
**Week 9: Governance Enforcement**
- [ ] Review all AI prompts against TRA-OPS-* policies
- [ ] Audit moderation workflows (Tractatus compliance)
- [ ] Test boundary enforcement (values decisions require humans)
- [ ] Cross-reference validator integration (AI content checks)
- [ ] MetacognitiveVerifier for complex AI operations
- [ ] Document AI decision audit trail
**Week 10: Content & Documentation**
- [ ] Final document migration review
- [ ] Cross-reference link validation
- [ ] PDF generation pipeline (downloads section)
- [ ] Citation index completion
- [ ] Privacy policy finalization
- [ ] Terms of service drafting
- [ ] About/Contact page updates
**Week 11: Testing & Optimization**
- [ ] End-to-end testing (user journeys)
- [ ] Performance optimization (CDN evaluation)
- [ ] Mobile testing (real devices)
- [ ] Browser compatibility (Firefox, Safari, Chrome)
- [ ] Accessibility re-audit (WCAG AA)
- [ ] Security penetration testing
- [ ] Load testing under realistic traffic
**Week 12: Soft Launch**
- [ ] Invite initial user cohort (20-50 users)
- AI safety researchers
- Academic institutions
- Aligned organizations
- [ ] Collect feedback via structured surveys
- [ ] Monitor error rates and performance
- [ ] Iterate on UX issues
- [ ] Prepare for public launch (Phase 3)
**Milestone 3**: Soft launch complete, feedback collected, ready for public launch
---
## Workstreams
### 1. Infrastructure & Deployment
**Owner**: Infrastructure Lead (or John Stroh)
**Duration**: Month 1 (Weeks 1-4)
#### Tasks
1. **Hosting Provision**
- Select OVHCloud VPS tier (see Budget Requirements)
- Provision server (Ubuntu 22.04 LTS recommended)
- Configure DNS (A records, AAAA for IPv6)
- Set up SSH key authentication (disable password auth)
2. **Web Server Configuration**
- Install Nginx
- Configure reverse proxy (port 9000 80/443)
- SSL/TLS via Let's Encrypt (Certbot)
- HTTP/2 and compression (gzip/brotli)
- Security headers (CSP, HSTS, X-Frame-Options)
3. **Database Setup**
- Install MongoDB 7.x
- Configure authentication
- Set up replication (optional for HA)
- Automated backups (daily, 7-day retention)
- Restore testing
4. **Application Deployment**
- Git-based deployment workflow
- Environment variables management
- Systemd service configuration
- Log rotation and management
- Process monitoring (PM2 or systemd watchdog)
5. **Security Hardening**
- UFW firewall (allow 22, 80, 443, deny all others)
- Fail2ban (SSH, HTTP)
- Unattended security updates
- Intrusion detection (optional: OSSEC, Wazuh)
- Regular security audits
**Deliverables**:
- Production server accessible at `https://agenticgovernance.digital`
- SSL/TLS A+ rating (SSL Labs)
- Automated backup system operational
- Monitoring dashboards configured
---
### 2. AI-Powered Features
**Owner**: AI Integration Lead (or John Stroh with Claude Code)
**Duration**: Month 2 (Weeks 5-8)
#### Tasks
##### 2.1 Claude API Integration
- **API Setup**
- Anthropic production API key
- Rate limiting configuration (requests/min, tokens/day)
- Cost monitoring and alerting
- Fallback handling (API downtime)
- **Service Architecture**
- `ClaudeAPI.service.js` - Core API wrapper
- Prompt template management
- Token usage tracking
- Error handling and retry logic
##### 2.2 Blog Curation System
- **AI Pipeline**
- Topic suggestion (from AI safety news feeds)
- Outline generation
- Citation extraction and validation
- Draft formatting (Markdown)
- **Human Oversight**
- Moderation queue integration
- Approve/Reject/Edit workflows
- Tractatus boundary enforcement (AI cannot publish without approval)
- Audit trail (who approved, when, why)
- **Publishing**
- Blog post model (title, slug, content, author, published_at)
- Blog list UI (pagination, filtering)
- Single post viewer (comments optional)
- RSS feed generation
- Social media metadata (OpenGraph, Twitter cards)
##### 2.3 Media Inquiry Triage
- **AI Classification**
- Inquiry type (press, academic, commercial, spam)
- Priority scoring (urgency, relevance, reach)
- Auto-draft responses (for human review)
- **Moderation Workflow**
- Admin triage dashboard
- Email notification to John Stroh
- Response approval (before sending)
- Contact management (CRM-lite)
##### 2.4 Case Study Portal
- **Community Submissions**
- Public submission form (structured data)
- AI relevance analysis (Tractatus applicability)
- Failure mode categorization (27027-type, boundary violation, etc.)
- **Human Moderation**
- Case review queue
- Approve/Reject/Request Edits
- Publication workflow
- Attribution and licensing (CC BY-SA 4.0)
**Deliverables**:
- Blog system with 5-10 initial posts
- Media inquiry form with AI triage
- Case study portal with 3-5 examples
- All AI decisions subject to human approval
---
### 3. Governance & Policy
**Owner**: Governance Lead (or John Stroh)
**Duration**: Throughout Phase 2
#### Tasks
1. **Create TRA-OPS-* Documents**
- TRA-OPS-0001: AI Content Generation Policy
- TRA-OPS-0002: Blog Editorial Guidelines
- TRA-OPS-0003: Media Inquiry Response Protocol
- TRA-OPS-0004: Case Study Moderation Standards
- TRA-OPS-0005: Human Oversight Requirements
2. **Tractatus Framework Enforcement**
- Ensure all AI actions classified (STR/OPS/TAC/SYS/STO)
- Cross-reference validator integration
- Boundary enforcement (no AI values decisions)
- Audit trail for AI decisions
3. **Legal & Compliance**
- Privacy Policy (GDPR-lite, no tracking cookies)
- Terms of Service
- Content licensing (CC BY-SA 4.0 for community contributions)
- Cookie policy (if analytics use cookies)
**Deliverables**:
- 5+ TRA-OPS-* governance documents
- Privacy Policy and Terms of Service
- Tractatus framework audit demonstrating compliance
---
### 4. Content & Documentation
**Owner**: Content Lead (or John Stroh)
**Duration**: Month 3 (Weeks 9-12)
#### Tasks
1. **Document Review**
- Final review of all migrated documents
- Cross-reference link validation
- Formatting consistency
- Citation completeness
2. **Blog Launch Content**
- Write 5-10 seed blog posts (human-authored)
- Topics: Framework introduction, 27027 incident, use cases, etc.
- RSS feed implementation
- Newsletter signup (optional)
3. **Legal Pages**
- Privacy Policy
- Terms of Service
- About page (mission, values, Te Tiriti acknowledgment)
- Contact page (ProtonMail integration)
4. **PDF Generation**
- Automated PDF export for key documents
- Download links in UI
- Version tracking
**Deliverables**:
- All documents reviewed and polished
- 5-10 initial blog posts published
- Privacy Policy and Terms of Service live
- PDF downloads available
---
### 5. Analytics & Monitoring
**Owner**: Operations Lead (or John Stroh)
**Duration**: Month 1 & ongoing
#### Tasks
1. **Privacy-Respecting Analytics**
- Deploy Plausible (self-hosted) or Matomo
- No cookies, no tracking, GDPR-compliant
- Metrics: page views, unique visitors, referrers
- Geographic data (country-level only)
2. **Error Tracking**
- Sentry (cloud) or self-hosted alternative (GlitchTip)
- JavaScript error tracking
- Server error logging
- Alerting on critical errors
3. **Performance Monitoring**
- Uptime monitoring (UptimeRobot or self-hosted)
- Response time tracking
- Database query performance
- API usage metrics (Claude API tokens/day)
4. **Business Metrics**
- Blog post views and engagement
- Media inquiry volume
- Case study submissions
- Admin moderation activity
**Deliverables**:
- Analytics dashboard operational
- Error tracking with alerting
- Uptime monitoring (99.9% target)
- Monthly metrics report template
---
## Success Criteria
Phase 2 is considered **complete** when:
### Technical Success
- [ ] Production site live at `https://agenticgovernance.digital` with SSL/TLS
- [ ] All Phase 1 features operational in production
- [ ] Blog system publishing AI-curated content (with human approval)
- [ ] Media inquiry triage system processing requests
- [ ] Case study portal accepting community submissions
- [ ] Uptime: 99%+ over 30-day period
- [ ] Performance: <3s page load (95th percentile)
- [ ] Security: No critical vulnerabilities (OWASP Top 10)
### Governance Success
- [ ] All AI content requires human approval (0 auto-published posts)
- [ ] Tractatus framework audit shows 100% compliance
- [ ] TRA-OPS-* policies documented and enforced
- [ ] Boundary enforcer blocks values decisions by AI
- [ ] Audit trail for all AI decisions (who, what, when, why)
### User Success
- [ ] Soft launch cohort: 20-50 users
- [ ] User satisfaction: 4+/5 average rating
- [ ] Blog engagement: 50+ readers/post average
- [ ] Media inquiries: 5+ per month
- [ ] Case study submissions: 3+ per month
- [ ] Accessibility: WCAG AA maintained
### Business Success
- [ ] Monthly hosting costs <$100/month
- [ ] Claude API costs <$200/month
- [ ] Zero data breaches or security incidents
- [ ] Privacy policy: zero complaints
- [ ] Positive feedback from initial users
---
## Risk Assessment
### High-Risk Items
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| **Claude API costs exceed budget** | Medium | High | Implement strict rate limiting, token usage alerts, monthly spending cap |
| **Security breach (data leak)** | Low | Critical | Security audit, penetration testing, bug bounty program (Phase 3) |
| **AI generates inappropriate content** | Medium | High | Mandatory human approval, content filters, moderation queue |
| **Server downtime during soft launch** | Medium | Medium | Uptime monitoring, automated backups, disaster recovery plan |
| **GDPR/privacy compliance issues** | Low | High | Legal review, privacy-by-design, no third-party tracking |
### Medium-Risk Items
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| **OVHCloud service disruption** | Low | Medium | Multi-region backup plan, cloud provider diversification (Phase 3) |
| **Email delivery issues (ProtonBridge)** | Medium | Low | Fallback SMTP provider, email queue system |
| **Blog content quality concerns** | Medium | Medium | Editorial guidelines, human review, reader feedback loop |
| **Performance degradation under load** | Medium | Medium | Load testing, CDN evaluation, database optimization |
| **User confusion with UI/UX** | High | Low | User testing, clear documentation, onboarding flow |
### Low-Risk Items
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| **Domain registration issues** | Very Low | Low | Auto-renewal, registrar lock |
| **SSL certificate expiry** | Very Low | Low | Certbot auto-renewal, monitoring alerts |
| **Dependency vulnerabilities** | Medium | Very Low | Dependabot, regular npm audit |
---
## Decision Points
### Before Starting Phase 2
**Required Approvals from John Stroh:**
1. **Budget Approval** (see Budget Requirements section)
- OVHCloud hosting: $30-80/month
- Claude API: $50-200/month
- Total: ~$100-300/month
2. **Timeline Confirmation**
- Start date for Phase 2
- Acceptable completion timeframe (2-3 months)
- Soft launch target date
3. **Content Strategy**
- Blog editorial guidelines (TRA-OPS-0002)
- Media response protocol (TRA-OPS-0003)
- Case study moderation standards (TRA-OPS-0004)
4. **Privacy Policy**
- Final wording for data collection
- Analytics tool selection (Plausible vs. Matomo)
- Email handling practices
5. **Soft Launch Strategy**
- Target user cohort (researchers, implementers, advocates)
- Invitation method (email, social media)
- Feedback collection process
### During Phase 2
**Interim Decisions:**
1. **Week 2**: VPS tier selection (based on performance testing)
2. **Week 5**: Claude API usage limits (tokens/day, cost caps)
3. **Week 8**: Blog launch readiness (sufficient seed content?)
4. **Week 10**: Soft launch invite list (who to include?)
5. **Week 12**: Phase 3 go/no-go decision
---
## Dependencies
### External Dependencies
1. **OVHCloud**
- VPS availability in preferred region
- DNS propagation (<24 hours)
- Support response time (for issues)
2. **Anthropic**
- Claude API production access
- API stability and uptime
- Pricing stability (no unexpected increases)
3. **Let's Encrypt**
- Certificate issuance
- Auto-renewal functionality
4. **ProtonMail**
- ProtonBridge availability
- Email delivery reliability
### Internal Dependencies
1. **Phase 1 Completion**
- All features tested and working
- Clean codebase
- Documentation complete
2. **Governance Documents**
- TRA-OPS-* policies drafted (see Task 3)
- Privacy Policy finalized
- Terms of Service drafted
3. **Seed Content**
- 5-10 initial blog posts (human-written)
- 3-5 case studies (curated examples)
- Documentation complete
4. **User Cohort**
- 20-50 users identified for soft launch
- Invitation list prepared
- Feedback survey drafted
---
## Budget Requirements
**See separate document: PHASE-2-COST-ESTIMATES.md**
Summary:
- **One-time**: $50-200 (SSL, setup)
- **Monthly recurring**: $100-300 (hosting + API)
- **Total Phase 2 cost**: ~$500-1,200 (3 months)
---
## Phase 2 → Phase 3 Transition
### Exit Criteria
Phase 2 ends and Phase 3 begins when:
- All success criteria met (see Success Criteria section)
- Soft launch feedback incorporated
- Zero critical bugs outstanding
- Governance audit complete
- John Stroh approves public launch
### Phase 3 Preview
- Public launch and marketing campaign
- Koha donation system (micropayments)
- Multi-language translations
- Community forums/discussion
- Bug bounty program
- Academic partnerships
---
## Appendices
### Appendix A: Technology Stack (Production)
**Hosting**: OVHCloud VPS
**OS**: Ubuntu 22.04 LTS
**Web Server**: Nginx 1.24+
**Application**: Node.js 18+, Express 4.x
**Database**: MongoDB 7.x
**SSL/TLS**: Let's Encrypt (Certbot)
**Email**: ProtonMail + ProtonBridge
**Analytics**: Plausible (self-hosted) or Matomo
**Error Tracking**: Sentry (cloud) or GlitchTip (self-hosted)
**Monitoring**: UptimeRobot or self-hosted
**AI Integration**: Anthropic Claude API (Sonnet 4.5)
### Appendix B: Key Performance Indicators (KPIs)
**Technical KPIs**:
- Uptime: 99.9%+
- Response time: <3s (95th percentile)
- Error rate: <0.1%
- Security vulnerabilities: 0 critical
**User KPIs**:
- Unique visitors: 100+/month (soft launch)
- Blog readers: 50+/post average
- Media inquiries: 5+/month
- Case submissions: 3+/month
**Business KPIs**:
- Hosting costs: <$100/month
- API costs: <$200/month
- User satisfaction: 4+/5
- AI approval rate: 100% (all content human-approved)
### Appendix C: Rollback Plan
If Phase 2 encounters critical issues:
1. **Immediate**: Revert to Phase 1 (local prototype)
2. **Within 24h**: Root cause analysis
3. **Within 72h**: Fix deployed or timeline extended
4. **Escalation**: Consult security experts if breach suspected
---
**Document Version**: 1.0
**Last Updated**: 2025-10-07
**Next Review**: Start of Phase 2 (TBD)
**Owner**: John Stroh
**Contributors**: Claude Code (Anthropic Sonnet 4.5)