tractatus/docs/PRODUCTION_DEPLOYMENT_CHECKLIST.md
TheFlow 29fa3956f9 feat: newsletter modal and deployment script enhancements
**Newsletter Modal Implementation**:
- Added modal subscription forms to blog pages
- Improved UX with dedicated modal instead of anchor links
- Location: public/blog.html, public/blog-post.html

**Blog JavaScript Enhancements**:
- Enhanced blog.js and blog-post.js with modal handling
- Newsletter form submission logic
- Location: public/js/blog.js, public/js/blog-post.js

**Deployment Script Improvements**:
- Added pre-deployment checks (server running, version parameters)
- Enhanced visual feedback with status indicators (✓/✗/⚠)
- Version parameter staleness detection
- Location: scripts/deploy-full-project-SAFE.sh

**Demo Page Cleanup**:
- Minor refinements to demo pages
- Location: public/demos/*.html

**Routes Enhancement**:
- Newsletter route additions
- Location: src/routes/index.js

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-14 13:11:46 +13:00

17 KiB

Production Deployment Checklist

Project: Tractatus AI Safety Framework Website Environment: Production (vps-93a693da.vps.ovh.net) Domain: https://agenticgovernance.digital Created: 2025-10-09 Status: Active Procedure


Overview

This checklist ensures safe, consistent deployments to production. Always follow this procedure to prevent security incidents, service disruptions, and data loss.

Deployment Philosophy:

  • Deploy early, deploy often
  • Test thoroughly before deploying
  • Verify after deploying
  • Document incidents and learn

Incident Prevention: This checklist was created after a security incident where sensitive Claude Code governance files were accidentally deployed to production. Following this procedure prevents similar incidents.


Pre-Deployment Checklist

1. Code Quality Verification

  • All tests passing locally

    npm test
    
    • Expected: All tests pass, no failures
    • If any tests fail: Fix before deploying
  • Test coverage acceptable

    npm test -- --coverage
    
    • Check critical services maintain 80%+ coverage
    • Review new code has reasonable coverage
  • Linting passes (if linter configured)

    npm run lint
    # OR
    npx eslint src/
    

2. Security Verification

  • Run security audit

    npm audit
    
    • Review all vulnerabilities
    • Critical/High: Must fix or document why acceptable
    • Medium/Low: Review and plan fix if needed
    • If fixes available: npm audit fix then re-test
  • Check for sensitive files in git

    git ls-files | grep -E '(CLAUDE|SESSION|\.env|SECRET|HANDOFF|CLOSEDOWN|_Maintenance_Guide)'
    
    • Expected: No matches (all sensitive files excluded)
    • If matches found: Review .gitignore and remove from git history
  • Verify .rsyncignore completeness

    cat .rsyncignore
    
    • Confirm excludes:
      • CLAUDE*.md, SESSION*.md, maintenance guides
      • .env, .env.local, .env.production.local
      • node_modules/, .git/, .claude/
      • Test files, coverage reports
      • Development-only files
  • Check environment secrets not in code

    grep -r "sk-ant-" src/ || echo "No API keys found ✓"
    grep -r "mongodb://tractatus" src/ || echo "No hardcoded DB URLs ✓"
    
    • Expected: No hardcoded secrets in source code
    • All secrets in .env files (which are excluded)

3. Database Verification

  • Database migrations ready (if any)

    # Check if new migrations exist
    ls -la scripts/migrations/ | tail -5
    
    • If migrations exist: Plan migration execution
    • Document migration rollback procedure
  • Backup current database (for major changes)

    ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
      "mongodump --uri='mongodb://tractatus_user:PASSWORD@localhost:27017/tractatus_prod?authSource=tractatus_prod' --out=/tmp/backup-$(date +%Y%m%d-%H%M%S)"
    
    • Only needed for schema changes or major updates
    • Store backup location in deployment notes

4. Change Documentation

  • Review what's being deployed

    git log --oneline origin/main..HEAD
    
    • Confirm all commits are intentional
    • Verify no work-in-progress commits
  • Update CHANGELOG.md (if project uses one)

    • Document user-facing changes
    • Document breaking changes
    • Document security fixes
  • Commit all changes

    git status
    # If uncommitted changes exist, decide: commit or stash
    

Deployment Execution

Choose Deployment Method

Decision Matrix:

What Changed Script to Use Command
Public HTML/CSS/JS only deploy-frontend.sh ./scripts/deploy-frontend.sh
Koha donation system deploy-koha-to-production.sh ./scripts/deploy-koha-to-production.sh
Full project (backend, routes, services) deploy-full-project-SAFE.sh ./scripts/deploy-full-project-SAFE.sh
Emergency rollback Manual rsync See rollback section

Option 1: Frontend-Only Deployment

Use when only public-facing files changed (HTML, CSS, JS, images).

./scripts/deploy-frontend.sh

What it deploys:

  • public/ directory
  • Excludes: admin, backend code, config files

Safety level: Safest (public files only)

Option 2: Koha-Specific Deployment

Use when Koha donation system changed.

./scripts/deploy-koha-to-production.sh

What it deploys:

  • Koha controllers, services, routes
  • Koha frontend (public/koha.html)
  • Related middleware and models

Safety level: ⚠️ Moderate (includes backend code)

Option 3: Full Project Deployment (Most Common)

Use for backend changes, new features, or multi-component updates.

./scripts/deploy-full-project-SAFE.sh

Deployment steps:

  1. Script shows excluded patterns from .rsyncignore
  2. Review exclusions carefully - Verify sensitive files excluded
  3. Script shows dry-run summary
  4. Verify files to be deployed - Look for any unexpected files
  5. Confirm deployment (or Ctrl+C to abort)
  6. Script executes rsync with progress
  7. Deployment complete

What it deploys:

  • All source code (src/)
  • Public files (public/)
  • Configuration (package.json, etc.)
  • Documentation (docs/)
  • Scripts (scripts/)

What it excludes (via .rsyncignore):

  • Claude Code governance files (CLAUDE*.md, SESSION*.md)
  • Environment files (.env*)
  • Node modules (node_modules/)
  • Git repository (.git/)
  • Test files and coverage
  • Development-only files

Safety level: ⚠️ Use carefully (full codebase)

Deployment Verification During Execution

  • Watch for errors during deployment

    • Rsync errors (permission denied, connection failures)
    • File conflicts
    • Unexpected file deletions
  • Verify file count is reasonable

    • Frontend: ~50-100 files
    • Koha: ~20-30 files
    • Full: ~200-300 files (varies by project size)
    • If thousands of files: STOP - check .rsyncignore

Post-Deployment Verification

1. Immediate Checks (< 2 minutes)

  • Restart application (if backend changes)

    ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
      "sudo systemctl restart tractatus"
    
  • Check service status

    ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
      "sudo systemctl status tractatus"
    
    • Expected: active (running)
    • If failed: Check logs immediately
  • Health endpoint check

    curl https://agenticgovernance.digital/health
    
    • Expected: {"status":"ok","timestamp":"..."} (200 OK)
    • If 500 or error: Check logs, may need rollback
  • Homepage loads

    curl -I https://agenticgovernance.digital
    
    • Expected: HTTP/2 200
    • If 404/500: Critical issue, check logs

2. Functional Checks (2-5 minutes)

3. Log Monitoring (5-15 minutes)

  • Monitor production logs for errors

    ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
      "sudo journalctl -u tractatus -f"
    
    • Watch for:
      • ERROR, CRITICAL log levels
      • Unhandled exceptions
      • Database connection failures
      • 500 errors on requests
    • Monitor for at least 5 minutes
    • If errors appear: Investigate immediately
  • Check for new error patterns

    ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
      "sudo journalctl -u tractatus --since '5 minutes ago' | grep -i error"
    
    • Compare to known errors (acceptable warnings)
    • New errors may indicate deployment issues

4. Analytics Check (Optional, 15+ minutes)

  • Verify Plausible Analytics tracking

  • Check Google Search Console (if configured)

    • Verify no new crawl errors
    • Check for 404 increases

Database Migration Procedure (If Needed)

Only required when schema changes or data migrations needed.

Pre-Migration

  • Backup database (already done in pre-deployment)
  • Test migration on staging (if staging environment exists)
  • Review migration script
    cat scripts/migrations/YYYYMMDD-description.js
    

Execute Migration

ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
  "cd /var/www/tractatus && node scripts/migrations/YYYYMMDD-description.js"

Post-Migration

  • Verify migration success

    # Check migration completed
    # Check data integrity
    
  • Test affected features

    • Any features using migrated data

Migration Rollback (If Needed)

  • Restore database from backup

    ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
      "mongorestore --uri='...' /tmp/backup-TIMESTAMP"
    
  • Rollback code (see rollback section)


Rollback Procedure

Use if deployment causes critical issues that can't be quickly fixed.

When to Rollback

  • Application won't start
  • Critical features completely broken
  • Security vulnerability introduced
  • Data loss or corruption occurring
  • 500 errors on every request

How to Rollback

  1. Identify last known good commit

    git log --oneline -10
    # Find commit before problematic changes
    
  2. Checkout last good commit

    git checkout <commit-hash>
    
  3. Redeploy using same script

    # Use same deployment script as original deployment
    ./scripts/deploy-full-project-SAFE.sh
    
  4. Restart application

    ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
      "sudo systemctl restart tractatus"
    
  5. Verify rollback successful

    • Check health endpoint
    • Check homepage loads
    • Check logs for errors
  6. Return to main branch

    git checkout main
    

Post-Rollback

  • Document incident

    • What went wrong?
    • What was the impact?
    • How was it detected?
    • How long was it broken?
    • What was rolled back?
  • Create incident report (template below)

  • Fix issue in development

    • Reproduce locally
    • Fix root cause
    • Add tests to prevent recurrence
    • Re-deploy when ready

Incident Documentation Template

Create file: docs/incidents/YYYY-MM-DD-description.md

# Incident Report: [Brief Description]

**Date**: YYYY-MM-DD HH:MM (NZST)
**Severity**: [Critical / High / Medium / Low]
**Duration**: [X minutes/hours]
**Detected By**: [User report / Monitoring / Developer]

## Summary
[1-2 sentence summary of what went wrong]

## Timeline
- HH:MM - Deployment initiated
- HH:MM - Issue detected
- HH:MM - Rollback initiated
- HH:MM - Service restored

## Root Cause
[What caused the issue?]

## Impact
- User-facing impact: [What did users experience?]
- Data impact: [Was any data lost/corrupted?]
- Security impact: [Were any security boundaries crossed?]

## Resolution
[How was it fixed?]

## Prevention
[What changes prevent this from happening again?]

## Action Items
- [ ] Fix root cause
- [ ] Add tests
- [ ] Update deployment checklist
- [ ] Update monitoring

Deployment Log Template

Keep a deployment log in: docs/deployments/YYYY-MM.md

# Deployments: [Month Year]

## YYYY-MM-DD HH:MM - [Description]

**Deployed By**: [Name]
**Deployment Type**: [Frontend / Koha / Full]
**Commits Deployed**:
- abc123 - Description
- def456 - Description

**Pre-Deployment Checks**:
- [x] Tests passing
- [x] Security audit clean
- [x] No sensitive files

**Verification**:
- [x] Health check passed
- [x] Homepage loads
- [x] No errors in logs

**Issues**: None
**Rollback Required**: No
**Notes**: [Any relevant notes]

Emergency Procedures

Service Won't Start

  1. Check logs immediately

    ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
      "sudo journalctl -u tractatus -n 100"
    
  2. Common issues:

    • MongoDB connection failed → Check MongoDB running: sudo systemctl status mongod
    • Port already in use → Check for zombie processes: sudo lsof -i :9000
    • Missing environment variables → Check .env file exists
    • Syntax error in code → Rollback immediately
  3. Quick fixes:

    # Restart MongoDB if stopped
    sudo systemctl start mongod
    
    # Kill zombie processes
    sudo pkill -f node.*tractatus
    
    # Restart application
    sudo systemctl restart tractatus
    

Database Connection Lost

  1. Verify MongoDB running

    ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
      "sudo systemctl status mongod"
    
  2. Check MongoDB logs

    ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
      "sudo journalctl -u mongod -n 50"
    
  3. Test connection manually

    ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
      "mongosh --host localhost --port 27017 --authenticationDatabase tractatus_prod -u tractatus_user"
    

High Error Rate

  1. Identify error pattern

    ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
      "sudo journalctl -u tractatus --since '10 minutes ago' | grep ERROR | sort | uniq -c | sort -rn | head -10"
    
  2. Check if all endpoints affected or specific routes

    # Check health endpoint
    curl https://agenticgovernance.digital/health
    
    # Check specific routes
    curl https://agenticgovernance.digital/api/documents
    
  3. Decision:

    • If isolated to one feature: Disable feature, investigate
    • If site-wide: Rollback immediately

Deployment Best Practices

DO:

  • Deploy during low-traffic hours (NZ: 10am-2pm NZST = low US traffic)
  • Deploy small, focused changes (easier to debug)
  • Test thoroughly before deploying
  • Monitor logs after deployment
  • Document all deployments
  • Keep rollback procedure tested and ready
  • Communicate with team before major deployments

DON'T:

  • Deploy on Friday afternoon (limited time to fix issues)
  • Deploy multiple unrelated changes together
  • Skip testing "because it's a small change"
  • Deploy without checking logs after
  • Deploy when tired or rushed
  • Deploy without ability to rollback
  • Forget to restart services after backend changes

Deployment Timing Guidelines

Best Times (Low risk):

  • Monday-Thursday, 10am-2pm NZST
  • After morning coffee, before lunch
  • When you have 2+ hours to monitor

Acceptable Times (Medium risk):

  • Monday-Thursday, 2pm-5pm NZST
  • Early morning deployments (if you're alert)

Avoid Times (High risk):

  • Friday 3pm+ (weekend coverage issues)
  • Late evening (tired, less alert)
  • During known high-traffic events
  • When about to leave/travel

Automation Opportunities (Future)

Potential Improvements:

  • Automated testing in CI/CD (GitHub Actions)
  • Automated deployment on merge to main (after tests pass)
  • Automated health checks post-deployment
  • Automated rollback on health check failure
  • Slack notifications for deployments
  • Blue-green deployment for zero-downtime
  • Canary deployments for gradual rollout

Not Ready Yet Because:

  • Need stable test suite ( NOW READY - 380 tests passing)
  • Need monitoring in place ( Next task - Option D)
  • Need error alerting ( Next task - Option D)
  • Need staging environment (💡 Future consideration)

Checklist Quick Reference

Pre-Deploy:

  • Tests pass
  • Security audit clean
  • No sensitive files
  • .rsyncignore verified

Deploy:

  • Choose correct script
  • Review dry-run
  • Execute deployment
  • Note any errors

Verify:

  • Service running
  • Health check OK
  • Homepage loads
  • Monitor logs 5-15min

Document:

  • Log deployment
  • Note any issues
  • Update team

Contact & Support

Production Access:

  • SSH: ubuntu@vps-93a693da.vps.ovh.net
  • Key: ~/.ssh/tractatus_deploy
  • Sudo: Available for systemctl, journalctl

Service Management:

  • Service: tractatus.service (systemd)
  • Status: sudo systemctl status tractatus
  • Logs: sudo journalctl -u tractatus -f
  • Restart: sudo systemctl restart tractatus

Database:

  • Host: localhost:27017
  • Database: tractatus_prod
  • Auth: tractatus_prod database
  • User: tractatus_user

Domain:


Document Status: Active Procedure Last Updated: 2025-10-09 Next Review: After major deployment or incident Maintainer: Technical Lead (Claude Code + John Stroh)