TheFlow 9d8fe404df chore: update dependencies and documentation

Update project dependencies, documentation, and supporting files:
- i18n improvements for multilingual support
- Admin dashboard enhancements
- Documentation updates for Koha/Stripe and deployment
- Server middleware and model updates
- Package dependency updates

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-10-19 12:48:37 +13:00

21 KiB

Raw Blame History

Production Deployment Checklist

Project: Tractatus AI Safety Framework Website Environment: Production (vps-93a693da.vps.ovh.net) Domain: https://agenticgovernance.digital Created: 2025-10-09 Status: Active Procedure

Overview

This checklist ensures safe, consistent deployments to production. Always follow this procedure to prevent security incidents, service disruptions, and data loss.

Deployment Philosophy:

Deploy early, deploy often
Test thoroughly before deploying
Verify after deploying
Document incidents and learn

Incident Prevention: This checklist was created after a security incident where sensitive Claude Code governance files were accidentally deployed to production. Following this procedure prevents similar incidents.

Pre-Deployment Checklist

1. Code Quality Verification

All tests passing locally
```
npm test
```
- Expected: All tests pass, no failures
- If any tests fail: Fix before deploying
Test coverage acceptable
```
npm test -- --coverage
```
- Check critical services maintain 80%+ coverage
- Review new code has reasonable coverage
Linting passes (if linter configured)
```
npm run lint
# OR
npx eslint src/
```

2. Security Verification

Run security audit
```
npm audit
```
- Review all vulnerabilities
- Critical/High: Must fix or document why acceptable
- Medium/Low: Review and plan fix if needed
- If fixes available: npm audit fix then re-test
Check for sensitive files in git
```
git ls-files | grep -E '(CLAUDE|SESSION|\.env|SECRET|HANDOFF|CLOSEDOWN|_Maintenance_Guide)'
```
- Expected: No matches (all sensitive files excluded)
- If matches found: Review .gitignore and remove from git history
Verify .rsyncignore completeness
```
cat .rsyncignore
```
- Confirm excludes:
  - CLAUDE*.md, SESSION*.md, maintenance guides
  - .env, .env.local, .env.production.local
  - node_modules/, .git/, .claude/
  - Test files, coverage reports
  - Development-only files

Check environment secrets not in code

grep -r "sk-ant-" src/ || echo "No API keys found ✓"
grep -r "mongodb://tractatus" src/ || echo "No hardcoded DB URLs ✓"

Expected: No hardcoded secrets in source code
All secrets in .env files (which are excluded)

3. Database Verification

Database migrations ready (if any)
```
# Check if new migrations exist
ls -la scripts/migrations/ | tail -5
```
- If migrations exist: Plan migration execution
- Document migration rollback procedure

Backup current database (for major changes)

ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
  "mongodump --uri='mongodb://tractatus_user:PASSWORD@localhost:27017/tractatus_prod?authSource=tractatus_prod' --out=/tmp/backup-$(date +%Y%m%d-%H%M%S)"

Only needed for schema changes or major updates
Store backup location in deployment notes

4. Change Documentation

Review what's being deployed
```
git log --oneline origin/main..HEAD
```
- Confirm all commits are intentional
- Verify no work-in-progress commits
Update CHANGELOG.md (if project uses one)
- Document user-facing changes
- Document breaking changes
- Document security fixes

Commit all changes

git status
# If uncommitted changes exist, decide: commit or stash

Deployment Execution

Choose Deployment Method

Decision Matrix:

What Changed	Script to Use	Command
Public HTML/CSS/JS only	`deploy-frontend.sh`	`./scripts/deploy-frontend.sh`
Koha donation system	`deploy-koha-to-production.sh`	`./scripts/deploy-koha-to-production.sh`
Full project (backend, routes, services)	`deploy-full-project-SAFE.sh`	`./scripts/deploy-full-project-SAFE.sh`
Emergency rollback	Manual rsync	See rollback section

Option 1: Frontend-Only Deployment

Use when only public-facing files changed (HTML, CSS, JS, images).

./scripts/deploy-frontend.sh

What it deploys:

public/ directory
Excludes: admin, backend code, config files

Safety level: ✅ Safest (public files only)

Option 2: Koha-Specific Deployment

Use when Koha donation system changed.

./scripts/deploy-koha-to-production.sh

What it deploys:

Koha controllers, services, routes
Koha frontend (public/koha.html)
Related middleware and models

Safety level: ⚠️ Moderate (includes backend code)

Option 3: Full Project Deployment (Most Common)

Use for backend changes, new features, or multi-component updates.

./scripts/deploy-full-project-SAFE.sh

Deployment steps:

Script shows excluded patterns from .rsyncignore
Review exclusions carefully - Verify sensitive files excluded
Script shows dry-run summary
Verify files to be deployed - Look for any unexpected files
Confirm deployment (or Ctrl+C to abort)
Script executes rsync with progress
Deployment complete

What it deploys:

All source code (src/)
Public files (public/)
Configuration (package.json, etc.)
Documentation (docs/)
Scripts (scripts/)

What it excludes (via .rsyncignore):

Claude Code governance files (CLAUDE*.md, SESSION*.md)
Environment files (.env*)
Node modules (node_modules/)
Git repository (.git/)
Test files and coverage
Development-only files

Safety level: ⚠️ Use carefully (full codebase)

Deployment Verification During Execution

Watch for errors during deployment
- Rsync errors (permission denied, connection failures)
- File conflicts
- Unexpected file deletions
Verify file count is reasonable
- Frontend: ~50-100 files
- Koha: ~20-30 files
- Full: ~200-300 files (varies by project size)
- If thousands of files: STOP - check .rsyncignore

Post-Deployment Verification

1. Immediate Checks (< 2 minutes)

Restart application (if backend changes)

ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
  "sudo systemctl restart tractatus"

Check service status

ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
  "sudo systemctl status tractatus"

Expected: active (running)
If failed: Check logs immediately

Health endpoint check
```
curl https://agenticgovernance.digital/health
```
- Expected: {"status":"ok","timestamp":"..."} (200 OK)
- If 500 or error: Check logs, may need rollback
Homepage loads
```
curl -I https://agenticgovernance.digital
```
- Expected: HTTP/2 200
- If 404/500: Critical issue, check logs

2. Functional Checks (2-5 minutes)

Test primary user flows:
- Visit homepage: https://agenticgovernance.digital
- Navigate to Researcher path: https://agenticgovernance.digital/researcher.html
- Navigate to Implementer path: https://agenticgovernance.digital/implementer.html
- Navigate to Leader path: https://agenticgovernance.digital/leader.html
- Visit documentation: https://agenticgovernance.digital/docs.html
- Test interactive demo: https://agenticgovernance.digital/demos/27027-demo.html
Test navigation:
- Click navbar dropdown menus
- Mobile menu (resize browser or use DevTools)
- Footer links work
Test critical features (based on what changed):
- If Koha changed: Test donation flow (test mode)
- If admin changed: Test admin login
- If governance changed: Test governance API (with admin token)
- If documents changed: Test document retrieval

3. Log Monitoring (5-15 minutes)

Monitor production logs for errors
```
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
  "sudo journalctl -u tractatus -f"
```
- Watch for:
  - ERROR, CRITICAL log levels
  - Unhandled exceptions
  - Database connection failures
  - 500 errors on requests
- Monitor for at least 5 minutes
- If errors appear: Investigate immediately

Check for new error patterns

ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
  "sudo journalctl -u tractatus --since '5 minutes ago' | grep -i error"

Compare to known errors (acceptable warnings)
New errors may indicate deployment issues

4. Analytics Check (Optional, 15+ minutes)

Verify Plausible Analytics tracking
- Visit https://plausible.io/agenticgovernance.digital
- Confirm events are being tracked
- Check for unusual bounce rates or errors
Check Google Search Console (if configured)
- Verify no new crawl errors
- Check for 404 increases

Database Migration Procedure (If Needed)

Only required when schema changes or data migrations needed.

Pre-Migration

Backup database (already done in pre-deployment)
Test migration on staging (if staging environment exists)

Review migration script

cat scripts/migrations/YYYYMMDD-description.js

Execute Migration

ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
  "cd /var/www/tractatus && node scripts/migrations/YYYYMMDD-description.js"

Post-Migration

Verify migration success

# Check migration completed
# Check data integrity

Test affected features
- Any features using migrated data

Migration Rollback (If Needed)

Restore database from backup

ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
  "mongorestore --uri='...' /tmp/backup-TIMESTAMP"

Rollback code (see rollback section)

Rollback Procedure

Use if deployment causes critical issues that can't be quickly fixed.

When to Rollback

Application won't start
Critical features completely broken
Security vulnerability introduced
Data loss or corruption occurring
500 errors on every request

How to Rollback

Identify last known good commit

git log --oneline -10
# Find commit before problematic changes

Checkout last good commit
```
git checkout <commit-hash>
```

Redeploy using same script

# Use same deployment script as original deployment
./scripts/deploy-full-project-SAFE.sh

Restart application

ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
  "sudo systemctl restart tractatus"

Verify rollback successful
- Check health endpoint
- Check homepage loads
- Check logs for errors
Return to main branch
```
git checkout main
```

Post-Rollback

Document incident
- What went wrong?
- What was the impact?
- How was it detected?
- How long was it broken?
- What was rolled back?
Create incident report (template below)
Fix issue in development
- Reproduce locally
- Fix root cause
- Add tests to prevent recurrence
- Re-deploy when ready

Incident Documentation Template

Create file: docs/incidents/YYYY-MM-DD-description.md

# Incident Report: [Brief Description]

**Date**: YYYY-MM-DD HH:MM (NZST)
**Severity**: [Critical / High / Medium / Low]
**Duration**: [X minutes/hours]
**Detected By**: [User report / Monitoring / Developer]

## Summary
[1-2 sentence summary of what went wrong]

## Timeline
- HH:MM - Deployment initiated
- HH:MM - Issue detected
- HH:MM - Rollback initiated
- HH:MM - Service restored

## Root Cause
[What caused the issue?]

## Impact
- User-facing impact: [What did users experience?]
- Data impact: [Was any data lost/corrupted?]
- Security impact: [Were any security boundaries crossed?]

## Resolution
[How was it fixed?]

## Prevention
[What changes prevent this from happening again?]

## Action Items
- [ ] Fix root cause
- [ ] Add tests
- [ ] Update deployment checklist
- [ ] Update monitoring

Deployment Log Template

Keep a deployment log in: docs/deployments/YYYY-MM.md

# Deployments: [Month Year]

## YYYY-MM-DD HH:MM - [Description]

**Deployed By**: [Name]
**Deployment Type**: [Frontend / Koha / Full]
**Commits Deployed**:
- abc123 - Description
- def456 - Description

**Pre-Deployment Checks**:
- [x] Tests passing
- [x] Security audit clean
- [x] No sensitive files

**Verification**:
- [x] Health check passed
- [x] Homepage loads
- [x] No errors in logs

**Issues**: None
**Rollback Required**: No
**Notes**: [Any relevant notes]

Emergency Procedures

Service Won't Start

Check logs immediately

ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
  "sudo journalctl -u tractatus -n 100"

Common issues:
- MongoDB connection failed → Check MongoDB running: sudo systemctl status mongod
- Port already in use → Check for zombie processes: sudo lsof -i :9000
- Missing environment variables → Check .env file exists
- Syntax error in code → Rollback immediately

Quick fixes:

# Restart MongoDB if stopped
sudo systemctl start mongod

# Kill zombie processes
sudo pkill -f node.*tractatus

# Restart application
sudo systemctl restart tractatus

Database Connection Lost

Verify MongoDB running

ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
  "sudo systemctl status mongod"

Check MongoDB logs

ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
  "sudo journalctl -u mongod -n 50"

Test connection manually

ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
  "mongosh --host localhost --port 27017 --authenticationDatabase tractatus_prod -u tractatus_user"

High Error Rate

Identify error pattern

ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
  "sudo journalctl -u tractatus --since '10 minutes ago' | grep ERROR | sort | uniq -c | sort -rn | head -10"

Check if all endpoints affected or specific routes

# Check health endpoint
curl https://agenticgovernance.digital/health

# Check specific routes
curl https://agenticgovernance.digital/api/documents

Decision:
- If isolated to one feature: Disable feature, investigate
- If site-wide: Rollback immediately

CRITICAL: HTML Caching Rules

MANDATORY REQUIREMENT: HTML files MUST be delivered fresh to users without requiring cache refresh.

The Problem

Service worker caching HTML files caused deployment failures where users saw OLD content even after deploying NEW code. Users should NEVER need to clear cache manually.

The Solution (Enforced as of 2025-10-17)

Service Worker (public/service-worker.js):

HTML files: Network-ONLY strategy (never cache, always fetch fresh)
Exception: /index.html only for offline fallback
Bump CACHE_VERSION constant whenever service worker logic changes

Server (src/server.js):

HTML files: Cache-Control: no-store, no-cache, must-revalidate, proxy-revalidate, max-age=0
This ensures browsers never cache HTML pages
CSS/JS: Long cache OK (use version parameters for cache-busting)

Version Manifest (public/version.json):

Update version number when deploying HTML changes
Service worker checks this for updates
Set forceUpdate: true for critical fixes

Deployment Rules for HTML Changes

When deploying HTML file changes:

Verify service worker never caches HTML (except index.html)

grep -A 10 "HTML files:" public/service-worker.js
# Should show: Network-ONLY strategy, no caching

Verify server sends no-cache headers

grep -A 3 "HTML files:" src/server.js
# Should show: no-store, no-cache, must-revalidate

Bump version.json if critical content changed

# Edit public/version.json
# Increment version: 1.1.2 → 1.1.3
# Update changelog
# Set forceUpdate: true

After deployment, verify headers in production

curl -s -I https://agenticgovernance.digital/koha.html | grep -i cache-control
# Expected: no-store, no-cache, must-revalidate

curl -s https://agenticgovernance.digital/koha.html | grep "<title>"
# Verify correct content showing

Test in incognito window
- Open https://agenticgovernance.digital in fresh incognito window
- Verify new content loads immediately
- No cache refresh should be needed

Testing Cache Behavior

Before deployment:

# Local: Verify server sends correct headers
curl -s -I http://localhost:9000/koha.html | grep cache-control
# Expected: no-store, no-cache

# Verify service worker doesn't cache HTML
grep "endsWith('.html')" public/service-worker.js -A 10
# Should NOT cache responses, only fetch

After deployment:

# Production: Verify headers
curl -s -I https://agenticgovernance.digital/<file>.html | grep cache-control

# Production: Verify fresh content
curl -s https://agenticgovernance.digital/<file>.html | grep "<title>"

Incident Prevention

Lesson Learned (2025-10-17 Koha Deployment):

Deployed koha.html with reciprocal giving updates
Service worker cached old version
Users saw old content despite fresh deployment
Required THREE deployment attempts to fix
Root cause: Service worker was caching HTML with network-first strategy

Prevention:

Service worker now enforces network-ONLY for all HTML (except offline index.html)
Server enforces no-cache headers
This checklist documents the requirement architecturally

Deployment Best Practices

DO:

✅ Deploy during low-traffic hours (NZ: 10am-2pm NZST = low US traffic)
✅ Deploy small, focused changes (easier to debug)
✅ Test thoroughly before deploying
✅ Monitor logs after deployment
✅ Document all deployments
✅ Keep rollback procedure tested and ready
✅ Communicate with team before major deployments
✅ CRITICAL: Verify HTML cache headers before and after deployment
✅ CRITICAL: Test in incognito window after HTML deployments

DON'T:

❌ Deploy on Friday afternoon (limited time to fix issues)
❌ Deploy multiple unrelated changes together
❌ Skip testing "because it's a small change"
❌ Deploy without checking logs after
❌ Deploy when tired or rushed
❌ Deploy without ability to rollback
❌ Forget to restart services after backend changes
❌ CRITICAL: Never cache HTML files in service worker (except offline fallback)
❌ CRITICAL: Never ask users to clear their browser cache - fix it server-side

Deployment Timing Guidelines

Best Times (Low risk):

Monday-Thursday, 10am-2pm NZST
After morning coffee, before lunch
When you have 2+ hours to monitor

Acceptable Times (Medium risk):

Monday-Thursday, 2pm-5pm NZST
Early morning deployments (if you're alert)

Avoid Times (High risk):

Friday 3pm+ (weekend coverage issues)
Late evening (tired, less alert)
During known high-traffic events
When about to leave/travel

Automation Opportunities (Future)

Potential Improvements:

Automated testing in CI/CD (GitHub Actions)
Automated deployment on merge to main (after tests pass)
Automated health checks post-deployment
Automated rollback on health check failure
Slack notifications for deployments
Blue-green deployment for zero-downtime
Canary deployments for gradual rollout

Not Ready Yet Because:

Need stable test suite (✅ NOW READY - 380 tests passing)
Need monitoring in place (⏳ Next task - Option D)
Need error alerting (⏳ Next task - Option D)
Need staging environment (💡 Future consideration)

Checklist Quick Reference

Pre-Deploy:

Tests pass
Security audit clean
No sensitive files
.rsyncignore verified

Deploy:

Choose correct script
Review dry-run
Execute deployment
Note any errors

Verify:

Service running
Health check OK
Homepage loads
Monitor logs 5-15min

Document:

Log deployment
Note any issues
Update team

Contact & Support

Production Access:

SSH: ubuntu@vps-93a693da.vps.ovh.net
Key: ~/.ssh/tractatus_deploy
Sudo: Available for systemctl, journalctl

Service Management:

Service: tractatus.service (systemd)
Status: sudo systemctl status tractatus
Logs: sudo journalctl -u tractatus -f
Restart: sudo systemctl restart tractatus

Database:

Host: localhost:27017
Database: tractatus_prod
Auth: tractatus_prod database
User: tractatus_user

Domain:

Production: https://agenticgovernance.digital
Analytics: https://plausible.io/agenticgovernance.digital

Document Status: Active Procedure Last Updated: 2025-10-09 Next Review: After major deployment or incident Maintainer: Technical Lead (Claude Code + John Stroh)

21 KiB Raw Blame History

Production Deployment Checklist

Overview

Pre-Deployment Checklist

1. Code Quality Verification

2. Security Verification

3. Database Verification

4. Change Documentation

Deployment Execution

Choose Deployment Method

Option 1: Frontend-Only Deployment

Option 2: Koha-Specific Deployment

Option 3: Full Project Deployment (Most Common)

Deployment Verification During Execution

Post-Deployment Verification

1. Immediate Checks (< 2 minutes)

2. Functional Checks (2-5 minutes)

3. Log Monitoring (5-15 minutes)

4. Analytics Check (Optional, 15+ minutes)

Database Migration Procedure (If Needed)

Pre-Migration

Execute Migration

Post-Migration

Migration Rollback (If Needed)

Rollback Procedure

When to Rollback

How to Rollback

Post-Rollback

Incident Documentation Template

Deployment Log Template

Emergency Procedures

Service Won't Start

Database Connection Lost

High Error Rate

CRITICAL: HTML Caching Rules

The Problem

The Solution (Enforced as of 2025-10-17)

Deployment Rules for HTML Changes

Testing Cache Behavior

Incident Prevention

Deployment Best Practices

DO:

DON'T:

Deployment Timing Guidelines

Automation Opportunities (Future)

Potential Improvements:

Not Ready Yet Because:

Checklist Quick Reference

Contact & Support

21 KiB

Raw Blame History