Add detailed deployment procedure to prevent security incidents and ensure consistent, safe deployments to production. Includes: - Pre-deployment verification (tests, security, sensitive file checks) - Three deployment methods (frontend, Koha, full project) - Post-deployment verification (health checks, log monitoring) - Database migration procedure - Emergency rollback procedure - Incident documentation template - Deployment log template - Emergency procedures (service failures, DB issues) - Best practices and timing guidelines Created after security incident where sensitive Claude Code files were accidentally deployed. This checklist prevents similar incidents through: - Mandatory .rsyncignore verification - Sensitive file checks before deployment - Dry-run review before execution - Post-deployment monitoring Status: Active procedure for all production deployments 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
17 KiB
Production Deployment Checklist
Project: Tractatus AI Safety Framework Website Environment: Production (vps-93a693da.vps.ovh.net) Domain: https://agenticgovernance.digital Created: 2025-10-09 Status: Active Procedure
Overview
This checklist ensures safe, consistent deployments to production. Always follow this procedure to prevent security incidents, service disruptions, and data loss.
Deployment Philosophy:
- Deploy early, deploy often
- Test thoroughly before deploying
- Verify after deploying
- Document incidents and learn
Incident Prevention: This checklist was created after a security incident where sensitive Claude Code governance files were accidentally deployed to production. Following this procedure prevents similar incidents.
Pre-Deployment Checklist
1. Code Quality Verification
-
All tests passing locally
npm test- Expected: All tests pass, no failures
- If any tests fail: Fix before deploying
-
Test coverage acceptable
npm test -- --coverage- Check critical services maintain 80%+ coverage
- Review new code has reasonable coverage
-
Linting passes (if linter configured)
npm run lint # OR npx eslint src/
2. Security Verification
-
Run security audit
npm audit- Review all vulnerabilities
- Critical/High: Must fix or document why acceptable
- Medium/Low: Review and plan fix if needed
- If fixes available:
npm audit fixthen re-test
-
Check for sensitive files in git
git ls-files | grep -E '(CLAUDE|SESSION|\.env|SECRET|HANDOFF|CLOSEDOWN|_Maintenance_Guide)'- Expected: No matches (all sensitive files excluded)
- If matches found: Review .gitignore and remove from git history
-
Verify .rsyncignore completeness
cat .rsyncignore- Confirm excludes:
CLAUDE*.md,SESSION*.md, maintenance guides.env,.env.local,.env.production.localnode_modules/,.git/,.claude/- Test files, coverage reports
- Development-only files
- Confirm excludes:
-
Check environment secrets not in code
grep -r "sk-ant-" src/ || echo "No API keys found ✓" grep -r "mongodb://tractatus" src/ || echo "No hardcoded DB URLs ✓"- Expected: No hardcoded secrets in source code
- All secrets in .env files (which are excluded)
3. Database Verification
-
Database migrations ready (if any)
# Check if new migrations exist ls -la scripts/migrations/ | tail -5- If migrations exist: Plan migration execution
- Document migration rollback procedure
-
Backup current database (for major changes)
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \ "mongodump --uri='mongodb://tractatus_user:PASSWORD@localhost:27017/tractatus_prod?authSource=tractatus_prod' --out=/tmp/backup-$(date +%Y%m%d-%H%M%S)"- Only needed for schema changes or major updates
- Store backup location in deployment notes
4. Change Documentation
-
Review what's being deployed
git log --oneline origin/main..HEAD- Confirm all commits are intentional
- Verify no work-in-progress commits
-
Update CHANGELOG.md (if project uses one)
- Document user-facing changes
- Document breaking changes
- Document security fixes
-
Commit all changes
git status # If uncommitted changes exist, decide: commit or stash
Deployment Execution
Choose Deployment Method
Decision Matrix:
| What Changed | Script to Use | Command |
|---|---|---|
| Public HTML/CSS/JS only | deploy-frontend.sh |
./scripts/deploy-frontend.sh |
| Koha donation system | deploy-koha-to-production.sh |
./scripts/deploy-koha-to-production.sh |
| Full project (backend, routes, services) | deploy-full-project-SAFE.sh |
./scripts/deploy-full-project-SAFE.sh |
| Emergency rollback | Manual rsync | See rollback section |
Option 1: Frontend-Only Deployment
Use when only public-facing files changed (HTML, CSS, JS, images).
./scripts/deploy-frontend.sh
What it deploys:
public/directory- Excludes: admin, backend code, config files
Safety level: ✅ Safest (public files only)
Option 2: Koha-Specific Deployment
Use when Koha donation system changed.
./scripts/deploy-koha-to-production.sh
What it deploys:
- Koha controllers, services, routes
- Koha frontend (public/koha.html)
- Related middleware and models
Safety level: ⚠️ Moderate (includes backend code)
Option 3: Full Project Deployment (Most Common)
Use for backend changes, new features, or multi-component updates.
./scripts/deploy-full-project-SAFE.sh
Deployment steps:
- Script shows excluded patterns from .rsyncignore
- Review exclusions carefully - Verify sensitive files excluded
- Script shows dry-run summary
- Verify files to be deployed - Look for any unexpected files
- Confirm deployment (or Ctrl+C to abort)
- Script executes rsync with progress
- Deployment complete
What it deploys:
- All source code (src/)
- Public files (public/)
- Configuration (package.json, etc.)
- Documentation (docs/)
- Scripts (scripts/)
What it excludes (via .rsyncignore):
- Claude Code governance files (CLAUDE*.md, SESSION*.md)
- Environment files (.env*)
- Node modules (node_modules/)
- Git repository (.git/)
- Test files and coverage
- Development-only files
Safety level: ⚠️ Use carefully (full codebase)
Deployment Verification During Execution
-
Watch for errors during deployment
- Rsync errors (permission denied, connection failures)
- File conflicts
- Unexpected file deletions
-
Verify file count is reasonable
- Frontend: ~50-100 files
- Koha: ~20-30 files
- Full: ~200-300 files (varies by project size)
- If thousands of files: STOP - check .rsyncignore
Post-Deployment Verification
1. Immediate Checks (< 2 minutes)
-
Restart application (if backend changes)
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \ "sudo systemctl restart tractatus" -
Check service status
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \ "sudo systemctl status tractatus"- Expected:
active (running) - If failed: Check logs immediately
- Expected:
-
Health endpoint check
curl https://agenticgovernance.digital/health- Expected:
{"status":"ok","timestamp":"..."}(200 OK) - If 500 or error: Check logs, may need rollback
- Expected:
-
Homepage loads
curl -I https://agenticgovernance.digital- Expected:
HTTP/2 200 - If 404/500: Critical issue, check logs
- Expected:
2. Functional Checks (2-5 minutes)
-
Test primary user flows:
- Visit homepage: https://agenticgovernance.digital
- Navigate to Researcher path: https://agenticgovernance.digital/researcher.html
- Navigate to Implementer path: https://agenticgovernance.digital/implementer.html
- Navigate to Leader path: https://agenticgovernance.digital/leader.html
- Visit documentation: https://agenticgovernance.digital/docs.html
- Test interactive demo: https://agenticgovernance.digital/demos/27027-demo.html
-
Test navigation:
- Click navbar dropdown menus
- Mobile menu (resize browser or use DevTools)
- Footer links work
-
Test critical features (based on what changed):
- If Koha changed: Test donation flow (test mode)
- If admin changed: Test admin login
- If governance changed: Test governance API (with admin token)
- If documents changed: Test document retrieval
3. Log Monitoring (5-15 minutes)
-
Monitor production logs for errors
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \ "sudo journalctl -u tractatus -f"- Watch for:
- ERROR, CRITICAL log levels
- Unhandled exceptions
- Database connection failures
- 500 errors on requests
- Monitor for at least 5 minutes
- If errors appear: Investigate immediately
- Watch for:
-
Check for new error patterns
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \ "sudo journalctl -u tractatus --since '5 minutes ago' | grep -i error"- Compare to known errors (acceptable warnings)
- New errors may indicate deployment issues
4. Analytics Check (Optional, 15+ minutes)
-
Verify Plausible Analytics tracking
- Visit https://plausible.io/agenticgovernance.digital
- Confirm events are being tracked
- Check for unusual bounce rates or errors
-
Check Google Search Console (if configured)
- Verify no new crawl errors
- Check for 404 increases
Database Migration Procedure (If Needed)
Only required when schema changes or data migrations needed.
Pre-Migration
- Backup database (already done in pre-deployment)
- Test migration on staging (if staging environment exists)
- Review migration script
cat scripts/migrations/YYYYMMDD-description.js
Execute Migration
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
"cd /var/www/tractatus && node scripts/migrations/YYYYMMDD-description.js"
Post-Migration
-
Verify migration success
# Check migration completed # Check data integrity -
Test affected features
- Any features using migrated data
Migration Rollback (If Needed)
-
Restore database from backup
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \ "mongorestore --uri='...' /tmp/backup-TIMESTAMP" -
Rollback code (see rollback section)
Rollback Procedure
Use if deployment causes critical issues that can't be quickly fixed.
When to Rollback
- Application won't start
- Critical features completely broken
- Security vulnerability introduced
- Data loss or corruption occurring
- 500 errors on every request
How to Rollback
-
Identify last known good commit
git log --oneline -10 # Find commit before problematic changes -
Checkout last good commit
git checkout <commit-hash> -
Redeploy using same script
# Use same deployment script as original deployment ./scripts/deploy-full-project-SAFE.sh -
Restart application
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \ "sudo systemctl restart tractatus" -
Verify rollback successful
- Check health endpoint
- Check homepage loads
- Check logs for errors
-
Return to main branch
git checkout main
Post-Rollback
-
Document incident
- What went wrong?
- What was the impact?
- How was it detected?
- How long was it broken?
- What was rolled back?
-
Create incident report (template below)
-
Fix issue in development
- Reproduce locally
- Fix root cause
- Add tests to prevent recurrence
- Re-deploy when ready
Incident Documentation Template
Create file: docs/incidents/YYYY-MM-DD-description.md
# Incident Report: [Brief Description]
**Date**: YYYY-MM-DD HH:MM (NZST)
**Severity**: [Critical / High / Medium / Low]
**Duration**: [X minutes/hours]
**Detected By**: [User report / Monitoring / Developer]
## Summary
[1-2 sentence summary of what went wrong]
## Timeline
- HH:MM - Deployment initiated
- HH:MM - Issue detected
- HH:MM - Rollback initiated
- HH:MM - Service restored
## Root Cause
[What caused the issue?]
## Impact
- User-facing impact: [What did users experience?]
- Data impact: [Was any data lost/corrupted?]
- Security impact: [Were any security boundaries crossed?]
## Resolution
[How was it fixed?]
## Prevention
[What changes prevent this from happening again?]
## Action Items
- [ ] Fix root cause
- [ ] Add tests
- [ ] Update deployment checklist
- [ ] Update monitoring
Deployment Log Template
Keep a deployment log in: docs/deployments/YYYY-MM.md
# Deployments: [Month Year]
## YYYY-MM-DD HH:MM - [Description]
**Deployed By**: [Name]
**Deployment Type**: [Frontend / Koha / Full]
**Commits Deployed**:
- abc123 - Description
- def456 - Description
**Pre-Deployment Checks**:
- [x] Tests passing
- [x] Security audit clean
- [x] No sensitive files
**Verification**:
- [x] Health check passed
- [x] Homepage loads
- [x] No errors in logs
**Issues**: None
**Rollback Required**: No
**Notes**: [Any relevant notes]
Emergency Procedures
Service Won't Start
-
Check logs immediately
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \ "sudo journalctl -u tractatus -n 100" -
Common issues:
- MongoDB connection failed → Check MongoDB running:
sudo systemctl status mongod - Port already in use → Check for zombie processes:
sudo lsof -i :9000 - Missing environment variables → Check .env file exists
- Syntax error in code → Rollback immediately
- MongoDB connection failed → Check MongoDB running:
-
Quick fixes:
# Restart MongoDB if stopped sudo systemctl start mongod # Kill zombie processes sudo pkill -f node.*tractatus # Restart application sudo systemctl restart tractatus
Database Connection Lost
-
Verify MongoDB running
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \ "sudo systemctl status mongod" -
Check MongoDB logs
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \ "sudo journalctl -u mongod -n 50" -
Test connection manually
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \ "mongosh --host localhost --port 27017 --authenticationDatabase tractatus_prod -u tractatus_user"
High Error Rate
-
Identify error pattern
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \ "sudo journalctl -u tractatus --since '10 minutes ago' | grep ERROR | sort | uniq -c | sort -rn | head -10" -
Check if all endpoints affected or specific routes
# Check health endpoint curl https://agenticgovernance.digital/health # Check specific routes curl https://agenticgovernance.digital/api/documents -
Decision:
- If isolated to one feature: Disable feature, investigate
- If site-wide: Rollback immediately
Deployment Best Practices
DO:
- ✅ Deploy during low-traffic hours (NZ: 10am-2pm NZST = low US traffic)
- ✅ Deploy small, focused changes (easier to debug)
- ✅ Test thoroughly before deploying
- ✅ Monitor logs after deployment
- ✅ Document all deployments
- ✅ Keep rollback procedure tested and ready
- ✅ Communicate with team before major deployments
DON'T:
- ❌ Deploy on Friday afternoon (limited time to fix issues)
- ❌ Deploy multiple unrelated changes together
- ❌ Skip testing "because it's a small change"
- ❌ Deploy without checking logs after
- ❌ Deploy when tired or rushed
- ❌ Deploy without ability to rollback
- ❌ Forget to restart services after backend changes
Deployment Timing Guidelines
Best Times (Low risk):
- Monday-Thursday, 10am-2pm NZST
- After morning coffee, before lunch
- When you have 2+ hours to monitor
Acceptable Times (Medium risk):
- Monday-Thursday, 2pm-5pm NZST
- Early morning deployments (if you're alert)
Avoid Times (High risk):
- Friday 3pm+ (weekend coverage issues)
- Late evening (tired, less alert)
- During known high-traffic events
- When about to leave/travel
Automation Opportunities (Future)
Potential Improvements:
- Automated testing in CI/CD (GitHub Actions)
- Automated deployment on merge to main (after tests pass)
- Automated health checks post-deployment
- Automated rollback on health check failure
- Slack notifications for deployments
- Blue-green deployment for zero-downtime
- Canary deployments for gradual rollout
Not Ready Yet Because:
- Need stable test suite (✅ NOW READY - 380 tests passing)
- Need monitoring in place (⏳ Next task - Option D)
- Need error alerting (⏳ Next task - Option D)
- Need staging environment (💡 Future consideration)
Checklist Quick Reference
Pre-Deploy:
- Tests pass
- Security audit clean
- No sensitive files
- .rsyncignore verified
Deploy:
- Choose correct script
- Review dry-run
- Execute deployment
- Note any errors
Verify:
- Service running
- Health check OK
- Homepage loads
- Monitor logs 5-15min
Document:
- Log deployment
- Note any issues
- Update team
Contact & Support
Production Access:
- SSH:
ubuntu@vps-93a693da.vps.ovh.net - Key:
~/.ssh/tractatus_deploy - Sudo: Available for systemctl, journalctl
Service Management:
- Service:
tractatus.service(systemd) - Status:
sudo systemctl status tractatus - Logs:
sudo journalctl -u tractatus -f - Restart:
sudo systemctl restart tractatus
Database:
- Host: localhost:27017
- Database:
tractatus_prod - Auth: tractatus_prod database
- User:
tractatus_user
Domain:
- Production: https://agenticgovernance.digital
- Analytics: https://plausible.io/agenticgovernance.digital
Document Status: Active Procedure Last Updated: 2025-10-09 Next Review: After major deployment or incident Maintainer: Technical Lead (Claude Code + John Stroh)