docs: create comprehensive production deployment checklist
Add detailed deployment procedure to prevent security incidents and ensure consistent, safe deployments to production. Includes: - Pre-deployment verification (tests, security, sensitive file checks) - Three deployment methods (frontend, Koha, full project) - Post-deployment verification (health checks, log monitoring) - Database migration procedure - Emergency rollback procedure - Incident documentation template - Deployment log template - Emergency procedures (service failures, DB issues) - Best practices and timing guidelines Created after security incident where sensitive Claude Code files were accidentally deployed. This checklist prevents similar incidents through: - Mandatory .rsyncignore verification - Sensitive file checks before deployment - Dry-run review before execution - Post-deployment monitoring Status: Active procedure for all production deployments 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
parent
20875e41fd
commit
91925d899c
1 changed files with 676 additions and 0 deletions
676
docs/PRODUCTION_DEPLOYMENT_CHECKLIST.md
Normal file
676
docs/PRODUCTION_DEPLOYMENT_CHECKLIST.md
Normal file
|
|
@ -0,0 +1,676 @@
|
||||||
|
# Production Deployment Checklist
|
||||||
|
|
||||||
|
**Project**: Tractatus AI Safety Framework Website
|
||||||
|
**Environment**: Production (vps-93a693da.vps.ovh.net)
|
||||||
|
**Domain**: https://agenticgovernance.digital
|
||||||
|
**Created**: 2025-10-09
|
||||||
|
**Status**: Active Procedure
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
This checklist ensures safe, consistent deployments to production. **Always follow this procedure** to prevent security incidents, service disruptions, and data loss.
|
||||||
|
|
||||||
|
**Deployment Philosophy**:
|
||||||
|
- Deploy early, deploy often
|
||||||
|
- Test thoroughly before deploying
|
||||||
|
- Verify after deploying
|
||||||
|
- Document incidents and learn
|
||||||
|
|
||||||
|
**Incident Prevention**: This checklist was created after a security incident where sensitive Claude Code governance files were accidentally deployed to production. Following this procedure prevents similar incidents.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pre-Deployment Checklist
|
||||||
|
|
||||||
|
### 1. Code Quality Verification
|
||||||
|
|
||||||
|
- [ ] **All tests passing locally**
|
||||||
|
```bash
|
||||||
|
npm test
|
||||||
|
```
|
||||||
|
- Expected: All tests pass, no failures
|
||||||
|
- If any tests fail: Fix before deploying
|
||||||
|
|
||||||
|
- [ ] **Test coverage acceptable**
|
||||||
|
```bash
|
||||||
|
npm test -- --coverage
|
||||||
|
```
|
||||||
|
- Check critical services maintain 80%+ coverage
|
||||||
|
- Review new code has reasonable coverage
|
||||||
|
|
||||||
|
- [ ] **Linting passes** (if linter configured)
|
||||||
|
```bash
|
||||||
|
npm run lint
|
||||||
|
# OR
|
||||||
|
npx eslint src/
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Security Verification
|
||||||
|
|
||||||
|
- [ ] **Run security audit**
|
||||||
|
```bash
|
||||||
|
npm audit
|
||||||
|
```
|
||||||
|
- Review all vulnerabilities
|
||||||
|
- Critical/High: Must fix or document why acceptable
|
||||||
|
- Medium/Low: Review and plan fix if needed
|
||||||
|
- If fixes available: `npm audit fix` then re-test
|
||||||
|
|
||||||
|
- [ ] **Check for sensitive files in git**
|
||||||
|
```bash
|
||||||
|
git ls-files | grep -E '(CLAUDE|SESSION|\.env|SECRET|HANDOFF|CLOSEDOWN|_Maintenance_Guide)'
|
||||||
|
```
|
||||||
|
- Expected: No matches (all sensitive files excluded)
|
||||||
|
- If matches found: Review .gitignore and remove from git history
|
||||||
|
|
||||||
|
- [ ] **Verify .rsyncignore completeness**
|
||||||
|
```bash
|
||||||
|
cat .rsyncignore
|
||||||
|
```
|
||||||
|
- Confirm excludes:
|
||||||
|
- `CLAUDE*.md`, `SESSION*.md`, maintenance guides
|
||||||
|
- `.env`, `.env.local`, `.env.production.local`
|
||||||
|
- `node_modules/`, `.git/`, `.claude/`
|
||||||
|
- Test files, coverage reports
|
||||||
|
- Development-only files
|
||||||
|
|
||||||
|
- [ ] **Check environment secrets not in code**
|
||||||
|
```bash
|
||||||
|
grep -r "sk-ant-" src/ || echo "No API keys found ✓"
|
||||||
|
grep -r "mongodb://tractatus" src/ || echo "No hardcoded DB URLs ✓"
|
||||||
|
```
|
||||||
|
- Expected: No hardcoded secrets in source code
|
||||||
|
- All secrets in .env files (which are excluded)
|
||||||
|
|
||||||
|
### 3. Database Verification
|
||||||
|
|
||||||
|
- [ ] **Database migrations ready** (if any)
|
||||||
|
```bash
|
||||||
|
# Check if new migrations exist
|
||||||
|
ls -la scripts/migrations/ | tail -5
|
||||||
|
```
|
||||||
|
- If migrations exist: Plan migration execution
|
||||||
|
- Document migration rollback procedure
|
||||||
|
|
||||||
|
- [ ] **Backup current database** (for major changes)
|
||||||
|
```bash
|
||||||
|
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
|
||||||
|
"mongodump --uri='mongodb://tractatus_user:PASSWORD@localhost:27017/tractatus_prod?authSource=tractatus_prod' --out=/tmp/backup-$(date +%Y%m%d-%H%M%S)"
|
||||||
|
```
|
||||||
|
- Only needed for schema changes or major updates
|
||||||
|
- Store backup location in deployment notes
|
||||||
|
|
||||||
|
### 4. Change Documentation
|
||||||
|
|
||||||
|
- [ ] **Review what's being deployed**
|
||||||
|
```bash
|
||||||
|
git log --oneline origin/main..HEAD
|
||||||
|
```
|
||||||
|
- Confirm all commits are intentional
|
||||||
|
- Verify no work-in-progress commits
|
||||||
|
|
||||||
|
- [ ] **Update CHANGELOG.md** (if project uses one)
|
||||||
|
- Document user-facing changes
|
||||||
|
- Document breaking changes
|
||||||
|
- Document security fixes
|
||||||
|
|
||||||
|
- [ ] **Commit all changes**
|
||||||
|
```bash
|
||||||
|
git status
|
||||||
|
# If uncommitted changes exist, decide: commit or stash
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Deployment Execution
|
||||||
|
|
||||||
|
### Choose Deployment Method
|
||||||
|
|
||||||
|
**Decision Matrix:**
|
||||||
|
|
||||||
|
| What Changed | Script to Use | Command |
|
||||||
|
|--------------|---------------|---------|
|
||||||
|
| Public HTML/CSS/JS only | `deploy-frontend.sh` | `./scripts/deploy-frontend.sh` |
|
||||||
|
| Koha donation system | `deploy-koha-to-production.sh` | `./scripts/deploy-koha-to-production.sh` |
|
||||||
|
| Full project (backend, routes, services) | `deploy-full-project-SAFE.sh` | `./scripts/deploy-full-project-SAFE.sh` |
|
||||||
|
| Emergency rollback | Manual rsync | See rollback section |
|
||||||
|
|
||||||
|
### Option 1: Frontend-Only Deployment
|
||||||
|
|
||||||
|
Use when only public-facing files changed (HTML, CSS, JS, images).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./scripts/deploy-frontend.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
**What it deploys:**
|
||||||
|
- `public/` directory
|
||||||
|
- Excludes: admin, backend code, config files
|
||||||
|
|
||||||
|
**Safety level:** ✅ Safest (public files only)
|
||||||
|
|
||||||
|
### Option 2: Koha-Specific Deployment
|
||||||
|
|
||||||
|
Use when Koha donation system changed.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./scripts/deploy-koha-to-production.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
**What it deploys:**
|
||||||
|
- Koha controllers, services, routes
|
||||||
|
- Koha frontend (public/koha.html)
|
||||||
|
- Related middleware and models
|
||||||
|
|
||||||
|
**Safety level:** ⚠️ Moderate (includes backend code)
|
||||||
|
|
||||||
|
### Option 3: Full Project Deployment (Most Common)
|
||||||
|
|
||||||
|
Use for backend changes, new features, or multi-component updates.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./scripts/deploy-full-project-SAFE.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
**Deployment steps:**
|
||||||
|
1. Script shows excluded patterns from .rsyncignore
|
||||||
|
2. **Review exclusions carefully** - Verify sensitive files excluded
|
||||||
|
3. Script shows dry-run summary
|
||||||
|
4. **Verify files to be deployed** - Look for any unexpected files
|
||||||
|
5. Confirm deployment (or Ctrl+C to abort)
|
||||||
|
6. Script executes rsync with progress
|
||||||
|
7. Deployment complete
|
||||||
|
|
||||||
|
**What it deploys:**
|
||||||
|
- All source code (src/)
|
||||||
|
- Public files (public/)
|
||||||
|
- Configuration (package.json, etc.)
|
||||||
|
- Documentation (docs/)
|
||||||
|
- Scripts (scripts/)
|
||||||
|
|
||||||
|
**What it excludes** (via .rsyncignore):
|
||||||
|
- Claude Code governance files (CLAUDE*.md, SESSION*.md)
|
||||||
|
- Environment files (.env*)
|
||||||
|
- Node modules (node_modules/)
|
||||||
|
- Git repository (.git/)
|
||||||
|
- Test files and coverage
|
||||||
|
- Development-only files
|
||||||
|
|
||||||
|
**Safety level:** ⚠️ Use carefully (full codebase)
|
||||||
|
|
||||||
|
### Deployment Verification During Execution
|
||||||
|
|
||||||
|
- [ ] **Watch for errors during deployment**
|
||||||
|
- Rsync errors (permission denied, connection failures)
|
||||||
|
- File conflicts
|
||||||
|
- Unexpected file deletions
|
||||||
|
|
||||||
|
- [ ] **Verify file count is reasonable**
|
||||||
|
- Frontend: ~50-100 files
|
||||||
|
- Koha: ~20-30 files
|
||||||
|
- Full: ~200-300 files (varies by project size)
|
||||||
|
- If thousands of files: STOP - check .rsyncignore
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Post-Deployment Verification
|
||||||
|
|
||||||
|
### 1. Immediate Checks (< 2 minutes)
|
||||||
|
|
||||||
|
- [ ] **Restart application** (if backend changes)
|
||||||
|
```bash
|
||||||
|
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
|
||||||
|
"sudo systemctl restart tractatus"
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Check service status**
|
||||||
|
```bash
|
||||||
|
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
|
||||||
|
"sudo systemctl status tractatus"
|
||||||
|
```
|
||||||
|
- Expected: `active (running)`
|
||||||
|
- If failed: Check logs immediately
|
||||||
|
|
||||||
|
- [ ] **Health endpoint check**
|
||||||
|
```bash
|
||||||
|
curl https://agenticgovernance.digital/health
|
||||||
|
```
|
||||||
|
- Expected: `{"status":"ok","timestamp":"..."}` (200 OK)
|
||||||
|
- If 500 or error: Check logs, may need rollback
|
||||||
|
|
||||||
|
- [ ] **Homepage loads**
|
||||||
|
```bash
|
||||||
|
curl -I https://agenticgovernance.digital
|
||||||
|
```
|
||||||
|
- Expected: `HTTP/2 200`
|
||||||
|
- If 404/500: Critical issue, check logs
|
||||||
|
|
||||||
|
### 2. Functional Checks (2-5 minutes)
|
||||||
|
|
||||||
|
- [ ] **Test primary user flows:**
|
||||||
|
- Visit homepage: https://agenticgovernance.digital
|
||||||
|
- Navigate to Researcher path: https://agenticgovernance.digital/researcher.html
|
||||||
|
- Navigate to Implementer path: https://agenticgovernance.digital/implementer.html
|
||||||
|
- Navigate to Leader path: https://agenticgovernance.digital/leader.html
|
||||||
|
- Visit documentation: https://agenticgovernance.digital/docs.html
|
||||||
|
- Test interactive demo: https://agenticgovernance.digital/demos/27027-demo.html
|
||||||
|
|
||||||
|
- [ ] **Test navigation:**
|
||||||
|
- Click navbar dropdown menus
|
||||||
|
- Mobile menu (resize browser or use DevTools)
|
||||||
|
- Footer links work
|
||||||
|
|
||||||
|
- [ ] **Test critical features** (based on what changed):
|
||||||
|
- If Koha changed: Test donation flow (test mode)
|
||||||
|
- If admin changed: Test admin login
|
||||||
|
- If governance changed: Test governance API (with admin token)
|
||||||
|
- If documents changed: Test document retrieval
|
||||||
|
|
||||||
|
### 3. Log Monitoring (5-15 minutes)
|
||||||
|
|
||||||
|
- [ ] **Monitor production logs for errors**
|
||||||
|
```bash
|
||||||
|
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
|
||||||
|
"sudo journalctl -u tractatus -f"
|
||||||
|
```
|
||||||
|
- Watch for:
|
||||||
|
- ERROR, CRITICAL log levels
|
||||||
|
- Unhandled exceptions
|
||||||
|
- Database connection failures
|
||||||
|
- 500 errors on requests
|
||||||
|
- Monitor for at least 5 minutes
|
||||||
|
- If errors appear: Investigate immediately
|
||||||
|
|
||||||
|
- [ ] **Check for new error patterns**
|
||||||
|
```bash
|
||||||
|
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
|
||||||
|
"sudo journalctl -u tractatus --since '5 minutes ago' | grep -i error"
|
||||||
|
```
|
||||||
|
- Compare to known errors (acceptable warnings)
|
||||||
|
- New errors may indicate deployment issues
|
||||||
|
|
||||||
|
### 4. Analytics Check (Optional, 15+ minutes)
|
||||||
|
|
||||||
|
- [ ] **Verify Plausible Analytics tracking**
|
||||||
|
- Visit https://plausible.io/agenticgovernance.digital
|
||||||
|
- Confirm events are being tracked
|
||||||
|
- Check for unusual bounce rates or errors
|
||||||
|
|
||||||
|
- [ ] **Check Google Search Console** (if configured)
|
||||||
|
- Verify no new crawl errors
|
||||||
|
- Check for 404 increases
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Database Migration Procedure (If Needed)
|
||||||
|
|
||||||
|
Only required when schema changes or data migrations needed.
|
||||||
|
|
||||||
|
### Pre-Migration
|
||||||
|
|
||||||
|
- [ ] **Backup database** (already done in pre-deployment)
|
||||||
|
- [ ] **Test migration on staging** (if staging environment exists)
|
||||||
|
- [ ] **Review migration script**
|
||||||
|
```bash
|
||||||
|
cat scripts/migrations/YYYYMMDD-description.js
|
||||||
|
```
|
||||||
|
|
||||||
|
### Execute Migration
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
|
||||||
|
"cd /var/www/tractatus && node scripts/migrations/YYYYMMDD-description.js"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Post-Migration
|
||||||
|
|
||||||
|
- [ ] **Verify migration success**
|
||||||
|
```bash
|
||||||
|
# Check migration completed
|
||||||
|
# Check data integrity
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Test affected features**
|
||||||
|
- Any features using migrated data
|
||||||
|
|
||||||
|
### Migration Rollback (If Needed)
|
||||||
|
|
||||||
|
- [ ] **Restore database from backup**
|
||||||
|
```bash
|
||||||
|
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
|
||||||
|
"mongorestore --uri='...' /tmp/backup-TIMESTAMP"
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Rollback code** (see rollback section)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Rollback Procedure
|
||||||
|
|
||||||
|
Use if deployment causes critical issues that can't be quickly fixed.
|
||||||
|
|
||||||
|
### When to Rollback
|
||||||
|
|
||||||
|
- Application won't start
|
||||||
|
- Critical features completely broken
|
||||||
|
- Security vulnerability introduced
|
||||||
|
- Data loss or corruption occurring
|
||||||
|
- 500 errors on every request
|
||||||
|
|
||||||
|
### How to Rollback
|
||||||
|
|
||||||
|
1. **Identify last known good commit**
|
||||||
|
```bash
|
||||||
|
git log --oneline -10
|
||||||
|
# Find commit before problematic changes
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Checkout last good commit**
|
||||||
|
```bash
|
||||||
|
git checkout <commit-hash>
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Redeploy using same script**
|
||||||
|
```bash
|
||||||
|
# Use same deployment script as original deployment
|
||||||
|
./scripts/deploy-full-project-SAFE.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Restart application**
|
||||||
|
```bash
|
||||||
|
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
|
||||||
|
"sudo systemctl restart tractatus"
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **Verify rollback successful**
|
||||||
|
- Check health endpoint
|
||||||
|
- Check homepage loads
|
||||||
|
- Check logs for errors
|
||||||
|
|
||||||
|
6. **Return to main branch**
|
||||||
|
```bash
|
||||||
|
git checkout main
|
||||||
|
```
|
||||||
|
|
||||||
|
### Post-Rollback
|
||||||
|
|
||||||
|
- [ ] **Document incident**
|
||||||
|
- What went wrong?
|
||||||
|
- What was the impact?
|
||||||
|
- How was it detected?
|
||||||
|
- How long was it broken?
|
||||||
|
- What was rolled back?
|
||||||
|
|
||||||
|
- [ ] **Create incident report** (template below)
|
||||||
|
|
||||||
|
- [ ] **Fix issue in development**
|
||||||
|
- Reproduce locally
|
||||||
|
- Fix root cause
|
||||||
|
- Add tests to prevent recurrence
|
||||||
|
- Re-deploy when ready
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Incident Documentation Template
|
||||||
|
|
||||||
|
Create file: `docs/incidents/YYYY-MM-DD-description.md`
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
# Incident Report: [Brief Description]
|
||||||
|
|
||||||
|
**Date**: YYYY-MM-DD HH:MM (NZST)
|
||||||
|
**Severity**: [Critical / High / Medium / Low]
|
||||||
|
**Duration**: [X minutes/hours]
|
||||||
|
**Detected By**: [User report / Monitoring / Developer]
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
[1-2 sentence summary of what went wrong]
|
||||||
|
|
||||||
|
## Timeline
|
||||||
|
- HH:MM - Deployment initiated
|
||||||
|
- HH:MM - Issue detected
|
||||||
|
- HH:MM - Rollback initiated
|
||||||
|
- HH:MM - Service restored
|
||||||
|
|
||||||
|
## Root Cause
|
||||||
|
[What caused the issue?]
|
||||||
|
|
||||||
|
## Impact
|
||||||
|
- User-facing impact: [What did users experience?]
|
||||||
|
- Data impact: [Was any data lost/corrupted?]
|
||||||
|
- Security impact: [Were any security boundaries crossed?]
|
||||||
|
|
||||||
|
## Resolution
|
||||||
|
[How was it fixed?]
|
||||||
|
|
||||||
|
## Prevention
|
||||||
|
[What changes prevent this from happening again?]
|
||||||
|
|
||||||
|
## Action Items
|
||||||
|
- [ ] Fix root cause
|
||||||
|
- [ ] Add tests
|
||||||
|
- [ ] Update deployment checklist
|
||||||
|
- [ ] Update monitoring
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Deployment Log Template
|
||||||
|
|
||||||
|
Keep a deployment log in: `docs/deployments/YYYY-MM.md`
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
# Deployments: [Month Year]
|
||||||
|
|
||||||
|
## YYYY-MM-DD HH:MM - [Description]
|
||||||
|
|
||||||
|
**Deployed By**: [Name]
|
||||||
|
**Deployment Type**: [Frontend / Koha / Full]
|
||||||
|
**Commits Deployed**:
|
||||||
|
- abc123 - Description
|
||||||
|
- def456 - Description
|
||||||
|
|
||||||
|
**Pre-Deployment Checks**:
|
||||||
|
- [x] Tests passing
|
||||||
|
- [x] Security audit clean
|
||||||
|
- [x] No sensitive files
|
||||||
|
|
||||||
|
**Verification**:
|
||||||
|
- [x] Health check passed
|
||||||
|
- [x] Homepage loads
|
||||||
|
- [x] No errors in logs
|
||||||
|
|
||||||
|
**Issues**: None
|
||||||
|
**Rollback Required**: No
|
||||||
|
**Notes**: [Any relevant notes]
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Emergency Procedures
|
||||||
|
|
||||||
|
### Service Won't Start
|
||||||
|
|
||||||
|
1. **Check logs immediately**
|
||||||
|
```bash
|
||||||
|
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
|
||||||
|
"sudo journalctl -u tractatus -n 100"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Common issues:**
|
||||||
|
- MongoDB connection failed → Check MongoDB running: `sudo systemctl status mongod`
|
||||||
|
- Port already in use → Check for zombie processes: `sudo lsof -i :9000`
|
||||||
|
- Missing environment variables → Check .env file exists
|
||||||
|
- Syntax error in code → Rollback immediately
|
||||||
|
|
||||||
|
3. **Quick fixes:**
|
||||||
|
```bash
|
||||||
|
# Restart MongoDB if stopped
|
||||||
|
sudo systemctl start mongod
|
||||||
|
|
||||||
|
# Kill zombie processes
|
||||||
|
sudo pkill -f node.*tractatus
|
||||||
|
|
||||||
|
# Restart application
|
||||||
|
sudo systemctl restart tractatus
|
||||||
|
```
|
||||||
|
|
||||||
|
### Database Connection Lost
|
||||||
|
|
||||||
|
1. **Verify MongoDB running**
|
||||||
|
```bash
|
||||||
|
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
|
||||||
|
"sudo systemctl status mongod"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Check MongoDB logs**
|
||||||
|
```bash
|
||||||
|
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
|
||||||
|
"sudo journalctl -u mongod -n 50"
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Test connection manually**
|
||||||
|
```bash
|
||||||
|
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
|
||||||
|
"mongosh --host localhost --port 27017 --authenticationDatabase tractatus_prod -u tractatus_user"
|
||||||
|
```
|
||||||
|
|
||||||
|
### High Error Rate
|
||||||
|
|
||||||
|
1. **Identify error pattern**
|
||||||
|
```bash
|
||||||
|
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
|
||||||
|
"sudo journalctl -u tractatus --since '10 minutes ago' | grep ERROR | sort | uniq -c | sort -rn | head -10"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Check if all endpoints affected or specific routes**
|
||||||
|
```bash
|
||||||
|
# Check health endpoint
|
||||||
|
curl https://agenticgovernance.digital/health
|
||||||
|
|
||||||
|
# Check specific routes
|
||||||
|
curl https://agenticgovernance.digital/api/documents
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Decision:**
|
||||||
|
- If isolated to one feature: Disable feature, investigate
|
||||||
|
- If site-wide: Rollback immediately
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Deployment Best Practices
|
||||||
|
|
||||||
|
### DO:
|
||||||
|
- ✅ Deploy during low-traffic hours (NZ: 10am-2pm NZST = low US traffic)
|
||||||
|
- ✅ Deploy small, focused changes (easier to debug)
|
||||||
|
- ✅ Test thoroughly before deploying
|
||||||
|
- ✅ Monitor logs after deployment
|
||||||
|
- ✅ Document all deployments
|
||||||
|
- ✅ Keep rollback procedure tested and ready
|
||||||
|
- ✅ Communicate with team before major deployments
|
||||||
|
|
||||||
|
### DON'T:
|
||||||
|
- ❌ Deploy on Friday afternoon (limited time to fix issues)
|
||||||
|
- ❌ Deploy multiple unrelated changes together
|
||||||
|
- ❌ Skip testing "because it's a small change"
|
||||||
|
- ❌ Deploy without checking logs after
|
||||||
|
- ❌ Deploy when tired or rushed
|
||||||
|
- ❌ Deploy without ability to rollback
|
||||||
|
- ❌ Forget to restart services after backend changes
|
||||||
|
|
||||||
|
### Deployment Timing Guidelines
|
||||||
|
|
||||||
|
**Best Times** (Low risk):
|
||||||
|
- Monday-Thursday, 10am-2pm NZST
|
||||||
|
- After morning coffee, before lunch
|
||||||
|
- When you have 2+ hours to monitor
|
||||||
|
|
||||||
|
**Acceptable Times** (Medium risk):
|
||||||
|
- Monday-Thursday, 2pm-5pm NZST
|
||||||
|
- Early morning deployments (if you're alert)
|
||||||
|
|
||||||
|
**Avoid Times** (High risk):
|
||||||
|
- Friday 3pm+ (weekend coverage issues)
|
||||||
|
- Late evening (tired, less alert)
|
||||||
|
- During known high-traffic events
|
||||||
|
- When about to leave/travel
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Automation Opportunities (Future)
|
||||||
|
|
||||||
|
### Potential Improvements:
|
||||||
|
- [ ] Automated testing in CI/CD (GitHub Actions)
|
||||||
|
- [ ] Automated deployment on merge to main (after tests pass)
|
||||||
|
- [ ] Automated health checks post-deployment
|
||||||
|
- [ ] Automated rollback on health check failure
|
||||||
|
- [ ] Slack notifications for deployments
|
||||||
|
- [ ] Blue-green deployment for zero-downtime
|
||||||
|
- [ ] Canary deployments for gradual rollout
|
||||||
|
|
||||||
|
### Not Ready Yet Because:
|
||||||
|
- Need stable test suite (✅ NOW READY - 380 tests passing)
|
||||||
|
- Need monitoring in place (⏳ Next task - Option D)
|
||||||
|
- Need error alerting (⏳ Next task - Option D)
|
||||||
|
- Need staging environment (💡 Future consideration)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Checklist Quick Reference
|
||||||
|
|
||||||
|
**Pre-Deploy:**
|
||||||
|
- [ ] Tests pass
|
||||||
|
- [ ] Security audit clean
|
||||||
|
- [ ] No sensitive files
|
||||||
|
- [ ] .rsyncignore verified
|
||||||
|
|
||||||
|
**Deploy:**
|
||||||
|
- [ ] Choose correct script
|
||||||
|
- [ ] Review dry-run
|
||||||
|
- [ ] Execute deployment
|
||||||
|
- [ ] Note any errors
|
||||||
|
|
||||||
|
**Verify:**
|
||||||
|
- [ ] Service running
|
||||||
|
- [ ] Health check OK
|
||||||
|
- [ ] Homepage loads
|
||||||
|
- [ ] Monitor logs 5-15min
|
||||||
|
|
||||||
|
**Document:**
|
||||||
|
- [ ] Log deployment
|
||||||
|
- [ ] Note any issues
|
||||||
|
- [ ] Update team
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Contact & Support
|
||||||
|
|
||||||
|
**Production Access:**
|
||||||
|
- SSH: `ubuntu@vps-93a693da.vps.ovh.net`
|
||||||
|
- Key: `~/.ssh/tractatus_deploy`
|
||||||
|
- Sudo: Available for systemctl, journalctl
|
||||||
|
|
||||||
|
**Service Management:**
|
||||||
|
- Service: `tractatus.service` (systemd)
|
||||||
|
- Status: `sudo systemctl status tractatus`
|
||||||
|
- Logs: `sudo journalctl -u tractatus -f`
|
||||||
|
- Restart: `sudo systemctl restart tractatus`
|
||||||
|
|
||||||
|
**Database:**
|
||||||
|
- Host: localhost:27017
|
||||||
|
- Database: `tractatus_prod`
|
||||||
|
- Auth: tractatus_prod database
|
||||||
|
- User: `tractatus_user`
|
||||||
|
|
||||||
|
**Domain:**
|
||||||
|
- Production: https://agenticgovernance.digital
|
||||||
|
- Analytics: https://plausible.io/agenticgovernance.digital
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Document Status**: Active Procedure
|
||||||
|
**Last Updated**: 2025-10-09
|
||||||
|
**Next Review**: After major deployment or incident
|
||||||
|
**Maintainer**: Technical Lead (Claude Code + John Stroh)
|
||||||
Loading…
Add table
Reference in a new issue