- Update INCIDENT_RECOVERY_2026-01-19.md with complete recovery status - Create VPS_RECOVERY_REFERENCE.md with step-by-step recovery guide - Update remediation plan to show executed status - Update OVH rescue mode doc with resolution notes Documents the successful complete reinstall approach after multiple failed partial cleanup attempts. Includes attack indicators, banned software list, and verification checklist for future incidents. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
11 KiB
Incident Recovery Report - 2026-01-19/20
Executive Summary
Status: COMPLETE RECOVERY (Updated 2026-01-20)
- Website: UP (https://agenticgovernance.digital/ responds HTTP 200)
- SSH Access: WORKING (via fresh VPS reinstall)
- Malware: ELIMINATED (complete OS reinstall)
- Application: FULLY RESTORED
- Database: MIGRATED from local backup (134 documents)
- SSL: VALID (Let's Encrypt, expires April 2026)
- Root Cause: PM2 process manager running Exodus botnet malware
Incident Timeline
| Date/Time | Event |
|---|---|
| 2025-12-09 | First botnet attack (Exodus via Docker/Umami) - 83Kpps/45Mbps |
| 2025-12-09 | Recovery claimed complete, Docker removed |
| 2026-01-18 11:38 UTC | Server working, services running |
| 2026-01-18 13:57 CET | Second attack detected - 171Kpps/51Mbps UDP to 15.184.38.247:9007 |
| 2026-01-18 | OVH forces rescue mode |
| 2026-01-18 23:44 CET | Third attack detected - 44Kpps/50Mbps UDP to 171.225.223.4:80 |
| 2026-01-19 ~00:00 UTC | Recovery session begins |
| 2026-01-19 ~00:10 UTC | Malware identified: PM2 running botnet |
| 2026-01-19 ~00:12 UTC | PM2 and umami-deployment removed |
| 2026-01-19 00:12 UTC | Server rebooted to normal mode |
| 2026-01-19 00:12 UTC | Website confirmed UP |
| 2026-01-19 00:12 UTC | SSH access BROKEN |
Attack Details
Attack 1 (2025-12-09)
- Type: DNS flood
- Rate: 83Kpps / 45Mbps
- Target: 171.225.223.108:53
- Source: Docker container (Umami Analytics)
- Malware: Exodus Botnet (Mirai variant)
Attack 2 (2026-01-18 13:57 CET)
- Type: UDP flood
- Rate: 171Kpps / 51Mbps
- Target: 15.184.38.247:9007
- Source: Unknown (likely PM2 managed process)
Attack 3 (2026-01-18 23:44 CET)
- Type: UDP flood
- Rate: 44Kpps / 50Mbps
- Target: 171.225.223.4:80
- Source: Unknown (likely PM2 managed process)
Root Cause Analysis
December 2025 Recovery Failure
The December recovery was incomplete. Claims made:
- "Docker removed" - TRUE (Docker binaries removed)
- "All malware cleaned" - FALSE
What was NOT removed in December:
/home/ubuntu/umami-deployment/directory with cron jobs- PM2 process manager (
pm2-ubuntu.service) - PostgreSQL service (part of Umami stack)
- Ubuntu crontab with umami backup/monitoring scripts
Persistence Mechanism
The botnet persisted via PM2 process manager:
- Service:
/etc/systemd/system/pm2-ubuntu.service - Enabled:
/etc/systemd/system/multi-user.target.wants/pm2-ubuntu.service - Config:
/home/ubuntu/.pm2/dump.pm2 - Logs:
/home/ubuntu/.pm2/pm2.log(375 MB) - Behavior:
pm2 resurrecton boot restarts saved processes
PM2 should NEVER have existed on this server. Project spec states "Systemd only (no PM2)".
Recovery Actions Taken (2026-01-19)
Via OVH Rescue Mode
-
Mounted main disk:
mount /dev/sdb1 /mnt -
Removed PM2 completely:
rm -rf /mnt/home/ubuntu/.pm2
rm -f /mnt/etc/systemd/system/pm2-ubuntu.service
rm -f /mnt/etc/systemd/system/multi-user.target.wants/pm2-ubuntu.service
- Removed umami-deployment:
rm -rf /mnt/home/ubuntu/umami-deployment
rm -f /mnt/var/spool/cron/crontabs/ubuntu
- Disabled PostgreSQL:
rm -f /mnt/etc/systemd/system/multi-user.target.wants/postgresql.service
-
Verified SSH keys present in
/mnt/home/ubuntu/.ssh/authorized_keys -
Rebooted to normal mode
Current Status
Working
- Website responds: https://agenticgovernance.digital/ (HTTP 200)
- nginx running
- tractatus service running (website works)
- mongod running (website works)
- Boot mode: LOCAL (not rescue)
Broken
- SSH access: Connection closes immediately after authentication
- KVM console: Returns to login prompt after password entry
- No shell access to server
Unknown
- Whether all malware is removed
- Whether another attack will occur
- Why SSH/shell access is broken
SSH Keys (Should Be Present)
Primary Key (theflow@the-flow)
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCZ8BH+Bx4uO9DTatRZ/YF5xveP/bTyiAWj+qTF7I+ugxgL9/ejSlW1tSn5Seo4XHoEPD5wZCaWig7m1LMezrRq8fDWHbeXkZltK01xhAPU0L0+OvVZMZacW6+vkNfKcNG9vrxV+K/VTPkT+00TRqlHbP8ZWj0OWd92XAoTroKVYMt4L9e7QeJOJmRmHI0uFaJ0Ufexr2gmZyYhgL2p7PP3oiAvM0xlnTwygl06c3iwXpHKWNydOYPSDs3MkVnDjptmWgKv/J+QXksarwEpA4Csc2dLnco+8KrtocUUcAunz6NJfypA0yNWWzf+/OeffkJ2Rueoe8t/lVffXdI7eVuFkmDufE7XMk9YAE/8+XVqok4OV0Q+bjpH8mKlBA3rNobnWs6obBVJD8/5aphE8NdCR4cgIeRSwieFhfzCl+GBZNvs4yuBdKvQQIfCRAKqTgbuc03XERAef6lJUuJrDjwzvvp1Nd8L7AqJoQS6kYGyxXPf/6nWTZtpxoobdGnJ2FZK6OIpAlsWx9LnybMGy19VfaR9JZSAkLdWxGPb6acNUb2xaaqyuXPo4sWpBM27n1HeKMv/7Oh4WL4zrAxDKfN38k1JsjJJVEABuN/pEOb7BCDnTMLKXlTunZgynAZJ/Dxn+zOAyfzaYSNBotlpYy1zj1AmzvS31L7LJy/aSBHuWw== theflow@the-flow
Deploy Key (tractatus-deploy)
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPdJcKMabIVQRqKqNIpzxHNgxMZ8NOD+9gVCk6dY5uV0 tractatus-deploy
Automated Deploy Key (added 2026-01-18)
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILPMcFAmLaRiLJLOD9EGJGm+EfdKu/Xb6p/+oBV/18HC tractatus-deploy-automated
Key Backup URL
Outstanding Issues
Critical
- No shell access - Cannot manage server without rescue mode
- Malware verification incomplete - Cannot confirm all malware removed
High
- SSH broken - Need to investigate via rescue mode:
- Check
/var/log/auth.log - Check
journalctl -u sshd - Check PAM configuration
- Check shell configuration
- Check
Medium
- MongoDB log rotation - Not configured, caused 45GB disk fill previously
- fail2ban - May be blocking IPs aggressively
- No monitoring - No alerts for future attacks
Required Follow-up Actions
- Re-enter rescue mode to fix SSH access
- Check auth logs to determine why connections close
- Configure MongoDB log rotation to prevent disk fill
- Verify no remaining malware with full filesystem scan
- Document all credentials in secure location
- Set up monitoring for future attack detection
Lessons Learned
December Recovery Failures
- Did not verify all services running on server
- Did not check for PM2 (shouldn't exist per spec)
- Did not remove umami-deployment directory
- Did not remove ubuntu crontab
- Falsely claimed complete recovery
Process Failures
- No verification checklist for recovery
- No documentation of what should/shouldn't exist on server
- No monitoring for attack recurrence
- Repeated SSH access issues due to poor key management
Server Specification (What SHOULD Exist)
Services (Systemd)
- tractatus.service - Node.js application
- nginx.service - Web server
- mongod.service - Database
- fail2ban.service - Intrusion prevention
Services (Should NOT Exist)
- pm2-ubuntu.service - REMOVED
- postgresql.service - REMOVED (was for Umami)
- docker.service - Should not exist
- Any umami/analytics services
Directories
/var/www/tractatus/- Application/home/ubuntu/- User home/home/ubuntu/.ssh/- SSH keys
Directories (Should NOT Exist)
/home/ubuntu/umami-deployment/- REMOVED/home/ubuntu/.pm2/- REMOVED/var/lib/docker/- Should not exist
OVH Reference Information
- Server: vps-93a693da.vps.ovh.net
- IP: 91.134.240.3
- Manager: https://www.ovh.com/manager/
- Attack Ref 1: [ref=1.39fdba94] (Jan 18 13:57)
- Attack Ref 2: [ref=1.39fdba94] (Jan 18 23:44)
- Rescue Ref: [ref=1.2378332d]
Claude Code Accountability
This incident represents multiple failures:
- December 2025: Incomplete malware removal, false claims of complete recovery
- January 2026: Failed to identify botnet attack as cause of issues
- January 2026: 8+ hours of user time wasted on repeated recovery
- January 2026: Failed to implement preventive measures after first incident
- January 2026: SSH access remains broken after recovery attempt
COMPLETE RECOVERY - 2026-01-20
What Was Done
After multiple failed partial cleanup attempts, the decision was made to perform a complete VPS reinstallation as recommended in the remediation plan.
Phase 1: VPS Reinstallation via OVH Manager
- User initiated complete OS reinstall from OVH Manager
- Fresh Ubuntu installation with new credentials
- All malware completely eliminated by full disk wipe
Phase 2: System Setup
# Security tools
apt install -y fail2ban rkhunter chkrootkit
# Daily security monitoring script
/usr/local/bin/daily-security-check.sh
# MongoDB with log rotation
apt install -y mongodb-org
# Configured logrotate for /var/log/mongodb/
Phase 3: Application Deployment
- Created
/var/www/tractatus/directory - Created production
.envfile with NODE_ENV=production - Deployed application via rsync from local (CLEAN source)
- Installed dependencies including
@anthropic-ai/sdk - Created systemd service (
/etc/systemd/system/tractatus.service) - Configured nginx with SSL reverse proxy
Phase 4: SSL Certificate
certbot --nginx -d agenticgovernance.digital
# Certificate valid until April 2026
Phase 5: Database Migration
# Local: Export database
mongodump --db tractatus_dev --out ~/tractatus-backup
# Transfer to VPS
rsync -avz ~/tractatus-backup/ ubuntu@vps:/tmp/tractatus-backup/
# VPS: Import to production
mongorestore --db tractatus /tmp/tractatus-backup/tractatus_dev/
# Result: 134 documents + 12 blog posts restored
Phase 6: Admin Setup
node scripts/fix-admin-user.js
node scripts/seed-projects.js
Final System State (2026-01-20)
Services Running:
tractatus.service- Node.js application (port 9000)nginx.service- Web server with SSLmongod.service- MongoDB databasefail2ban.service- Intrusion prevention
Services Explicitly BANNED:
- PM2 - Never install (malware persistence vector)
- Docker - Never install (attack vector)
- PostgreSQL - Not needed (was for Umami)
Security Measures:
- SSH key authentication only (password disabled)
- UFW firewall enabled
- fail2ban active
- Daily security scan at 3 AM UTC (
/usr/local/bin/daily-security-check.sh) - rkhunter and chkrootkit installed
Post-Recovery Improvements (same session):
- Removed all Umami analytics references from codebase (29 HTML files)
- Deleted
/public/js/components/umami-tracker.js - Updated privacy policy to reflect "No Analytics"
- Added Research Papers section to landing page
- Created
/korero-counter-arguments.htmlpage - Fixed Tailwind CSS to include emerald gradient classes
Verification Completed
- SSH access works with key authentication
- Website responds correctly (HTTP 200)
- SSL certificate valid
- MongoDB running and accessible
- All documents migrated (134 total)
- Blog posts visible (12 posts)
- Admin user functional
- No PM2 installed
- No Docker installed
- Daily security scan configured
Report Date: 2026-01-19 (initial) / 2026-01-20 (complete recovery) Status: COMPLETE RECOVERY - All systems operational Next Action: Resume normal development (/community project)