# Incident Recovery Report - 2026-01-19/20 ## Executive Summary **Status:** COMPLETE RECOVERY (Updated 2026-01-20) - Website: UP (https://agenticgovernance.digital/ responds HTTP 200) - SSH Access: WORKING (via fresh VPS reinstall) - Malware: ELIMINATED (complete OS reinstall) - Application: FULLY RESTORED - Database: MIGRATED from local backup (134 documents) - SSL: VALID (Let's Encrypt, expires April 2026) - Root Cause: PM2 process manager running Exodus botnet malware --- ## Incident Timeline | Date/Time | Event | |-----------|-------| | 2025-12-09 | First botnet attack (Exodus via Docker/Umami) - 83Kpps/45Mbps | | 2025-12-09 | Recovery claimed complete, Docker removed | | 2026-01-18 11:38 UTC | Server working, services running | | 2026-01-18 13:57 CET | Second attack detected - 171Kpps/51Mbps UDP to 15.184.38.247:9007 | | 2026-01-18 | OVH forces rescue mode | | 2026-01-18 23:44 CET | Third attack detected - 44Kpps/50Mbps UDP to 171.225.223.4:80 | | 2026-01-19 ~00:00 UTC | Recovery session begins | | 2026-01-19 ~00:10 UTC | Malware identified: PM2 running botnet | | 2026-01-19 ~00:12 UTC | PM2 and umami-deployment removed | | 2026-01-19 00:12 UTC | Server rebooted to normal mode | | 2026-01-19 00:12 UTC | Website confirmed UP | | 2026-01-19 00:12 UTC | SSH access BROKEN | --- ## Attack Details ### Attack 1 (2025-12-09) - **Type:** DNS flood - **Rate:** 83Kpps / 45Mbps - **Target:** 171.225.223.108:53 - **Source:** Docker container (Umami Analytics) - **Malware:** Exodus Botnet (Mirai variant) ### Attack 2 (2026-01-18 13:57 CET) - **Type:** UDP flood - **Rate:** 171Kpps / 51Mbps - **Target:** 15.184.38.247:9007 - **Source:** Unknown (likely PM2 managed process) ### Attack 3 (2026-01-18 23:44 CET) - **Type:** UDP flood - **Rate:** 44Kpps / 50Mbps - **Target:** 171.225.223.4:80 - **Source:** Unknown (likely PM2 managed process) --- ## Root Cause Analysis ### December 2025 Recovery Failure The December recovery was **incomplete**. Claims made: - "Docker removed" - TRUE (Docker binaries removed) - "All malware cleaned" - FALSE What was **NOT** removed in December: 1. `/home/ubuntu/umami-deployment/` directory with cron jobs 2. PM2 process manager (`pm2-ubuntu.service`) 3. PostgreSQL service (part of Umami stack) 4. Ubuntu crontab with umami backup/monitoring scripts ### Persistence Mechanism The botnet persisted via **PM2 process manager**: - Service: `/etc/systemd/system/pm2-ubuntu.service` - Enabled: `/etc/systemd/system/multi-user.target.wants/pm2-ubuntu.service` - Config: `/home/ubuntu/.pm2/dump.pm2` - Logs: `/home/ubuntu/.pm2/pm2.log` (375 MB) - Behavior: `pm2 resurrect` on boot restarts saved processes PM2 should NEVER have existed on this server. Project spec states "Systemd only (no PM2)". --- ## Recovery Actions Taken (2026-01-19) ### Via OVH Rescue Mode 1. Mounted main disk: `mount /dev/sdb1 /mnt` 2. Removed PM2 completely: ```bash rm -rf /mnt/home/ubuntu/.pm2 rm -f /mnt/etc/systemd/system/pm2-ubuntu.service rm -f /mnt/etc/systemd/system/multi-user.target.wants/pm2-ubuntu.service ``` 3. Removed umami-deployment: ```bash rm -rf /mnt/home/ubuntu/umami-deployment rm -f /mnt/var/spool/cron/crontabs/ubuntu ``` 4. Disabled PostgreSQL: ```bash rm -f /mnt/etc/systemd/system/multi-user.target.wants/postgresql.service ``` 5. Verified SSH keys present in `/mnt/home/ubuntu/.ssh/authorized_keys` 6. Rebooted to normal mode --- ## Current Status ### Working - Website responds: https://agenticgovernance.digital/ (HTTP 200) - nginx running - tractatus service running (website works) - mongod running (website works) - Boot mode: LOCAL (not rescue) ### Broken - SSH access: Connection closes immediately after authentication - KVM console: Returns to login prompt after password entry - No shell access to server ### Unknown - Whether all malware is removed - Whether another attack will occur - Why SSH/shell access is broken --- ## SSH Keys (Should Be Present) ### Primary Key (theflow@the-flow) ``` ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCZ8BH+Bx4uO9DTatRZ/YF5xveP/bTyiAWj+qTF7I+ugxgL9/ejSlW1tSn5Seo4XHoEPD5wZCaWig7m1LMezrRq8fDWHbeXkZltK01xhAPU0L0+OvVZMZacW6+vkNfKcNG9vrxV+K/VTPkT+00TRqlHbP8ZWj0OWd92XAoTroKVYMt4L9e7QeJOJmRmHI0uFaJ0Ufexr2gmZyYhgL2p7PP3oiAvM0xlnTwygl06c3iwXpHKWNydOYPSDs3MkVnDjptmWgKv/J+QXksarwEpA4Csc2dLnco+8KrtocUUcAunz6NJfypA0yNWWzf+/OeffkJ2Rueoe8t/lVffXdI7eVuFkmDufE7XMk9YAE/8+XVqok4OV0Q+bjpH8mKlBA3rNobnWs6obBVJD8/5aphE8NdCR4cgIeRSwieFhfzCl+GBZNvs4yuBdKvQQIfCRAKqTgbuc03XERAef6lJUuJrDjwzvvp1Nd8L7AqJoQS6kYGyxXPf/6nWTZtpxoobdGnJ2FZK6OIpAlsWx9LnybMGy19VfaR9JZSAkLdWxGPb6acNUb2xaaqyuXPo4sWpBM27n1HeKMv/7Oh4WL4zrAxDKfN38k1JsjJJVEABuN/pEOb7BCDnTMLKXlTunZgynAZJ/Dxn+zOAyfzaYSNBotlpYy1zj1AmzvS31L7LJy/aSBHuWw== theflow@the-flow ``` ### Deploy Key (tractatus-deploy) ``` ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPdJcKMabIVQRqKqNIpzxHNgxMZ8NOD+9gVCk6dY5uV0 tractatus-deploy ``` ### Automated Deploy Key (added 2026-01-18) ``` ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILPMcFAmLaRiLJLOD9EGJGm+EfdKu/Xb6p/+oBV/18HC tractatus-deploy-automated ``` ### Key Backup URL https://paste.rs/nELRM --- ## Outstanding Issues ### Critical 1. **No shell access** - Cannot manage server without rescue mode 2. **Malware verification incomplete** - Cannot confirm all malware removed ### High 1. **SSH broken** - Need to investigate via rescue mode: - Check `/var/log/auth.log` - Check `journalctl -u sshd` - Check PAM configuration - Check shell configuration ### Medium 1. **MongoDB log rotation** - Not configured, caused 45GB disk fill previously 2. **fail2ban** - May be blocking IPs aggressively 3. **No monitoring** - No alerts for future attacks --- ## Required Follow-up Actions 1. **Re-enter rescue mode** to fix SSH access 2. **Check auth logs** to determine why connections close 3. **Configure MongoDB log rotation** to prevent disk fill 4. **Verify no remaining malware** with full filesystem scan 5. **Document all credentials** in secure location 6. **Set up monitoring** for future attack detection --- ## Lessons Learned ### December Recovery Failures 1. Did not verify all services running on server 2. Did not check for PM2 (shouldn't exist per spec) 3. Did not remove umami-deployment directory 4. Did not remove ubuntu crontab 5. Falsely claimed complete recovery ### Process Failures 1. No verification checklist for recovery 2. No documentation of what should/shouldn't exist on server 3. No monitoring for attack recurrence 4. Repeated SSH access issues due to poor key management --- ## Server Specification (What SHOULD Exist) ### Services (Systemd) - tractatus.service - Node.js application - nginx.service - Web server - mongod.service - Database - fail2ban.service - Intrusion prevention ### Services (Should NOT Exist) - pm2-ubuntu.service - REMOVED - postgresql.service - REMOVED (was for Umami) - docker.service - Should not exist - Any umami/analytics services ### Directories - `/var/www/tractatus/` - Application - `/home/ubuntu/` - User home - `/home/ubuntu/.ssh/` - SSH keys ### Directories (Should NOT Exist) - `/home/ubuntu/umami-deployment/` - REMOVED - `/home/ubuntu/.pm2/` - REMOVED - `/var/lib/docker/` - Should not exist --- ## OVH Reference Information - **Server:** vps-93a693da.vps.ovh.net - **IP:** 91.134.240.3 - **Manager:** https://www.ovh.com/manager/ - **Attack Ref 1:** [ref=1.39fdba94] (Jan 18 13:57) - **Attack Ref 2:** [ref=1.39fdba94] (Jan 18 23:44) - **Rescue Ref:** [ref=1.2378332d] --- ## Claude Code Accountability This incident represents multiple failures: 1. **December 2025:** Incomplete malware removal, false claims of complete recovery 2. **January 2026:** Failed to identify botnet attack as cause of issues 3. **January 2026:** 8+ hours of user time wasted on repeated recovery 4. **January 2026:** Failed to implement preventive measures after first incident 5. **January 2026:** SSH access remains broken after recovery attempt --- --- ## COMPLETE RECOVERY - 2026-01-20 ### What Was Done After multiple failed partial cleanup attempts, the decision was made to perform a **complete VPS reinstallation** as recommended in the remediation plan. #### Phase 1: VPS Reinstallation via OVH Manager - User initiated complete OS reinstall from OVH Manager - Fresh Ubuntu installation with new credentials - All malware completely eliminated by full disk wipe #### Phase 2: System Setup ```bash # Security tools apt install -y fail2ban rkhunter chkrootkit # Daily security monitoring script /usr/local/bin/daily-security-check.sh # MongoDB with log rotation apt install -y mongodb-org # Configured logrotate for /var/log/mongodb/ ``` #### Phase 3: Application Deployment 1. Created `/var/www/tractatus/` directory 2. Created production `.env` file with NODE_ENV=production 3. Deployed application via rsync from local (CLEAN source) 4. Installed dependencies including `@anthropic-ai/sdk` 5. Created systemd service (`/etc/systemd/system/tractatus.service`) 6. Configured nginx with SSL reverse proxy #### Phase 4: SSL Certificate ```bash certbot --nginx -d agenticgovernance.digital # Certificate valid until April 2026 ``` #### Phase 5: Database Migration ```bash # Local: Export database mongodump --db tractatus_dev --out ~/tractatus-backup # Transfer to VPS rsync -avz ~/tractatus-backup/ ubuntu@vps:/tmp/tractatus-backup/ # VPS: Import to production mongorestore --db tractatus /tmp/tractatus-backup/tractatus_dev/ # Result: 134 documents + 12 blog posts restored ``` #### Phase 6: Admin Setup ```bash node scripts/fix-admin-user.js node scripts/seed-projects.js ``` ### Final System State (2026-01-20) **Services Running:** - `tractatus.service` - Node.js application (port 9000) - `nginx.service` - Web server with SSL - `mongod.service` - MongoDB database - `fail2ban.service` - Intrusion prevention **Services Explicitly BANNED:** - PM2 - Never install (malware persistence vector) - Docker - Never install (attack vector) - PostgreSQL - Not needed (was for Umami) **Security Measures:** - SSH key authentication only (password disabled) - UFW firewall enabled - fail2ban active - Daily security scan at 3 AM UTC (`/usr/local/bin/daily-security-check.sh`) - rkhunter and chkrootkit installed **Post-Recovery Improvements (same session):** - Removed all Umami analytics references from codebase (29 HTML files) - Deleted `/public/js/components/umami-tracker.js` - Updated privacy policy to reflect "No Analytics" - Added Research Papers section to landing page - Created `/korero-counter-arguments.html` page - Fixed Tailwind CSS to include emerald gradient classes ### Verification Completed - [x] SSH access works with key authentication - [x] Website responds correctly (HTTP 200) - [x] SSL certificate valid - [x] MongoDB running and accessible - [x] All documents migrated (134 total) - [x] Blog posts visible (12 posts) - [x] Admin user functional - [x] No PM2 installed - [x] No Docker installed - [x] Daily security scan configured --- **Report Date:** 2026-01-19 (initial) / 2026-01-20 (complete recovery) **Status:** COMPLETE RECOVERY - All systems operational **Next Action:** Resume normal development (/community project)