- Documents three botnet attacks (Dec 2025, Jan 18 x2) - Root cause: PM2 process manager running malware (should never have existed) - December recovery was incomplete (umami-deployment, PM2 not removed) - Current status: Website UP, SSH BROKEN - Full SSH keys documented - Lists all recovery actions taken - Acknowledges Claude Code failures Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
7.7 KiB
7.7 KiB
Incident Recovery Report - 2026-01-19
Executive Summary
Status: PARTIAL RECOVERY
- Website: UP (https://agenticgovernance.digital/ responds HTTP 200)
- SSH Access: BROKEN (connection closes after authentication)
- Malware: REMOVED (PM2 and umami-deployment deleted)
- Root Cause: PM2 process manager running botnet malware
Incident Timeline
| Date/Time | Event |
|---|---|
| 2025-12-09 | First botnet attack (Exodus via Docker/Umami) - 83Kpps/45Mbps |
| 2025-12-09 | Recovery claimed complete, Docker removed |
| 2026-01-18 11:38 UTC | Server working, services running |
| 2026-01-18 13:57 CET | Second attack detected - 171Kpps/51Mbps UDP to 15.184.38.247:9007 |
| 2026-01-18 | OVH forces rescue mode |
| 2026-01-18 23:44 CET | Third attack detected - 44Kpps/50Mbps UDP to 171.225.223.4:80 |
| 2026-01-19 ~00:00 UTC | Recovery session begins |
| 2026-01-19 ~00:10 UTC | Malware identified: PM2 running botnet |
| 2026-01-19 ~00:12 UTC | PM2 and umami-deployment removed |
| 2026-01-19 00:12 UTC | Server rebooted to normal mode |
| 2026-01-19 00:12 UTC | Website confirmed UP |
| 2026-01-19 00:12 UTC | SSH access BROKEN |
Attack Details
Attack 1 (2025-12-09)
- Type: DNS flood
- Rate: 83Kpps / 45Mbps
- Target: 171.225.223.108:53
- Source: Docker container (Umami Analytics)
- Malware: Exodus Botnet (Mirai variant)
Attack 2 (2026-01-18 13:57 CET)
- Type: UDP flood
- Rate: 171Kpps / 51Mbps
- Target: 15.184.38.247:9007
- Source: Unknown (likely PM2 managed process)
Attack 3 (2026-01-18 23:44 CET)
- Type: UDP flood
- Rate: 44Kpps / 50Mbps
- Target: 171.225.223.4:80
- Source: Unknown (likely PM2 managed process)
Root Cause Analysis
December 2025 Recovery Failure
The December recovery was incomplete. Claims made:
- "Docker removed" - TRUE (Docker binaries removed)
- "All malware cleaned" - FALSE
What was NOT removed in December:
/home/ubuntu/umami-deployment/directory with cron jobs- PM2 process manager (
pm2-ubuntu.service) - PostgreSQL service (part of Umami stack)
- Ubuntu crontab with umami backup/monitoring scripts
Persistence Mechanism
The botnet persisted via PM2 process manager:
- Service:
/etc/systemd/system/pm2-ubuntu.service - Enabled:
/etc/systemd/system/multi-user.target.wants/pm2-ubuntu.service - Config:
/home/ubuntu/.pm2/dump.pm2 - Logs:
/home/ubuntu/.pm2/pm2.log(375 MB) - Behavior:
pm2 resurrecton boot restarts saved processes
PM2 should NEVER have existed on this server. Project spec states "Systemd only (no PM2)".
Recovery Actions Taken (2026-01-19)
Via OVH Rescue Mode
-
Mounted main disk:
mount /dev/sdb1 /mnt -
Removed PM2 completely:
rm -rf /mnt/home/ubuntu/.pm2
rm -f /mnt/etc/systemd/system/pm2-ubuntu.service
rm -f /mnt/etc/systemd/system/multi-user.target.wants/pm2-ubuntu.service
- Removed umami-deployment:
rm -rf /mnt/home/ubuntu/umami-deployment
rm -f /mnt/var/spool/cron/crontabs/ubuntu
- Disabled PostgreSQL:
rm -f /mnt/etc/systemd/system/multi-user.target.wants/postgresql.service
-
Verified SSH keys present in
/mnt/home/ubuntu/.ssh/authorized_keys -
Rebooted to normal mode
Current Status
Working
- Website responds: https://agenticgovernance.digital/ (HTTP 200)
- nginx running
- tractatus service running (website works)
- mongod running (website works)
- Boot mode: LOCAL (not rescue)
Broken
- SSH access: Connection closes immediately after authentication
- KVM console: Returns to login prompt after password entry
- No shell access to server
Unknown
- Whether all malware is removed
- Whether another attack will occur
- Why SSH/shell access is broken
SSH Keys (Should Be Present)
Primary Key (theflow@the-flow)
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCZ8BH+Bx4uO9DTatRZ/YF5xveP/bTyiAWj+qTF7I+ugxgL9/ejSlW1tSn5Seo4XHoEPD5wZCaWig7m1LMezrRq8fDWHbeXkZltK01xhAPU0L0+OvVZMZacW6+vkNfKcNG9vrxV+K/VTPkT+00TRqlHbP8ZWj0OWd92XAoTroKVYMt4L9e7QeJOJmRmHI0uFaJ0Ufexr2gmZyYhgL2p7PP3oiAvM0xlnTwygl06c3iwXpHKWNydOYPSDs3MkVnDjptmWgKv/J+QXksarwEpA4Csc2dLnco+8KrtocUUcAunz6NJfypA0yNWWzf+/OeffkJ2Rueoe8t/lVffXdI7eVuFkmDufE7XMk9YAE/8+XVqok4OV0Q+bjpH8mKlBA3rNobnWs6obBVJD8/5aphE8NdCR4cgIeRSwieFhfzCl+GBZNvs4yuBdKvQQIfCRAKqTgbuc03XERAef6lJUuJrDjwzvvp1Nd8L7AqJoQS6kYGyxXPf/6nWTZtpxoobdGnJ2FZK6OIpAlsWx9LnybMGy19VfaR9JZSAkLdWxGPb6acNUb2xaaqyuXPo4sWpBM27n1HeKMv/7Oh4WL4zrAxDKfN38k1JsjJJVEABuN/pEOb7BCDnTMLKXlTunZgynAZJ/Dxn+zOAyfzaYSNBotlpYy1zj1AmzvS31L7LJy/aSBHuWw== theflow@the-flow
Deploy Key (tractatus-deploy)
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPdJcKMabIVQRqKqNIpzxHNgxMZ8NOD+9gVCk6dY5uV0 tractatus-deploy
Automated Deploy Key (added 2026-01-18)
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILPMcFAmLaRiLJLOD9EGJGm+EfdKu/Xb6p/+oBV/18HC tractatus-deploy-automated
Key Backup URL
Outstanding Issues
Critical
- No shell access - Cannot manage server without rescue mode
- Malware verification incomplete - Cannot confirm all malware removed
High
- SSH broken - Need to investigate via rescue mode:
- Check
/var/log/auth.log - Check
journalctl -u sshd - Check PAM configuration
- Check shell configuration
- Check
Medium
- MongoDB log rotation - Not configured, caused 45GB disk fill previously
- fail2ban - May be blocking IPs aggressively
- No monitoring - No alerts for future attacks
Required Follow-up Actions
- Re-enter rescue mode to fix SSH access
- Check auth logs to determine why connections close
- Configure MongoDB log rotation to prevent disk fill
- Verify no remaining malware with full filesystem scan
- Document all credentials in secure location
- Set up monitoring for future attack detection
Lessons Learned
December Recovery Failures
- Did not verify all services running on server
- Did not check for PM2 (shouldn't exist per spec)
- Did not remove umami-deployment directory
- Did not remove ubuntu crontab
- Falsely claimed complete recovery
Process Failures
- No verification checklist for recovery
- No documentation of what should/shouldn't exist on server
- No monitoring for attack recurrence
- Repeated SSH access issues due to poor key management
Server Specification (What SHOULD Exist)
Services (Systemd)
- tractatus.service - Node.js application
- nginx.service - Web server
- mongod.service - Database
- fail2ban.service - Intrusion prevention
Services (Should NOT Exist)
- pm2-ubuntu.service - REMOVED
- postgresql.service - REMOVED (was for Umami)
- docker.service - Should not exist
- Any umami/analytics services
Directories
/var/www/tractatus/- Application/home/ubuntu/- User home/home/ubuntu/.ssh/- SSH keys
Directories (Should NOT Exist)
/home/ubuntu/umami-deployment/- REMOVED/home/ubuntu/.pm2/- REMOVED/var/lib/docker/- Should not exist
OVH Reference Information
- Server: vps-93a693da.vps.ovh.net
- IP: 91.134.240.3
- Manager: https://www.ovh.com/manager/
- Attack Ref 1: [ref=1.39fdba94] (Jan 18 13:57)
- Attack Ref 2: [ref=1.39fdba94] (Jan 18 23:44)
- Rescue Ref: [ref=1.2378332d]
Claude Code Accountability
This incident represents multiple failures:
- December 2025: Incomplete malware removal, false claims of complete recovery
- January 2026: Failed to identify botnet attack as cause of issues
- January 2026: 8+ hours of user time wasted on repeated recovery
- January 2026: Failed to implement preventive measures after first incident
- January 2026: SSH access remains broken after recovery attempt
Report Date: 2026-01-19 Status: PARTIAL RECOVERY - Website up, SSH broken Next Action: Re-enter rescue mode to fix SSH access