tractatus/docs/INCIDENT_RECOVERY_2026-01-19.md
TheFlow 9b95c364d2 docs: Add incident recovery report 2026-01-19
- Documents three botnet attacks (Dec 2025, Jan 18 x2)
- Root cause: PM2 process manager running malware (should never have existed)
- December recovery was incomplete (umami-deployment, PM2 not removed)
- Current status: Website UP, SSH BROKEN
- Full SSH keys documented
- Lists all recovery actions taken
- Acknowledges Claude Code failures

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 13:28:59 +13:00

7.7 KiB

Incident Recovery Report - 2026-01-19

Executive Summary

Status: PARTIAL RECOVERY

  • Website: UP (https://agenticgovernance.digital/ responds HTTP 200)
  • SSH Access: BROKEN (connection closes after authentication)
  • Malware: REMOVED (PM2 and umami-deployment deleted)
  • Root Cause: PM2 process manager running botnet malware

Incident Timeline

Date/Time Event
2025-12-09 First botnet attack (Exodus via Docker/Umami) - 83Kpps/45Mbps
2025-12-09 Recovery claimed complete, Docker removed
2026-01-18 11:38 UTC Server working, services running
2026-01-18 13:57 CET Second attack detected - 171Kpps/51Mbps UDP to 15.184.38.247:9007
2026-01-18 OVH forces rescue mode
2026-01-18 23:44 CET Third attack detected - 44Kpps/50Mbps UDP to 171.225.223.4:80
2026-01-19 ~00:00 UTC Recovery session begins
2026-01-19 ~00:10 UTC Malware identified: PM2 running botnet
2026-01-19 ~00:12 UTC PM2 and umami-deployment removed
2026-01-19 00:12 UTC Server rebooted to normal mode
2026-01-19 00:12 UTC Website confirmed UP
2026-01-19 00:12 UTC SSH access BROKEN

Attack Details

Attack 1 (2025-12-09)

  • Type: DNS flood
  • Rate: 83Kpps / 45Mbps
  • Target: 171.225.223.108:53
  • Source: Docker container (Umami Analytics)
  • Malware: Exodus Botnet (Mirai variant)

Attack 2 (2026-01-18 13:57 CET)

  • Type: UDP flood
  • Rate: 171Kpps / 51Mbps
  • Target: 15.184.38.247:9007
  • Source: Unknown (likely PM2 managed process)

Attack 3 (2026-01-18 23:44 CET)

  • Type: UDP flood
  • Rate: 44Kpps / 50Mbps
  • Target: 171.225.223.4:80
  • Source: Unknown (likely PM2 managed process)

Root Cause Analysis

December 2025 Recovery Failure

The December recovery was incomplete. Claims made:

  • "Docker removed" - TRUE (Docker binaries removed)
  • "All malware cleaned" - FALSE

What was NOT removed in December:

  1. /home/ubuntu/umami-deployment/ directory with cron jobs
  2. PM2 process manager (pm2-ubuntu.service)
  3. PostgreSQL service (part of Umami stack)
  4. Ubuntu crontab with umami backup/monitoring scripts

Persistence Mechanism

The botnet persisted via PM2 process manager:

  • Service: /etc/systemd/system/pm2-ubuntu.service
  • Enabled: /etc/systemd/system/multi-user.target.wants/pm2-ubuntu.service
  • Config: /home/ubuntu/.pm2/dump.pm2
  • Logs: /home/ubuntu/.pm2/pm2.log (375 MB)
  • Behavior: pm2 resurrect on boot restarts saved processes

PM2 should NEVER have existed on this server. Project spec states "Systemd only (no PM2)".


Recovery Actions Taken (2026-01-19)

Via OVH Rescue Mode

  1. Mounted main disk: mount /dev/sdb1 /mnt

  2. Removed PM2 completely:

rm -rf /mnt/home/ubuntu/.pm2
rm -f /mnt/etc/systemd/system/pm2-ubuntu.service
rm -f /mnt/etc/systemd/system/multi-user.target.wants/pm2-ubuntu.service
  1. Removed umami-deployment:
rm -rf /mnt/home/ubuntu/umami-deployment
rm -f /mnt/var/spool/cron/crontabs/ubuntu
  1. Disabled PostgreSQL:
rm -f /mnt/etc/systemd/system/multi-user.target.wants/postgresql.service
  1. Verified SSH keys present in /mnt/home/ubuntu/.ssh/authorized_keys

  2. Rebooted to normal mode


Current Status

Working

  • Website responds: https://agenticgovernance.digital/ (HTTP 200)
  • nginx running
  • tractatus service running (website works)
  • mongod running (website works)
  • Boot mode: LOCAL (not rescue)

Broken

  • SSH access: Connection closes immediately after authentication
  • KVM console: Returns to login prompt after password entry
  • No shell access to server

Unknown

  • Whether all malware is removed
  • Whether another attack will occur
  • Why SSH/shell access is broken

SSH Keys (Should Be Present)

Primary Key (theflow@the-flow)

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCZ8BH+Bx4uO9DTatRZ/YF5xveP/bTyiAWj+qTF7I+ugxgL9/ejSlW1tSn5Seo4XHoEPD5wZCaWig7m1LMezrRq8fDWHbeXkZltK01xhAPU0L0+OvVZMZacW6+vkNfKcNG9vrxV+K/VTPkT+00TRqlHbP8ZWj0OWd92XAoTroKVYMt4L9e7QeJOJmRmHI0uFaJ0Ufexr2gmZyYhgL2p7PP3oiAvM0xlnTwygl06c3iwXpHKWNydOYPSDs3MkVnDjptmWgKv/J+QXksarwEpA4Csc2dLnco+8KrtocUUcAunz6NJfypA0yNWWzf+/OeffkJ2Rueoe8t/lVffXdI7eVuFkmDufE7XMk9YAE/8+XVqok4OV0Q+bjpH8mKlBA3rNobnWs6obBVJD8/5aphE8NdCR4cgIeRSwieFhfzCl+GBZNvs4yuBdKvQQIfCRAKqTgbuc03XERAef6lJUuJrDjwzvvp1Nd8L7AqJoQS6kYGyxXPf/6nWTZtpxoobdGnJ2FZK6OIpAlsWx9LnybMGy19VfaR9JZSAkLdWxGPb6acNUb2xaaqyuXPo4sWpBM27n1HeKMv/7Oh4WL4zrAxDKfN38k1JsjJJVEABuN/pEOb7BCDnTMLKXlTunZgynAZJ/Dxn+zOAyfzaYSNBotlpYy1zj1AmzvS31L7LJy/aSBHuWw== theflow@the-flow

Deploy Key (tractatus-deploy)

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPdJcKMabIVQRqKqNIpzxHNgxMZ8NOD+9gVCk6dY5uV0 tractatus-deploy

Automated Deploy Key (added 2026-01-18)

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILPMcFAmLaRiLJLOD9EGJGm+EfdKu/Xb6p/+oBV/18HC tractatus-deploy-automated

Key Backup URL

https://paste.rs/nELRM


Outstanding Issues

Critical

  1. No shell access - Cannot manage server without rescue mode
  2. Malware verification incomplete - Cannot confirm all malware removed

High

  1. SSH broken - Need to investigate via rescue mode:
    • Check /var/log/auth.log
    • Check journalctl -u sshd
    • Check PAM configuration
    • Check shell configuration

Medium

  1. MongoDB log rotation - Not configured, caused 45GB disk fill previously
  2. fail2ban - May be blocking IPs aggressively
  3. No monitoring - No alerts for future attacks

Required Follow-up Actions

  1. Re-enter rescue mode to fix SSH access
  2. Check auth logs to determine why connections close
  3. Configure MongoDB log rotation to prevent disk fill
  4. Verify no remaining malware with full filesystem scan
  5. Document all credentials in secure location
  6. Set up monitoring for future attack detection

Lessons Learned

December Recovery Failures

  1. Did not verify all services running on server
  2. Did not check for PM2 (shouldn't exist per spec)
  3. Did not remove umami-deployment directory
  4. Did not remove ubuntu crontab
  5. Falsely claimed complete recovery

Process Failures

  1. No verification checklist for recovery
  2. No documentation of what should/shouldn't exist on server
  3. No monitoring for attack recurrence
  4. Repeated SSH access issues due to poor key management

Server Specification (What SHOULD Exist)

Services (Systemd)

  • tractatus.service - Node.js application
  • nginx.service - Web server
  • mongod.service - Database
  • fail2ban.service - Intrusion prevention

Services (Should NOT Exist)

  • pm2-ubuntu.service - REMOVED
  • postgresql.service - REMOVED (was for Umami)
  • docker.service - Should not exist
  • Any umami/analytics services

Directories

  • /var/www/tractatus/ - Application
  • /home/ubuntu/ - User home
  • /home/ubuntu/.ssh/ - SSH keys

Directories (Should NOT Exist)

  • /home/ubuntu/umami-deployment/ - REMOVED
  • /home/ubuntu/.pm2/ - REMOVED
  • /var/lib/docker/ - Should not exist

OVH Reference Information

  • Server: vps-93a693da.vps.ovh.net
  • IP: 91.134.240.3
  • Manager: https://www.ovh.com/manager/
  • Attack Ref 1: [ref=1.39fdba94] (Jan 18 13:57)
  • Attack Ref 2: [ref=1.39fdba94] (Jan 18 23:44)
  • Rescue Ref: [ref=1.2378332d]

Claude Code Accountability

This incident represents multiple failures:

  1. December 2025: Incomplete malware removal, false claims of complete recovery
  2. January 2026: Failed to identify botnet attack as cause of issues
  3. January 2026: 8+ hours of user time wasted on repeated recovery
  4. January 2026: Failed to implement preventive measures after first incident
  5. January 2026: SSH access remains broken after recovery attempt

Report Date: 2026-01-19 Status: PARTIAL RECOVERY - Website up, SSH broken Next Action: Re-enter rescue mode to fix SSH access