tractatus/docs/plans/REMEDIATION_PLAN_AGENTICGOVERNANCE_20260119.md
TheFlow b302960a61 docs: Complete VPS recovery documentation and attack reference
- Update INCIDENT_RECOVERY_2026-01-19.md with complete recovery status
- Create VPS_RECOVERY_REFERENCE.md with step-by-step recovery guide
- Update remediation plan to show executed status
- Update OVH rescue mode doc with resolution notes

Documents the successful complete reinstall approach after multiple
failed partial cleanup attempts. Includes attack indicators, banned
software list, and verification checklist for future incidents.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 12:06:32 +13:00

13 KiB

Comprehensive Remediation Plan: agenticgovernance.digital

Date: 2026-01-19

Status: EXECUTED SUCCESSFULLY (2026-01-20)

UPDATE 2026-01-20: This remediation plan was successfully executed. Complete VPS reinstallation performed, all systems restored, security hardening applied. See docs/INCIDENT_RECOVERY_2026-01-19.md for full details.


Executive Summary

The agenticgovernance.digital VPS (vps-93a693da.vps.ovh.net) has been compromised three times by the same botnet infrastructure. Each prior "recovery" was incomplete, leaving persistence mechanisms that allowed reinfection.

Recommendation: COMPLETE REINSTALL

Based on security industry best practices and the pattern of recurring compromise, partial cleanup is no longer viable. A complete OS reinstall is the only way to guarantee all malware is removed.


Attack History Analysis

Timeline

Date Attack Root Cause Recovery Status
2025-12-09 83Kpps DNS flood (Exodus botnet via Docker/Umami) Docker container compromise INCOMPLETE - PM2, umami-deployment, cron jobs left
2026-01-18 13:57 171Kpps UDP flood to 15.184.38.247:9007 PM2 resurrected botnet processes INCOMPLETE - SSH broken post-recovery
2026-01-18 23:44 44Kpps UDP flood to 171.225.223.4:80 Continued PM2 persistence Server in rescue mode
2026-01-19 (today) OVH anti-hack triggered again Unknown - likely same persistence CURRENT INCIDENT

What Was Missed in Each Recovery

December 2025 Recovery:

  • Docker removed ✓
  • PM2 process manager NOT removed ✗
  • /home/ubuntu/umami-deployment/ NOT removed ✗
  • Ubuntu crontab NOT cleared ✗
  • PostgreSQL service NOT disabled ✗

January 19 Recovery (earlier today):

  • PM2 removed ✓
  • umami-deployment removed ✓
  • PostgreSQL disabled ✓
  • But server is in rescue mode AGAIN = something else was missed

Malware Profile

Name: Exodus Botnet (Mirai variant) C2 Server: 196.251.100.191 (South Africa) Capabilities:

  • Multi-architecture binaries (x86, x86_64, ARM, MIPS, etc.)
  • UDP/DNS flood attacks
  • Self-replicating via PM2 process manager
  • Persistence through system services

PM2 as Persistence Mechanism:

  • PM2's resurrect feature auto-restarts saved processes on boot
  • Used by modern botnets like NodeCordRAT and Tsundere (2025)
  • Survives manual process termination
  • Creates systemd service (pm2-ubuntu.service)

Why Partial Cleanup Has Failed

Problem 1: Unknown Persistence Mechanisms

Each cleanup identified SOME persistence mechanisms but missed others. There may be:

  • Modified system binaries (rootkits)
  • Kernel modules
  • Hidden cron jobs in unexpected locations
  • Modified init scripts
  • SSH backdoors (could explain broken SSH)

Problem 2: No Baseline for Comparison

Without knowing exactly what should exist on a clean system, we cannot verify complete removal.

Problem 3: Forensic Limitations in Rescue Mode

Rescue mode provides limited visibility into:

  • Runtime state of malware
  • Memory-resident components
  • Kernel-level modifications

Expert Consensus

"Reinstalling a computer after it has been compromised can be a painstaking process, but it is the best way to be certain that everything an attacker left behind has been found." - UC Berkeley Information Security Office

"Rootkits are difficult to remove, and the only 100% sure fire way to remove a rootkit from a device that has been infected is to wipe the device and reinstall the operating system."


Phase 1: Data Backup (From Rescue Mode)

CRITICAL: Before reinstalling, back up essential data:

# 1. Boot into rescue mode via OVH Manager
# 2. Mount main disk
mount /dev/sdb1 /mnt/vps

# 3. Create backup directory
mkdir -p /tmp/tractatus-backup

# 4. Backup application code (verify hashes later)
tar -czf /tmp/tractatus-backup/app.tar.gz /mnt/vps/var/www/tractatus/

# 5. Backup MongoDB data
tar -czf /tmp/tractatus-backup/mongodb.tar.gz /mnt/vps/var/lib/mongodb/

# 6. Backup SSL certificates
tar -czf /tmp/tractatus-backup/ssl.tar.gz /mnt/vps/etc/letsencrypt/

# 7. Backup nginx config (for reference, will recreate)
cp /mnt/vps/etc/nginx/sites-available/tractatus /tmp/tractatus-backup/

# 8. Download backups to local machine
scp -r root@RESCUE_IP:/tmp/tractatus-backup/ ~/tractatus-recovery/

DO NOT BACKUP:

  • /home/ubuntu/.pm2/ (malware)
  • /home/ubuntu/umami-deployment/ (malware)
  • Any executables (may be compromised)
  • /var/lib/docker/ (attack vector)

Phase 2: VPS Reinstallation

  1. Via OVH Manager:

    • Navigate to VPS management
    • Select "Reinstall" option
    • Choose: Ubuntu 22.04 LTS (or latest LTS)
    • Wait for completion (~10 minutes)
  2. Retrieve New Root Password:

    • Check email for new credentials
    • Or use OVH password reset function

Phase 3: Fresh System Setup

Initial SSH Access:

ssh root@91.134.240.3

Step 1: System Updates

apt update && apt upgrade -y

Step 2: Create Non-Root User

adduser ubuntu
usermod -aG sudo ubuntu

Step 3: SSH Hardening

# Add authorized keys
mkdir -p /home/ubuntu/.ssh
cat > /home/ubuntu/.ssh/authorized_keys << 'EOF'
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCZ8BH+Bx4uO9DTatRZ... theflow@the-flow
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPdJcKMabIVQRqKqNIpzxHNgxMZ8NOD+9gVCk6dY5uV0 tractatus-deploy
EOF

chown -R ubuntu:ubuntu /home/ubuntu/.ssh
chmod 700 /home/ubuntu/.ssh
chmod 600 /home/ubuntu/.ssh/authorized_keys

# Harden SSH config
cat > /etc/ssh/sshd_config.d/hardening.conf << 'EOF'
PasswordAuthentication no
PermitRootLogin no
MaxAuthTries 3
LoginGraceTime 20
ClientAliveInterval 300
ClientAliveCountMax 2
EOF

systemctl restart sshd

Step 4: Firewall Configuration

apt install -y ufw
ufw default deny incoming
ufw default allow outgoing
ufw allow 22/tcp comment 'SSH'
ufw allow 80/tcp comment 'HTTP'
ufw allow 443/tcp comment 'HTTPS'
# Block Docker ports (never needed)
ufw deny 2375/tcp comment 'Block Docker API'
ufw deny 2376/tcp comment 'Block Docker TLS'
ufw enable

Step 5: Intrusion Prevention

apt install -y fail2ban
cat > /etc/fail2ban/jail.local << 'EOF'
[sshd]
enabled = true
maxretry = 3
bantime = 24h
findtime = 1h
EOF

systemctl enable fail2ban
systemctl start fail2ban

Step 6: Install Required Software

# Node.js (via NodeSource)
curl -fsSL https://deb.nodesource.com/setup_20.x | bash -
apt install -y nodejs

# MongoDB
curl -fsSL https://pgp.mongodb.com/server-7.0.asc | gpg -o /usr/share/keyrings/mongodb-server-7.0.gpg --dearmor
echo "deb [ signed-by=/usr/share/keyrings/mongodb-server-7.0.gpg ] https://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/7.0 multiverse" > /etc/apt/sources.list.d/mongodb-org-7.0.list
apt update
apt install -y mongodb-org
systemctl enable mongod
systemctl start mongod

# nginx
apt install -y nginx
systemctl enable nginx

# certbot for SSL
apt install -y certbot python3-certbot-nginx

Phase 4: Application Deployment

Step 1: Prepare Application Directory

mkdir -p /var/www/tractatus
chown ubuntu:ubuntu /var/www/tractatus

Step 2: Deploy from CLEAN Local Source

# From local machine - deploy ONLY from verified clean source
cd ~/projects/tractatus
./scripts/deploy.sh --full

Step 3: Restore MongoDB Data

# If data integrity is verified
mongorestore --db tractatus ~/tractatus-recovery/mongodb/tractatus/

Step 4: SSL Certificate

certbot --nginx -d agenticgovernance.digital

Step 5: Create Systemd Service

cat > /etc/systemd/system/tractatus.service << 'EOF'
[Unit]
Description=Tractatus Application
After=network.target mongod.service

[Service]
Type=simple
User=ubuntu
WorkingDirectory=/var/www/tractatus
ExecStart=/usr/bin/node src/server.js
Restart=on-failure
RestartSec=10
Environment=NODE_ENV=production

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable tractatus
systemctl start tractatus

Phase 5: Monitoring Setup

Step 1: Log Rotation for MongoDB

cat > /etc/logrotate.d/mongodb << 'EOF'
/var/log/mongodb/*.log {
    daily
    rotate 7
    compress
    missingok
    notifempty
    sharedscripts
    postrotate
        /bin/kill -SIGUSR1 $(cat /var/lib/mongodb/mongod.lock 2>/dev/null) 2>/dev/null || true
    endscript
}
EOF

Step 2: Install Rootkit Scanner

apt install -y rkhunter chkrootkit lynis

# Run initial scan
rkhunter --update
rkhunter --check --skip-keypress
chkrootkit
lynis audit system

Step 3: Monitoring Script

cat > /usr/local/bin/security-check.sh << 'EOF'
#!/bin/bash
# Daily security check
LOG=/var/log/security-check.log
echo "=== Security Check $(date) ===" >> $LOG

# Check for unauthorized services
systemctl list-units --type=service --state=running | grep -v "systemd\|ssh\|nginx\|mongod\|tractatus\|fail2ban\|ufw" >> $LOG

# Check for unusual network connections
netstat -tlnp | grep -v "127.0.0.1\|mongodb\|node\|nginx" >> $LOG

# Check for PM2 (should never exist)
if command -v pm2 &> /dev/null; then
    echo "WARNING: PM2 DETECTED - SHOULD NOT EXIST" >> $LOG
fi

# Check for Docker (should never exist)
if command -v docker &> /dev/null; then
    echo "WARNING: DOCKER DETECTED - SHOULD NOT EXIST" >> $LOG
fi
EOF
chmod +x /usr/local/bin/security-check.sh

# Add to cron
echo "0 6 * * * root /usr/local/bin/security-check.sh" > /etc/cron.d/security-check

What Must NEVER Exist on This Server

Component Reason
PM2 Used for malware persistence
Docker Attack vector (Umami compromise)
PostgreSQL Only for Umami, not needed
Any analytics containers Attack surface
Node packages outside app Potential supply chain risk

Verification Script:

#!/bin/bash
ALERT=0
if command -v pm2 &> /dev/null; then echo "ALERT: PM2 exists"; ALERT=1; fi
if command -v docker &> /dev/null; then echo "ALERT: Docker exists"; ALERT=1; fi
if [ -d "/home/ubuntu/.pm2" ]; then echo "ALERT: .pm2 directory exists"; ALERT=1; fi
if [ -d "/home/ubuntu/umami-deployment" ]; then echo "ALERT: umami-deployment exists"; ALERT=1; fi
if systemctl is-enabled postgresql &> /dev/null; then echo "ALERT: PostgreSQL enabled"; ALERT=1; fi
if [ $ALERT -eq 0 ]; then echo "Server is clean"; fi

Post-Recovery Verification Checklist

  • SSH access works with key authentication
  • Password authentication is disabled
  • fail2ban is running and banning IPs
  • UFW is enabled with correct rules
  • nginx is serving the site
  • tractatus service is running
  • MongoDB is running and bound to 127.0.0.1
  • SSL certificate is valid
  • No PM2 installed
  • No Docker installed
  • No PostgreSQL installed
  • rkhunter scan is clean
  • chkrootkit scan is clean
  • Log rotation is configured
  • Daily security check cron is active

Credentials to Rotate

After reinstall, rotate all credentials:

  1. MongoDB admin password (if using authentication)
  2. Application secrets in .env
  3. Session secrets
  4. Any API keys

Important: Change passwords from a DIFFERENT machine, not the compromised server.


Long-Term Prevention

  1. Never install Docker - not needed for this application
  2. Never install PM2 - use systemd only
  3. Weekly security scans - rkhunter, chkrootkit
  4. Monitor outbound traffic - alert on unexpected destinations
  5. Keep system updated - enable unattended-upgrades
  6. Review SSH logs weekly - check for brute force patterns

OVH Support Communication Template

Subject: Request to restore VPS to normal mode after reinstallation

Reference: [ref=1.2378332d]
Server: vps-93a693da.vps.ovh.net

We have identified the cause of the anti-hack triggers:
- Compromised Docker container running botnet malware
- PM2 process manager persisting malicious processes

We have completed a full OS reinstall and implemented:
- Hardened SSH configuration (key-only, no root)
- UFW firewall with minimal open ports
- fail2ban for intrusion prevention
- Removal of Docker and PM2

Please restore the VPS to normal boot mode.

Thank you.

Timeline Estimate

Phase Duration
Backup data 30 min
VPS reinstall 10 min
System setup 45 min
Application deployment 30 min
Verification 30 min
Total ~2.5 hours

References


Document Author: Claude Code Date: 2026-01-19 Status: Ready for implementation Next Action: User decision on proceeding with complete reinstall