tractatus/deployment-quickstart/UPTIME_MONITORING_SETUP.md
TheFlow 18bb173c95 feat: add disk monitoring system for dev and production
Add comprehensive disk monitoring with real-time metrics:
- Backend API endpoints for disk/memory metrics (local + remote)
- Admin UI page with CSP-compliant DOM rendering
- Health status indicators with color-coded thresholds
- SSH-based remote metrics collection from OVH VPS
- Auto-refresh every 5 minutes

Backend:
- src/models/DiskMetrics.model.js: Metrics collection model
- src/controllers/diskMetrics.controller.js: 3 admin endpoints
- src/routes/diskMetrics.routes.js: Admin-authenticated routes
- src/routes/index.js: Register disk-metrics routes

Frontend:
- public/admin/disk-monitoring.html: Admin dashboard page
- public/js/admin-disk-monitoring.js: CSP-compliant UI rendering
- public/js/components/navbar-admin.js: Add disk monitoring link

Documentation:
- deployment-quickstart/UPTIME_MONITORING_SETUP.md

API endpoints:
- GET /api/admin/disk-metrics (all systems)
- GET /api/admin/disk-metrics/local (dev system)
- GET /api/admin/disk-metrics/remote (production VPS)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-29 11:53:55 +13:00

5.2 KiB

External Uptime Monitoring Setup Guide

This guide explains how to set up external uptime monitoring for the Tractatus Umami Analytics instance.

Monitored Endpoints

Primary Monitoring Target

  • URL: https://analytics.agenticgovernance.digital/api/heartbeat
  • Expected Response: HTTP 200 OK
  • Purpose: Umami application health check

Secondary Monitoring Targets (Optional)

  • URL: https://agenticgovernance.digital/
  • Expected Response: HTTP 200 OK
  • Purpose: Main website availability

UptimeRobot provides free uptime monitoring with:

  • 50 monitors
  • 5-minute check intervals
  • Email/SMS alerts
  • Status page generation

Setup Instructions

1. Create Account

  1. Visit https://uptimerobot.com
  2. Sign up for a free account
  3. Verify your email address

2. Add Analytics Monitor

  1. Click "Add New Monitor"

  2. Configure:

    • Monitor Type: HTTP(s)
    • Friendly Name: Tractatus Analytics (Umami)
    • URL: https://analytics.agenticgovernance.digital/api/heartbeat
    • Monitoring Interval: 5 minutes
    • Monitor Timeout: 30 seconds
    • HTTP Method: GET
    • Expected Status Code: 200
  3. Click "Create Monitor"

3. Add Main Website Monitor (Optional)

  1. Click "Add New Monitor"

  2. Configure:

    • Monitor Type: HTTP(s)
    • Friendly Name: Tractatus Website
    • URL: https://agenticgovernance.digital/
    • Monitoring Interval: 5 minutes
    • Monitor Timeout: 30 seconds
  3. Click "Create Monitor"

4. Configure Alert Contacts

  1. Go to "My Settings" → "Alert Contacts"
  2. Add email address for alerts
  3. (Optional) Add SMS number for critical alerts
  4. Configure alert preferences:
    • Alert When: Down
    • Alert After: 2 consecutive failures (10 minutes)
    • Re-Alert After: 30 minutes

5. Create Public Status Page (Optional)

  1. Go to "Status Pages"
  2. Click "Add Status Page"
  3. Configure:
    • Title: Tractatus Services Status
    • Custom Domain: (optional) status.agenticgovernance.digital
    • Monitors: Select both monitors
  4. Enable "Show Uptime Percentage"
  5. Enable "Show Response Times"

Alternative Services

Pingdom

Better Uptime

StatusCake

Internal Monitoring (Already Configured)

The following internal monitoring is already set up:

Docker Health Checks

  • Umami Container: curl -f http://localhost:3000/api/heartbeat

    • Interval: 10 seconds
    • Timeout: 5 seconds
    • Retries: 5
  • PostgreSQL Container: pg_isready -U $POSTGRES_USER -d $POSTGRES_DB

    • Interval: 5 seconds
    • Timeout: 5 seconds
    • Retries: 5

Automated Backups

  • Schedule: Daily at 2:00 AM
  • Retention: 7 days
  • Location: ~/umami-backups/
  • Script: ~/umami-deployment/backup-umami-db.sh

Disk Usage Monitoring

  • Schedule: Daily at 3:00 AM
  • Warning Threshold: 80% disk usage
  • Critical Threshold: 90% disk usage
  • Location: ~/umami-backups/disk-monitoring.log
  • Script: ~/umami-deployment/monitor-disk-usage.sh

Verification

To verify monitoring is working:

  1. Check Endpoint Manually:
curl -I https://analytics.agenticgovernance.digital/api/heartbeat
# Should return: HTTP/2 200
  1. Test Alert Flow:

    • Stop Umami container: docker stop tractatus-umami
    • Wait for alert (should arrive within 10 minutes)
    • Restart container: docker start tractatus-umami
    • Verify recovery alert
  2. Check Internal Monitoring:

# View Docker health status
docker ps

# Check backup logs
tail -20 ~/umami-backups/backup.log

# Check disk monitoring logs
tail -20 ~/umami-backups/disk-monitoring.log

Alert Response Procedures

Analytics Down (5+ minutes)

  1. Check Docker container status: docker ps
  2. Check container logs: docker logs tractatus-umami
  3. Check PostgreSQL status: docker logs tractatus-umami-db
  4. If needed, restart: cd ~/umami-deployment && docker compose restart

High Disk Usage (>80%)

  1. Check backup retention: ls -lh ~/umami-backups/
  2. Remove old backups manually if needed
  3. Check PostgreSQL volume: docker exec tractatus-umami-db du -sh /var/lib/postgresql/data
  4. Consider database cleanup or server upgrade

Database Corruption

  1. Stop Umami: docker compose stop umami
  2. Restore from backup: ~/umami-deployment/restore-umami-db.sh ~/umami-backups/umami_backup_YYYYMMDD_HHMMSS.sql.gz
  3. Restart services: docker compose up -d

Next Steps

  • Sign up for UptimeRobot
  • Add analytics.agenticgovernance.digital monitor
  • Configure email alerts
  • Test alert delivery
  • (Optional) Create public status page
  • Document response procedures in team wiki

Maintenance

  • Review monitoring logs monthly
  • Test restore procedure quarterly
  • Update alert contacts when team changes
  • Review disk usage trends monthly

Last Updated: 2025-10-29 Monitoring Status: Internal monitoring active, external monitoring pending user setup