Add comprehensive disk monitoring with real-time metrics: - Backend API endpoints for disk/memory metrics (local + remote) - Admin UI page with CSP-compliant DOM rendering - Health status indicators with color-coded thresholds - SSH-based remote metrics collection from OVH VPS - Auto-refresh every 5 minutes Backend: - src/models/DiskMetrics.model.js: Metrics collection model - src/controllers/diskMetrics.controller.js: 3 admin endpoints - src/routes/diskMetrics.routes.js: Admin-authenticated routes - src/routes/index.js: Register disk-metrics routes Frontend: - public/admin/disk-monitoring.html: Admin dashboard page - public/js/admin-disk-monitoring.js: CSP-compliant UI rendering - public/js/components/navbar-admin.js: Add disk monitoring link Documentation: - deployment-quickstart/UPTIME_MONITORING_SETUP.md API endpoints: - GET /api/admin/disk-metrics (all systems) - GET /api/admin/disk-metrics/local (dev system) - GET /api/admin/disk-metrics/remote (production VPS) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
186 lines
5.2 KiB
Markdown
186 lines
5.2 KiB
Markdown
# External Uptime Monitoring Setup Guide
|
|
|
|
This guide explains how to set up external uptime monitoring for the Tractatus Umami Analytics instance.
|
|
|
|
## Monitored Endpoints
|
|
|
|
### Primary Monitoring Target
|
|
- **URL**: `https://analytics.agenticgovernance.digital/api/heartbeat`
|
|
- **Expected Response**: HTTP 200 OK
|
|
- **Purpose**: Umami application health check
|
|
|
|
### Secondary Monitoring Targets (Optional)
|
|
- **URL**: `https://agenticgovernance.digital/`
|
|
- **Expected Response**: HTTP 200 OK
|
|
- **Purpose**: Main website availability
|
|
|
|
## Recommended Service: UptimeRobot (Free Tier)
|
|
|
|
UptimeRobot provides free uptime monitoring with:
|
|
- 50 monitors
|
|
- 5-minute check intervals
|
|
- Email/SMS alerts
|
|
- Status page generation
|
|
|
|
### Setup Instructions
|
|
|
|
#### 1. Create Account
|
|
1. Visit https://uptimerobot.com
|
|
2. Sign up for a free account
|
|
3. Verify your email address
|
|
|
|
#### 2. Add Analytics Monitor
|
|
1. Click "Add New Monitor"
|
|
2. Configure:
|
|
- **Monitor Type**: HTTP(s)
|
|
- **Friendly Name**: `Tractatus Analytics (Umami)`
|
|
- **URL**: `https://analytics.agenticgovernance.digital/api/heartbeat`
|
|
- **Monitoring Interval**: 5 minutes
|
|
- **Monitor Timeout**: 30 seconds
|
|
- **HTTP Method**: GET
|
|
- **Expected Status Code**: 200
|
|
|
|
3. Click "Create Monitor"
|
|
|
|
#### 3. Add Main Website Monitor (Optional)
|
|
1. Click "Add New Monitor"
|
|
2. Configure:
|
|
- **Monitor Type**: HTTP(s)
|
|
- **Friendly Name**: `Tractatus Website`
|
|
- **URL**: `https://agenticgovernance.digital/`
|
|
- **Monitoring Interval**: 5 minutes
|
|
- **Monitor Timeout**: 30 seconds
|
|
|
|
3. Click "Create Monitor"
|
|
|
|
#### 4. Configure Alert Contacts
|
|
1. Go to "My Settings" → "Alert Contacts"
|
|
2. Add email address for alerts
|
|
3. (Optional) Add SMS number for critical alerts
|
|
4. Configure alert preferences:
|
|
- **Alert When**: Down
|
|
- **Alert After**: 2 consecutive failures (10 minutes)
|
|
- **Re-Alert After**: 30 minutes
|
|
|
|
#### 5. Create Public Status Page (Optional)
|
|
1. Go to "Status Pages"
|
|
2. Click "Add Status Page"
|
|
3. Configure:
|
|
- **Title**: Tractatus Services Status
|
|
- **Custom Domain**: (optional) status.agenticgovernance.digital
|
|
- **Monitors**: Select both monitors
|
|
4. Enable "Show Uptime Percentage"
|
|
5. Enable "Show Response Times"
|
|
|
|
## Alternative Services
|
|
|
|
### Pingdom
|
|
- **Free Tier**: 1 monitor
|
|
- **Check Interval**: 1 minute
|
|
- **URL**: https://www.pingdom.com
|
|
|
|
### Better Uptime
|
|
- **Free Tier**: 10 monitors
|
|
- **Check Interval**: 3 minutes
|
|
- **URL**: https://betteruptime.com
|
|
|
|
### StatusCake
|
|
- **Free Tier**: 10 monitors
|
|
- **Check Interval**: 5 minutes
|
|
- **URL**: https://www.statuscake.com
|
|
|
|
## Internal Monitoring (Already Configured)
|
|
|
|
The following internal monitoring is already set up:
|
|
|
|
### Docker Health Checks
|
|
- **Umami Container**: `curl -f http://localhost:3000/api/heartbeat`
|
|
- Interval: 10 seconds
|
|
- Timeout: 5 seconds
|
|
- Retries: 5
|
|
|
|
- **PostgreSQL Container**: `pg_isready -U $POSTGRES_USER -d $POSTGRES_DB`
|
|
- Interval: 5 seconds
|
|
- Timeout: 5 seconds
|
|
- Retries: 5
|
|
|
|
### Automated Backups
|
|
- **Schedule**: Daily at 2:00 AM
|
|
- **Retention**: 7 days
|
|
- **Location**: `~/umami-backups/`
|
|
- **Script**: `~/umami-deployment/backup-umami-db.sh`
|
|
|
|
### Disk Usage Monitoring
|
|
- **Schedule**: Daily at 3:00 AM
|
|
- **Warning Threshold**: 80% disk usage
|
|
- **Critical Threshold**: 90% disk usage
|
|
- **Location**: `~/umami-backups/disk-monitoring.log`
|
|
- **Script**: `~/umami-deployment/monitor-disk-usage.sh`
|
|
|
|
## Verification
|
|
|
|
To verify monitoring is working:
|
|
|
|
1. **Check Endpoint Manually**:
|
|
```bash
|
|
curl -I https://analytics.agenticgovernance.digital/api/heartbeat
|
|
# Should return: HTTP/2 200
|
|
```
|
|
|
|
2. **Test Alert Flow**:
|
|
- Stop Umami container: `docker stop tractatus-umami`
|
|
- Wait for alert (should arrive within 10 minutes)
|
|
- Restart container: `docker start tractatus-umami`
|
|
- Verify recovery alert
|
|
|
|
3. **Check Internal Monitoring**:
|
|
```bash
|
|
# View Docker health status
|
|
docker ps
|
|
|
|
# Check backup logs
|
|
tail -20 ~/umami-backups/backup.log
|
|
|
|
# Check disk monitoring logs
|
|
tail -20 ~/umami-backups/disk-monitoring.log
|
|
```
|
|
|
|
## Alert Response Procedures
|
|
|
|
### Analytics Down (5+ minutes)
|
|
1. Check Docker container status: `docker ps`
|
|
2. Check container logs: `docker logs tractatus-umami`
|
|
3. Check PostgreSQL status: `docker logs tractatus-umami-db`
|
|
4. If needed, restart: `cd ~/umami-deployment && docker compose restart`
|
|
|
|
### High Disk Usage (>80%)
|
|
1. Check backup retention: `ls -lh ~/umami-backups/`
|
|
2. Remove old backups manually if needed
|
|
3. Check PostgreSQL volume: `docker exec tractatus-umami-db du -sh /var/lib/postgresql/data`
|
|
4. Consider database cleanup or server upgrade
|
|
|
|
### Database Corruption
|
|
1. Stop Umami: `docker compose stop umami`
|
|
2. Restore from backup: `~/umami-deployment/restore-umami-db.sh ~/umami-backups/umami_backup_YYYYMMDD_HHMMSS.sql.gz`
|
|
3. Restart services: `docker compose up -d`
|
|
|
|
## Next Steps
|
|
|
|
- [ ] Sign up for UptimeRobot
|
|
- [ ] Add analytics.agenticgovernance.digital monitor
|
|
- [ ] Configure email alerts
|
|
- [ ] Test alert delivery
|
|
- [ ] (Optional) Create public status page
|
|
- [ ] Document response procedures in team wiki
|
|
|
|
## Maintenance
|
|
|
|
- Review monitoring logs monthly
|
|
- Test restore procedure quarterly
|
|
- Update alert contacts when team changes
|
|
- Review disk usage trends monthly
|
|
|
|
---
|
|
|
|
**Last Updated**: 2025-10-29
|
|
**Monitoring Status**: Internal monitoring active, external monitoring pending user setup
|