tractatus/docs/plans/DEGRADATION_SCORE_IMPLEMENTATION.md
TheFlow b5d17f9dbc feat: Add performance degradation detection to context pressure monitoring
Implements 5-metric weighted degradation score to detect performance issues:
- Error patterns (30%): Consecutive errors, clustering, severity
- Framework fade (25%): Component staleness detection
- Context quality (20%): Post-compaction degradation, session age
- Behavioral indicators (15%): Tool retry patterns
- Task completion (10%): Recent error rate

Degradation levels: LOW (<20%), MODERATE (20-40%), HIGH (40-60%), CRITICAL (60%+)

Displayed in 'ffs' command output with breakdown and recommendations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-04 16:30:13 +13:00

333 lines
9 KiB
Markdown

# Degradation Score Implementation Plan
**Problem**: Pressure gauge showed 3% but performance severely degraded
**Root Cause**: Missing behavioral/quality metrics
**Framework Audit**: 690964aa9eac658bf5f14cb4
---
## Missing Metrics Identified
### 1. ERROR PATTERN ANALYSIS (30% weight)
- **Consecutive errors**: Track errors in sequence
- **Error clustering**: Detect error bursts (3+ in 10-minute window)
- **Error severity**: Weight by impact (blocked=3, warning=1)
- **Repeated failures**: Same tool/operation failing multiple times
### 2. FRAMEWORK FADE (25% weight)
- **Component staleness**: Time since MetacognitiveVerifier last used
- **BoundaryEnforcer usage**: Should be invoked for values decisions
- **Framework invocation rate**: Declining usage = fade
### 3. CONTEXT QUALITY (20% weight)
- **Post-compaction flag**: Session continued after compaction = quality loss
- **Knowledge domain shift**: Sudden change in task types
- **Session age**: Very long sessions = accumulated drift
### 4. BEHAVIORAL INDICATORS (15% weight)
- **Tool retry rate**: Same tool called 3+ times consecutively
- **Read without action**: Files read but not edited/used
- **Deployment thrashing**: Multiple restarts in short period
### 5. TASK COMPLETION (10% weight)
- **Time since last success**: How long since error-free completion
- **Success rate trend**: Declining completion rate
---
## Implementation
### File: `scripts/framework-components/ContextPressureMonitor.js`
Add new method:
```javascript
/**
* Calculate degradation score (0-100)
* Combines behavioral and quality metrics
*/
async calculateDegradationScore() {
const scores = {
errorPattern: await this._analyzeErrorPatterns(), // 30%
frameworkFade: await this._detectFrameworkFade(), // 25%
contextQuality: await this._assessContextQuality(), // 20%
behavioral: await this._analyzeBehavior(), // 15%
taskCompletion: await this._measureTaskCompletion() // 10%
};
const degradationScore =
scores.errorPattern * 0.30 +
scores.frameworkFade * 0.25 +
scores.contextQuality * 0.20 +
scores.behavioral * 0.15 +
scores.taskCompletion * 0.10;
return {
score: Math.round(degradationScore),
level: this._getDegradationLevel(degradationScore),
breakdown: scores,
recommendation: this._getRecommendation(degradationScore)
};
}
/**
* Analyze error patterns (returns 0-100)
*/
async _analyzeErrorPatterns() {
const recentErrors = await this.memoryProxy.getRecentAuditLogs({
limit: 50,
filter: { hasError: true }
});
// Consecutive errors
let consecutive = 0;
let maxConsecutive = 0;
let currentStreak = 0;
recentErrors.forEach((e, i) => {
if (e.decision?.blocked || e.decision?.errors) {
currentStreak++;
maxConsecutive = Math.max(maxConsecutive, currentStreak);
} else {
currentStreak = 0;
}
});
// Error clustering (3+ errors in 10-minute windows)
const errorClusters = this._detectErrorClusters(recentErrors, 10 * 60 * 1000);
// Error severity weighting
const severityScore = recentErrors.reduce((sum, e) => {
if (e.decision?.blocked) return sum + 3;
if (e.decision?.errors) return sum + 1;
return sum;
}, 0);
// Combine metrics
const consecutiveScore = Math.min(maxConsecutive * 10, 100);
const clusterScore = Math.min(errorClusters.length * 15, 100);
const severityScoreNormalized = Math.min(severityScore * 2, 100);
return Math.round((consecutiveScore + clusterScore + severityScoreNormalized) / 3);
}
/**
* Detect framework fade (returns 0-100)
*/
async _detectFrameworkFade() {
const criticalComponents = [
'MetacognitiveVerifier',
'BoundaryEnforcer',
'PluralisticDeliberationOrchestrator'
];
const componentActivity = await Promise.all(
criticalComponents.map(async (service) => {
const logs = await this.memoryProxy.getRecentAuditLogs({
limit: 1,
filter: { service }
});
if (logs.length === 0) return { service, ageMinutes: Infinity };
const age = (Date.now() - logs[0].timestamp) / 1000 / 60;
return { service, ageMinutes: age };
})
);
// Score: minutes since last use
// 0-30 min = 0 points
// 30-60 min = 50 points
// 60+ min = 100 points
const scores = componentActivity.map(c => {
if (c.ageMinutes === Infinity) return 100;
if (c.ageMinutes < 30) return 0;
if (c.ageMinutes < 60) return 50;
return 100;
});
return Math.round(scores.reduce((a, b) => a + b, 0) / scores.length);
}
/**
* Assess context quality (returns 0-100)
*/
async _assessContextQuality() {
const session = await this.memoryProxy.getSessionState();
let score = 0;
// Post-compaction flag (major degradation indicator)
if (session.autoCompactions && session.autoCompactions.length > 0) {
const lastCompaction = session.autoCompactions[session.autoCompactions.length - 1];
const timeSinceCompaction = (Date.now() - lastCompaction.timestamp) / 1000 / 60;
// Within 60 minutes of compaction = high risk
if (timeSinceCompaction < 60) {
score += 60;
} else if (timeSinceCompaction < 120) {
score += 30;
}
}
// Session age (very long sessions accumulate drift)
const sessionAge = (Date.now() - session.startTime) / 1000 / 60 / 60; // hours
if (sessionAge > 6) score += 40;
else if (sessionAge > 4) score += 20;
return Math.min(score, 100);
}
/**
* Analyze behavioral indicators (returns 0-100)
*/
async _analyzeBehavior() {
const recentActions = await this.memoryProxy.getRecentAuditLogs({ limit: 50 });
// Tool retry rate
const toolCalls = recentActions.map(a => a.metadata?.tool);
let retries = 0;
for (let i = 2; i < toolCalls.length; i++) {
if (toolCalls[i] === toolCalls[i-1] && toolCalls[i] === toolCalls[i-2]) {
retries++;
}
}
const retryScore = Math.min(retries * 20, 100);
return retryScore;
}
/**
* Measure task completion (returns 0-100)
*/
async _measureTaskCompletion() {
const recentErrors = await this.memoryProxy.getRecentAuditLogs({
limit: 20,
filter: { hasError: true }
});
// Simple metric: error rate in last 20 actions
const errorRate = (recentErrors.length / 20) * 100;
return Math.round(errorRate);
}
/**
* Get degradation level
*/
_getDegradationLevel(score) {
if (score >= 60) return 'CRITICAL';
if (score >= 40) return 'HIGH';
if (score >= 20) return 'MODERATE';
return 'LOW';
}
/**
* Get recommendation
*/
_getRecommendation(score) {
if (score >= 60) {
return 'RECOMMEND SESSION RESTART - Quality severely degraded';
}
if (score >= 40) {
return 'WARN USER - Performance declining, consider checkpoint review';
}
return 'Monitoring - No action needed';
}
```
---
## Integration Points
### 1. Add to Pressure Analysis
Modify `analyzeContextPressure()` to include degradationScore:
```javascript
async analyzeContextPressure(tokenCount = null, tokenBudget = 200000) {
// ... existing metrics ...
const degradation = await this.calculateDegradationScore();
return {
level: this._determineLevel(overallScore),
score: overallScore,
degradation: degradation.score,
degradationLevel: degradation.level,
degradationBreakdown: degradation.breakdown,
recommendation: degradation.recommendation,
// ... rest of response
};
}
```
### 2. Token Checkpoint Reporting
Update checkpoint messages to include degradation:
```
📊 Context Pressure: NORMAL (4%) | Degradation: HIGH (45%) | Tokens: 50000/200000
⚠️ WARNING: Framework fade detected - MetacognitiveVerifier unused for 45 minutes
```
### 3. Framework Stats (ffs)
Add degradation section to `scripts/framework-stats.js`:
```
⚠️ DEGRADATION ANALYSIS
Score: 45%
Level: HIGH
Breakdown:
• Error patterns: 30%
• Framework fade: 60% ← CRITICAL
• Context quality: 40%
• Behavioral: 20%
• Task completion: 15%
Recommendation: Consider checkpoint review
```
---
## Testing
### Test Case 1: Framework Fade Detection
- Session runs for 2 hours without MetacognitiveVerifier
- Degradation score should be HIGH (40%+)
### Test Case 2: Post-Compaction
- Session continues after compaction
- Context quality score should be 60+
- Overall degradation should be HIGH
### Test Case 3: Error Clustering
- 5 consecutive errors occur
- Error pattern score should be 50+
- User should see warning
---
## Implementation Steps
1. **Add degradation methods** to ContextPressureMonitor.js
2. **Update analyzeContextPressure()** to calculate degradation
3. **Modify checkpoint reporting** to show degradation
4. **Update framework-stats.js** to display breakdown
5. **Test with real session data**
6. **Document in CLAUDE_Tractatus_Maintenance_Guide.md**
---
## Success Criteria
- ✅ Degradation score catches "random" performance drops
- ✅ Framework fade detected within 30 minutes
- ✅ Post-compaction quality loss flagged immediately
- ✅ User warned before performance becomes unacceptable
- ✅ False positive rate < 5%
---
**Estimated Implementation Time**: 4-6 hours
**Priority**: HIGH (governance integrity issue)
**Framework Audit ID**: 690964aa9eac658bf5f14cb4