Implements 5-metric weighted degradation score to detect performance issues: - Error patterns (30%): Consecutive errors, clustering, severity - Framework fade (25%): Component staleness detection - Context quality (20%): Post-compaction degradation, session age - Behavioral indicators (15%): Tool retry patterns - Task completion (10%): Recent error rate Degradation levels: LOW (<20%), MODERATE (20-40%), HIGH (40-60%), CRITICAL (60%+) Displayed in 'ffs' command output with breakdown and recommendations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
333 lines
9 KiB
Markdown
333 lines
9 KiB
Markdown
# Degradation Score Implementation Plan
|
|
|
|
**Problem**: Pressure gauge showed 3% but performance severely degraded
|
|
**Root Cause**: Missing behavioral/quality metrics
|
|
**Framework Audit**: 690964aa9eac658bf5f14cb4
|
|
|
|
---
|
|
|
|
## Missing Metrics Identified
|
|
|
|
### 1. ERROR PATTERN ANALYSIS (30% weight)
|
|
- **Consecutive errors**: Track errors in sequence
|
|
- **Error clustering**: Detect error bursts (3+ in 10-minute window)
|
|
- **Error severity**: Weight by impact (blocked=3, warning=1)
|
|
- **Repeated failures**: Same tool/operation failing multiple times
|
|
|
|
### 2. FRAMEWORK FADE (25% weight)
|
|
- **Component staleness**: Time since MetacognitiveVerifier last used
|
|
- **BoundaryEnforcer usage**: Should be invoked for values decisions
|
|
- **Framework invocation rate**: Declining usage = fade
|
|
|
|
### 3. CONTEXT QUALITY (20% weight)
|
|
- **Post-compaction flag**: Session continued after compaction = quality loss
|
|
- **Knowledge domain shift**: Sudden change in task types
|
|
- **Session age**: Very long sessions = accumulated drift
|
|
|
|
### 4. BEHAVIORAL INDICATORS (15% weight)
|
|
- **Tool retry rate**: Same tool called 3+ times consecutively
|
|
- **Read without action**: Files read but not edited/used
|
|
- **Deployment thrashing**: Multiple restarts in short period
|
|
|
|
### 5. TASK COMPLETION (10% weight)
|
|
- **Time since last success**: How long since error-free completion
|
|
- **Success rate trend**: Declining completion rate
|
|
|
|
---
|
|
|
|
## Implementation
|
|
|
|
### File: `scripts/framework-components/ContextPressureMonitor.js`
|
|
|
|
Add new method:
|
|
|
|
```javascript
|
|
/**
|
|
* Calculate degradation score (0-100)
|
|
* Combines behavioral and quality metrics
|
|
*/
|
|
async calculateDegradationScore() {
|
|
const scores = {
|
|
errorPattern: await this._analyzeErrorPatterns(), // 30%
|
|
frameworkFade: await this._detectFrameworkFade(), // 25%
|
|
contextQuality: await this._assessContextQuality(), // 20%
|
|
behavioral: await this._analyzeBehavior(), // 15%
|
|
taskCompletion: await this._measureTaskCompletion() // 10%
|
|
};
|
|
|
|
const degradationScore =
|
|
scores.errorPattern * 0.30 +
|
|
scores.frameworkFade * 0.25 +
|
|
scores.contextQuality * 0.20 +
|
|
scores.behavioral * 0.15 +
|
|
scores.taskCompletion * 0.10;
|
|
|
|
return {
|
|
score: Math.round(degradationScore),
|
|
level: this._getDegradationLevel(degradationScore),
|
|
breakdown: scores,
|
|
recommendation: this._getRecommendation(degradationScore)
|
|
};
|
|
}
|
|
|
|
/**
|
|
* Analyze error patterns (returns 0-100)
|
|
*/
|
|
async _analyzeErrorPatterns() {
|
|
const recentErrors = await this.memoryProxy.getRecentAuditLogs({
|
|
limit: 50,
|
|
filter: { hasError: true }
|
|
});
|
|
|
|
// Consecutive errors
|
|
let consecutive = 0;
|
|
let maxConsecutive = 0;
|
|
let currentStreak = 0;
|
|
|
|
recentErrors.forEach((e, i) => {
|
|
if (e.decision?.blocked || e.decision?.errors) {
|
|
currentStreak++;
|
|
maxConsecutive = Math.max(maxConsecutive, currentStreak);
|
|
} else {
|
|
currentStreak = 0;
|
|
}
|
|
});
|
|
|
|
// Error clustering (3+ errors in 10-minute windows)
|
|
const errorClusters = this._detectErrorClusters(recentErrors, 10 * 60 * 1000);
|
|
|
|
// Error severity weighting
|
|
const severityScore = recentErrors.reduce((sum, e) => {
|
|
if (e.decision?.blocked) return sum + 3;
|
|
if (e.decision?.errors) return sum + 1;
|
|
return sum;
|
|
}, 0);
|
|
|
|
// Combine metrics
|
|
const consecutiveScore = Math.min(maxConsecutive * 10, 100);
|
|
const clusterScore = Math.min(errorClusters.length * 15, 100);
|
|
const severityScoreNormalized = Math.min(severityScore * 2, 100);
|
|
|
|
return Math.round((consecutiveScore + clusterScore + severityScoreNormalized) / 3);
|
|
}
|
|
|
|
/**
|
|
* Detect framework fade (returns 0-100)
|
|
*/
|
|
async _detectFrameworkFade() {
|
|
const criticalComponents = [
|
|
'MetacognitiveVerifier',
|
|
'BoundaryEnforcer',
|
|
'PluralisticDeliberationOrchestrator'
|
|
];
|
|
|
|
const componentActivity = await Promise.all(
|
|
criticalComponents.map(async (service) => {
|
|
const logs = await this.memoryProxy.getRecentAuditLogs({
|
|
limit: 1,
|
|
filter: { service }
|
|
});
|
|
|
|
if (logs.length === 0) return { service, ageMinutes: Infinity };
|
|
|
|
const age = (Date.now() - logs[0].timestamp) / 1000 / 60;
|
|
return { service, ageMinutes: age };
|
|
})
|
|
);
|
|
|
|
// Score: minutes since last use
|
|
// 0-30 min = 0 points
|
|
// 30-60 min = 50 points
|
|
// 60+ min = 100 points
|
|
const scores = componentActivity.map(c => {
|
|
if (c.ageMinutes === Infinity) return 100;
|
|
if (c.ageMinutes < 30) return 0;
|
|
if (c.ageMinutes < 60) return 50;
|
|
return 100;
|
|
});
|
|
|
|
return Math.round(scores.reduce((a, b) => a + b, 0) / scores.length);
|
|
}
|
|
|
|
/**
|
|
* Assess context quality (returns 0-100)
|
|
*/
|
|
async _assessContextQuality() {
|
|
const session = await this.memoryProxy.getSessionState();
|
|
|
|
let score = 0;
|
|
|
|
// Post-compaction flag (major degradation indicator)
|
|
if (session.autoCompactions && session.autoCompactions.length > 0) {
|
|
const lastCompaction = session.autoCompactions[session.autoCompactions.length - 1];
|
|
const timeSinceCompaction = (Date.now() - lastCompaction.timestamp) / 1000 / 60;
|
|
|
|
// Within 60 minutes of compaction = high risk
|
|
if (timeSinceCompaction < 60) {
|
|
score += 60;
|
|
} else if (timeSinceCompaction < 120) {
|
|
score += 30;
|
|
}
|
|
}
|
|
|
|
// Session age (very long sessions accumulate drift)
|
|
const sessionAge = (Date.now() - session.startTime) / 1000 / 60 / 60; // hours
|
|
if (sessionAge > 6) score += 40;
|
|
else if (sessionAge > 4) score += 20;
|
|
|
|
return Math.min(score, 100);
|
|
}
|
|
|
|
/**
|
|
* Analyze behavioral indicators (returns 0-100)
|
|
*/
|
|
async _analyzeBehavior() {
|
|
const recentActions = await this.memoryProxy.getRecentAuditLogs({ limit: 50 });
|
|
|
|
// Tool retry rate
|
|
const toolCalls = recentActions.map(a => a.metadata?.tool);
|
|
let retries = 0;
|
|
for (let i = 2; i < toolCalls.length; i++) {
|
|
if (toolCalls[i] === toolCalls[i-1] && toolCalls[i] === toolCalls[i-2]) {
|
|
retries++;
|
|
}
|
|
}
|
|
|
|
const retryScore = Math.min(retries * 20, 100);
|
|
return retryScore;
|
|
}
|
|
|
|
/**
|
|
* Measure task completion (returns 0-100)
|
|
*/
|
|
async _measureTaskCompletion() {
|
|
const recentErrors = await this.memoryProxy.getRecentAuditLogs({
|
|
limit: 20,
|
|
filter: { hasError: true }
|
|
});
|
|
|
|
// Simple metric: error rate in last 20 actions
|
|
const errorRate = (recentErrors.length / 20) * 100;
|
|
return Math.round(errorRate);
|
|
}
|
|
|
|
/**
|
|
* Get degradation level
|
|
*/
|
|
_getDegradationLevel(score) {
|
|
if (score >= 60) return 'CRITICAL';
|
|
if (score >= 40) return 'HIGH';
|
|
if (score >= 20) return 'MODERATE';
|
|
return 'LOW';
|
|
}
|
|
|
|
/**
|
|
* Get recommendation
|
|
*/
|
|
_getRecommendation(score) {
|
|
if (score >= 60) {
|
|
return 'RECOMMEND SESSION RESTART - Quality severely degraded';
|
|
}
|
|
if (score >= 40) {
|
|
return 'WARN USER - Performance declining, consider checkpoint review';
|
|
}
|
|
return 'Monitoring - No action needed';
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Integration Points
|
|
|
|
### 1. Add to Pressure Analysis
|
|
|
|
Modify `analyzeContextPressure()` to include degradationScore:
|
|
|
|
```javascript
|
|
async analyzeContextPressure(tokenCount = null, tokenBudget = 200000) {
|
|
// ... existing metrics ...
|
|
|
|
const degradation = await this.calculateDegradationScore();
|
|
|
|
return {
|
|
level: this._determineLevel(overallScore),
|
|
score: overallScore,
|
|
degradation: degradation.score,
|
|
degradationLevel: degradation.level,
|
|
degradationBreakdown: degradation.breakdown,
|
|
recommendation: degradation.recommendation,
|
|
// ... rest of response
|
|
};
|
|
}
|
|
```
|
|
|
|
### 2. Token Checkpoint Reporting
|
|
|
|
Update checkpoint messages to include degradation:
|
|
|
|
```
|
|
📊 Context Pressure: NORMAL (4%) | Degradation: HIGH (45%) | Tokens: 50000/200000
|
|
⚠️ WARNING: Framework fade detected - MetacognitiveVerifier unused for 45 minutes
|
|
```
|
|
|
|
### 3. Framework Stats (ffs)
|
|
|
|
Add degradation section to `scripts/framework-stats.js`:
|
|
|
|
```
|
|
⚠️ DEGRADATION ANALYSIS
|
|
Score: 45%
|
|
Level: HIGH
|
|
Breakdown:
|
|
• Error patterns: 30%
|
|
• Framework fade: 60% ← CRITICAL
|
|
• Context quality: 40%
|
|
• Behavioral: 20%
|
|
• Task completion: 15%
|
|
Recommendation: Consider checkpoint review
|
|
```
|
|
|
|
---
|
|
|
|
## Testing
|
|
|
|
### Test Case 1: Framework Fade Detection
|
|
- Session runs for 2 hours without MetacognitiveVerifier
|
|
- Degradation score should be HIGH (40%+)
|
|
|
|
### Test Case 2: Post-Compaction
|
|
- Session continues after compaction
|
|
- Context quality score should be 60+
|
|
- Overall degradation should be HIGH
|
|
|
|
### Test Case 3: Error Clustering
|
|
- 5 consecutive errors occur
|
|
- Error pattern score should be 50+
|
|
- User should see warning
|
|
|
|
---
|
|
|
|
## Implementation Steps
|
|
|
|
1. **Add degradation methods** to ContextPressureMonitor.js
|
|
2. **Update analyzeContextPressure()** to calculate degradation
|
|
3. **Modify checkpoint reporting** to show degradation
|
|
4. **Update framework-stats.js** to display breakdown
|
|
5. **Test with real session data**
|
|
6. **Document in CLAUDE_Tractatus_Maintenance_Guide.md**
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
- ✅ Degradation score catches "random" performance drops
|
|
- ✅ Framework fade detected within 30 minutes
|
|
- ✅ Post-compaction quality loss flagged immediately
|
|
- ✅ User warned before performance becomes unacceptable
|
|
- ✅ False positive rate < 5%
|
|
|
|
---
|
|
|
|
**Estimated Implementation Time**: 4-6 hours
|
|
**Priority**: HIGH (governance integrity issue)
|
|
**Framework Audit ID**: 690964aa9eac658bf5f14cb4
|