# Degradation Score Implementation Plan **Problem**: Pressure gauge showed 3% but performance severely degraded **Root Cause**: Missing behavioral/quality metrics **Framework Audit**: 690964aa9eac658bf5f14cb4 --- ## Missing Metrics Identified ### 1. ERROR PATTERN ANALYSIS (30% weight) - **Consecutive errors**: Track errors in sequence - **Error clustering**: Detect error bursts (3+ in 10-minute window) - **Error severity**: Weight by impact (blocked=3, warning=1) - **Repeated failures**: Same tool/operation failing multiple times ### 2. FRAMEWORK FADE (25% weight) - **Component staleness**: Time since MetacognitiveVerifier last used - **BoundaryEnforcer usage**: Should be invoked for values decisions - **Framework invocation rate**: Declining usage = fade ### 3. CONTEXT QUALITY (20% weight) - **Post-compaction flag**: Session continued after compaction = quality loss - **Knowledge domain shift**: Sudden change in task types - **Session age**: Very long sessions = accumulated drift ### 4. BEHAVIORAL INDICATORS (15% weight) - **Tool retry rate**: Same tool called 3+ times consecutively - **Read without action**: Files read but not edited/used - **Deployment thrashing**: Multiple restarts in short period ### 5. TASK COMPLETION (10% weight) - **Time since last success**: How long since error-free completion - **Success rate trend**: Declining completion rate --- ## Implementation ### File: `scripts/framework-components/ContextPressureMonitor.js` Add new method: ```javascript /** * Calculate degradation score (0-100) * Combines behavioral and quality metrics */ async calculateDegradationScore() { const scores = { errorPattern: await this._analyzeErrorPatterns(), // 30% frameworkFade: await this._detectFrameworkFade(), // 25% contextQuality: await this._assessContextQuality(), // 20% behavioral: await this._analyzeBehavior(), // 15% taskCompletion: await this._measureTaskCompletion() // 10% }; const degradationScore = scores.errorPattern * 0.30 + scores.frameworkFade * 0.25 + scores.contextQuality * 0.20 + scores.behavioral * 0.15 + scores.taskCompletion * 0.10; return { score: Math.round(degradationScore), level: this._getDegradationLevel(degradationScore), breakdown: scores, recommendation: this._getRecommendation(degradationScore) }; } /** * Analyze error patterns (returns 0-100) */ async _analyzeErrorPatterns() { const recentErrors = await this.memoryProxy.getRecentAuditLogs({ limit: 50, filter: { hasError: true } }); // Consecutive errors let consecutive = 0; let maxConsecutive = 0; let currentStreak = 0; recentErrors.forEach((e, i) => { if (e.decision?.blocked || e.decision?.errors) { currentStreak++; maxConsecutive = Math.max(maxConsecutive, currentStreak); } else { currentStreak = 0; } }); // Error clustering (3+ errors in 10-minute windows) const errorClusters = this._detectErrorClusters(recentErrors, 10 * 60 * 1000); // Error severity weighting const severityScore = recentErrors.reduce((sum, e) => { if (e.decision?.blocked) return sum + 3; if (e.decision?.errors) return sum + 1; return sum; }, 0); // Combine metrics const consecutiveScore = Math.min(maxConsecutive * 10, 100); const clusterScore = Math.min(errorClusters.length * 15, 100); const severityScoreNormalized = Math.min(severityScore * 2, 100); return Math.round((consecutiveScore + clusterScore + severityScoreNormalized) / 3); } /** * Detect framework fade (returns 0-100) */ async _detectFrameworkFade() { const criticalComponents = [ 'MetacognitiveVerifier', 'BoundaryEnforcer', 'PluralisticDeliberationOrchestrator' ]; const componentActivity = await Promise.all( criticalComponents.map(async (service) => { const logs = await this.memoryProxy.getRecentAuditLogs({ limit: 1, filter: { service } }); if (logs.length === 0) return { service, ageMinutes: Infinity }; const age = (Date.now() - logs[0].timestamp) / 1000 / 60; return { service, ageMinutes: age }; }) ); // Score: minutes since last use // 0-30 min = 0 points // 30-60 min = 50 points // 60+ min = 100 points const scores = componentActivity.map(c => { if (c.ageMinutes === Infinity) return 100; if (c.ageMinutes < 30) return 0; if (c.ageMinutes < 60) return 50; return 100; }); return Math.round(scores.reduce((a, b) => a + b, 0) / scores.length); } /** * Assess context quality (returns 0-100) */ async _assessContextQuality() { const session = await this.memoryProxy.getSessionState(); let score = 0; // Post-compaction flag (major degradation indicator) if (session.autoCompactions && session.autoCompactions.length > 0) { const lastCompaction = session.autoCompactions[session.autoCompactions.length - 1]; const timeSinceCompaction = (Date.now() - lastCompaction.timestamp) / 1000 / 60; // Within 60 minutes of compaction = high risk if (timeSinceCompaction < 60) { score += 60; } else if (timeSinceCompaction < 120) { score += 30; } } // Session age (very long sessions accumulate drift) const sessionAge = (Date.now() - session.startTime) / 1000 / 60 / 60; // hours if (sessionAge > 6) score += 40; else if (sessionAge > 4) score += 20; return Math.min(score, 100); } /** * Analyze behavioral indicators (returns 0-100) */ async _analyzeBehavior() { const recentActions = await this.memoryProxy.getRecentAuditLogs({ limit: 50 }); // Tool retry rate const toolCalls = recentActions.map(a => a.metadata?.tool); let retries = 0; for (let i = 2; i < toolCalls.length; i++) { if (toolCalls[i] === toolCalls[i-1] && toolCalls[i] === toolCalls[i-2]) { retries++; } } const retryScore = Math.min(retries * 20, 100); return retryScore; } /** * Measure task completion (returns 0-100) */ async _measureTaskCompletion() { const recentErrors = await this.memoryProxy.getRecentAuditLogs({ limit: 20, filter: { hasError: true } }); // Simple metric: error rate in last 20 actions const errorRate = (recentErrors.length / 20) * 100; return Math.round(errorRate); } /** * Get degradation level */ _getDegradationLevel(score) { if (score >= 60) return 'CRITICAL'; if (score >= 40) return 'HIGH'; if (score >= 20) return 'MODERATE'; return 'LOW'; } /** * Get recommendation */ _getRecommendation(score) { if (score >= 60) { return 'RECOMMEND SESSION RESTART - Quality severely degraded'; } if (score >= 40) { return 'WARN USER - Performance declining, consider checkpoint review'; } return 'Monitoring - No action needed'; } ``` --- ## Integration Points ### 1. Add to Pressure Analysis Modify `analyzeContextPressure()` to include degradationScore: ```javascript async analyzeContextPressure(tokenCount = null, tokenBudget = 200000) { // ... existing metrics ... const degradation = await this.calculateDegradationScore(); return { level: this._determineLevel(overallScore), score: overallScore, degradation: degradation.score, degradationLevel: degradation.level, degradationBreakdown: degradation.breakdown, recommendation: degradation.recommendation, // ... rest of response }; } ``` ### 2. Token Checkpoint Reporting Update checkpoint messages to include degradation: ``` 📊 Context Pressure: NORMAL (4%) | Degradation: HIGH (45%) | Tokens: 50000/200000 ⚠️ WARNING: Framework fade detected - MetacognitiveVerifier unused for 45 minutes ``` ### 3. Framework Stats (ffs) Add degradation section to `scripts/framework-stats.js`: ``` ⚠️ DEGRADATION ANALYSIS Score: 45% Level: HIGH Breakdown: • Error patterns: 30% • Framework fade: 60% ← CRITICAL • Context quality: 40% • Behavioral: 20% • Task completion: 15% Recommendation: Consider checkpoint review ``` --- ## Testing ### Test Case 1: Framework Fade Detection - Session runs for 2 hours without MetacognitiveVerifier - Degradation score should be HIGH (40%+) ### Test Case 2: Post-Compaction - Session continues after compaction - Context quality score should be 60+ - Overall degradation should be HIGH ### Test Case 3: Error Clustering - 5 consecutive errors occur - Error pattern score should be 50+ - User should see warning --- ## Implementation Steps 1. **Add degradation methods** to ContextPressureMonitor.js 2. **Update analyzeContextPressure()** to calculate degradation 3. **Modify checkpoint reporting** to show degradation 4. **Update framework-stats.js** to display breakdown 5. **Test with real session data** 6. **Document in CLAUDE_Tractatus_Maintenance_Guide.md** --- ## Success Criteria - ✅ Degradation score catches "random" performance drops - ✅ Framework fade detected within 30 minutes - ✅ Post-compaction quality loss flagged immediately - ✅ User warned before performance becomes unacceptable - ✅ False positive rate < 5% --- **Estimated Implementation Time**: 4-6 hours **Priority**: HIGH (governance integrity issue) **Framework Audit ID**: 690964aa9eac658bf5f14cb4