TheFlow b5d17f9dbc feat: Add performance degradation detection to context pressure monitoring

Implements 5-metric weighted degradation score to detect performance issues:
- Error patterns (30%): Consecutive errors, clustering, severity
- Framework fade (25%): Component staleness detection
- Context quality (20%): Post-compaction degradation, session age
- Behavioral indicators (15%): Tool retry patterns
- Task completion (10%): Recent error rate

Degradation levels: LOW (<20%), MODERATE (20-40%), HIGH (40-60%), CRITICAL (60%+)

Displayed in 'ffs' command output with breakdown and recommendations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-04 16:30:13 +13:00

9 KiB

Raw Permalink Blame History

Degradation Score Implementation Plan

Problem: Pressure gauge showed 3% but performance severely degraded Root Cause: Missing behavioral/quality metrics Framework Audit: 690964aa9eac658bf5f14cb4

Missing Metrics Identified

1. ERROR PATTERN ANALYSIS (30% weight)

Consecutive errors: Track errors in sequence
Error clustering: Detect error bursts (3+ in 10-minute window)
Error severity: Weight by impact (blocked=3, warning=1)
Repeated failures: Same tool/operation failing multiple times

2. FRAMEWORK FADE (25% weight)

Component staleness: Time since MetacognitiveVerifier last used
BoundaryEnforcer usage: Should be invoked for values decisions
Framework invocation rate: Declining usage = fade

3. CONTEXT QUALITY (20% weight)

Post-compaction flag: Session continued after compaction = quality loss
Knowledge domain shift: Sudden change in task types
Session age: Very long sessions = accumulated drift

4. BEHAVIORAL INDICATORS (15% weight)

Tool retry rate: Same tool called 3+ times consecutively
Read without action: Files read but not edited/used
Deployment thrashing: Multiple restarts in short period

5. TASK COMPLETION (10% weight)

Time since last success: How long since error-free completion
Success rate trend: Declining completion rate

Implementation

File: `scripts/framework-components/ContextPressureMonitor.js`

Add new method:

/**
 * Calculate degradation score (0-100)
 * Combines behavioral and quality metrics
 */
async calculateDegradationScore() {
  const scores = {
    errorPattern: await this._analyzeErrorPatterns(),      // 30%
    frameworkFade: await this._detectFrameworkFade(),       // 25%
    contextQuality: await this._assessContextQuality(),     // 20%
    behavioral: await this._analyzeBehavior(),              // 15%
    taskCompletion: await this._measureTaskCompletion()     // 10%
  };

  const degradationScore =
    scores.errorPattern * 0.30 +
    scores.frameworkFade * 0.25 +
    scores.contextQuality * 0.20 +
    scores.behavioral * 0.15 +
    scores.taskCompletion * 0.10;

  return {
    score: Math.round(degradationScore),
    level: this._getDegradationLevel(degradationScore),
    breakdown: scores,
    recommendation: this._getRecommendation(degradationScore)
  };
}

/**
 * Analyze error patterns (returns 0-100)
 */
async _analyzeErrorPatterns() {
  const recentErrors = await this.memoryProxy.getRecentAuditLogs({
    limit: 50,
    filter: { hasError: true }
  });

  // Consecutive errors
  let consecutive = 0;
  let maxConsecutive = 0;
  let currentStreak = 0;

  recentErrors.forEach((e, i) => {
    if (e.decision?.blocked || e.decision?.errors) {
      currentStreak++;
      maxConsecutive = Math.max(maxConsecutive, currentStreak);
    } else {
      currentStreak = 0;
    }
  });

  // Error clustering (3+ errors in 10-minute windows)
  const errorClusters = this._detectErrorClusters(recentErrors, 10 * 60 * 1000);

  // Error severity weighting
  const severityScore = recentErrors.reduce((sum, e) => {
    if (e.decision?.blocked) return sum + 3;
    if (e.decision?.errors) return sum + 1;
    return sum;
  }, 0);

  // Combine metrics
  const consecutiveScore = Math.min(maxConsecutive * 10, 100);
  const clusterScore = Math.min(errorClusters.length * 15, 100);
  const severityScoreNormalized = Math.min(severityScore * 2, 100);

  return Math.round((consecutiveScore + clusterScore + severityScoreNormalized) / 3);
}

/**
 * Detect framework fade (returns 0-100)
 */
async _detectFrameworkFade() {
  const criticalComponents = [
    'MetacognitiveVerifier',
    'BoundaryEnforcer',
    'PluralisticDeliberationOrchestrator'
  ];

  const componentActivity = await Promise.all(
    criticalComponents.map(async (service) => {
      const logs = await this.memoryProxy.getRecentAuditLogs({
        limit: 1,
        filter: { service }
      });

      if (logs.length === 0) return { service, ageMinutes: Infinity };

      const age = (Date.now() - logs[0].timestamp) / 1000 / 60;
      return { service, ageMinutes: age };
    })
  );

  // Score: minutes since last use
  // 0-30 min = 0 points
  // 30-60 min = 50 points
  // 60+ min = 100 points
  const scores = componentActivity.map(c => {
    if (c.ageMinutes === Infinity) return 100;
    if (c.ageMinutes < 30) return 0;
    if (c.ageMinutes < 60) return 50;
    return 100;
  });

  return Math.round(scores.reduce((a, b) => a + b, 0) / scores.length);
}

/**
 * Assess context quality (returns 0-100)
 */
async _assessContextQuality() {
  const session = await this.memoryProxy.getSessionState();

  let score = 0;

  // Post-compaction flag (major degradation indicator)
  if (session.autoCompactions && session.autoCompactions.length > 0) {
    const lastCompaction = session.autoCompactions[session.autoCompactions.length - 1];
    const timeSinceCompaction = (Date.now() - lastCompaction.timestamp) / 1000 / 60;

    // Within 60 minutes of compaction = high risk
    if (timeSinceCompaction < 60) {
      score += 60;
    } else if (timeSinceCompaction < 120) {
      score += 30;
    }
  }

  // Session age (very long sessions accumulate drift)
  const sessionAge = (Date.now() - session.startTime) / 1000 / 60 / 60; // hours
  if (sessionAge > 6) score += 40;
  else if (sessionAge > 4) score += 20;

  return Math.min(score, 100);
}

/**
 * Analyze behavioral indicators (returns 0-100)
 */
async _analyzeBehavior() {
  const recentActions = await this.memoryProxy.getRecentAuditLogs({ limit: 50 });

  // Tool retry rate
  const toolCalls = recentActions.map(a => a.metadata?.tool);
  let retries = 0;
  for (let i = 2; i < toolCalls.length; i++) {
    if (toolCalls[i] === toolCalls[i-1] && toolCalls[i] === toolCalls[i-2]) {
      retries++;
    }
  }

  const retryScore = Math.min(retries * 20, 100);
  return retryScore;
}

/**
 * Measure task completion (returns 0-100)
 */
async _measureTaskCompletion() {
  const recentErrors = await this.memoryProxy.getRecentAuditLogs({
    limit: 20,
    filter: { hasError: true }
  });

  // Simple metric: error rate in last 20 actions
  const errorRate = (recentErrors.length / 20) * 100;
  return Math.round(errorRate);
}

/**
 * Get degradation level
 */
_getDegradationLevel(score) {
  if (score >= 60) return 'CRITICAL';
  if (score >= 40) return 'HIGH';
  if (score >= 20) return 'MODERATE';
  return 'LOW';
}

/**
 * Get recommendation
 */
_getRecommendation(score) {
  if (score >= 60) {
    return 'RECOMMEND SESSION RESTART - Quality severely degraded';
  }
  if (score >= 40) {
    return 'WARN USER - Performance declining, consider checkpoint review';
  }
  return 'Monitoring - No action needed';
}

Integration Points

1. Add to Pressure Analysis

Modify analyzeContextPressure() to include degradationScore:

async analyzeContextPressure(tokenCount = null, tokenBudget = 200000) {
  // ... existing metrics ...

  const degradation = await this.calculateDegradationScore();

  return {
    level: this._determineLevel(overallScore),
    score: overallScore,
    degradation: degradation.score,
    degradationLevel: degradation.level,
    degradationBreakdown: degradation.breakdown,
    recommendation: degradation.recommendation,
    // ... rest of response
  };
}

2. Token Checkpoint Reporting

Update checkpoint messages to include degradation:

📊 Context Pressure: NORMAL (4%) | Degradation: HIGH (45%) | Tokens: 50000/200000
⚠️  WARNING: Framework fade detected - MetacognitiveVerifier unused for 45 minutes

3. Framework Stats (ffs)

Add degradation section to scripts/framework-stats.js:

⚠️  DEGRADATION ANALYSIS
  Score: 45%
  Level: HIGH
  Breakdown:
    • Error patterns: 30%
    • Framework fade: 60%  ← CRITICAL
    • Context quality: 40%
    • Behavioral: 20%
    • Task completion: 15%
  Recommendation: Consider checkpoint review

Testing

Test Case 1: Framework Fade Detection

Session runs for 2 hours without MetacognitiveVerifier
Degradation score should be HIGH (40%+)

Test Case 2: Post-Compaction

Session continues after compaction
Context quality score should be 60+
Overall degradation should be HIGH

Test Case 3: Error Clustering

5 consecutive errors occur
Error pattern score should be 50+
User should see warning

Implementation Steps

Add degradation methods to ContextPressureMonitor.js
Update analyzeContextPressure() to calculate degradation
Modify checkpoint reporting to show degradation
Update framework-stats.js to display breakdown
Test with real session data
Document in CLAUDE_Tractatus_Maintenance_Guide.md

Success Criteria

✅ Degradation score catches "random" performance drops
✅ Framework fade detected within 30 minutes
✅ Post-compaction quality loss flagged immediately
✅ User warned before performance becomes unacceptable
✅ False positive rate < 5%

Estimated Implementation Time: 4-6 hours Priority: HIGH (governance integrity issue) Framework Audit ID: 690964aa9eac658bf5f14cb4

9 KiB Raw Permalink Blame History