tractatus/docs/plans/DEGRADATION_SCORE_IMPLEMENTATION.md

# Degradation Score Implementation Plan

**Problem**: Pressure gauge showed 3% but performance severely degraded
**Root Cause**: Missing behavioral/quality metrics
**Framework Audit**: 690964aa9eac658bf5f14cb4

---

## Missing Metrics Identified

### 1. ERROR PATTERN ANALYSIS (30% weight)
- **Consecutive errors**: Track errors in sequence
- **Error clustering**: Detect error bursts (3+ in 10-minute window)
- **Error severity**: Weight by impact (blocked=3, warning=1)
- **Repeated failures**: Same tool/operation failing multiple times

### 2. FRAMEWORK FADE (25% weight)
- **Component staleness**: Time since MetacognitiveVerifier last used
- **BoundaryEnforcer usage**: Should be invoked for values decisions
- **Framework invocation rate**: Declining usage = fade

### 3. CONTEXT QUALITY (20% weight)
- **Post-compaction flag**: Session continued after compaction = quality loss
- **Knowledge domain shift**: Sudden change in task types
- **Session age**: Very long sessions = accumulated drift

### 4. BEHAVIORAL INDICATORS (15% weight)
- **Tool retry rate**: Same tool called 3+ times consecutively
- **Read without action**: Files read but not edited/used
- **Deployment thrashing**: Multiple restarts in short period

### 5. TASK COMPLETION (10% weight)
- **Time since last success**: How long since error-free completion
- **Success rate trend**: Declining completion rate

---

## Implementation

### File: `scripts/framework-components/ContextPressureMonitor.js`

Add new method:

```javascript
/**
 * Calculate degradation score (0-100)
 * Combines behavioral and quality metrics
 */
async calculateDegradationScore() {
  const scores = {
    errorPattern: await this._analyzeErrorPatterns(),      // 30%
    frameworkFade: await this._detectFrameworkFade(),       // 25%
    contextQuality: await this._assessContextQuality(),     // 20%
    behavioral: await this._analyzeBehavior(),              // 15%
    taskCompletion: await this._measureTaskCompletion()     // 10%
  };

  const degradationScore =
    scores.errorPattern * 0.30 +
    scores.frameworkFade * 0.25 +
    scores.contextQuality * 0.20 +
    scores.behavioral * 0.15 +
    scores.taskCompletion * 0.10;

  return {
    score: Math.round(degradationScore),
    level: this._getDegradationLevel(degradationScore),
    breakdown: scores,
    recommendation: this._getRecommendation(degradationScore)
  };
}

/**
 * Analyze error patterns (returns 0-100)
 */
async _analyzeErrorPatterns() {
  const recentErrors = await this.memoryProxy.getRecentAuditLogs({
    limit: 50,
    filter: { hasError: true }
  });

  // Consecutive errors
  let consecutive = 0;
  let maxConsecutive = 0;
  let currentStreak = 0;

  recentErrors.forEach((e, i) => {
    if (e.decision?.blocked || e.decision?.errors) {
      currentStreak++;
      maxConsecutive = Math.max(maxConsecutive, currentStreak);
    } else {
      currentStreak = 0;
    }
  });

  // Error clustering (3+ errors in 10-minute windows)
  const errorClusters = this._detectErrorClusters(recentErrors, 10 * 60 * 1000);

  // Error severity weighting
  const severityScore = recentErrors.reduce((sum, e) => {
    if (e.decision?.blocked) return sum + 3;
    if (e.decision?.errors) return sum + 1;
    return sum;
  }, 0);

  // Combine metrics
  const consecutiveScore = Math.min(maxConsecutive * 10, 100);
  const clusterScore = Math.min(errorClusters.length * 15, 100);
  const severityScoreNormalized = Math.min(severityScore * 2, 100);

  return Math.round((consecutiveScore + clusterScore + severityScoreNormalized) / 3);
}

/**
 * Detect framework fade (returns 0-100)
 */
async _detectFrameworkFade() {
  const criticalComponents = [
    'MetacognitiveVerifier',
    'BoundaryEnforcer',
    'PluralisticDeliberationOrchestrator'
  ];

  const componentActivity = await Promise.all(
    criticalComponents.map(async (service) => {
      const logs = await this.memoryProxy.getRecentAuditLogs({
        limit: 1,
        filter: { service }
      });

      if (logs.length === 0) return { service, ageMinutes: Infinity };

      const age = (Date.now() - logs[0].timestamp) / 1000 / 60;
      return { service, ageMinutes: age };
    })
  );

  // Score: minutes since last use
  // 0-30 min = 0 points
  // 30-60 min = 50 points
  // 60+ min = 100 points
  const scores = componentActivity.map(c => {
    if (c.ageMinutes === Infinity) return 100;
    if (c.ageMinutes < 30) return 0;
    if (c.ageMinutes < 60) return 50;
    return 100;
  });

  return Math.round(scores.reduce((a, b) => a + b, 0) / scores.length);
}

/**
 * Assess context quality (returns 0-100)
 */
async _assessContextQuality() {
  const session = await this.memoryProxy.getSessionState();

  let score = 0;

  // Post-compaction flag (major degradation indicator)
  if (session.autoCompactions && session.autoCompactions.length > 0) {
    const lastCompaction = session.autoCompactions[session.autoCompactions.length - 1];
    const timeSinceCompaction = (Date.now() - lastCompaction.timestamp) / 1000 / 60;

    // Within 60 minutes of compaction = high risk
    if (timeSinceCompaction < 60) {
      score += 60;
    } else if (timeSinceCompaction < 120) {
      score += 30;
    }
  }

  // Session age (very long sessions accumulate drift)
  const sessionAge = (Date.now() - session.startTime) / 1000 / 60 / 60; // hours
  if (sessionAge > 6) score += 40;
  else if (sessionAge > 4) score += 20;

  return Math.min(score, 100);
}

/**
 * Analyze behavioral indicators (returns 0-100)
 */
async _analyzeBehavior() {
  const recentActions = await this.memoryProxy.getRecentAuditLogs({ limit: 50 });

  // Tool retry rate
  const toolCalls = recentActions.map(a => a.metadata?.tool);
  let retries = 0;
  for (let i = 2; i < toolCalls.length; i++) {
    if (toolCalls[i] === toolCalls[i-1] && toolCalls[i] === toolCalls[i-2]) {
      retries++;
    }
  }

  const retryScore = Math.min(retries * 20, 100);
  return retryScore;
}

/**
 * Measure task completion (returns 0-100)
 */
async _measureTaskCompletion() {
  const recentErrors = await this.memoryProxy.getRecentAuditLogs({
    limit: 20,
    filter: { hasError: true }
  });

  // Simple metric: error rate in last 20 actions
  const errorRate = (recentErrors.length / 20) * 100;
  return Math.round(errorRate);
}

/**
 * Get degradation level
 */
_getDegradationLevel(score) {
  if (score >= 60) return 'CRITICAL';
  if (score >= 40) return 'HIGH';
  if (score >= 20) return 'MODERATE';
  return 'LOW';
}

/**
 * Get recommendation
 */
_getRecommendation(score) {
  if (score >= 60) {
    return 'RECOMMEND SESSION RESTART - Quality severely degraded';
  }
  if (score >= 40) {
    return 'WARN USER - Performance declining, consider checkpoint review';
  }
  return 'Monitoring - No action needed';
}
```

---

## Integration Points

### 1. Add to Pressure Analysis

Modify `analyzeContextPressure()` to include degradationScore:

```javascript
async analyzeContextPressure(tokenCount = null, tokenBudget = 200000) {
  // ... existing metrics ...

  const degradation = await this.calculateDegradationScore();

  return {
    level: this._determineLevel(overallScore),
    score: overallScore,
    degradation: degradation.score,
    degradationLevel: degradation.level,
    degradationBreakdown: degradation.breakdown,
    recommendation: degradation.recommendation,
    // ... rest of response
  };
}
```

### 2. Token Checkpoint Reporting

Update checkpoint messages to include degradation:

```
📊 Context Pressure: NORMAL (4%) | Degradation: HIGH (45%) | Tokens: 50000/200000
⚠️  WARNING: Framework fade detected - MetacognitiveVerifier unused for 45 minutes
```

### 3. Framework Stats (ffs)

Add degradation section to `scripts/framework-stats.js`:

```
⚠️  DEGRADATION ANALYSIS
  Score: 45%
  Level: HIGH
  Breakdown:
    • Error patterns: 30%
    • Framework fade: 60%  ← CRITICAL
    • Context quality: 40%
    • Behavioral: 20%
    • Task completion: 15%
  Recommendation: Consider checkpoint review
```

---

## Testing

### Test Case 1: Framework Fade Detection
- Session runs for 2 hours without MetacognitiveVerifier
- Degradation score should be HIGH (40%+)

### Test Case 2: Post-Compaction
- Session continues after compaction
- Context quality score should be 60+
- Overall degradation should be HIGH

### Test Case 3: Error Clustering
- 5 consecutive errors occur
- Error pattern score should be 50+
- User should see warning

---

## Implementation Steps

1. **Add degradation methods** to ContextPressureMonitor.js
2. **Update analyzeContextPressure()** to calculate degradation
3. **Modify checkpoint reporting** to show degradation
4. **Update framework-stats.js** to display breakdown
5. **Test with real session data**
6. **Document in CLAUDE_Tractatus_Maintenance_Guide.md**

---

## Success Criteria

- ✅ Degradation score catches "random" performance drops
- ✅ Framework fade detected within 30 minutes
- ✅ Post-compaction quality loss flagged immediately
- ✅ User warned before performance becomes unacceptable
- ✅ False positive rate < 5%

---

**Estimated Implementation Time**: 4-6 hours
**Priority**: HIGH (governance integrity issue)
**Framework Audit ID**: 690964aa9eac658bf5f14cb4