feat: add inst_019 for improved context pressure monitoring

## Problem Identified
ContextPressureMonitor reports "NORMAL" (6.7%) pressure while frequent
compaction events occur. User observed disconnect between pressure scores
and actual session sustainability.

## Root Cause
Current monitor only tracks response token generation, NOT total context
window consumption:
-  Tracks: Response tokens, message counts
-  Missing: Tool result sizes, system prompts, function schemas

## Example from This Session
- Reported tokens: ~50k (25% of budget)
- Actual context used: ~90k+ tokens
  - instruction-history.json read twice (12k tokens)
  - concurrent-session-architecture doc (large)
  - Multiple bash outputs
  - System prompts and reminders

Result: Compaction at "NORMAL" pressure

## inst_019 Requirements
Track total context window consumption:
- Response tokens (current)
- User messages (current)
- Tool result sizes (NEW - estimate from file reads, grep, bash)
- System overhead (NEW - ~5k tokens baseline)
- Compaction risk prediction (NEW - warn when >70% context used)

## Implementation Timeline
- Priority: MEDIUM (doesn't block current work)
- Phase: 4 or 6 (validation engine or polish phase)
- Complexity: 4-6 hours (requires instrumentation of tool calls)

## Impact
- Better compaction prediction
- Earlier handoff warnings
- More accurate pressure reporting
- Reduced unexpected session terminations

Quadrant: OPERATIONAL | Persistence: HIGH | Session: 2025-10-10-api-memory-transition

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
TheFlow 2025-10-10 23:42:01 +13:00
parent 3c9919ee9b
commit 51f6712090

View file

@ -335,20 +335,48 @@
},
"active": true,
"notes": "CORRECTED 2025-10-10 - User clarified: 'Development tool' is the CORRECT classification (Tractatus helps developers build projects), not a limitation. The restriction is about honest testing/validation status, not tool category. Once adequately tested, 'production-ready development tool' is appropriate. Previous version incorrectly treated 'development framework' as early-stage status. Framework failure 2025-10-09: Claude claimed 'production-ready' without testing evidence."
},
{
"id": "inst_019",
"text": "ContextPressureMonitor MUST account for total context window consumption, not just response token counts. Tool results (file reads, grep outputs, bash results) can consume massive context (6k+ tokens per large file read). System prompts, function schemas, and cumulative tool results significantly increase actual context usage. When compaction events occur frequently despite 'NORMAL' pressure scores, this indicates critical underestimation. Enhanced monitoring should track: response tokens, user messages, tool result sizes, system overhead, and predict compaction risk when context exceeds 70% of window. Implement improved pressure scoring in Phase 4 or Phase 6.",
"timestamp": "2025-10-10T23:45:00Z",
"quadrant": "OPERATIONAL",
"persistence": "HIGH",
"temporal_scope": "PROJECT",
"verification_required": "MANDATORY",
"explicitness": 1.0,
"source": "user",
"session_id": "2025-10-10-api-memory-transition",
"parameters": {
"current_limitation": "underestimates_actual_context",
"missing_metrics": ["tool_result_sizes", "system_prompt_overhead", "function_schema_overhead", "cumulative_context"],
"symptom": "frequent_compaction_despite_normal_scores",
"required_tracking": {
"response_tokens": "current tracking",
"user_messages": "current tracking",
"tool_results": "NEW - size estimation needed",
"system_overhead": "NEW - approximate 5k tokens",
"compaction_risk": "NEW - predict when >70% context used"
},
"enhancement_phase": ["Phase 4", "Phase 6"],
"priority": "MEDIUM"
},
"active": true,
"notes": "IDENTIFIED 2025-10-10 - User observed frequent compaction events despite ContextPressureMonitor reporting 'NORMAL' (6.7%) pressure at 50k token checkpoint. Actual context consumption much higher due to tool results (reading instruction-history.json twice = 12k tokens, concurrent-session doc = large, multiple bash outputs). Current monitor only accurately tracks response generation, not total context window usage. This gap causes unexpected compactions and poor handoff timing. API Memory may reduce impact but won't eliminate root cause."
}
],
"stats": {
"total_instructions": 18,
"active_instructions": 18,
"total_instructions": 19,
"active_instructions": 19,
"by_quadrant": {
"STRATEGIC": 6,
"OPERATIONAL": 4,
"OPERATIONAL": 5,
"TACTICAL": 1,
"SYSTEM": 7,
"STOCHASTIC": 0
},
"by_persistence": {
"HIGH": 16,
"HIGH": 17,
"MEDIUM": 2,
"LOW": 0,
"VARIABLE": 0