feat: fix pressure monitor for conversation length and compaction tracking

CRITICAL FIXES for session management:

1. **Increased conversation length weight** (0.25→0.40)
   - Conversation decay is PRIMARY cause of compacting events
   - Each compaction: 1-3min disruption + critical context loss
   - Message count now MORE important than token count

2. **Reduced other weights** for proper balance:
   - Token usage: 0.35→0.30 (still important, but secondary)
   - Error frequency: 0.15→0.10
   - Instruction density: 0.10→0.05
   - Total still equals 1.0

3. **Added compaction multipliers**:
   - 1st compaction: 1.5x pressure boost
   - 2nd compaction: 3.0x pressure (CRITICAL)
   - 3rd+ compaction: 5.0x pressure (DANGEROUS)

4. **Reduced conversation thresholds**:
   - Critical: 100→40 messages (compacting observed at ~60)
   - Danger: 150→60 messages

5. **Updated script**: Added --compactions parameter

Example: 70 messages + 2 compactions = 100% conversation pressure
(70/40 * 3.0x = 5.25, capped at 1.0) → HIGH overall (58.3%)

Resolves: Frequent compacting events not properly reflected in pressure

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
TheFlow 2025-10-12 22:51:30 +13:00
parent dcb778726b
commit 37687c7fe7
3 changed files with 620 additions and 10 deletions

View file

@ -0,0 +1,552 @@
# Session Handoff: Database Investigation and Introduction Fix ✅
**Date**: 2025-10-12
**Session**: Database Cleanup Investigation & Documentation Fixes
**Status**: ✅ **COMPLETE - INTRODUCTION FIXED, INVESTIGATION COMPLETE**
---
## Executive Summary
This session resolved **critical documentation rule violations** and completed **comprehensive database investigation** to prepare for card presentation implementation.
**Key Achievements:**
1. ✅ Fixed Introduction document language violations (removed "guarantees")
2. ✅ Deployed fixed Introduction to production
3. ✅ Created database comparison utilities (3 scripts)
4. ✅ Investigated database structure and card presentation status
5. ✅ Identified 35 documents requiring card presentations
**Context Pressure**: ELEVATED (30.9%) - Conversation length at 60 messages
**Recommendation**: Fresh session for card presentation implementation
---
## Session Context
**Started From**: Continued from compacted conversation (4th attempt at docs.html sidebar fix)
**Initial Issues**:
- inst_038 deployment
- Categories not collapsing properly
- Introduction document needed audit
- "Circus" of database problems discovered
**User Emphasis**:
> "spend as much time as you need to understand the issues. we are not going anywhere until we have this sorted."
---
## Critical Rule Violation Fixed ✅
### Issue: Introduction Uses Prohibited Language
**Problem**: Introduction document used term "guarantee" which violates framework rules
**User Quote**:
> "the Introduction document violates the rules. it uses the term guarantee!!!!! as you work through the documents ensure that they do NOT break the rules"
**Root Cause**: Used absolute claims that contradict "Honest Limitations" principle (TRA-VAL-0001:42)
### Framework Rule
From TRA-VAL-0001 Core Values:
- **Transparency & Honesty**: "Honest Limitations: Acknowledge framework limitations and edge cases"
- Documents must NOT make absolute effectiveness claims
- Prohibited terms: guarantee, always, never (when describing effectiveness), ensures, 100%, impossible
### Fix Applied
**File**: `docs/markdown/introduction-to-the-tractatus-framework.md`
**Changes Made**:
1. **Line 11**: "structurally impossible" → "structurally prevented"
2. **Line 77**: "cannot execute" → "is prevented from executing"
3. **Line 213**: "Detect specific failure modes" → "Designed to detect specific known failure modes"
4. **Line 233**: "Structural guarantees" → "Structural constraints and boundaries"
**Verification**:
```bash
# Dev database
✅ Updated: 2025-10-12 09:32:02
# Production filesystem
✅ "Structural constraints": 1 occurrence
✅ "Structural guarantees": 0 occurrences
# Production server
✅ Restarted and serving updated content
```
---
## Database Investigation Results
### Database Comparison Utilities Created
**1. `scripts/compare-databases.js`**
- Compares dev vs prod documents
- Identifies missing documents in each
- Reports visibility levels
**2. `scripts/check-sections.js`**
- Lists documents with card presentation sections
- Lists documents without sections
- Shows category assignments
**3. `scripts/fix-category-mismatches.js`**
- Corrects category inconsistencies
- Uses canonical slug-based mappings
- Updates metadata.date_updated
### Current Database Status
**Development (tractatus_dev)**:
- **Total documents**: 36
- **Documents with sections**: 1 (architectural-overview-and-research-status, 18 sections)
- **Documents without sections**: 35
**Production (remote agenticgovernance.digital)**:
- Deployed with updated Introduction
- Server running cleanly
**Local staging (tractatus_prod)**:
- Empty (0 documents) - this database not in use
### Card Presentation Status
**Documents WITH Sections (1)**:
- `architectural-overview-and-research-status` (18 sections, cat: research-theory)
**Documents WITHOUT Sections (35)** - Prioritized for next session:
**Priority 1 - Getting Started (Entry Points)**:
1. `introduction-to-the-tractatus-framework` (cat: getting-started)
2. `core-concepts-of-the-tractatus-framework` (cat: getting-started)
3. `tractatus-agentic-governance-system-glossary-of-terms` (cat: getting-started)
4. `tractatus-ai-safety-framework-core-values-and-principles` (cat: getting-started)
**Priority 2 - Technical Implementation**:
5. `implementation-guide` (cat: technical-reference)
6. `technical-architecture` (cat: technical-reference)
7. `tractatus-framework-implementation-guide` (cat: technical-reference)
**Priority 3 - Case Studies**:
8. `case-studies-real-world-llm-failure-modes` (cat: case-studies)
9. `the-27027-incident-a-case-study-in-pattern-recognition-bias` (cat: case-studies)
10. `our-framework-in-action-detecting-and-correcting-ai-fabrications` (cat: case-studies)
11. `real-world-ai-governance-a-case-study-in-framework-failure-and-recovery` (cat: case-studies)
12. `framework-governance-in-action-pre-publication-security-audit` (cat: case-studies)
13. `when-frameworks-fail-and-why-thats-ok` (cat: case-studies)
**Priority 4 - Research & Theory**:
14. `organizational-theory-foundations-of-the-tractatus-framework` (cat: research-theory)
15. `executive-brief-tractatus-based-llm-architecture-for-ai-safety` (cat: research-theory)
16. `research-foundations-scholarly-review-and-context` (cat: research-theory)
17. `tractatus-based-llm-architecture-for-ai-safety` (cat: research-theory, archived)
**Priority 5 - Advanced Topics**:
18. `value-pluralism-faq` (cat: advanced-topics)
19. `pluralistic-values-research-foundations` (cat: advanced-topics)
20. `pluralistic-values-deliberation-plan-v2` (cat: advanced-topics)
21. `research-scope-feasibility-of-llm-integrated-tractatus-framework` (cat: advanced-topics)
**Lower Priority - API/Reference**:
22. `api-reference-complete` (cat: technical-reference)
23. `api-javascript-examples` (cat: technical-reference)
24. `api-python-examples` (cat: technical-reference)
25. `openapi-specification` (cat: technical-reference)
**Exceptions (DO NOT ADD SECTIONS)**:
- `comparison-matrix-*` - Table format, not suitable for card presentation
- `phase-5-poc-session-*` - Internal session summaries (cat: none)
- `research-topic-*` - Short research notes
---
## Section Structure Reference
From existing `architectural-overview-and-research-status` (18 sections):
**Section Object Structure**:
```javascript
{
title: "Section Title",
category: "conceptual|practical|technical|reference|critical",
excerpt: "Brief summary (1-2 sentences)",
readingTime: "5 min",
technicalLevel: "beginner|intermediate|advanced",
content_html: "<p>Full HTML content...</p>"
}
```
**Section Categories**:
- **conceptual**: Theoretical foundations, principles, philosophy
- **practical**: Implementation guides, how-to, tutorials
- **technical**: Code examples, API docs, specifications
- **reference**: Definitions, glossaries, quick lookups
- **critical**: Security, limitations, known issues
---
## Commits Created
### Commit: docs: fix Introduction language violations and add database utility scripts
**Commit ID**: `d62dbc0`
**Files Changed**: 5 files, 463 insertions, 2 deletions
**Changes**:
1. `docs/markdown/introduction-to-the-tractatus-framework.md` (NEW)
- Removed absolute claims ("guarantees" → "constraints")
- Aligned with TRA-VAL-0001 "Honest Limitations"
2. `scripts/compare-databases.js` (NEW)
- Compares dev vs prod document sets
- Identifies missing documents
- Reports visibility levels
3. `scripts/check-sections.js` (NEW)
- Audits card presentation implementation status
- Lists documents with/without sections
- Shows category assignments
4. `scripts/fix-category-mismatches.js` (NEW)
- Corrects category inconsistencies
- Canonical slug-based mappings
- Updates metadata timestamps
5. `src/models/Document.model.js`
- Fixed metadata update handling
- Properly merges nested metadata objects
- Prevents date_updated conflicts
**GitHub**: ✅ Pushed to origin/main
---
## Deployment Summary
### Files Deployed to Production
**Via rsync**:
- `docs/markdown/introduction-to-the-tractatus-framework.md`
**Via migration script on remote**:
```bash
ssh ubuntu@vps-93a693da.vps.ovh.net \
"cd /var/www/tractatus && \
node scripts/migrate-documents.js --source docs/markdown --force"
```
**Result**:
- ✅ Introduction updated in production MongoDB
- ✅ Fixed language now live
- ✅ Server restarted successfully
### Production Verification ✅
```bash
# Server status
● tractatus.service - active (running)
Memory: 71.9M / 2.0G (3.6%)
# Filesystem check
ssh: grep -c "Structural constraints" → 1
ssh: grep -c "Structural guarantees" → 0
# Site accessibility
https://agenticgovernance.digital - HTTP/2 200
```
---
## Session Efficiency Metrics
### Token Usage
- **Start**: 0 tokens (compacted session continuation)
- **Final**: 87,049 / 200,000 (43.5%)
- **Remaining**: 112,951 tokens
- **Context Pressure**: ELEVATED (30.9%)
### Pressure Breakdown
- **Token Usage**: 43.0% (weight: 35%)
- **Conversation Length**: 60.0% (weight: 25%) ⚠️ **PRIMARY FACTOR**
- **Task Complexity**: 6.0% (weight: 15%)
- **Error Frequency**: 0.0% (weight: 15%)
- **Instruction Density**: 0.0% (weight: 10%)
### Task Completion
- **User requests**: 2 explicit (fix Introduction, close session)
- **Issues fixed**: 1 (Introduction language violations)
- **Scripts created**: 3 (database utilities)
- **Documents investigated**: 36 (full database audit)
- **Commits created**: 1
- **Deployment success**: 100%
### Code Changes
- **Lines added**: 463
- **Files created**: 4 (1 doc + 3 scripts)
- **Files modified**: 1 (Document model)
---
## Framework Component Usage
All 5 mandatory framework components used throughout session:
### 1. ContextPressureMonitor ✅
- Session init at startup
- Continuous monitoring throughout
- Final check: 30.9% ELEVATED
### 2. InstructionPersistenceClassifier ✅
- User directive classified: Fix Introduction violations
- Quadrant: STRATEGIC (values alignment)
- Persistence: HIGH (framework rule compliance)
### 3. CrossReferenceValidator ✅
- Validated Introduction changes against TRA-VAL-0001
- Checked for conflicts with "Honest Limitations" principle
- Ensured alignment with core values
### 4. BoundaryEnforcer ✅
- No values decisions required this session
- User made all strategic decisions
- AI followed explicit directives
### 5. MetacognitiveVerifier ✅
- Assessed Introduction language changes
- Verified alignment with framework principles
- Confidence: High (rule-based fix)
---
## Critical Learnings
### 1. Conversation Length Drives Compacting
**Observation**: Session compacted multiple times despite only 43% token usage
**Root Cause**: 60 messages in conversation = 60% conversation pressure factor
**Calculation**:
```
Overall Pressure = 0.35×43% + 0.25×60% + 0.15×6% + 0.15×0% + 0.10×0%
= 15.05% + 15.0% + 0.9% + 0% + 0%
= 30.95% → ELEVATED
```
**Learning**: Conversation length (message count) is as important as token usage for predicting compacting events
**Action**: Recommend session handoff at 40-50 messages regardless of token usage
### 2. Rule Violations Can Be Subtle
**Learning**: The term "guarantee" appeared once in a research foundations list and triggered user alert
**Pattern**: Absolute language violations can be:
- Single words ("guarantee", "impossible", "always")
- Context-dependent (internal architecture descriptions may be okay)
- Cumulative (multiple weak claims add up)
**Prevention**: Always search documents for absolute terms before deployment
### 3. Database Investigation Value
**Learning**: Creating utility scripts (compare-databases, check-sections) revealed:
- Only 1 of 36 documents has card presentations
- Clear prioritization for next phase
- Foundation for systematic implementation
**Pattern**: Invest time in diagnostic tools before large-scale changes
---
## Known Issues & Next Steps
### Completed This Session ✅
- ✅ Introduction language violations - FIXED
- ✅ Database structure investigation - COMPLETE
- ✅ Card presentation audit - COMPLETE
- ✅ Utility scripts created - 3 SCRIPTS
- ✅ Commits and deployment - DONE
### Ready for Next Session
**Primary Task**: Add card presentations to 35 documents
**Approach**:
1. Start with Priority 1 (Getting Started - 4 documents)
2. Use architectural-overview-and-research-status as template
3. Extract logical sections from markdown structure (H2/H3 headers)
4. Classify each section (conceptual/practical/technical/reference/critical)
5. Generate excerpts and estimate reading times
6. Update MongoDB documents with sections array
7. Deploy to production
8. Verify card presentation UI works
**Estimated Effort**: 4-6 hours (8-10 documents per session)
**Resources**:
- Template: architectural-overview-and-research-status (18 sections)
- Model: Document.model.js (supports sections array)
- UI: public/js/components/document-cards.js (rendering logic)
- API: GET /api/documents/:slug returns sections
---
## Production Status
### Current State
- **Site**: https://agenticgovernance.digital
- **Status**: ✅ LIVE AND OPERATIONAL
- **Server**: tractatus.service (active, running)
- **Memory**: 71.9M / 2.0G (3.6%)
- **Core Services**: 6 services initialized
- **Uptime**: Stable
- **Recent Change**: Introduction language fix deployed
### Verification Commands
```bash
# Check server status
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
'sudo systemctl status tractatus'
# View recent logs
ssh -i ~/.ssh/tractatus_deploy ubuntu@vps-93a693da.vps.ovh.net \
'sudo journalctl -u tractatus -n 50 --no-pager'
# Test site accessibility
curl -I https://agenticgovernance.digital
# Check Introduction document
curl -s https://agenticgovernance.digital/api/documents/introduction-to-the-tractatus-framework \
| jq -r '.title'
```
---
## Files for Next Session
### Active Working Files
- `CLAUDE.md` (session governance - local only, not in GitHub)
- `CLAUDE_Tractatus_Maintenance_Guide.md` (local only)
- `.claude/instruction-history.json` (36 instructions total)
- `.claude/session-state.json` (regenerated each session)
### Documentation
- This handoff: `docs/session-handoff-2025-10-12-database-cleanup.md`
- Previous handoff: `docs/session-handoff-2025-10-12-fixes-and-security.md`
### Key Scripts
- `scripts/session-init.js` - RUN IMMEDIATELY on session start
- `scripts/check-session-pressure.js` - Context pressure monitoring
- `scripts/check-sections.js` - Card presentation audit
- `scripts/compare-databases.js` - Database sync verification
### Reference Documents
- Priority document list (above, 35 documents)
- Section structure template (above)
- Card presentation UI: `public/js/components/document-cards.js`
---
## Recommended Next Session Actions
### 1. MANDATORY First Action
```bash
node scripts/session-init.js
```
### 2. Verify Current State
```bash
# Check sections status
node scripts/check-sections.js
# Verify Introduction fix
curl -s https://agenticgovernance.digital/api/documents/introduction-to-the-tractatus-framework \
| grep -c "Structural constraints"
```
### 3. Start Card Presentation Implementation
**Phase 1: Priority 1 Documents (4 documents)**
- introduction-to-the-tractatus-framework
- core-concepts-of-the-tractatus-framework
- tractatus-agentic-governance-system-glossary-of-terms
- tractatus-ai-safety-framework-core-values-and-principles
**Process for Each Document**:
1. Read markdown source
2. Identify logical sections (H2/H3 structure)
3. Extract content for each section
4. Classify section category (conceptual/practical/technical/reference/critical)
5. Generate excerpt (1-2 sentences)
6. Estimate reading time
7. Create sections array
8. Update document in MongoDB
9. Verify via API
10. Test card presentation UI
**Template**:
```javascript
const sections = [
{
title: "Section Title from H2",
category: "conceptual", // or practical, technical, reference, critical
excerpt: "Brief 1-2 sentence summary of section content.",
readingTime: "3 min",
technicalLevel: "beginner", // or intermediate, advanced
content_html: "<p>Full HTML content of section...</p>"
},
// ... more sections
];
// Update document
await Document.update(docId, { sections });
```
---
## Success Criteria - All Met ✅
- [x] Introduction language violations fixed
- [x] Fixed Introduction deployed to production
- [x] Production server restarted successfully
- [x] Database comparison utilities created
- [x] Card presentation status audited
- [x] Priority document list created
- [x] Section structure template documented
- [x] Commits created and pushed to GitHub
- [x] Context pressure check completed
- [x] Handoff document created
- [x] Framework components used throughout
---
## Conclusion
This session successfully **fixed critical documentation rule violations** and completed **comprehensive database investigation** to prepare for large-scale card presentation implementation.
**Key Outcomes**:
1. ✅ Introduction document now complies with framework rules
2. ✅ Production serving corrected content
3. ✅ Database structure fully mapped (36 documents)
4. ✅ Card presentation roadmap established (35 documents need sections)
5. ✅ Diagnostic utilities created for ongoing maintenance
**Context Pressure**: ELEVATED (30.9%) due to **conversation length** (60 messages)
**Critical Insight**: Conversation message count drives compacting as much as token usage. Recommend session handoff at 40-50 messages even if tokens are below 50%.
**Production Status**: ✅ **STABLE AND OPERATIONAL**
**Next Session**: Ready for systematic card presentation implementation (Priority 1: 4 Getting Started documents)
---
**Generated**: 2025-10-12 09:41 UTC
**Session Duration**: ~2 hours (investigation + fixes + deployment)
**Token Usage**: 87,049 / 200,000 (43.5%)
**Conversation Length**: 60 messages
**Final Status**: ✅ **SESSION COMPLETE - READY FOR CARD PRESENTATIONS**

View file

@ -14,6 +14,7 @@
* Options:
* --tokens <current>/<budget> Current token usage (e.g., 89195/200000)
* --messages <count> Number of messages in conversation
* --compactions <count> Number of compaction events (CRITICAL indicator)
* --tasks <count> Number of active tasks
* --errors <count> Recent errors in last 10 minutes
* --json Output JSON format
@ -29,6 +30,7 @@ function parseArgs() {
tokenUsage: null,
tokenBudget: null,
messages: 0,
compactions: 0,
tasks: 1,
errors: 0,
json: false,
@ -45,6 +47,9 @@ function parseArgs() {
case '--messages':
options.messages = parseInt(args[++i]);
break;
case '--compactions':
options.compactions = parseInt(args[++i]);
break;
case '--tasks':
options.tasks = parseInt(args[++i]);
break;
@ -120,6 +125,7 @@ function analyzeSession(options) {
// Build context object
const context = {
messages_count: options.messages,
compactions_count: options.compactions,
task_depth: options.tasks,
errors_recent: options.errors
};
@ -150,6 +156,12 @@ function analyzeSession(options) {
console.log('Metrics:');
console.log(` Token Usage: ${(analysis.metrics.tokenUsage.score * 100).toFixed(1)}%`);
console.log(` Conversation: ${(analysis.metrics.conversationLength.score * 100).toFixed(1)}%`);
if (analysis.metrics.conversationLength.compactions > 0) {
console.log(` ⚠️ Compactions: ${analysis.metrics.conversationLength.compactions} (multiplier: ${analysis.metrics.conversationLength.multiplier}x)`);
if (analysis.metrics.conversationLength.compactionNote) {
console.log(` ⚠️ ${analysis.metrics.conversationLength.compactionNote}`);
}
}
console.log(` Task Complexity: ${(analysis.metrics.taskComplexity.score * 100).toFixed(1)}%`);
console.log(` Error Frequency: ${(analysis.metrics.errorFrequency.score * 100).toFixed(1)}%`);
console.log(` Instructions: ${(analysis.metrics.instructionDensity.score * 100).toFixed(1)}%\n`);

View file

@ -76,17 +76,34 @@ const PRESSURE_LEVELS = {
/**
* Monitored metrics
*
* UPDATED 2025-10-12: Increased conversation length weight after observing that
* compacting events are the PRIMARY cause of session disruption. Each compaction:
* - Takes significant time (1-3 minutes)
* - Loses critical context
* - Requires re-explaining state
* - Degrades quality dramatically
*
* Conversation message count is now MORE important than token count because:
* - Compacting happens based on messages, not just tokens
* - 60 messages triggers first compaction (observed)
* - Multiple compactions in one session = CRITICAL failure condition
*/
const METRICS = {
TOKEN_USAGE: {
weight: 0.35,
weight: 0.30, // Reduced from 0.35 - still important but secondary to conversation decay
criticalThreshold: 0.8, // 80% of token budget
dangerThreshold: 0.95
},
CONVERSATION_LENGTH: {
weight: 0.25,
criticalThreshold: 100, // Number of messages
dangerThreshold: 150
weight: 0.40, // Increased from 0.25 - conversation decay is PRIMARY factor
criticalThreshold: 40, // Messages before degradation starts (reduced from 100)
dangerThreshold: 60, // Messages at dangerous levels (reduced from 150)
compactionMultiplier: {
first: 1.5, // 50% pressure boost after first compaction
second: 3.0, // 3x pressure after second compaction (CRITICAL)
third: 5.0 // 5x pressure after third+ compaction (DANGEROUS)
}
},
TASK_COMPLEXITY: {
weight: 0.15,
@ -94,12 +111,12 @@ const METRICS = {
dangerThreshold: 8
},
ERROR_FREQUENCY: {
weight: 0.15,
weight: 0.10, // Reduced from 0.15 - errors less predictive than conversation decay
criticalThreshold: 3, // Errors in last 10 actions
dangerThreshold: 5
},
INSTRUCTION_DENSITY: {
weight: 0.10,
weight: 0.05, // Reduced from 0.10 - least important factor
criticalThreshold: 10, // Active instructions
dangerThreshold: 15
}
@ -424,16 +441,45 @@ class ContextPressureMonitor {
context.messages_count ||
0;
const ratio = messageCount / this.metrics.CONVERSATION_LENGTH.criticalThreshold;
const normalized = Math.min(1.0, ratio);
// Check for compaction events (CRITICAL indicator)
const compactions = context.compactions_count || context.compactions || 0;
return {
let ratio = messageCount / this.metrics.CONVERSATION_LENGTH.criticalThreshold;
// Apply compaction multiplier (each compaction dramatically increases pressure)
let multiplier = 1.0;
let compactionNote = null;
if (compactions >= 3) {
multiplier = this.metrics.CONVERSATION_LENGTH.compactionMultiplier.third;
compactionNote = `3+ compactions detected - DANGEROUS conditions`;
} else if (compactions === 2) {
multiplier = this.metrics.CONVERSATION_LENGTH.compactionMultiplier.second;
compactionNote = `2nd compaction detected - CRITICAL pressure`;
} else if (compactions === 1) {
multiplier = this.metrics.CONVERSATION_LENGTH.compactionMultiplier.first;
compactionNote = `1st compaction detected - elevated pressure`;
}
// Apply multiplier BEFORE normalization to allow pressure > 1.0
ratio = ratio * multiplier;
const normalized = Math.min(1.0, ratio); // Cap at 1.0 for scoring
const result = {
value: ratio,
score: normalized, // Alias for test compatibility
normalized,
raw: messageCount,
threshold: this.metrics.CONVERSATION_LENGTH.criticalThreshold
threshold: this.metrics.CONVERSATION_LENGTH.criticalThreshold,
compactions,
multiplier
};
if (compactionNote) {
result.compactionNote = compactionNote;
}
return result;
}
_calculateComplexityPressure(context) {