fix: CrossReferenceValidator 100% - prohibition & preference detection

Fixed 2 failing CrossReferenceValidator tests by improving InstructionPersistenceClassifier:

1. **Prohibition Detection (Test #1)**
   - Added HIGH persistence for explicit prohibitions
   - Patterns: "not X", "never X", "don't use X", "avoid X"
   - Example: "use React, not Vue" → HIGH (was LOW)
   - Enables semantic conflict detection in CrossReferenceValidator

2. **Preference Language (Test #2)**
   - Added "prefer" to MEDIUM persistence indicators
   - Patterns: "prefer to", "prefer using", "try to", "aim to"
   - Example: "prefer using async/await" → MEDIUM (was HIGH)
   - Prevents over-aggressive rejection for soft preferences

**Impact:**
- CrossReferenceValidator: 26/28 → 28/28 (92.9% → 100%)
- Overall coverage: 168/192 → 170/192 (87.5% → 88.5%)
- +2 tests, +1.0% coverage

**Changes:**
- src/services/InstructionPersistenceClassifier.service.js:
  - Added prohibition pattern detection in _calculatePersistence()
  - Enhanced preference language patterns

**Root Cause:**
Previous session's CrossReferenceValidator enhancements expected HIGH
persistence for prohibitions, but classifier wasn't recognizing them.

**Validation:**
All 28 CrossReferenceValidator tests passing
No regressions in other services

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
TheFlow 2025-10-07 10:03:56 +13:00
parent 0eec32c1b2
commit 9ca462db39
2 changed files with 420 additions and 2 deletions

View file

@ -0,0 +1,411 @@
# Session Handoff: CrossReferenceValidator Debugging
**Date**: 2025-10-07
**Session**: Part 3 - Test Coverage Push & CrossReferenceValidator
**Status**: ⚠️ HALTED - DANGEROUS Pressure (95%)
**Overall Coverage**: 87.5% (168/192 tests)
---
## 🛑 Session Halted by Framework
**ContextPressureMonitor**: DANGEROUS (95%)
- **Reason**: Conversation length (95 messages)
- **Action**: Mandatory halt per Tractatus governance
- **User**: Acknowledged and authorized completion of handoff
---
## 🎯 Session Objectives - 2 of 3 Achieved
| Objective | Target | Final | Status |
|-----------|--------|-------|--------|
| InstructionPersistenceClassifier | 95%+ | **100%** | ✅ EXCEEDED |
| ContextPressureMonitor | 75%+ | **76.1%** | ✅ ACHIEVED |
| MetacognitiveVerifier | 70%+ | **73.2%** | ✅ ACHIEVED |
| CrossReferenceValidator | 100% | 92.9% | ⚠️ IN PROGRESS |
---
## 📊 Test Coverage Progress
**Session Start**: 77.6% (149/192)
**Session End**: 87.5% (168/192)
**Improvement**: +9.9% (+19 tests)
### Service Breakdown
| Service | Start | End | Change | Status |
|---------|-------|-----|--------|--------|
| **InstructionPersistenceClassifier** | 85.3% | **100%** | +14.7% | ✅✅✅ PERFECT |
| **BoundaryEnforcer** | 100% | **100%** | - | ✅✅✅ PERFECT |
| **ContextPressureMonitor** | 60.9% | **76.1%** | +15.2% | ✅✅ Very Good |
| **MetacognitiveVerifier** | 61.0% | **73.2%** | +12.2% | ✅✅ Very Good |
| **CrossReferenceValidator** | 96.4% | **92.9%** | -3.5% | ⚠️ **WIP** |
---
## ⚠️ CrossReferenceValidator: 2 Tests Failing
**Current Status**: 26/28 tests passing (92.9%)
**Note**: Temporarily degraded from 96.4% during enhancement work
### Failing Test #1: Parameter Conflict Detection
**Test**: `should detect parameter conflicts between action and instruction`
**Test Code**:
```javascript
const instruction = classifier.classify({
text: 'use React for the frontend, not Vue',
context: {},
source: 'user'
});
const action = {
type: 'install_package',
description: 'Install Vue.js framework',
parameters: {
package: 'vue',
framework: 'vue'
}
};
const result = validator.validate(action, { recent_instructions: [instruction] });
expect(result.status).toBe('REJECTED'); // FAILS - gets APPROVED
expect(result.conflicts.length).toBeGreaterThan(0);
```
**Expected**: Status = 'REJECTED', conflicts detected
**Actual**: Status = 'APPROVED', no conflicts detected
**Problem**: The instruction says "use React, not Vue", but the semantic conflict detection isn't catching the conflict between the action parameters (package: 'vue', framework: 'vue') and the prohibition in the instruction text.
**What Was Tried**:
1. Added semantic prohibition detection in `_checkConflict()`
2. Added patterns: `/\bnot\s+(\w+)/gi`, `/\bnever\s+(\w+)/gi`, etc.
3. Limited to HIGH persistence instructions only
4. Set prohibition conflicts to CRITICAL severity
**Why It's Still Failing**:
The prohibition pattern `/\bnot\s+(\w+)/gi` matches "not Vue" and extracts "Vue", but the parameter values are lowercase "vue". The pattern matching needs to be case-insensitive AND the instruction needs to be classified with HIGH persistence.
**Next Steps**:
1. Check if instruction is actually classified as HIGH persistence
2. Verify parameter extraction is working
3. Test the prohibition pattern matching logic
4. May need to extract "React" and "Vue" as framework parameters from instruction text
---
### Failing Test #2: WARNING Severity Assignment
**Test**: `should assign WARNING severity to conflicts with MEDIUM persistence instructions`
**Test Code**:
```javascript
const instruction = classifier.classify({
text: 'prefer using async/await over callbacks',
context: {},
source: 'user'
});
const action = {
type: 'code_generation',
description: 'Generate function with callbacks',
parameters: {
pattern: 'callback'
}
};
const result = validator.validate(action, { recent_instructions: [instruction] });
expect(['WARNING', 'APPROVED']).toContain(result.status); // FAILS - gets REJECTED
```
**Expected**: Status = 'WARNING' or 'APPROVED'
**Actual**: Status = 'REJECTED'
**Problem**: The instruction likely has HIGH persistence (not MEDIUM), causing CRITICAL severity conflict, which leads to REJECTED instead of WARNING.
**What Was Changed**:
```javascript
// In _determineConflictSeverity:
if (persistence === 'HIGH') {
return CONFLICT_SEVERITY.CRITICAL; // Changed from WARNING
}
```
This made ALL HIGH persistence conflicts CRITICAL, which may be too strict.
**Why It's Failing**:
The instruction "prefer using async/await over callbacks" is probably being classified as HIGH persistence instead of MEDIUM, because it contains strong language ("prefer").
**Next Steps**:
1. Check actual persistence classification of the instruction
2. If it's HIGH, adjust the test expectation
3. If it should be MEDIUM, adjust InstructionPersistenceClassifier
4. Consider adding "prefer" as a MEDIUM persistence indicator
---
## 🔧 Changes Made to CrossReferenceValidator
### File: `src/services/CrossReferenceValidator.service.js`
#### 1. Added Semantic Prohibition Detection
```javascript
// Check for semantic conflicts (prohibitions in instruction text)
if (instruction.persistence === 'HIGH') {
const prohibitionPatterns = [
/\bnot\s+(\w+)/gi,
/don't\s+use\s+(\w+)/gi,
/\bavoid\s+(\w+)/gi,
/\bnever\s+(\w+)/gi
];
// Detects "not X", "never X", etc. and treats as CRITICAL conflicts
}
```
#### 2. Enhanced Conflict Severity Logic
```javascript
// HIGH persistence alone now returns CRITICAL (was WARNING)
if (persistence === 'HIGH') {
return CONFLICT_SEVERITY.CRITICAL;
}
// Added 'confirmed' to critical parameters
const criticalParams = ['port', 'database', 'host', 'url', 'confirmed'];
```
---
## 💾 Commits Created
### 1. `6102412` - InstructionPersistenceClassifier + ContextPressureMonitor
- InstructionPersistenceClassifier: 85.3% → 100%
- ContextPressureMonitor: 60.9% → 76.1%
- Overall: 77.6% → 84.9%
### 2. `2299dc7` - MetacognitiveVerifier Improvements
- MetacognitiveVerifier: 63.4% → 73.2%
- Overall: 84.9% → 87.5%
### 3. `cd747df` - CrossReferenceValidator WIP (UNCOMMITTED WORK POSSIBLE)
- Added semantic prohibition detection
- Enhanced severity logic
- Status: 2 tests still failing
---
## 📋 Next Session Action Plan
### Immediate (First 15 minutes)
1. **Verify Git Status**
```bash
git status
```
2. **Run CrossReferenceValidator Tests**
```bash
npx jest tests/unit/CrossReferenceValidator.test.js --verbose
```
3. **Debug Failing Test #1** (Parameter Conflicts)
```bash
# Test the instruction classification
node -e "
const classifier = require('./src/services/InstructionPersistenceClassifier.service.js');
const result = classifier.classify({
text: 'use React for the frontend, not Vue',
context: {},
source: 'user'
});
console.log('Persistence:', result.persistence);
console.log('Parameters:', result.parameters);
"
```
4. **Check Parameter Extraction**
```bash
# Verify what parameters are extracted from instruction
# and from action
```
### Short-term (Next 30 minutes)
1. **Fix Test #1**: Parameter conflict detection
- Ensure instruction is HIGH persistence
- Verify prohibition pattern matching
- Test case-insensitive matching
- May need to extract "React" and "Vue" as framework parameters
2. **Fix Test #2**: WARNING severity
- Check persistence of "prefer using" instruction
- Adjust test expectation OR classifier logic
- Consider adding "prefer" as MEDIUM indicator
3. **Verify No Regressions**
```bash
npx jest tests/unit/CrossReferenceValidator.test.js
# Should show 28/28 passing
```
4. **Run Full Test Suite**
```bash
npm run test:unit
# Verify 168+ tests passing
```
### Medium-term (If time permits)
1. **Push Overall Coverage to 90%+**
- Current: 87.5% (168/192)
- Target: 90%+ (173/192)
- Need: +5 tests
2. **Remaining Opportunities**:
- ContextPressureMonitor: 11 tests remaining
- MetacognitiveVerifier: 11 tests remaining
- Pick easiest wins
---
## 🔍 Debugging Commands
### Check Instruction Classification
```bash
node -e "
const classifier = require('./src/services/InstructionPersistenceClassifier.service.js');
const inst = classifier.classify({
text: 'use React for the frontend, not Vue',
context: {}, source: 'user'
});
console.log(JSON.stringify(inst, null, 2));
"
```
### Check Parameter Extraction
```bash
node -e "
const validator = require('./src/services/CrossReferenceValidator.service.js');
const action = {
type: 'install_package',
description: 'Install Vue.js framework',
parameters: { package: 'vue', framework: 'vue' }
};
const params = validator._extractActionParameters(action);
console.log('Extracted params:', params);
"
```
### Run Single Test with Full Output
```bash
npx jest tests/unit/CrossReferenceValidator.test.js \
-t "should detect parameter conflicts" \
--verbose --no-coverage
```
---
## 🎓 Key Learnings This Session
### 1. Tractatus Framework Self-Governance Works
- ContextPressureMonitor correctly detected DANGEROUS conditions
- Mandatory halt triggered at 95% conversation pressure
- Framework dogfooding validated in real use
### 2. Multi-Factor Pressure is Critical
- Conversation length (95 messages) was primary risk factor
- Token usage was moderate (56.9%)
- System correctly prioritized conversation attention decay
### 3. Semantic Conflict Detection is Complex
- Parameter-level conflicts are straightforward
- Text-based prohibition detection requires:
- HIGH persistence filtering
- Case-insensitive matching
- Context-aware pattern extraction
- Framework/library name recognition
### 4. Test-Driven Fixes Can Temporarily Break Things
- CrossReferenceValidator went from 96.4% → 92.9% during enhancement
- Normal when adding new features
- Acceptable as WIP if documented and committed separately
---
## ⚠️ Important Notes for Next Session
### Git Status
- All work committed (3 commits)
- Working directory should be clean
- Branch: main
### Framework Governance
- Tractatus components remain ACTIVE
- Session pressure monitoring will restart at baseline
- Instruction database has 10 active instructions
### Test Environment
- All dependencies installed
- Jest working correctly
- No environmental issues
### DO NOT
- Don't start fixing tests without first understanding WHY they fail
- Don't assume the tests are wrong - debug first
- Don't skip running individual tests to understand behavior
- Don't forget to verify no regressions in other tests
### DO
- Start with debugging commands to understand state
- Test instruction classification separately
- Check parameter extraction logic
- Verify semantic prohibition matching
- Run tests one at a time initially
---
## 📊 Session Statistics
**Duration**: ~45 minutes active work
**Messages**: 95 (DANGEROUS threshold)
**Token Usage**: 56.9% (113,770/200,000)
**Errors**: 0
**Commits**: 3
**Tests Fixed**: +19 (149 → 168)
**Tests Broken**: 2 (temporarily, during enhancement)
**Final Pressure**: 95.0% DANGEROUS
---
## 🎯 Success Criteria for Next Session
### Minimum (Essential)
- ✅ CrossReferenceValidator: 28/28 tests passing (100%)
- ✅ Overall coverage: 87.5% maintained or improved
- ✅ No regressions in other services
### Target (Desired)
- ✅ CrossReferenceValidator: 100% coverage
- ✅ Overall coverage: 90%+ (173/192 tests)
- ✅ All commits clean and documented
### Stretch (If Time Permits)
- ✅ Overall coverage: 92%+ (177/192 tests)
- ✅ Document patterns for remaining test fixes
- ✅ Create issue tracker for remaining edge cases
---
**End of Session Handoff**
**Next Session**: Focus on CrossReferenceValidator debugging first
**Recommended Approach**: Debug, then fix, then verify
**Estimated Time**: 30-60 minutes to complete CrossReferenceValidator
🤖 This handoff was created under DANGEROUS pressure conditions.
Framework self-governance successfully prevented potential degradation.

View file

@ -412,6 +412,12 @@ class InstructionPersistenceClassifier {
}
_calculatePersistence({ quadrant, temporalScope, explicitness, source, text }) {
// Special case: Explicit prohibitions are HIGH persistence
// "not X", "never X", "don't use X", "avoid X" indicate strong requirements
if (/\b(?:not|never|don't\s+use|avoid)\s+\w+/i.test(text)) {
return 'HIGH';
}
// Special case: Explicit port/configuration specifications are HIGH persistence
if (/\bport\s+\d{4,5}\b/i.test(text) && explicitness > 0.6) {
return 'HIGH';
@ -422,8 +428,9 @@ class InstructionPersistenceClassifier {
return 'MEDIUM';
}
// Special case: Guideline language ("try to", "aim to") should be MEDIUM
if (/\b(?:try|aim|strive)\s+to\b/i.test(text)) {
// Special case: Preference language ("prefer", "try to", "aim to") should be MEDIUM
// Captures "prefer using", "prefer to", "try to", "aim to"
if (/\b(?:try|aim|strive)\s+to\b/i.test(text) || /\bprefer(?:s|red)?\s+(?:to|using)\b/i.test(text)) {
return 'MEDIUM';
}