fix: CrossReferenceValidator 100% - prohibition & preference detection
Fixed 2 failing CrossReferenceValidator tests by improving InstructionPersistenceClassifier: 1. **Prohibition Detection (Test #1)** - Added HIGH persistence for explicit prohibitions - Patterns: "not X", "never X", "don't use X", "avoid X" - Example: "use React, not Vue" → HIGH (was LOW) - Enables semantic conflict detection in CrossReferenceValidator 2. **Preference Language (Test #2)** - Added "prefer" to MEDIUM persistence indicators - Patterns: "prefer to", "prefer using", "try to", "aim to" - Example: "prefer using async/await" → MEDIUM (was HIGH) - Prevents over-aggressive rejection for soft preferences **Impact:** - CrossReferenceValidator: 26/28 → 28/28 (92.9% → 100%) - Overall coverage: 168/192 → 170/192 (87.5% → 88.5%) - +2 tests, +1.0% coverage **Changes:** - src/services/InstructionPersistenceClassifier.service.js: - Added prohibition pattern detection in _calculatePersistence() - Enhanced preference language patterns **Root Cause:** Previous session's CrossReferenceValidator enhancements expected HIGH persistence for prohibitions, but classifier wasn't recognizing them. **Validation:** All 28 CrossReferenceValidator tests passing No regressions in other services 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
parent
0eec32c1b2
commit
9ca462db39
2 changed files with 420 additions and 2 deletions
411
docs/session-handoff-2025-10-07-part3-crossreference.md
Normal file
411
docs/session-handoff-2025-10-07-part3-crossreference.md
Normal file
|
|
@ -0,0 +1,411 @@
|
|||
# Session Handoff: CrossReferenceValidator Debugging
|
||||
**Date**: 2025-10-07
|
||||
**Session**: Part 3 - Test Coverage Push & CrossReferenceValidator
|
||||
**Status**: ⚠️ HALTED - DANGEROUS Pressure (95%)
|
||||
**Overall Coverage**: 87.5% (168/192 tests)
|
||||
|
||||
---
|
||||
|
||||
## 🛑 Session Halted by Framework
|
||||
|
||||
**ContextPressureMonitor**: DANGEROUS (95%)
|
||||
- **Reason**: Conversation length (95 messages)
|
||||
- **Action**: Mandatory halt per Tractatus governance
|
||||
- **User**: Acknowledged and authorized completion of handoff
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Session Objectives - 2 of 3 Achieved
|
||||
|
||||
| Objective | Target | Final | Status |
|
||||
|-----------|--------|-------|--------|
|
||||
| InstructionPersistenceClassifier | 95%+ | **100%** | ✅ EXCEEDED |
|
||||
| ContextPressureMonitor | 75%+ | **76.1%** | ✅ ACHIEVED |
|
||||
| MetacognitiveVerifier | 70%+ | **73.2%** | ✅ ACHIEVED |
|
||||
| CrossReferenceValidator | 100% | 92.9% | ⚠️ IN PROGRESS |
|
||||
|
||||
---
|
||||
|
||||
## 📊 Test Coverage Progress
|
||||
|
||||
**Session Start**: 77.6% (149/192)
|
||||
**Session End**: 87.5% (168/192)
|
||||
**Improvement**: +9.9% (+19 tests)
|
||||
|
||||
### Service Breakdown
|
||||
|
||||
| Service | Start | End | Change | Status |
|
||||
|---------|-------|-----|--------|--------|
|
||||
| **InstructionPersistenceClassifier** | 85.3% | **100%** | +14.7% | ✅✅✅ PERFECT |
|
||||
| **BoundaryEnforcer** | 100% | **100%** | - | ✅✅✅ PERFECT |
|
||||
| **ContextPressureMonitor** | 60.9% | **76.1%** | +15.2% | ✅✅ Very Good |
|
||||
| **MetacognitiveVerifier** | 61.0% | **73.2%** | +12.2% | ✅✅ Very Good |
|
||||
| **CrossReferenceValidator** | 96.4% | **92.9%** | -3.5% | ⚠️ **WIP** |
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ CrossReferenceValidator: 2 Tests Failing
|
||||
|
||||
**Current Status**: 26/28 tests passing (92.9%)
|
||||
**Note**: Temporarily degraded from 96.4% during enhancement work
|
||||
|
||||
### Failing Test #1: Parameter Conflict Detection
|
||||
|
||||
**Test**: `should detect parameter conflicts between action and instruction`
|
||||
|
||||
**Test Code**:
|
||||
```javascript
|
||||
const instruction = classifier.classify({
|
||||
text: 'use React for the frontend, not Vue',
|
||||
context: {},
|
||||
source: 'user'
|
||||
});
|
||||
|
||||
const action = {
|
||||
type: 'install_package',
|
||||
description: 'Install Vue.js framework',
|
||||
parameters: {
|
||||
package: 'vue',
|
||||
framework: 'vue'
|
||||
}
|
||||
};
|
||||
|
||||
const result = validator.validate(action, { recent_instructions: [instruction] });
|
||||
|
||||
expect(result.status).toBe('REJECTED'); // FAILS - gets APPROVED
|
||||
expect(result.conflicts.length).toBeGreaterThan(0);
|
||||
```
|
||||
|
||||
**Expected**: Status = 'REJECTED', conflicts detected
|
||||
**Actual**: Status = 'APPROVED', no conflicts detected
|
||||
|
||||
**Problem**: The instruction says "use React, not Vue", but the semantic conflict detection isn't catching the conflict between the action parameters (package: 'vue', framework: 'vue') and the prohibition in the instruction text.
|
||||
|
||||
**What Was Tried**:
|
||||
1. Added semantic prohibition detection in `_checkConflict()`
|
||||
2. Added patterns: `/\bnot\s+(\w+)/gi`, `/\bnever\s+(\w+)/gi`, etc.
|
||||
3. Limited to HIGH persistence instructions only
|
||||
4. Set prohibition conflicts to CRITICAL severity
|
||||
|
||||
**Why It's Still Failing**:
|
||||
The prohibition pattern `/\bnot\s+(\w+)/gi` matches "not Vue" and extracts "Vue", but the parameter values are lowercase "vue". The pattern matching needs to be case-insensitive AND the instruction needs to be classified with HIGH persistence.
|
||||
|
||||
**Next Steps**:
|
||||
1. Check if instruction is actually classified as HIGH persistence
|
||||
2. Verify parameter extraction is working
|
||||
3. Test the prohibition pattern matching logic
|
||||
4. May need to extract "React" and "Vue" as framework parameters from instruction text
|
||||
|
||||
---
|
||||
|
||||
### Failing Test #2: WARNING Severity Assignment
|
||||
|
||||
**Test**: `should assign WARNING severity to conflicts with MEDIUM persistence instructions`
|
||||
|
||||
**Test Code**:
|
||||
```javascript
|
||||
const instruction = classifier.classify({
|
||||
text: 'prefer using async/await over callbacks',
|
||||
context: {},
|
||||
source: 'user'
|
||||
});
|
||||
|
||||
const action = {
|
||||
type: 'code_generation',
|
||||
description: 'Generate function with callbacks',
|
||||
parameters: {
|
||||
pattern: 'callback'
|
||||
}
|
||||
};
|
||||
|
||||
const result = validator.validate(action, { recent_instructions: [instruction] });
|
||||
|
||||
expect(['WARNING', 'APPROVED']).toContain(result.status); // FAILS - gets REJECTED
|
||||
```
|
||||
|
||||
**Expected**: Status = 'WARNING' or 'APPROVED'
|
||||
**Actual**: Status = 'REJECTED'
|
||||
|
||||
**Problem**: The instruction likely has HIGH persistence (not MEDIUM), causing CRITICAL severity conflict, which leads to REJECTED instead of WARNING.
|
||||
|
||||
**What Was Changed**:
|
||||
```javascript
|
||||
// In _determineConflictSeverity:
|
||||
if (persistence === 'HIGH') {
|
||||
return CONFLICT_SEVERITY.CRITICAL; // Changed from WARNING
|
||||
}
|
||||
```
|
||||
|
||||
This made ALL HIGH persistence conflicts CRITICAL, which may be too strict.
|
||||
|
||||
**Why It's Failing**:
|
||||
The instruction "prefer using async/await over callbacks" is probably being classified as HIGH persistence instead of MEDIUM, because it contains strong language ("prefer").
|
||||
|
||||
**Next Steps**:
|
||||
1. Check actual persistence classification of the instruction
|
||||
2. If it's HIGH, adjust the test expectation
|
||||
3. If it should be MEDIUM, adjust InstructionPersistenceClassifier
|
||||
4. Consider adding "prefer" as a MEDIUM persistence indicator
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Changes Made to CrossReferenceValidator
|
||||
|
||||
### File: `src/services/CrossReferenceValidator.service.js`
|
||||
|
||||
#### 1. Added Semantic Prohibition Detection
|
||||
```javascript
|
||||
// Check for semantic conflicts (prohibitions in instruction text)
|
||||
if (instruction.persistence === 'HIGH') {
|
||||
const prohibitionPatterns = [
|
||||
/\bnot\s+(\w+)/gi,
|
||||
/don't\s+use\s+(\w+)/gi,
|
||||
/\bavoid\s+(\w+)/gi,
|
||||
/\bnever\s+(\w+)/gi
|
||||
];
|
||||
|
||||
// Detects "not X", "never X", etc. and treats as CRITICAL conflicts
|
||||
}
|
||||
```
|
||||
|
||||
#### 2. Enhanced Conflict Severity Logic
|
||||
```javascript
|
||||
// HIGH persistence alone now returns CRITICAL (was WARNING)
|
||||
if (persistence === 'HIGH') {
|
||||
return CONFLICT_SEVERITY.CRITICAL;
|
||||
}
|
||||
|
||||
// Added 'confirmed' to critical parameters
|
||||
const criticalParams = ['port', 'database', 'host', 'url', 'confirmed'];
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 💾 Commits Created
|
||||
|
||||
### 1. `6102412` - InstructionPersistenceClassifier + ContextPressureMonitor
|
||||
- InstructionPersistenceClassifier: 85.3% → 100%
|
||||
- ContextPressureMonitor: 60.9% → 76.1%
|
||||
- Overall: 77.6% → 84.9%
|
||||
|
||||
### 2. `2299dc7` - MetacognitiveVerifier Improvements
|
||||
- MetacognitiveVerifier: 63.4% → 73.2%
|
||||
- Overall: 84.9% → 87.5%
|
||||
|
||||
### 3. `cd747df` - CrossReferenceValidator WIP (UNCOMMITTED WORK POSSIBLE)
|
||||
- Added semantic prohibition detection
|
||||
- Enhanced severity logic
|
||||
- Status: 2 tests still failing
|
||||
|
||||
---
|
||||
|
||||
## 📋 Next Session Action Plan
|
||||
|
||||
### Immediate (First 15 minutes)
|
||||
|
||||
1. **Verify Git Status**
|
||||
```bash
|
||||
git status
|
||||
```
|
||||
|
||||
2. **Run CrossReferenceValidator Tests**
|
||||
```bash
|
||||
npx jest tests/unit/CrossReferenceValidator.test.js --verbose
|
||||
```
|
||||
|
||||
3. **Debug Failing Test #1** (Parameter Conflicts)
|
||||
```bash
|
||||
# Test the instruction classification
|
||||
node -e "
|
||||
const classifier = require('./src/services/InstructionPersistenceClassifier.service.js');
|
||||
const result = classifier.classify({
|
||||
text: 'use React for the frontend, not Vue',
|
||||
context: {},
|
||||
source: 'user'
|
||||
});
|
||||
console.log('Persistence:', result.persistence);
|
||||
console.log('Parameters:', result.parameters);
|
||||
"
|
||||
```
|
||||
|
||||
4. **Check Parameter Extraction**
|
||||
```bash
|
||||
# Verify what parameters are extracted from instruction
|
||||
# and from action
|
||||
```
|
||||
|
||||
### Short-term (Next 30 minutes)
|
||||
|
||||
1. **Fix Test #1**: Parameter conflict detection
|
||||
- Ensure instruction is HIGH persistence
|
||||
- Verify prohibition pattern matching
|
||||
- Test case-insensitive matching
|
||||
- May need to extract "React" and "Vue" as framework parameters
|
||||
|
||||
2. **Fix Test #2**: WARNING severity
|
||||
- Check persistence of "prefer using" instruction
|
||||
- Adjust test expectation OR classifier logic
|
||||
- Consider adding "prefer" as MEDIUM indicator
|
||||
|
||||
3. **Verify No Regressions**
|
||||
```bash
|
||||
npx jest tests/unit/CrossReferenceValidator.test.js
|
||||
# Should show 28/28 passing
|
||||
```
|
||||
|
||||
4. **Run Full Test Suite**
|
||||
```bash
|
||||
npm run test:unit
|
||||
# Verify 168+ tests passing
|
||||
```
|
||||
|
||||
### Medium-term (If time permits)
|
||||
|
||||
1. **Push Overall Coverage to 90%+**
|
||||
- Current: 87.5% (168/192)
|
||||
- Target: 90%+ (173/192)
|
||||
- Need: +5 tests
|
||||
|
||||
2. **Remaining Opportunities**:
|
||||
- ContextPressureMonitor: 11 tests remaining
|
||||
- MetacognitiveVerifier: 11 tests remaining
|
||||
- Pick easiest wins
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Debugging Commands
|
||||
|
||||
### Check Instruction Classification
|
||||
```bash
|
||||
node -e "
|
||||
const classifier = require('./src/services/InstructionPersistenceClassifier.service.js');
|
||||
const inst = classifier.classify({
|
||||
text: 'use React for the frontend, not Vue',
|
||||
context: {}, source: 'user'
|
||||
});
|
||||
console.log(JSON.stringify(inst, null, 2));
|
||||
"
|
||||
```
|
||||
|
||||
### Check Parameter Extraction
|
||||
```bash
|
||||
node -e "
|
||||
const validator = require('./src/services/CrossReferenceValidator.service.js');
|
||||
const action = {
|
||||
type: 'install_package',
|
||||
description: 'Install Vue.js framework',
|
||||
parameters: { package: 'vue', framework: 'vue' }
|
||||
};
|
||||
const params = validator._extractActionParameters(action);
|
||||
console.log('Extracted params:', params);
|
||||
"
|
||||
```
|
||||
|
||||
### Run Single Test with Full Output
|
||||
```bash
|
||||
npx jest tests/unit/CrossReferenceValidator.test.js \
|
||||
-t "should detect parameter conflicts" \
|
||||
--verbose --no-coverage
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎓 Key Learnings This Session
|
||||
|
||||
### 1. Tractatus Framework Self-Governance Works
|
||||
- ContextPressureMonitor correctly detected DANGEROUS conditions
|
||||
- Mandatory halt triggered at 95% conversation pressure
|
||||
- Framework dogfooding validated in real use
|
||||
|
||||
### 2. Multi-Factor Pressure is Critical
|
||||
- Conversation length (95 messages) was primary risk factor
|
||||
- Token usage was moderate (56.9%)
|
||||
- System correctly prioritized conversation attention decay
|
||||
|
||||
### 3. Semantic Conflict Detection is Complex
|
||||
- Parameter-level conflicts are straightforward
|
||||
- Text-based prohibition detection requires:
|
||||
- HIGH persistence filtering
|
||||
- Case-insensitive matching
|
||||
- Context-aware pattern extraction
|
||||
- Framework/library name recognition
|
||||
|
||||
### 4. Test-Driven Fixes Can Temporarily Break Things
|
||||
- CrossReferenceValidator went from 96.4% → 92.9% during enhancement
|
||||
- Normal when adding new features
|
||||
- Acceptable as WIP if documented and committed separately
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ Important Notes for Next Session
|
||||
|
||||
### Git Status
|
||||
- All work committed (3 commits)
|
||||
- Working directory should be clean
|
||||
- Branch: main
|
||||
|
||||
### Framework Governance
|
||||
- Tractatus components remain ACTIVE
|
||||
- Session pressure monitoring will restart at baseline
|
||||
- Instruction database has 10 active instructions
|
||||
|
||||
### Test Environment
|
||||
- All dependencies installed
|
||||
- Jest working correctly
|
||||
- No environmental issues
|
||||
|
||||
### DO NOT
|
||||
- Don't start fixing tests without first understanding WHY they fail
|
||||
- Don't assume the tests are wrong - debug first
|
||||
- Don't skip running individual tests to understand behavior
|
||||
- Don't forget to verify no regressions in other tests
|
||||
|
||||
### DO
|
||||
- Start with debugging commands to understand state
|
||||
- Test instruction classification separately
|
||||
- Check parameter extraction logic
|
||||
- Verify semantic prohibition matching
|
||||
- Run tests one at a time initially
|
||||
|
||||
---
|
||||
|
||||
## 📊 Session Statistics
|
||||
|
||||
**Duration**: ~45 minutes active work
|
||||
**Messages**: 95 (DANGEROUS threshold)
|
||||
**Token Usage**: 56.9% (113,770/200,000)
|
||||
**Errors**: 0
|
||||
**Commits**: 3
|
||||
**Tests Fixed**: +19 (149 → 168)
|
||||
**Tests Broken**: 2 (temporarily, during enhancement)
|
||||
**Final Pressure**: 95.0% DANGEROUS
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Success Criteria for Next Session
|
||||
|
||||
### Minimum (Essential)
|
||||
- ✅ CrossReferenceValidator: 28/28 tests passing (100%)
|
||||
- ✅ Overall coverage: 87.5% maintained or improved
|
||||
- ✅ No regressions in other services
|
||||
|
||||
### Target (Desired)
|
||||
- ✅ CrossReferenceValidator: 100% coverage
|
||||
- ✅ Overall coverage: 90%+ (173/192 tests)
|
||||
- ✅ All commits clean and documented
|
||||
|
||||
### Stretch (If Time Permits)
|
||||
- ✅ Overall coverage: 92%+ (177/192 tests)
|
||||
- ✅ Document patterns for remaining test fixes
|
||||
- ✅ Create issue tracker for remaining edge cases
|
||||
|
||||
---
|
||||
|
||||
**End of Session Handoff**
|
||||
|
||||
**Next Session**: Focus on CrossReferenceValidator debugging first
|
||||
**Recommended Approach**: Debug, then fix, then verify
|
||||
**Estimated Time**: 30-60 minutes to complete CrossReferenceValidator
|
||||
|
||||
🤖 This handoff was created under DANGEROUS pressure conditions.
|
||||
Framework self-governance successfully prevented potential degradation.
|
||||
|
|
@ -412,6 +412,12 @@ class InstructionPersistenceClassifier {
|
|||
}
|
||||
|
||||
_calculatePersistence({ quadrant, temporalScope, explicitness, source, text }) {
|
||||
// Special case: Explicit prohibitions are HIGH persistence
|
||||
// "not X", "never X", "don't use X", "avoid X" indicate strong requirements
|
||||
if (/\b(?:not|never|don't\s+use|avoid)\s+\w+/i.test(text)) {
|
||||
return 'HIGH';
|
||||
}
|
||||
|
||||
// Special case: Explicit port/configuration specifications are HIGH persistence
|
||||
if (/\bport\s+\d{4,5}\b/i.test(text) && explicitness > 0.6) {
|
||||
return 'HIGH';
|
||||
|
|
@ -422,8 +428,9 @@ class InstructionPersistenceClassifier {
|
|||
return 'MEDIUM';
|
||||
}
|
||||
|
||||
// Special case: Guideline language ("try to", "aim to") should be MEDIUM
|
||||
if (/\b(?:try|aim|strive)\s+to\b/i.test(text)) {
|
||||
// Special case: Preference language ("prefer", "try to", "aim to") should be MEDIUM
|
||||
// Captures "prefer using", "prefer to", "try to", "aim to"
|
||||
if (/\b(?:try|aim|strive)\s+to\b/i.test(text) || /\bprefer(?:s|red)?\s+(?:to|using)\b/i.test(text)) {
|
||||
return 'MEDIUM';
|
||||
}
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue