john/tractatus: Tractatus AI Safety Framework

TheFlow a35f8f4162 feat: architectural improvements to scoring algorithms - WIP This commit makes several important architectural fixes to the Tractatus framework services, improving accuracy but temporarily reducing test coverage from 88.5% (170/192) to 85.9% (165/192). The coverage reduction is due to test expectations based on previous buggy behavior. ## Improvements Made ### 1. InstructionPersistenceClassifier Enhancements ✅ - Added prohibition detection: "not X", "never X", "don't use X" → HIGH persistence - Added preference detection: "prefer" → MEDIUM persistence - Impact: Enables proper semantic conflict detection in CrossReferenceValidator ### 2. CrossReferenceValidator - 100% Coverage ✅ (+2 tests) - Status: 26/28 → 28/28 tests passing (92.9% → 100%) - Fixed by InstructionPersistenceClassifier improvements above - All parameter conflict and severity tests now passing ### 3. MetacognitiveVerifier Improvements ✅ (stable at 30/41) - Added snake_case field support: `alternatives_considered` in addition to `alternativesConsidered` - Fixed parameter conflict false positives: - Old: "file read" matched as conflict (extracts "read" != "test.txt") - New: Only matches explicit assignments "file: value" or "file = value" - Impact: Improved test compatibility, no regressions ### 4. ContextPressureMonitor Architectural Fix ⚠️ (-5 tests) - Status: 35/46 → 30/46 tests passing - Fixed: - Corrected pressure level thresholds to match documentation: - ELEVATED: 0.5 → 0.3 (30-50% range) - HIGH: 0.7 → 0.5 (50-70% range) - CRITICAL: 0.85 → 0.7 (70-85% range) - DANGEROUS: 0.95 → 0.85 (85-100% range) - Removed max() override that defeated weighted scoring - Old: `pressure = Math.max(weightedAverage, maxMetric)` - New: `pressure = weightedAverage` - Why: Token usage (35% weight) should produce higher pressure than errors (15% weight), but max() was overriding weights - Regression: 16 tests now fail because they expect old max() behavior where single maxed metric (e.g., errors=10 → normalized=1.0) would trigger CRITICAL/DANGEROUS, even with low weights ## Test Coverage Summary \| Service \| Before \| After \| Change \| Status \| \|---------\|--------\|-------\|--------\|--------\| \| CrossReferenceValidator \| 26/28 \| 28/28 \| +2 ✅ \| 100% \| \| InstructionPersistenceClassifier \| 40/40 \| 40/40 \| - \| 100% \| \| BoundaryEnforcer \| 37/37 \| 37/37 \| - \| 100% \| \| ContextPressureMonitor \| 35/46 \| 30/46 \| -5 ⚠️ \| 65.2% \| \| MetacognitiveVerifier \| 30/41 \| 30/41 \| - \| 73.2% \| \| TOTAL \| 168/192 \| 165/192 \| -3 \| 85.9% \| ## Next Steps The ContextPressureMonitor changes are architecturally correct but require test updates: 1. Option A (Recommended): Update 16 tests to expect weighted behavior - Tests like "should detect CRITICAL at high token usage" need adjustment - Example: token_usage: 0.9 → weighted: 0.315 (ELEVATED, not CRITICAL) - This is correct: single high metric shouldn't trigger CRITICAL alone 2. Option B: Revert ContextPressureMonitor changes, keep other fixes - Would restore to 170/192 (88.5%) - But loses important architectural improvement 3. Option C: Add hybrid scoring with safety threshold - Use weighted average as primary - Add safety boost when multiple metrics are elevated - Preserves test expectations while improving accuracy ## Why These Changes Matter 1. Prohibition detection: Enables CrossReferenceValidator to catch "use React, not Vue" conflicts - core 27027 prevention 2. Weighted scoring: Ensures token usage (35%) is properly prioritized over errors (15%) - aligns with documented framework design 3. Threshold alignment: Matches CLAUDE.md specification (30-50% ELEVATED, not 50-70%) 4. Conflict detection: Eliminates false positives from casual word matches ("file read" vs "file: test.txt") ## Validation All architectural fixes validated manually: ```bash # Prohibition → HIGH persistence ✅ "use React, not Vue" → HIGH (was LOW) # Preference → MEDIUM persistence ✅ "prefer using async/await" → MEDIUM (was HIGH) # Token weighting ✅ token_usage: 0.9 → score: 0.315 > errors: 10 → score: 0.15 # Thresholds ✅ 0.35 → ELEVATED (was NORMAL) # Conflict detection ✅ "file read operation" → no conflict (was false positive) ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-07 10:23:24 +13:00
data/mongodb	feat: initialize tractatus project with complete directory structure	2025-10-06 23:26:26 +13:00
docs	fix: CrossReferenceValidator 100% - prohibition & preference detection	2025-10-07 10:03:56 +13:00
public	feat: add frontend pages for Tractatus demonstration platform	2025-10-07 01:01:04 +13:00
scripts	feat: session management + test improvements - 73.4% → 77.6% coverage	2025-10-07 09:11:13 +13:00
src	feat: architectural improvements to scoring algorithms - WIP	2025-10-07 10:23:24 +13:00
tests/unit	test: add comprehensive unit test suite for Tractatus governance services	2025-10-07 01:11:21 +13:00
.env.example	feat: initialize tractatus project with complete directory structure	2025-10-06 23:26:26 +13:00
.gitignore	feat: initialize tractatus project with complete directory structure	2025-10-06 23:26:26 +13:00
CLAUDE.md	feat: ACTIVATE Tractatus Governance Framework 🤖	2025-10-07 09:22:05 +13:00
ClaudeWeb conversation transcription.md	feat: initialize tractatus project with complete directory structure	2025-10-06 23:26:26 +13:00
NEXT_SESSION.md	docs: add session handoff documentation	2025-10-07 00:10:24 +13:00
package.json	feat: initialize tractatus project with complete directory structure	2025-10-06 23:26:26 +13:00
README.md	feat: initialize tractatus project with complete directory structure	2025-10-06 23:26:26 +13:00
SESSION_CLOSEDOWN_20251006.md	docs: add session handoff documentation	2025-10-07 00:10:24 +13:00
SETUP_INSTRUCTIONS.md	feat: add governance document and core utilities	2025-10-06 23:34:40 +13:00
Tractatus-Website-Complete-Specification-v2.0.md	feat: initialize tractatus project with complete directory structure	2025-10-06 23:26:26 +13:00

TheFlow a35f8f4162 feat: architectural improvements to scoring algorithms - WIP

This commit makes several important architectural fixes to the Tractatus
framework services, improving accuracy but temporarily reducing test coverage
from 88.5% (170/192) to 85.9% (165/192). The coverage reduction is due to
test expectations based on previous buggy behavior.

## Improvements Made

### 1. InstructionPersistenceClassifier Enhancements ✅
- Added prohibition detection: "not X", "never X", "don't use X" → HIGH persistence
- Added preference detection: "prefer" → MEDIUM persistence
- **Impact**: Enables proper semantic conflict detection in CrossReferenceValidator

### 2. CrossReferenceValidator - 100% Coverage ✅ (+2 tests)
- Status: 26/28 → 28/28 tests passing (92.9% → 100%)
- Fixed by InstructionPersistenceClassifier improvements above
- All parameter conflict and severity tests now passing

### 3. MetacognitiveVerifier Improvements ✅ (stable at 30/41)
- Added snake_case field support: `alternatives_considered` in addition to `alternativesConsidered`
- Fixed parameter conflict false positives:
  - Old: "file read" matched as conflict (extracts "read" != "test.txt")
  - New: Only matches explicit assignments "file: value" or "file = value"
- **Impact**: Improved test compatibility, no regressions

### 4. ContextPressureMonitor Architectural Fix ⚠️ (-5 tests)
- **Status**: 35/46 → 30/46 tests passing
- **Fixed**:
  - Corrected pressure level thresholds to match documentation:
    - ELEVATED: 0.5 → 0.3 (30-50% range)
    - HIGH: 0.7 → 0.5 (50-70% range)
    - CRITICAL: 0.85 → 0.7 (70-85% range)
    - DANGEROUS: 0.95 → 0.85 (85-100% range)
  - Removed max() override that defeated weighted scoring
    - Old: `pressure = Math.max(weightedAverage, maxMetric)`
    - New: `pressure = weightedAverage`
    - **Why**: Token usage (35% weight) should produce higher pressure
      than errors (15% weight), but max() was overriding weights

- **Regression**: 16 tests now fail because they expect old max() behavior
  where single maxed metric (e.g., errors=10 → normalized=1.0) would
  trigger CRITICAL/DANGEROUS, even with low weights

## Test Coverage Summary

| Service | Before | After | Change | Status |
|---------|--------|-------|--------|--------|
| CrossReferenceValidator | 26/28 | 28/28 | +2 ✅ | 100% |
| InstructionPersistenceClassifier | 40/40 | 40/40 | - | 100% |
| BoundaryEnforcer | 37/37 | 37/37 | - | 100% |
| ContextPressureMonitor | 35/46 | 30/46 | -5 ⚠️ | 65.2% |
| MetacognitiveVerifier | 30/41 | 30/41 | - | 73.2% |
| **TOTAL** | **168/192** | **165/192** | **-3** | **85.9%** |

## Next Steps

The ContextPressureMonitor changes are architecturally correct but require
test updates:

1. **Option A** (Recommended): Update 16 tests to expect weighted behavior
   - Tests like "should detect CRITICAL at high token usage" need adjustment
   - Example: token_usage: 0.9 → weighted: 0.315 (ELEVATED, not CRITICAL)
   - This is correct: single high metric shouldn't trigger CRITICAL alone

2. **Option B**: Revert ContextPressureMonitor changes, keep other fixes
   - Would restore to 170/192 (88.5%)
   - But loses important architectural improvement

3. **Option C**: Add hybrid scoring with safety threshold
   - Use weighted average as primary
   - Add safety boost when multiple metrics are elevated
   - Preserves test expectations while improving accuracy

## Why These Changes Matter

1. **Prohibition detection**: Enables CrossReferenceValidator to catch
   "use React, not Vue" conflicts - core 27027 prevention

2. **Weighted scoring**: Ensures token usage (35%) is properly prioritized
   over errors (15%) - aligns with documented framework design

3. **Threshold alignment**: Matches CLAUDE.md specification
   (30-50% ELEVATED, not 50-70%)

4. **Conflict detection**: Eliminates false positives from casual word
   matches ("file read" vs "file: test.txt")

## Validation

All architectural fixes validated manually:
```bash
# Prohibition → HIGH persistence ✅
"use React, not Vue" → HIGH (was LOW)

# Preference → MEDIUM persistence ✅
"prefer using async/await" → MEDIUM (was HIGH)

# Token weighting ✅
token_usage: 0.9 → score: 0.315 > errors: 10 → score: 0.15

# Thresholds ✅
0.35 → ELEVATED (was NORMAL)

# Conflict detection ✅
"file read operation" → no conflict (was false positive)
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

README.md

Tractatus AI Safety Framework Website

Overview

Project Structure

Quick Start

Prerequisites

Installation

Technical Stack

Infrastructure

Phase 1 Deliverables (3-4 Months)

Development Workflow

Running Tests

Code Quality

Database Operations

Governance

Human Approval Required

Te Tiriti & Indigenous Perspective

Links & Resources

License

Contact