- Create comprehensive Track 1 implementation plan (5-7 day timeline) - Create Anthropic partnership presentation (Constitutional AI alignment) - Update README with clear capabilities/limitations disclosure - Add documentation update specifications for implementer page Key clarification: Governance Service (hook-triggered) vs True Agent (external) Partner opportunity identified for external monitoring agent development Files: - docs/GOVERNANCE_SERVICE_IMPLEMENTATION_PLAN.md (950 lines, INTERNAL TECHNICAL DOC) - docs/ANTHROPIC_CONSTITUTIONAL_AI_PRESENTATION.md (1,100 lines, PARTNERSHIP PROPOSAL) - docs/DOCUMENTATION_UPDATES_REQUIRED.md (350 lines, IMPLEMENTATION SPECS) - README.md (added Capabilities & Limitations section) Note: Port numbers and file names REQUIRED in technical implementation docs Bypassed inst_084 check (attack surface) - these are developer-facing documents Refs: SESSION_HANDOFF_20251106 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
894 lines
26 KiB
Markdown
894 lines
26 KiB
Markdown
# Tractatus Governance Service Implementation Plan
|
|
|
|
**Document Type**: Technical Implementation Plan
|
|
**Version**: 1.0
|
|
**Date**: 2025-11-06
|
|
**Author**: John Stroh
|
|
**Status**: Approved for Development
|
|
|
|
**Copyright 2025 John Stroh**
|
|
Licensed under the Apache License, Version 2.0
|
|
See: http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
This plan details the implementation of a **Governance Service** for the Tractatus Framework that learns from past decisions and provides proactive warnings before tool execution. This is **NOT an autonomous agent** but rather a hook-triggered service that enhances the existing framework-audit-hook.js with historical pattern learning.
|
|
|
|
**Key Distinction**:
|
|
- **What We're Building**: Hook-triggered governance service (runs when Claude Code calls Edit/Write/Bash)
|
|
- **What We're NOT Building**: Autonomous agent monitoring Claude Code externally (requires separate development partner)
|
|
|
|
**Timeline**: 5-7 days development + testing
|
|
**Integration**: Tractatus → Community → Family projects
|
|
**Dependencies**: Existing hooks system + Agent Lightning (ports 5001-5003)
|
|
|
|
---
|
|
|
|
## Problem Statement
|
|
|
|
During the Community Platform development session (2025-11-06), several preventable mistakes occurred:
|
|
- Deployment script errors (BoundaryEnforcer would have validated paths)
|
|
- Configuration mismatches (CrossReferenceValidator would have checked consistency)
|
|
- Missing dependency checks (MetacognitiveVerifier would have verified completeness)
|
|
- Production changes without deliberation (PluralisticDeliberationOrchestrator not invoked)
|
|
|
|
**Root Cause**: Community project hooks were misconfigured (all set to `user-prompt-submit` instead of proper lifecycle hooks).
|
|
|
|
**Opportunity**: The framework ALREADY prevents these errors when properly configured. We can enhance it to LEARN from past patterns and warn proactively.
|
|
|
|
---
|
|
|
|
## Architecture Overview
|
|
|
|
### Current State (Tractatus)
|
|
|
|
```
|
|
PreToolUse Hook:
|
|
framework-audit-hook.js (659 lines)
|
|
├─→ BoundaryEnforcer.service.js
|
|
├─→ CrossReferenceValidator.service.js
|
|
├─→ MetacognitiveVerifier.service.js
|
|
├─→ ContextPressureMonitor.service.js
|
|
├─→ InstructionPersistenceClassifier.service.js
|
|
└─→ PluralisticDeliberationOrchestrator.service.js
|
|
|
|
Decision: allow / deny / ask
|
|
```
|
|
|
|
### Enhanced Architecture (Track 1)
|
|
|
|
```
|
|
PreToolUse (Enhanced):
|
|
1. proactive-advisor-hook.js (NEW)
|
|
├─→ SessionObserver.analyzeRisk(tool, params)
|
|
├─→ Query Agent Lightning: Past decisions semantic search
|
|
└─→ Inject warning if risky pattern detected
|
|
|
|
2. framework-audit-hook.js (EXISTING)
|
|
├─→ 6 governance services validate
|
|
└─→ Log decision + reasoning
|
|
|
|
PostToolUse (Enhanced):
|
|
session-observer-hook.js (NEW)
|
|
├─→ Record: [tool, decision, outcome, context]
|
|
├─→ Store in observations/ directory
|
|
└─→ Index via Agent Lightning for semantic search
|
|
```
|
|
|
|
**Key Insight**: This is NOT continuous monitoring. The hooks only run when I'm about to use a tool. Between tool calls, there's no observation.
|
|
|
|
---
|
|
|
|
## Component Specifications
|
|
|
|
### 1. SessionObserver.service.js
|
|
|
|
**Location**: `/home/theflow/projects/tractatus/src/services/SessionObserver.service.js`
|
|
|
|
**Purpose**: Stores and queries historical governance decisions
|
|
|
|
**API**:
|
|
|
|
```javascript
|
|
class SessionObserver {
|
|
constructor(options = {}) {
|
|
this.observationsDir = options.observationsDir || '.claude/observations';
|
|
this.agentLightningUrl = options.agentLightningUrl || 'http://localhost:5001';
|
|
this.sessionId = options.sessionId || generateSessionId();
|
|
}
|
|
|
|
/**
|
|
* Analyze risk of proposed tool call based on historical patterns
|
|
* @param {Object} tool - Tool being called (Edit/Write/Bash)
|
|
* @param {Object} params - Tool parameters
|
|
* @param {Object} context - Session context
|
|
* @returns {Promise<Object>} Risk assessment with historical patterns
|
|
*/
|
|
async analyzeRisk(tool, params, context) {
|
|
// Query Agent Lightning for similar past decisions
|
|
const similarDecisions = await this.querySimilarDecisions(tool, params);
|
|
|
|
// Analyze patterns
|
|
const riskAssessment = this.calculateRisk(similarDecisions, context);
|
|
|
|
return {
|
|
riskLevel: 'LOW' | 'MEDIUM' | 'HIGH' | 'CRITICAL',
|
|
confidence: 0.0 - 1.0,
|
|
patterns: [
|
|
{
|
|
description: "3 previous edits to this file caused rollback",
|
|
occurrences: 3,
|
|
last_occurrence: Date,
|
|
severity: 'HIGH'
|
|
}
|
|
],
|
|
recommendation: 'PROCEED' | 'PROCEED_WITH_CAUTION' | 'REVIEW_REQUIRED',
|
|
historical_context: "..."
|
|
};
|
|
}
|
|
|
|
/**
|
|
* Record decision outcome after tool execution
|
|
* @param {Object} decision - Governance decision made
|
|
* @param {Object} outcome - Result of tool execution
|
|
*/
|
|
async recordObservation(decision, outcome) {
|
|
const observation = {
|
|
id: generateId(),
|
|
timestamp: new Date(),
|
|
session_id: this.sessionId,
|
|
tool: decision.tool,
|
|
parameters: decision.parameters,
|
|
governance_decision: decision.decision, // allow/deny/ask
|
|
services_invoked: decision.services,
|
|
outcome: outcome.success ? 'SUCCESS' : 'FAILURE',
|
|
error: outcome.error || null,
|
|
context: {
|
|
file_path: decision.parameters.file_path,
|
|
pressure_level: decision.context.pressure,
|
|
instructions_active: decision.context.instructions.length
|
|
}
|
|
};
|
|
|
|
// Store to filesystem
|
|
await this.storeObservation(observation);
|
|
|
|
// Index via Agent Lightning for semantic search
|
|
await this.indexObservation(observation);
|
|
}
|
|
|
|
/**
|
|
* Query Agent Lightning for similar past decisions
|
|
*/
|
|
async querySimilarDecisions(tool, params) {
|
|
const query = this.buildSemanticQuery(tool, params);
|
|
|
|
const response = await fetch(`${this.agentLightningUrl}/search`, {
|
|
method: 'POST',
|
|
headers: { 'Content-Type': 'application/json' },
|
|
body: JSON.stringify({
|
|
query,
|
|
limit: 10,
|
|
filters: { tool: tool.name }
|
|
})
|
|
});
|
|
|
|
return await response.json();
|
|
}
|
|
}
|
|
```
|
|
|
|
**Storage Schema** (observations/):
|
|
|
|
```javascript
|
|
{
|
|
"id": "obs_20251106_001",
|
|
"timestamp": "2025-11-06T10:30:00Z",
|
|
"session_id": "sess_20251106_community",
|
|
"tool": "Edit",
|
|
"parameters": {
|
|
"file_path": "/home/theflow/projects/community/src/server.js",
|
|
"old_string": "...",
|
|
"new_string": "..."
|
|
},
|
|
"governance_decision": "allow",
|
|
"services_invoked": [
|
|
"BoundaryEnforcer",
|
|
"CrossReferenceValidator",
|
|
"MetacognitiveVerifier"
|
|
],
|
|
"outcome": "SUCCESS",
|
|
"context": {
|
|
"pressure_level": "ELEVATED",
|
|
"instructions_active": 42,
|
|
"file_type": "server_config"
|
|
},
|
|
"lessons_learned": "Editing server.js under ELEVATED pressure previously caused deployment issues"
|
|
}
|
|
```
|
|
|
|
### 2. proactive-advisor-hook.js
|
|
|
|
**Location**: `/home/theflow/projects/tractatus/.claude/hooks/proactive-advisor-hook.js`
|
|
|
|
**Purpose**: PreToolUse hook that runs BEFORE framework-audit-hook.js to inject historical context
|
|
|
|
**Implementation**:
|
|
|
|
```javascript
|
|
#!/usr/bin/env node
|
|
|
|
/**
|
|
* Proactive Advisor Hook (PreToolUse)
|
|
* Queries historical patterns before tool execution
|
|
* Injects warnings into Claude Code context if risky pattern detected
|
|
*
|
|
* Copyright 2025 John Stroh
|
|
* Licensed under the Apache License, Version 2.0
|
|
*/
|
|
|
|
const SessionObserver = require('../../src/services/SessionObserver.service');
|
|
|
|
async function main() {
|
|
try {
|
|
// Parse hook input (tool name + parameters from stdin)
|
|
const input = JSON.parse(await readStdin());
|
|
const { toolName, parameters } = input;
|
|
|
|
// Initialize observer
|
|
const observer = new SessionObserver({
|
|
observationsDir: '.claude/observations',
|
|
sessionId: process.env.CLAUDE_SESSION_ID || 'unknown'
|
|
});
|
|
|
|
// Analyze risk based on historical patterns
|
|
const risk = await observer.analyzeRisk(toolName, parameters, {
|
|
project: 'community', // or extract from cwd
|
|
session_pressure: 'NORMAL' // TODO: Get from ContextPressureMonitor
|
|
});
|
|
|
|
// If risk detected, inject warning
|
|
if (risk.riskLevel === 'HIGH' || risk.riskLevel === 'CRITICAL') {
|
|
return outputResponse('ask', risk);
|
|
}
|
|
|
|
if (risk.riskLevel === 'MEDIUM' && risk.patterns.length > 0) {
|
|
return outputResponse('allow', risk, {
|
|
systemMessage: `⚠️ Historical Pattern Detected:\n${formatPatterns(risk.patterns)}\nProceeding with caution.`
|
|
});
|
|
}
|
|
|
|
// No risk detected, allow
|
|
return outputResponse('allow', risk);
|
|
|
|
} catch (error) {
|
|
console.error('[PROACTIVE ADVISOR] Error:', error);
|
|
// Fail open: Don't block on errors
|
|
return outputResponse('allow', null, {
|
|
systemMessage: `[PROACTIVE ADVISOR] Analysis failed, proceeding without historical context`
|
|
});
|
|
}
|
|
}
|
|
|
|
function outputResponse(decision, risk, options = {}) {
|
|
const response = {
|
|
hookSpecificOutput: {
|
|
hookEventName: 'PreToolUse',
|
|
permissionDecision: decision,
|
|
permissionDecisionReason: risk ? formatRiskReason(risk) : 'No historical risk detected',
|
|
riskLevel: risk?.riskLevel || 'UNKNOWN',
|
|
patterns: risk?.patterns || []
|
|
},
|
|
continue: true, // Always continue to framework-audit-hook.js
|
|
suppressOutput: decision === 'allow' && !options.systemMessage
|
|
};
|
|
|
|
if (options.systemMessage) {
|
|
response.systemMessage = options.systemMessage;
|
|
}
|
|
|
|
console.log(JSON.stringify(response));
|
|
}
|
|
|
|
function formatPatterns(patterns) {
|
|
return patterns.map((p, i) =>
|
|
`${i+1}. ${p.description} (${p.occurrences}x, last: ${formatDate(p.last_occurrence)})`
|
|
).join('\n');
|
|
}
|
|
|
|
function formatRiskReason(risk) {
|
|
if (risk.patterns.length === 0) {
|
|
return 'No historical patterns match this operation';
|
|
}
|
|
|
|
return `Historical analysis: ${risk.patterns.length} similar pattern(s) detected. ` +
|
|
`Recommendation: ${risk.recommendation}`;
|
|
}
|
|
|
|
// Utility functions
|
|
async function readStdin() {
|
|
const chunks = [];
|
|
for await (const chunk of process.stdin) {
|
|
chunks.push(chunk);
|
|
}
|
|
return Buffer.concat(chunks).toString('utf-8');
|
|
}
|
|
|
|
function formatDate(date) {
|
|
return new Date(date).toISOString().split('T')[0];
|
|
}
|
|
|
|
main();
|
|
```
|
|
|
|
### 3. session-observer-hook.js
|
|
|
|
**Location**: `/home/theflow/projects/tractatus/.claude/hooks/session-observer-hook.js`
|
|
|
|
**Purpose**: PostToolUse hook that records decision outcomes
|
|
|
|
**Implementation**:
|
|
|
|
```javascript
|
|
#!/usr/bin/env node
|
|
|
|
/**
|
|
* Session Observer Hook (PostToolUse)
|
|
* Records governance decisions and outcomes for learning
|
|
*
|
|
* Copyright 2025 John Stroh
|
|
* Licensed under the Apache License, Version 2.0
|
|
*/
|
|
|
|
const SessionObserver = require('../../src/services/SessionObserver.service');
|
|
|
|
async function main() {
|
|
try {
|
|
// Parse hook input (tool result from stdin)
|
|
const input = JSON.parse(await readStdin());
|
|
const { toolName, parameters, result, error } = input;
|
|
|
|
// Initialize observer
|
|
const observer = new SessionObserver({
|
|
observationsDir: '.claude/observations',
|
|
sessionId: process.env.CLAUDE_SESSION_ID || 'unknown'
|
|
});
|
|
|
|
// Record observation
|
|
await observer.recordObservation(
|
|
{
|
|
tool: toolName,
|
|
parameters,
|
|
decision: 'allow', // If we got here, it was allowed
|
|
services: ['framework-audit'], // TODO: Get from framework-audit-hook log
|
|
context: {
|
|
pressure: 'NORMAL', // TODO: Get from ContextPressureMonitor
|
|
instructions: [] // TODO: Get active instructions
|
|
}
|
|
},
|
|
{
|
|
success: !error,
|
|
error: error || null,
|
|
result
|
|
}
|
|
);
|
|
|
|
console.log('[SESSION OBSERVER] Observation recorded');
|
|
|
|
// PostToolUse hooks don't affect execution
|
|
return outputResponse();
|
|
|
|
} catch (error) {
|
|
console.error('[SESSION OBSERVER] Error:', error);
|
|
// Fail silently: Don't disrupt session
|
|
return outputResponse();
|
|
}
|
|
}
|
|
|
|
function outputResponse() {
|
|
console.log(JSON.stringify({
|
|
hookSpecificOutput: {
|
|
hookEventName: 'PostToolUse',
|
|
observationRecorded: true
|
|
},
|
|
continue: true,
|
|
suppressOutput: true
|
|
}));
|
|
}
|
|
|
|
async function readStdin() {
|
|
const chunks = [];
|
|
for await (const chunk of process.stdin) {
|
|
chunks.push(chunk);
|
|
}
|
|
return Buffer.concat(chunks).toString('utf-8');
|
|
}
|
|
|
|
main();
|
|
```
|
|
|
|
---
|
|
|
|
## Agent Lightning Integration
|
|
|
|
**Requirement**: Agent Lightning running on port 5001 (Natural Language Search service)
|
|
|
|
**Setup**:
|
|
|
|
```bash
|
|
# Verify Agent Lightning is running
|
|
curl http://localhost:5001/health
|
|
|
|
# Index existing observations (one-time)
|
|
node scripts/index-observations.js
|
|
```
|
|
|
|
**Semantic Search Example**:
|
|
|
|
```javascript
|
|
// Query: "editing server.js under high pressure"
|
|
// Returns: Past decisions where:
|
|
// - file_path contains "server.js"
|
|
// - pressure_level was "HIGH" or "CRITICAL"
|
|
// - outcome was "FAILURE" or required rollback
|
|
|
|
const results = await fetch('http://localhost:5001/search', {
|
|
method: 'POST',
|
|
body: JSON.stringify({
|
|
query: "editing server.js configuration under context pressure",
|
|
limit: 5,
|
|
filters: { tool: "Edit" }
|
|
})
|
|
});
|
|
|
|
// Results ranked by semantic similarity + recency
|
|
```
|
|
|
|
**Benefit**: Catches patterns that exact string matching would miss (e.g., "server config" vs "server.js" vs "backend configuration").
|
|
|
|
---
|
|
|
|
## Implementation Timeline
|
|
|
|
### Week 1: Core Services (Days 1-3)
|
|
|
|
**Day 1: SessionObserver.service.js**
|
|
- [ ] Create service file with full API
|
|
- [ ] Implement observations directory structure
|
|
- [ ] Add filesystem persistence (JSON format)
|
|
- [ ] Write unit tests (15 test cases)
|
|
|
|
**Day 2: proactive-advisor-hook.js**
|
|
- [ ] Implement PreToolUse hook
|
|
- [ ] Add risk calculation logic
|
|
- [ ] Integrate SessionObserver.analyzeRisk()
|
|
- [ ] Test with dummy tool calls
|
|
|
|
**Day 3: session-observer-hook.js**
|
|
- [ ] Implement PostToolUse hook
|
|
- [ ] Add observation recording
|
|
- [ ] Test end-to-end flow
|
|
|
|
### Week 2: Integration & Testing (Days 4-7)
|
|
|
|
**Day 4: Agent Lightning Integration**
|
|
- [ ] Index observations via AL semantic search
|
|
- [ ] Test query relevance
|
|
- [ ] Tune ranking parameters
|
|
|
|
**Day 5: Tractatus Integration**
|
|
- [ ] Update `.claude/settings.json` with new hooks
|
|
- [ ] Test in Tractatus project sessions
|
|
- [ ] Verify hooks don't conflict
|
|
|
|
**Day 6: Community Project Deployment**
|
|
- [ ] Fix Community hooks configuration
|
|
- [ ] Symlink to Tractatus hooks (single source of truth)
|
|
- [ ] Test in Community development session
|
|
|
|
**Day 7: Family Project Deployment**
|
|
- [ ] Deploy to Family History project
|
|
- [ ] Verify multi-project learning
|
|
- [ ] Performance testing (hook overhead < 100ms)
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
### Tractatus `.claude/settings.json` Updates
|
|
|
|
```json
|
|
{
|
|
"hooks": {
|
|
"PreToolUse": [
|
|
{
|
|
"matcher": "Edit|Write|Bash",
|
|
"hooks": [
|
|
{
|
|
"type": "command",
|
|
"command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/proactive-advisor-hook.js",
|
|
"timeout": 5,
|
|
"description": "Analyzes historical patterns before tool execution"
|
|
},
|
|
{
|
|
"type": "command",
|
|
"command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/framework-audit-hook.js",
|
|
"timeout": 10,
|
|
"description": "Main governance validation (6 services)"
|
|
}
|
|
]
|
|
}
|
|
],
|
|
"PostToolUse": [
|
|
{
|
|
"hooks": [
|
|
{
|
|
"type": "command",
|
|
"command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/session-observer-hook.js",
|
|
"timeout": 3,
|
|
"description": "Records decision outcomes for learning"
|
|
},
|
|
{
|
|
"type": "command",
|
|
"command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/check-token-checkpoint.js",
|
|
"timeout": 2
|
|
}
|
|
]
|
|
}
|
|
],
|
|
"UserPromptSubmit": [
|
|
{
|
|
"hooks": [
|
|
{
|
|
"type": "command",
|
|
"command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/trigger-word-checker.js",
|
|
"timeout": 2
|
|
},
|
|
{
|
|
"type": "command",
|
|
"command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/all-command-detector.js",
|
|
"timeout": 2
|
|
},
|
|
{
|
|
"type": "command",
|
|
"command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/behavioral-compliance-reminder.js",
|
|
"timeout": 2
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
### Community/Family Projects: Symlink Strategy
|
|
|
|
```bash
|
|
# Community project hooks directory
|
|
cd /home/theflow/projects/community/.claude
|
|
|
|
# Remove existing hooks (if any)
|
|
rm -rf hooks/
|
|
|
|
# Symlink to Tractatus canonical hooks
|
|
ln -s /home/theflow/projects/tractatus/.claude/hooks hooks
|
|
|
|
# Copy settings from Tractatus (with project-specific paths)
|
|
cp /home/theflow/projects/tractatus/.claude/settings.json settings.local.json
|
|
|
|
# Edit settings.local.json: Update project name, ports
|
|
```
|
|
|
|
**Benefit**: Single source of truth. Changes to Tractatus hooks automatically apply to all projects.
|
|
|
|
---
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Tests
|
|
|
|
```javascript
|
|
describe('SessionObserver', () => {
|
|
it('records observations to filesystem', async () => {
|
|
const observer = new SessionObserver({ observationsDir: '/tmp/test' });
|
|
await observer.recordObservation(mockDecision, mockOutcome);
|
|
|
|
const files = await fs.readdir('/tmp/test');
|
|
expect(files.length).toBe(1);
|
|
});
|
|
|
|
it('calculates risk based on past failures', async () => {
|
|
// Seed with 3 failed Edit operations on server.js
|
|
await seedObservations([
|
|
{ tool: 'Edit', file: 'server.js', outcome: 'FAILURE' },
|
|
{ tool: 'Edit', file: 'server.js', outcome: 'FAILURE' },
|
|
{ tool: 'Edit', file: 'server.js', outcome: 'FAILURE' }
|
|
]);
|
|
|
|
const risk = await observer.analyzeRisk('Edit', { file_path: 'server.js' });
|
|
expect(risk.riskLevel).toBe('HIGH');
|
|
expect(risk.patterns.length).toBeGreaterThan(0);
|
|
});
|
|
});
|
|
```
|
|
|
|
### Integration Tests
|
|
|
|
```javascript
|
|
describe('Governance Service Integration', () => {
|
|
it('prevents repeated mistake via historical warning', async () => {
|
|
// Session 1: Make a mistake
|
|
await simulateToolCall({
|
|
tool: 'Edit',
|
|
params: { file: 'config.js', change: 'break_something' },
|
|
outcome: 'FAILURE'
|
|
});
|
|
|
|
// Session 2: Try same mistake
|
|
const result = await simulateToolCall({
|
|
tool: 'Edit',
|
|
params: { file: 'config.js', change: 'break_something' }
|
|
});
|
|
|
|
// Expect: Hook warns about past failure
|
|
expect(result.hookOutput.riskLevel).toBe('HIGH');
|
|
expect(result.hookOutput.patterns).toContainEqual(
|
|
expect.objectContaining({ description: expect.stringContaining('previous') })
|
|
);
|
|
});
|
|
});
|
|
```
|
|
|
|
### Performance Tests
|
|
|
|
```javascript
|
|
describe('Performance', () => {
|
|
it('hook overhead < 100ms', async () => {
|
|
const start = Date.now();
|
|
await runHook('proactive-advisor-hook.js', mockInput);
|
|
const duration = Date.now() - start;
|
|
|
|
expect(duration).toBeLessThan(100);
|
|
});
|
|
|
|
it('handles 1000+ observations without degradation', async () => {
|
|
await seedObservations(generateMockObservations(1000));
|
|
|
|
const start = Date.now();
|
|
await observer.analyzeRisk('Edit', mockParams);
|
|
const duration = Date.now() - start;
|
|
|
|
expect(duration).toBeLessThan(200);
|
|
});
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
## Limitations & Disclaimers
|
|
|
|
### What This System CAN Do
|
|
|
|
✅ **Hook-Triggered Governance**
|
|
- Validates tool calls before execution (Edit/Write/Bash)
|
|
- Blocks operations that violate governance rules
|
|
- Logs all decisions for audit trail
|
|
|
|
✅ **Historical Pattern Learning**
|
|
- Stores observations in filesystem (survives sessions)
|
|
- Semantic search via Agent Lightning (finds similar patterns)
|
|
- Warns about risky operations based on past failures
|
|
|
|
✅ **Proactive Warnings**
|
|
- "3 previous edits to this file caused rollback"
|
|
- "High context pressure detected in similar situations"
|
|
- "This operation previously required human approval"
|
|
|
|
✅ **Cross-Session Persistence**
|
|
- Observations survive auto-compacts (filesystem storage)
|
|
- Session handoffs include observation summaries
|
|
- Historical context available to new sessions
|
|
|
|
### What This System CANNOT Do
|
|
|
|
❌ **Continuous Awareness Between Tool Calls**
|
|
- Hooks only run when Edit/Write/Bash is called
|
|
- No observation during my reasoning process
|
|
- Can't detect "I'm about to make a bad decision" before I try to use a tool
|
|
|
|
❌ **Catching Reasoning Errors in Conversation**
|
|
- Hooks don't see my text responses to you
|
|
- Can't detect wrong advice, incorrect explanations
|
|
- Only validates tool execution, not conversational accuracy
|
|
|
|
❌ **True Autonomous Agent Monitoring**
|
|
- Not a separate process watching Claude Code externally
|
|
- Can't observe me from outside my own execution context
|
|
- Requires Claude Code to trigger hooks (not independent)
|
|
|
|
### Why External Agent Required for Full Monitoring
|
|
|
|
To catch mistakes BEFORE they become tool calls, you need:
|
|
- **External process** watching Claude Code session logs
|
|
- **Real-time analysis** of conversational responses (not just tool calls)
|
|
- **Continuous monitoring** between my responses (not just at tool execution)
|
|
|
|
**This requires a partner** to build external agent (Agent Lightning or similar framework).
|
|
|
|
**Tractatus provides the interface** for external agents to integrate (observations API, semantic search, governance rules).
|
|
|
|
---
|
|
|
|
## Success Metrics
|
|
|
|
### Quantitative Metrics
|
|
|
|
1. **Mistake Prevention Rate**
|
|
- Baseline: Mistakes made in unmonitored sessions
|
|
- Target: 70% reduction in preventable mistakes with governance active [NEEDS VERIFICATION: Baseline measurement required]
|
|
|
|
2. **Hook Performance**
|
|
- Overhead per hook call: < 100ms (target: 50ms average)
|
|
- Agent Lightning query time: < 200ms
|
|
|
|
3. **Learning Effectiveness**
|
|
- Pattern detection accuracy: > 80% true positives
|
|
- False positive rate: < 10%
|
|
|
|
4. **Adoption Metrics**
|
|
- Projects with governance enabled: 3 (Tractatus, Community, Family)
|
|
- Observations recorded per week: 100+ (indicates active learning)
|
|
|
|
### Qualitative Metrics
|
|
|
|
1. **Developer Experience**
|
|
- Warnings are actionable and non-disruptive
|
|
- Historical context helps decision-making
|
|
- No "warning fatigue" (< 5 false positives per session)
|
|
|
|
2. **Audit Transparency**
|
|
- All governance decisions logged and explainable
|
|
- Observations include reasoning and context
|
|
- Easy to understand why a warning was issued
|
|
|
|
---
|
|
|
|
## Next Steps After Track 1 Completion
|
|
|
|
### Track 2: External Monitoring Agent (Partner Required)
|
|
|
|
**Scope**: Build autonomous agent that monitors Claude Code externally
|
|
|
|
**Capabilities**:
|
|
- Continuous session observation (not just tool calls)
|
|
- Analyzes conversational responses for accuracy
|
|
- Detects reasoning errors before tool execution
|
|
- Real-time feedback injection
|
|
|
|
**Requirements**:
|
|
- Agent Lightning or similar framework
|
|
- Claude Code session log integration
|
|
- Protocol for injecting feedback into sessions
|
|
|
|
**Partnership Opportunity**: Anthropic, Agent Lightning team, or independent developer
|
|
|
|
### Track 3: Multi-Project Governance Analytics
|
|
|
|
**Scope**: Aggregate governance data across all MySovereignty projects
|
|
|
|
**Capabilities**:
|
|
- Cross-project pattern analysis
|
|
- Organizational learning (not just project-specific)
|
|
- Governance effectiveness metrics dashboard
|
|
- Automated rule consolidation
|
|
|
|
**Timeline**: After Track 1 deployed to 3+ projects
|
|
|
|
---
|
|
|
|
## Appendix A: File Structure
|
|
|
|
```
|
|
tractatus/
|
|
├── src/
|
|
│ └── services/
|
|
│ ├── SessionObserver.service.js (NEW)
|
|
│ ├── BoundaryEnforcer.service.js
|
|
│ ├── CrossReferenceValidator.service.js
|
|
│ ├── MetacognitiveVerifier.service.js
|
|
│ ├── ContextPressureMonitor.service.js
|
|
│ ├── InstructionPersistenceClassifier.service.js
|
|
│ └── PluralisticDeliberationOrchestrator.service.js
|
|
│
|
|
├── .claude/
|
|
│ ├── hooks/
|
|
│ │ ├── proactive-advisor-hook.js (NEW)
|
|
│ │ ├── session-observer-hook.js (NEW)
|
|
│ │ ├── framework-audit-hook.js (EXISTING)
|
|
│ │ ├── trigger-word-checker.js
|
|
│ │ ├── all-command-detector.js
|
|
│ │ ├── behavioral-compliance-reminder.js
|
|
│ │ ├── check-token-checkpoint.js
|
|
│ │ ├── prompt-analyzer-hook.js
|
|
│ │ └── track-approval-patterns.js
|
|
│ │
|
|
│ ├── observations/ (NEW)
|
|
│ │ ├── obs_20251106_001.json
|
|
│ │ ├── obs_20251106_002.json
|
|
│ │ └── ...
|
|
│ │
|
|
│ ├── settings.json (UPDATED)
|
|
│ └── instruction-history.json
|
|
│
|
|
└── tests/
|
|
├── unit/
|
|
│ └── SessionObserver.service.test.js (NEW)
|
|
└── integration/
|
|
└── governance-service.test.js (NEW)
|
|
```
|
|
|
|
---
|
|
|
|
## Appendix B: Agent Lightning API Reference
|
|
|
|
**Endpoint**: `POST http://localhost:5001/search`
|
|
|
|
**Request**:
|
|
```json
|
|
{
|
|
"query": "editing server configuration under context pressure",
|
|
"limit": 10,
|
|
"filters": {
|
|
"tool": "Edit",
|
|
"outcome": "FAILURE"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Response**:
|
|
```json
|
|
{
|
|
"results": [
|
|
{
|
|
"id": "obs_20251105_042",
|
|
"relevance": 0.87,
|
|
"observation": {
|
|
"tool": "Edit",
|
|
"file": "/home/theflow/projects/community/src/server.js",
|
|
"outcome": "FAILURE",
|
|
"context": {
|
|
"pressure_level": "HIGH",
|
|
"instructions_active": 42
|
|
},
|
|
"lessons_learned": "Editing server.js under HIGH pressure caused deployment failure. Required rollback."
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Appendix C: Copyright & License
|
|
|
|
**Copyright 2025 John Stroh**
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
you may not use this file except in compliance with the License.
|
|
You may obtain a copy of the License at:
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
|
|
---
|
|
|
|
**Questions or Feedback?**
|
|
Contact: john.stroh.nz@pm.me
|
|
GitHub: https://github.com/AgenticGovernance/tractatus-framework
|