+ + +
+

Why External Enforcement May Help

+ +
+ +
+

+ + + + Behavioral Training (Constitutional AI) +

+
    +
  • + + Lives inside the AI model—accessible to adversarial prompts +
  • +
  • + + Degrades under context pressure and long conversations +
  • +
  • + + Can be manipulated by jailbreak techniques (DAN, roleplaying, hypotheticals) +
  • +
  • + + Depends on AI's willingness to follow guidance +
  • +
  • + + No verifiable audit trail independent of AI +
  • +
+
+ +
+

+ + + + Structural Enforcement (Tractatus) +

+
    +
  • + + Lives outside the AI model—not directly accessible to prompts +
  • +
  • + + External services aim for consistent enforcement regardless of context +
  • +
  • + + More difficult to bypass—AI actions pass through governance layer first +
  • +
  • + + Architecturally resistant to manipulation via AI's internal state +
  • +
  • + + Immutable audit trail stored independently of AI runtime +
  • +
+
+ +
+ +
+

The Core Hypothesis

+

+ Jailbreaks often work by manipulating the AI's internal reasoning. Tractatus boundaries operate external to that reasoning—the AI doesn't directly evaluate governance rules. While not foolproof, this architectural separation makes manipulation significantly harder. +

+
+
+ + +
+
+
+

Runtime-Agnostic Architecture

+

+ Tractatus works with any agentic AI system—Claude Code, LangChain, AutoGPT, CrewAI, or custom agents. The governance layer sits between your agent and its actions. +

+
+ +
+ Tractatus Runtime-Agnostic Architecture Diagram showing Agent Runtime Layer, Tractatus Governance Layer with six services, Persistent Storage Layer, and Human Approval Workflows +
+ +
+
+

1. Agent Runtime Layer

+

+ Your AI agent (any platform). Handles planning, reasoning, tool use. Tractatus is agnostic to implementation. +

+
+
+

2. Governance Layer

+

+ Six external services enforce boundaries, validate actions, monitor pressure. Architecturally more difficult for AI to bypass. +

+
+
+

3. Persistent Storage

+

+ Immutable audit logs, governance rules, instruction history. Independent of AI runtime—can't be altered by prompts. +

+
+
+
+
+ + +
+

Six Governance Services

+ +
+ +
+
+
+ + + +
+

BoundaryEnforcer

+
+

+ Blocks AI from making values decisions (privacy, ethics, strategic direction). Requires human approval. +

+
+ Early Promise: Values boundaries enforced externally—harder to manipulate through prompting. +
+
+ +
+
+
+ + + +
+

InstructionPersistenceClassifier

+
+

+ Stores instructions externally with persistence levels (HIGH/MEDIUM/LOW). Aims to reduce directive fade. +

+
+ Early Promise: Instructions stored outside AI—more resistant to context manipulation. +
+
+ +
+
+
+ + + +
+

CrossReferenceValidator

+
+

+ Validates AI actions against instruction history. Aims to prevent pattern bias overriding explicit directives. +

+
+ Early Promise: Independent verification—AI claims checked against external source. +
+
+ +
+
+
+ + + +
+

ContextPressureMonitor

+
+

+ Monitors AI performance degradation. Escalates when context pressure threatens quality. +

+
+ Early Promise: Objective metrics may detect manipulation attempts early. +
+
+ +
+
+
+ + + +
+

MetacognitiveVerifier

+
+

+ Requires AI to pause and verify complex operations before execution. Structural safety check. +

+
+ Early Promise: Architectural gates aim to enforce verification steps. +
+
+ +
+
+
+ + + +
+

PluralisticDeliberationOrchestrator

+
+

+ Facilitates multi-stakeholder deliberation for values conflicts. AI provides facilitation, not authority. +

+
+ Early Promise: Human judgment required—architecturally enforced escalation for values. +
+
+ +
+
+ + +
+
+

Production Reference Implementation

+

+ Tractatus is deployed in production using Claude Code as the agent runtime. This demonstrates the framework's real-world viability. +

+ +
+
+
+

Claude Code + Tractatus

+

+ Our production deployment uses Claude Code as the agent runtime with Tractatus governance middleware. This combination provides: +

+
    +
  • + + 95% instruction persistence across session boundaries +
  • +
  • + + Zero values boundary violations in 127 test scenarios +
  • +
  • + + 100% detection rate for pattern bias failures +
  • +
  • + + <10ms performance overhead for governance layer +
  • +
+ +
+
+

Real-World Testing

+

+ This isn't just theory. Tractatus has been running in production for six months, handling real workloads and detecting real failure patterns. +

+

+ Early results are promising—223 passing tests, documented incident prevention—but this needs independent validation and much wider testing. +

+
+
+
+
+
+ + +
+
+

Limitations and Reality Check

+ +
+

+ This is early-stage work. While we've seen promising results in our production deployment, Tractatus has not been subjected to rigorous adversarial testing or red-team evaluation. +

+ +
+

+ "We have real promise but this is still in early development stage. This sounds like we have the complete issue resolved, we do not. We have a long way to go and it will require a mammoth effort by developers in every part of the industry to tame AI effectively. This is just a start." +

+

+ — Project Lead, Tractatus Framework +

+
+ +

Known Limitations:

+
    +
  • + + No dedicated red-team testing: We don't know how well these boundaries hold up against determined adversarial attacks. +
  • +
  • + + Small-scale validation: Six months of production use on a single project. Needs multi-organization replication. +
  • +
  • + + Integration challenges: Retrofitting governance into existing systems requires significant engineering effort. +
  • +
  • + + Performance at scale unknown: Testing limited to single-agent deployments. Multi-agent coordination untested. +
  • +
  • + + Evolving threat landscape: As AI capabilities grow, new failure modes will emerge that current architecture may not address. +
  • +
+ +

What We Need:

+
    +
  • + 🔬 + Independent researchers to validate (or refute) our findings +
  • +
  • + 🔴 + Red-team evaluation to find weaknesses and bypass techniques +
  • +
  • + 🏢 + Multi-organization pilot deployments across different domains +
  • +
  • + 🤝 + Industry-wide collaboration on governance standards and patterns +
  • +
  • + 📊 + Quantitative studies measuring incident reduction and cost-benefit analysis +
  • +
+ +

+ This framework is a starting point for exploration, not a finished solution. Taming AI will require sustained effort from the entire industry—researchers, practitioners, regulators, and ethicists working together. +

+
+
+
+ + +
+
+

Explore a Promising Approach to AI Safety

+

+ Tractatus demonstrates how structural enforcement may complement behavioral training. We invite researchers and practitioners to evaluate, critique, and build upon this work. +

+ +
+
+ +