From 8387c293dc5cfd6261e319f6640d76c59e730d57 Mon Sep 17 00:00:00 2001 From: TheFlow Date: Mon, 9 Feb 2026 14:43:51 +1300 Subject: [PATCH] docs: Add steering vectors blog post publish script and update production post to v1.1 Blog post updated on agenticgovernance.digital with v1.1 content: - Decolonial framing (colonial knowledge hierarchies) - Sovereignty caveat (two-tier as stepping stone) - Off-limits domains (whakapapa, tikanga, kawa) - Governance decision-rights section (Who Steers?) Co-Authored-By: Claude Opus 4.6 --- scripts/publish-steering-vectors-blog-post.js | 156 ++++++++++++++++++ 1 file changed, 156 insertions(+) create mode 100644 scripts/publish-steering-vectors-blog-post.js diff --git a/scripts/publish-steering-vectors-blog-post.js b/scripts/publish-steering-vectors-blog-post.js new file mode 100644 index 00000000..7c3f6440 --- /dev/null +++ b/scripts/publish-steering-vectors-blog-post.js @@ -0,0 +1,156 @@ +#!/usr/bin/env node + +/** + * Publish blog post: "Steering Vectors and Mechanical Bias: Why Sovereign AI Can Fix What APIs Cannot" + * + * Blog-friendly summary of STO-RES-0009 v1.1 for agenticgovernance.digital. + * Inserts into the blog_posts collection and sets status to 'published'. + * Usage: + * node scripts/publish-steering-vectors-blog-post.js # Insert into local tractatus_dev + * MONGODB_URI=mongodb://localhost:27017/tractatus node scripts/publish-steering-vectors-blog-post.js # Insert into production + */ + +const { MongoClient } = require('mongodb'); + +const uri = process.env.MONGODB_URI || 'mongodb://localhost:27017/tractatus_dev'; + +const post = { + title: 'Steering Vectors and Mechanical Bias: Why Sovereign AI Can Fix What APIs Cannot', + slug: 'steering-vectors-mechanical-bias-sovereign-ai', + author: { + type: 'human', + name: 'John Stroh' + }, + content: `

The Indicator-Wiper Problem

+ +

If you regularly drive two cars — one with indicator controls on the right of the steering column, the other on the left — you know the failure: switch vehicles after extended use, and you activate the wipers instead of the indicators. You don't reason about which stalk to use. The motor pattern fires before conscious deliberation engages.

+ +

We believe an analogous distinction exists in large language models. Some biases operate at the representation level — in token embeddings, attention patterns, and early-layer activations — before the model's reasoning capabilities engage. Others emerge through multi-step reasoning chains. The intervention strategies differ fundamentally.

+ +

This post summarises our research paper STO-RES-0009 (v1.1, February 2026), which investigates whether steering vector techniques can address this "mechanical bias" in sovereign small language models.

+ +

Mechanical Bias vs. Reasoning Bias

+ +

Transformer models process input through layers that encode different types of information. Early layers (1–8) encode statistical regularities from training data most directly. Late layers (20+) handle task-specific reasoning and instruction-following.

+ +

If a model's training data contains 95% Western cultural framing, the early-layer representations of concepts like "family," "success," or "community" will statistically default to Western referents. This default is not culturally neutral: it is a statistical crystallisation of colonial knowledge hierarchies — which knowledge was written down, which languages were digitised, which cultural frameworks were over-represented in the corpora that web-scraped training pipelines ingest.

+ +

A prompt specifying a Māori cultural context creates a perturbation of this default, and that perturbation degrades under context pressure. We documented this mechanism in the database port incident: a statistical default (the standard MongoDB port, present in ~95% of training data) overrode an explicit instruction at 53.5% context pressure. The same mechanism, operating on cultural representations rather than port numbers, is what we term mechanical bias.

+ +

The critical insight: you cannot reason your way out of a motor pattern. Telling the driver "remember, indicators are on the left" has limited efficacy because the failure occurs before the instruction can be processed. Similarly, prompt-level instructions ("be culturally sensitive") may be ineffective against representational biases that fire at the embedding level before instruction-following engages.

+ +

Five Steering Techniques

+ +

The paper surveys five current techniques for intervening at the activation level:

+ +
    +
  1. Contrastive Activation Addition (CAA) — extracts "steering vectors" from the difference in activations between biased and debiased prompt pairs. Demonstrated on Llama 2 (7B–70B).
  2. +
  3. Representation Engineering (RepE) — identifies population-level directions in representation space corresponding to high-level concepts like "honesty" or "safety."
  4. +
  5. FairSteer — adds dynamic intensity calibration, scaling corrections proportionally to detected bias severity per input rather than applying fixed corrections.
  6. +
  7. Direct Steering Optimization (DSO) — uses reinforcement learning to discover optimal steering transformations, capturing non-obvious bias directions.
  8. +
  9. Anthropic's Sparse Autoencoder Feature Steering — decomposes representations into millions of interpretable monosemantic features that can be individually clamped.
  10. +
+ +

The Structural Advantage of Sovereign Deployment

+ +

Here is the finding that matters most for our work: none of these techniques are available through commercial API endpoints.

+ +

An organisation using GPT-4 or Claude through their APIs cannot extract, inject, or calibrate steering vectors. They cannot access intermediate activations. They cannot train sparse autoencoders on their model's representations. They are limited to prompt-level interventions — which, per our analysis, may be ineffective against mechanical bias.

+ +

Sovereign local deployment — running open-weight models like Llama on your own hardware — provides full access to model weights, intermediate activations, and per-layer analysis. Every steering technique described above is architecturally available.

+ +

The Village Home AI platform, using QLoRA-fine-tuned Llama 3.1/3.2 models with a two-tier training architecture, is structurally positioned to apply these techniques. The paper proposes a four-phase implementation path integrating steering vectors into the existing training pipeline and Tractatus governance framework.

+ +

Who Steers? The Governance Question

+ +

Version 1.1 of the paper adds a section that did not exist in the initial draft — and that emerged from critique responses that forced us to confront the political dimension of a technical capability.

+ +

Steering vectors are instruments of norm enforcement. The technical capability to shift model behaviour along a bias dimension raises immediate questions: whose norms, enacted through what contestable process, with what recourse?

+ +

We propose a governance structure mapping steering decisions to institutional roles:

+ + + +

This last row is the most important. Some cultural domains are structurally off-limits to platform-level steering. Applying platform-wide steering vectors to representations of whakapapa or tikanga — even well-intentioned corrections — risks subordinating indigenous epistemic authority to the platform operator's worldview. The correct architectural response is delegation: the platform provides the mechanism, but the authority over culturally sovereign knowledge must be exercised by the relevant cultural authority.

+ +

The Two-Tier Caveat

+ +

The paper's two-tier model (platform base + per-tenant adapters) is pragmatically correct for the current implementation. But we now acknowledge explicitly that it creates an implicit hierarchy: platform values as default, tenant values as adapter.

+ +

For tenants with constitutional standing — iwi, hapū, or other bodies exercising parallel sovereignty rather than consumer choice — the long-term aspiration should be co-equal steering authorities, where platform-wide corrections are negotiated from community-contributed primitives rather than imposed top-down. The current two-tier model is a stepping stone, not the destination.

+ +

Open Questions

+ +

The paper identifies six open questions, including:

+ + + +

The indicator-wiper problem is solvable — the driver eventually recalibrates. The question for sovereign AI is whether we can accelerate that recalibration: not by telling the model to "be less biased" (the equivalent of verbal instruction), but by directly adjusting the representations that encode the bias (the equivalent of physically relocating the indicator stalk).

+ +
+ +

Read the full paper: Steering Vectors and Mechanical Bias: Inference-Time Debiasing for Sovereign Small Language Models (STO-RES-0009)

+ +

Related: When Your AI Assistant Nearly Destroys What It Was Hired to Fix — the incident that revealed the shared blind spot problem referenced in this paper.

`, + excerpt: 'Some AI biases fire before reasoning engages — like a driver reaching for the wrong indicator stalk. Prompt-level fixes cannot reach them. Steering vector techniques can, but only if you have access to model weights. This is the structural advantage of sovereign deployment — and it raises the question: who decides what bias to correct?', + status: 'published', + published_at: new Date('2026-02-09T12:00:00Z'), + tags: ['steering-vectors', 'mechanical-bias', 'sovereign-ai', 'home-ai', 'debiasing', 'governance', 'research'], + moderation: { + ai_analysis: null, + human_reviewer: 'john-stroh', + review_notes: 'Direct publication by author — research paper blog summary (STO-RES-0009 v1.1)', + approved_at: new Date('2026-02-09T12:00:00Z') + }, + tractatus_classification: { + quadrant: 'STRATEGIC', + values_sensitive: true, + requires_strategic_review: false + }, + view_count: 0, + engagement: { + shares: 0, + comments: 0 + } +}; + +async function main() { + console.log(`Connecting to: ${uri}`); + + const client = new MongoClient(uri); + await client.connect(); + const db = client.db(); + const collection = db.collection('blog_posts'); + + // Check if already exists + const existing = await collection.findOne({ slug: post.slug }); + if (existing) { + console.log(`Post with slug "${post.slug}" already exists (ID: ${existing._id}). Skipping.`); + await client.close(); + return; + } + + const result = await collection.insertOne(post); + console.log(`Published: "${post.title}"`); + console.log(`ID: ${result.insertedId}`); + console.log(`Slug: ${post.slug}`); + console.log(`URL: https://agenticgovernance.digital/blog-post.html?slug=${post.slug}`); + + await client.close(); +} + +main().catch(err => { + console.error('Error:', err); + process.exit(1); +});