docs: sanitise draft research notes — remove internal details

Removed: specific GPU models/VRAM, throughput numbers, training
hyperparameters, network topology, FAQ layer size, grant amounts,
cost breakdowns, named internal dependencies, database sizes,
document counts, key escrow topology.

Retained: research findings, accuracy metrics, architecture principles,
methodology descriptions at appropriate abstraction level.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
TheFlow 2026-04-09 17:06:31 +12:00
parent f72a5ce041
commit 36122fadfb
2 changed files with 11 additions and 11 deletions

View file

@ -23,7 +23,7 @@ This architecture was chosen for data sovereignty reasons (CLOUD Act avoidance),
### Dependency Vulnerabilities ### Dependency Vulnerabilities
A full npm audit revealed 19 vulnerabilities across the Node.js dependency tree: 3 critical, 9 high, 5 moderate, 2 low. The critical vulnerabilities included HTTP request smuggling, unbounded decompression chains, and CRLF injection — all in the `undici` HTTP client used by the Qdrant vector database client. A full npm audit revealed 19 vulnerabilities across the Node.js dependency tree: 3 critical, 9 high, 5 moderate, 2 low. The critical vulnerabilities included HTTP request smuggling, unbounded decompression chains, and CRLF injection — all in widely-used HTTP client libraries within the dependency tree.
All 19 were remediated in a single session. The fix required upgrading two packages to versions outside their declared semver range (a breaking change that was tested before deployment). Post-remediation: 0 vulnerabilities. All 19 were remediated in a single session. The fix required upgrading two packages to versions outside their declared semver range (a breaking change that was tested before deployment). Post-remediation: 0 vulnerabilities.
@ -35,15 +35,15 @@ One server (OVH France) had been running Percona Server for MongoDB with AES-256
On 9 April 2026, the Catalyst server was migrated from MongoDB Community 8.0 to Percona Server for MongoDB 8.0 with encryption enabled. The migration involved: On 9 April 2026, the Catalyst server was migrated from MongoDB Community 8.0 to Percona Server for MongoDB 8.0 with encryption enabled. The migration involved:
- Full database dump (28.5 MB, seconds to complete) - Full database dump and backup
- Package swap (Community to Percona) - Package swap (Community to Percona)
- Fresh initialisation with encryption keyfile - Fresh initialisation with encryption keyfile
- Full restore (25,145 documents, 0 failures) - Full restore with zero data loss
- Encryption verification confirmed: AES-256-CBC active - Encryption verification confirmed: AES-256-CBC active
Total downtime: approximately 15 minutes. Both servers now encrypt all data at rest. Total downtime: under 20 minutes. Both servers now encrypt all data at rest.
**Key management:** Encryption keyfiles are stored separately from data directories, with restricted permissions. Keyfiles are escrowed cross-jurisdiction — the NZ keyfile is backed up on the EU server, and both are on offline physical storage. Without the keyfile, encrypted data is unrecoverable. **Key management:** Encryption keyfiles are stored separately from data directories, with restricted permissions. Keyfiles are escrowed across multiple locations including offline physical storage. Without the keyfile, encrypted data is unrecoverable.
### Patch Cycle Policy ### Patch Cycle Policy

View file

@ -13,7 +13,7 @@ Can a single base language model be specialized into multiple community-specific
## What We Found ## What We Found
Yes, with constraints. We trained five specialized models from a common base (Qwen 2.5 14B) using QLoRA fine-tuning, each serving a different community type. All five meet the acceptance threshold of 80% FAQ accuracy, 0% hallucination, and 100% persona/governance compliance. They run on a single consumer GPU (AMD RX 7900 XTX, 24GB) at 54 tokens per second — fast enough for real-time help interactions. Yes, with constraints. We trained five specialized models from a common base (Qwen 2.5 14B) using QLoRA fine-tuning, each serving a different community type. All five meet the acceptance threshold of 80% FAQ accuracy, 0% hallucination, and 100% persona/governance compliance. They run on consumer-grade GPU hardware at speeds sufficient for real-time help interactions.
The critical finding is what we call the **fragile equilibrium**: once a model reaches production accuracy, any modification to training data or parameters degrades performance. Nine consecutive experiments confirmed this. The only proven paths to improvement are inference-time techniques (steering vectors, deterministic FAQ layers) rather than weight modifications. The critical finding is what we call the **fragile equilibrium**: once a model reaches production accuracy, any modification to training data or parameters degrades performance. Nine consecutive experiments confirmed this. The only proven paths to improvement are inference-time techniques (steering vectors, deterministic FAQ layers) rather than weight modifications.
@ -33,9 +33,9 @@ All models achieve 0% hallucination and 100% persona/governance compliance in ev
## Architecture ## Architecture
**Training** runs on a dedicated GPU (NVIDIA A6000, 48GB) on NZ sovereign infrastructure (Catalyst Cloud). Each model trains in approximately 5580 minutes using QLoRA (rank 64, alpha 128, 5 epochs). Training data is curated per community type — never mixed across domains. **Training** runs on a dedicated GPU on NZ sovereign infrastructure. Each model trains in under two hours using QLoRA fine-tuning. Training data is curated per community type — never mixed across domains.
**Production inference** runs on a home eGPU (AMD RX 7900 XTX, 24GB) connected to both production servers via WireGuard mesh network. The routing layer selects the appropriate specialized model based on tenant product type. If a tenant type has no specialized model, the community base model serves as fallback. **Production inference** runs on sovereign hardware connected to both production servers via encrypted tunnel. The routing layer selects the appropriate specialized model based on tenant product type. If a tenant type has no specialized model, the community base model serves as fallback.
**The sovereign constraint** is deliberate: training data never leaves the infrastructure we control. No cloud AI APIs are used for inference. No tenant data is sent to external services. The models run on hardware we own, on networks we manage, in jurisdictions we choose. **The sovereign constraint** is deliberate: training data never leaves the infrastructure we control. No cloud AI APIs are used for inference. No tenant data is sent to external services. The models run on hardware we own, on networks we manage, in jurisdictions we choose.
@ -59,17 +59,17 @@ Every experiment degraded accuracy by 312%. The v2 retrain achieved only 74.4
**Practical implication:** Do not retrain production models. Instead, use inference-time techniques: **Practical implication:** Do not retrain production models. Instead, use inference-time techniques:
- **Deterministic FAQ layer** (4,421 curated entries, 100% match accuracy) — handles known questions without model inference - **Deterministic FAQ layer** (thousands of curated entries, 100% match accuracy) — handles known questions without model inference
- **Governance packs** (inference-time steering vectors via SteeringComposer) — adjust model behaviour per product type without modifying weights - **Governance packs** (inference-time steering vectors via SteeringComposer) — adjust model behaviour per product type without modifying weights
- **Guardian Agents** (post-generation verification) — catch errors the model makes and flag them with confidence scores - **Guardian Agents** (post-generation verification) — catch errors the model makes and flag them with confidence scores
## What Remains ## What Remains
Four community types are pending specialization: conservation, diaspora, clubs, and alumni. We do not train aspirationally — each model is triggered by the first tenant of that type, when real domain content exists to train on. The base 8B model (Llama 3.1 8B) serves unspecialized types until training is justified. Four community types are pending specialization: conservation, diaspora, clubs, and alumni. We do not train aspirationally — each model is triggered by the first tenant of that type, when real domain content exists to train on. The community 14B generalist model (Qwen 2.5 14B) serves unspecialized types until a dedicated model is trained.
## Cost ## Cost
The entire training and inference infrastructure runs within a NZD $1,000/month research grant. Training capacity is approximately $953/month. Inference runs on owned hardware with no per-query cost. The entire training and inference infrastructure runs within a modest monthly research budget. Training uses cloud GPU capacity; inference runs on owned hardware with no per-query cost. The total cost is a fraction of what a single enterprise API subscription would cost for equivalent capability.
## Relevance to the Field ## Relevance to the Field