fix: update village-ai.html — replace stale 3B/8B architecture with current

Replaced Two-Model Architecture (3B/8B) with Specialized Model Architecture
(five production 14B models by community type). Updated Training Tiers:
Tier 2 now describes product-type specialization, not per-tenant adapters.
Fixed infrastructure section: WireGuard inference is live not planned,
model size corrected to 14B. Updated limitations and production timeline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
TheFlow 2026-04-09 17:29:41 +12:00
parent 36122fadfb
commit b7f2245ec4

View file

@ -120,38 +120,60 @@
</div>
</section>
<!-- Two-Model Architecture -->
<!-- Specialized Model Architecture -->
<section class="mb-10">
<h2 class="text-3xl font-bold text-gray-900 mb-4" data-i18n="two_model.heading">Two-Model Architecture</h2>
<p class="text-gray-700 mb-4" data-i18n-html="two_model.intro">
Village AI uses two models of different sizes, routed by task complexity. This is not a fallback mechanism &mdash; each model is optimised for its role.
<h2 class="text-3xl font-bold text-gray-900 mb-4">Specialized Model Architecture</h2>
<p class="text-gray-700 mb-4">
Village AI uses multiple specialized models, each fine-tuned for a specific community type. The routing layer selects the appropriate model based on the tenant&rsquo;s product type. All models operate under the same governance stack.
</p>
<div class="grid grid-cols-1 md:grid-cols-2 gap-6">
<div class="bg-white rounded-lg shadow-sm p-6 border-l-4 border-blue-500">
<h3 class="text-lg font-bold text-gray-900 mb-2" data-i18n-html="two_model.fast_title">3B Model &mdash; Fast Assistant</h3>
<span class="inline-block bg-green-100 text-green-800 text-xs font-semibold px-2 py-0.5 rounded mb-2" data-i18n="two_model.fast_badge">Operational</span>
<p class="text-gray-700 text-sm mb-3" data-i18n="two_model.fast_desc">
Handles help queries, tooltips, error explanations, short summaries, and translation. Target response time: under 5 seconds complete.
<div class="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-6">
<div class="bg-white rounded-lg shadow-sm p-6 border-l-4 border-teal-500">
<h3 class="text-lg font-bold text-gray-900 mb-2">Community &amp; Governance</h3>
<span class="inline-block bg-green-100 text-green-800 text-xs font-semibold px-2 py-0.5 rounded mb-2">Production</span>
<p class="text-gray-700 text-sm">
Generalist model serving neighbourhood communities, governance bodies, and committees. Also serves as fallback for community types without a dedicated model.
</p>
<p class="text-gray-500 text-xs" data-i18n="two_model.fast_routing">
Routing triggers: simple queries, known FAQ patterns, single-step tasks.
</div>
<div class="bg-white rounded-lg shadow-sm p-6 border-l-4 border-emerald-500">
<h3 class="text-lg font-bold text-gray-900 mb-2">Wh&#257;nau &amp; Indigenous</h3>
<span class="inline-block bg-green-100 text-green-800 text-xs font-semibold px-2 py-0.5 rounded mb-2">Production</span>
<p class="text-gray-700 text-sm">
Trained on te reo M&#257;ori content, whakapapa structures, and tikanga documentation. Highest indigenous domain accuracy across all variants.
</p>
</div>
<div class="bg-white rounded-lg shadow-sm p-6 border-l-4 border-purple-500">
<h3 class="text-lg font-bold text-gray-900 mb-2" data-i18n-html="two_model.deep_title">8B Model &mdash; Deep Reasoning</h3>
<span class="inline-block bg-amber-100 text-amber-800 text-xs font-semibold px-2 py-0.5 rounded mb-2" data-i18n="two_model.deep_badge">Planned</span>
<p class="text-gray-700 text-sm mb-3" data-i18n="two_model.deep_desc">
Handles life story generation, year-in-review narratives, complex summarisation, and sensitive correspondence. Target response time: under 90 seconds.
<h3 class="text-lg font-bold text-gray-900 mb-2">Episcopal &amp; Parish</h3>
<span class="inline-block bg-green-100 text-green-800 text-xs font-semibold px-2 py-0.5 rounded mb-2">Production</span>
<p class="text-gray-700 text-sm">
Trained on Anglican parish governance, Book of Common Prayer, vestry procedures, and liturgical calendar. Serves parish and diocesan communities.
</p>
<p class="text-gray-500 text-xs" data-i18n="two_model.deep_routing">
Routing triggers: keywords like "everything about", multi-source retrieval, grief/trauma markers.
</div>
<div class="bg-white rounded-lg shadow-sm p-6 border-l-4 border-blue-500">
<h3 class="text-lg font-bold text-gray-900 mb-2">Family &amp; Heritage</h3>
<span class="inline-block bg-green-100 text-green-800 text-xs font-semibold px-2 py-0.5 rounded mb-2">Production</span>
<p class="text-gray-700 text-sm">
Trained on family storytelling, genealogy, heritage preservation, and inter-generational content. Highest overall FAQ accuracy.
</p>
</div>
<div class="bg-white rounded-lg shadow-sm p-6 border-l-4 border-indigo-500">
<h3 class="text-lg font-bold text-gray-900 mb-2">Business &amp; Professional</h3>
<span class="inline-block bg-green-100 text-green-800 text-xs font-semibold px-2 py-0.5 rounded mb-2">Production</span>
<p class="text-gray-700 text-sm">
Trained on CRM, invoicing, time tracking, and professional services content. Serves business tenants and platform operations.
</p>
</div>
<div class="bg-white rounded-lg shadow-sm p-6 border-l-4 border-gray-300">
<h3 class="text-lg font-bold text-gray-900 mb-2">Additional Types</h3>
<span class="inline-block bg-amber-100 text-amber-800 text-xs font-semibold px-2 py-0.5 rounded mb-2">Trigger-based</span>
<p class="text-gray-700 text-sm">
Conservation, diaspora, clubs, and alumni models are trained when the first tenant of that type is established. Until then, the community generalist model serves.
</p>
</div>
</div>
<p class="text-gray-600 text-sm mt-4" data-i18n-html="two_model.footer">
Both models operate under the same governance stack. Routing governance is designed; ContextPressureMonitor override capability is planned.
<p class="text-gray-600 text-sm mt-4">
All models are fine-tuned from the same base using QLoRA. Training data is curated per community type and never mixed across domains. A deterministic FAQ layer handles known questions without model inference. Steering vectors adjust model behaviour at inference time without modifying weights.
</p>
</section>
@ -178,14 +200,14 @@
<div class="bg-white rounded-lg shadow-sm p-6 border-l-4 border-teal-500">
<div class="flex items-baseline justify-between mb-2">
<h3 class="text-lg font-bold text-gray-900" data-i18n="training_tiers.tier2_title">Tier 2: Tenant Adapters</h3>
<span class="text-xs bg-teal-100 text-teal-800 px-2 py-1 rounded" data-i18n="training_tiers.tier2_badge">Per community</span>
<h3 class="text-lg font-bold text-gray-900">Tier 2: Product-Type Specialization</h3>
<span class="text-xs bg-teal-100 text-teal-800 px-2 py-1 rounded">Per community type</span>
</div>
<p class="text-gray-700 text-sm mb-2" data-i18n-html="training_tiers.tier2_desc">
Each community trains a lightweight LoRA adapter on its own content &mdash; stories, documents, photos, and events that members have explicitly consented to include. This allows Village AI to answer questions like "What stories has Grandma shared?" without accessing any other community's data.
<p class="text-gray-700 text-sm mb-2">
Each community type (wh&#257;nau, episcopal, business, family, etc.) has a dedicated fine-tuned model trained on domain-specific content. The model learns the vocabulary, governance patterns, and cultural framing appropriate to that community type. Tenant data isolation is maintained &mdash; no tenant&rsquo;s content is used in another tenant&rsquo;s training data.
</p>
<p class="text-gray-500 text-xs" data-i18n-html="training_tiers.tier2_update">
Adapters are small (50&ndash;100MB). Consent is per-content-item. Content marked "only me" is never included regardless of consent. Training method: QLoRA fine-tuning with governance-validated data.
<p class="text-gray-500 text-xs">
Specialization is triggered when the first tenant of a new type is established. Training method: QLoRA fine-tuning with governance-validated, curated corpora.
</p>
</div>
@ -293,7 +315,7 @@
<div class="bg-amber-50 rounded-lg p-5 border border-amber-200 mt-4">
<p class="text-amber-900 text-sm" data-i18n-html="dual_layer.caveat">
<strong>Honest caveat:</strong> Layer A (inherent governance via training) has been empirically validated across multiple training runs with consistent governance compliance. Layer B (active governance via Village codebase) has been operating in production for 5 months. The dual-layer thesis is demonstrating results, though evaluation remains self-reported. Independent audit is planned.
<strong>Honest caveat:</strong> Layer A (inherent governance via training) has been empirically validated across multiple training runs with consistent governance compliance. Layer B (active governance via Village codebase) has been operating in production since October 2025. The dual-layer thesis is demonstrating results, though evaluation remains self-reported. Independent audit is planned.
</p>
</div>
@ -503,17 +525,17 @@
<div class="bg-white rounded-lg shadow-sm p-5 border border-gray-200">
<h3 class="text-lg font-bold text-gray-900 mb-2" data-i18n="infrastructure.remote_title">Remote Inference</h3>
<ul class="text-gray-700 text-sm space-y-2">
<li data-i18n="infrastructure.remote_item1">Model weights deployed to production server (OVH France)</li>
<li data-i18n="infrastructure.remote_item2">Inference via Ollama on production server</li>
<li data-i18n="infrastructure.remote_item3">Home GPU inference via WireGuard VPN (planned)</li>
<li data-i18n="infrastructure.remote_item4">CPU-based inference provides baseline availability</li>
<li>Specialized model weights served from sovereign GPU infrastructure</li>
<li>Inference via Ollama, routed by tenant product type</li>
<li>GPU inference via encrypted WireGuard tunnel to both production servers</li>
<li>Production servers in EU (France) and NZ (Catalyst Cloud)</li>
</ul>
</div>
</div>
<div class="bg-gray-50 rounded-lg p-5 border border-gray-200 mt-4">
<p class="text-gray-700 text-sm" data-i18n-html="infrastructure.why_consumer">
<strong>Why consumer hardware?</strong> The SLL thesis is that sovereign AI training should be accessible, not reserved for organisations with data centre budgets. A single consumer GPU can fine-tune a 7B model efficiently via QLoRA. The entire training infrastructure fits on a desk.
<strong>Why consumer hardware?</strong> The SLL thesis is that sovereign AI training should be accessible, not reserved for organisations with data centre budgets. Consumer-grade GPUs can fine-tune 14B models efficiently via QLoRA. The entire inference infrastructure fits on a desk.
</p>
</div>
</section>
@ -691,7 +713,7 @@
<ul class="space-y-3 text-amber-800">
<li class="flex items-start">
<span class="mr-2 font-bold">&bull;</span>
<span data-i18n-html="limitations.item1"><strong>Early-stage training:</strong> Multiple QLoRA fine-tuning runs have been completed. A production model is deployed with governance compliance and bias metrics meeting targets. Evaluation is self-reported. Independent audit is planned.</span>
<span><strong>Production training:</strong> Multiple specialized models are deployed across five community types, each with governance compliance and bias metrics meeting targets. Evaluation is self-reported. Independent audit is planned.</span>
</li>
<li class="flex items-start">
<span class="mr-2 font-bold">&bull;</span>