tractatus/scripts/upload-document.js
TheFlow a4db3e62ec
Some checks are pending
CI / Run Tests (push) Waiting to run
CI / Lint Code (push) Waiting to run
CI / CSP Compliance Check (push) Waiting to run
chore(vendor-policy): sweep project-self GitHub URLs to Codeberg (partial)
Addresses the documentation-layer gap after Phase A/B moved the git REMOTE from
GitHub to Codeberg but left ~100 project-self GitHub URLs embedded in markdown,
HTML, JS, and Python files. The remote-layer migration was generalised as
"GitHub is gone from the codebase" without verifying the content layer.

22 files swept in this commit. 27 additional files hold pre-existing inst_016/017/018
or inst_084 debt that would transfer on touch (hook whole-file scan). Those
await a companion hygiene-first commit before their GitHub->Codeberg flip
can land cleanly.

Sweep scope this commit:
  - README.md, SECURITY.md
  - 3 For-Claude-Web bundle files (GitHub URLs noted as "separate concern" in
    today's earlier licence-swap commits)
  - docs/markdown/deployment-guide.md
  - docs/AUTOMATED_SYNC_SETUP, PLURALISM_CHECKLIST, github/AGENT_LIGHTNING_README
  - docs/business-intelligence/governance-bi-tools
  - docs/outreach/EXECUTIVE-BRIEF-BI-GOVERNANCE (+ v2)
  - docs/research/ARCHITECTURAL-SAFEGUARDS-*
  - email-templates/README.md, base-template.html
  - 3 scripts/seed-*-blog-post.js (blog-seeding scripts)
  - scripts/upload-document.js
  - SESSION_HANDOFF_2025-10-23_FRAMEWORK_ANALYSIS.md
  - SECURITY_INCIDENT_POST_MORTEM_2025-10-21.md

Pattern swaps (longest-first):
  github.com/AgenticGovernance/tractatus-framework/issues -> codeberg.org/mysovereignty/tractatus-framework/issues
  github.com/AgenticGovernance/tractatus-framework/discussions -> .../issues (Codeberg has no discussions feature)
  github.com/AgenticGovernance/tractatus-framework.git -> codeberg.org/mysovereignty/tractatus-framework.git
  github.com/AgenticGovernance/tractatus-framework -> codeberg.org/mysovereignty/tractatus-framework
  git@github.com:AgenticGovernance/... -> git@codeberg.org:mysovereignty/...
  github.com/AgenticGovernance/tractatus (old org/repo path) -> codeberg.org/mysovereignty/tractatus-framework
  AgenticGovernance/tractatus-framework (bare) -> mysovereignty/tractatus-framework

Hook validator update (scripts/hook-validators/validate-credentials.js):
  PROTECTED_VALUES.github_org:  'AgenticGovernance'  -> 'mysovereignty'
  PROTECTED_VALUES.license:     'Apache License 2.0' -> EUPL-1.2 long form
  URL detection regex:          /github\.com\/.../   -> /codeberg\.org\/.../
  Placeholder checks + error messages updated to reflect Codeberg as
  authoritative post-migration host. Key names (e.g. `github_org`) retained
  for backward compatibility with validate-file-edit.js.

Held back from this commit (27 files total, documented reasons):

  11 historical session handoffs / closedown docs / incident reports
    (2025-10 through 2026-02) — modifying them rewrites the record to contain
    URLs that did not exist at the time of writing, AND ownership of their
    pre-existing inst_084 exposures transfers on touch.

  8 live-content docs with pre-existing inst_084 debt (port/API-endpoint/
    file-path exposures): docs/markdown/case-studies.md, technical-architecture,
    introduction-to-the-tractatus-framework, implementation-guide-v1.1,
    docs/plans/integrated-implementation-roadmap-2025, docs/governance/*,
    docs/ANTHROPIC_*, docs/GOVERNANCE_SERVICE_*, docs/RESEARCH_DOCUMENTATION_*,
    deployment-quickstart/*.

  8 live-content docs with pre-existing inst_016/017/018 debt:
    CHANGELOG.md, CONTRIBUTING.md, docs/LAUNCH_ANNOUNCEMENT, LAUNCH_CHECKLIST,
    PHASE_4_REPOSITORY_ANALYSIS, PHASE_6_SUMMARY, docs/plans/research-enhancement-
    roadmap-2025, docs/case-studies/pre-publication-audit-oct-2025.

  Also NOT in this commit (separate concerns):
  - scripts/add-inst-084-github-url-protection.js (detection-rule logic needs
    framework-level decision on post-migration semantics).
  - .claude/* (framework state).
  - docs/PRODUCTION_DOCUMENTS_EXPORT.json (DB dump).
  - package-lock.json (npm sponsor URLs, third-party).
  - .git/config embedded credentials (requires out-of-band rotation on both
    remote hosts + auth-strategy decision; user-action task).

Context: today's EUPL-1.2 sweep closed the licence-text-content layer
(5c386d0d / 6d49bfbf / ab0a6af4 / 4c1a26e8). This commit starts closing the
matching vendor-URL-content layer. Next: hygiene-first pass on the 16
live-content docs held back, then a second URL-flip pass on them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 10:53:13 +12:00

657 lines
18 KiB
JavaScript

#!/usr/bin/env node
/**
* Upload Document to Tractatus Docs System
*
* One-command script to:
* 1. Upload markdown file to database
* 2. Generate PDF automatically
* 3. Configure for docs.html sidebar
* 4. Create card rendering metadata
* 5. Set up download links
*
* Usage:
* node scripts/upload-document.js <markdown-file> [options]
*
* Options:
* --category <cat> Category (getting-started, technical-reference, research-theory, etc.)
* --audience <aud> Target audience (general, researcher, implementer, leader, etc.)
* --title <title> Override document title
* --author <author> Document author (default: Agentic Governance Research Team)
* --tags <tags> Comma-separated tags
* --no-pdf Skip PDF generation
* --pdf-dir <dir> Custom PDF output directory (default: docs/research/)
* --order <num> Display order (lower = higher priority, default: 999)
* --force Overwrite existing document
*/
require('dotenv').config();
const fs = require('fs').promises;
const path = require('path');
const { spawn } = require('child_process');
const { connect, close } = require('../src/utils/db.util');
const Document = require('../src/models/Document.model');
const { markdownToHtml, extractTOC, generateSlug } = require('../src/utils/markdown.util');
// Parse command line arguments
const args = process.argv.slice(2);
if (args.length === 0 || args[0] === '--help' || args[0] === '-h') {
console.log(`
Usage: node scripts/upload-document.js <markdown-file> [options]
Options:
--category <cat> REQUIRED. Category: getting-started, technical-reference, research-theory,
advanced-topics, case-studies, business-leadership
--type <type> REQUIRED. Document type: working-paper, case-study, technical-report,
guide, reference, brief
--audience <aud> Audience: general, researcher, implementer, leader, advocate, developer
--title <title> Override document title (extracted from H1 if not provided)
--author <author> Document author (default: Agentic Governance Research Team)
--tags <tags> Comma-separated tags
--no-pdf Skip PDF generation
--pdf-dir <dir> Custom PDF output directory (default: docs/research/)
--order <num> Display order (lower = higher priority, default: 999)
--force Overwrite existing document
--contact <email> Contact email (default: research@agenticgovernance.digital)
--licence <type> Licence type: Apache-2.0 or CC-BY-4.0 (SPDX identifiers)
Inferred from category if not set: research-theory/advanced-topics/case-studies → CC-BY-4.0, others → Apache-2.0
Categories:
- getting-started 🚀 Getting Started
- technical-reference 🔌 Technical Reference
- research-theory 🔬 Theory & Research
- advanced-topics 🎓 Advanced Topics
- case-studies 📊 Case Studies
- business-leadership 💼 Business & Leadership
Examples:
# Upload research paper
node scripts/upload-document.js docs/research/my-paper.md \\
--category research-theory \\
--type working-paper \\
--audience researcher \\
--tags "ai-safety,governance,research"
# Upload technical guide (no PDF)
node scripts/upload-document.js docs/guides/setup.md \\
--category getting-started \\
--type guide \\
--audience developer \\
--no-pdf
# Upload with custom order
node scripts/upload-document.js docs/important.md \\
--category getting-started \\
--order 1 \\
--force
`);
process.exit(0);
}
// Extract markdown file path
const mdFilePath = args[0];
if (!mdFilePath) {
console.error('❌ Error: No markdown file specified');
process.exit(1);
}
// Parse options
const options = {
category: null,
audience: 'general',
document_type: null,
title: null,
author: 'Agentic Governance Research Team',
tags: [],
generatePDF: true,
pdfDir: 'docs/research',
order: 999,
force: false,
contact: 'research@agenticgovernance.digital',
licence: null // SPDX: 'Apache-2.0' or 'CC-BY-4.0' — inferred from category if not set
};
for (let i = 1; i < args.length; i++) {
switch (args[i]) {
case '--category':
options.category = args[++i];
break;
case '--audience':
options.audience = args[++i];
break;
case '--type':
options.document_type = args[++i];
break;
case '--title':
options.title = args[++i];
break;
case '--author':
options.author = args[++i];
break;
case '--tags':
options.tags = args[++i].split(',').map(t => t.trim());
break;
case '--no-pdf':
options.generatePDF = false;
break;
case '--pdf-dir':
options.pdfDir = args[++i];
break;
case '--order':
options.order = parseInt(args[++i]);
break;
case '--force':
options.force = true;
break;
case '--contact':
options.contact = args[++i];
break;
case '--licence':
case '--license':
options.licence = args[++i];
break;
}
}
// Validate category
const VALID_CATEGORIES = [
'getting-started',
'technical-reference',
'research-theory',
'advanced-topics',
'case-studies',
'business-leadership'
];
if (!options.category) {
console.error('❌ Error: --category is required');
console.log('Valid categories:', VALID_CATEGORIES.join(', '));
process.exit(1);
}
if (!VALID_CATEGORIES.includes(options.category)) {
console.error(`❌ Error: Invalid category "${options.category}"`);
console.log('Valid categories:', VALID_CATEGORIES.join(', '));
process.exit(1);
}
// Licence: infer from category if not explicitly provided
const VALID_LICENCES = ['Apache-2.0', 'CC-BY-4.0'];
const CC_BY_CATEGORIES = ['research-theory', 'advanced-topics', 'case-studies'];
if (!options.licence) {
options.licence = CC_BY_CATEGORIES.includes(options.category) ? 'CC-BY-4.0' : 'Apache-2.0';
console.log(`📜 Licence (inferred from category): ${options.licence}`);
} else {
if (!VALID_LICENCES.includes(options.licence)) {
console.error(`❌ Error: Invalid licence "${options.licence}"`);
console.log('Valid licences:', VALID_LICENCES.join(', '));
process.exit(1);
}
console.log(`📜 Licence: ${options.licence}`);
}
/**
* Generate PDF from markdown
*/
async function generatePDF(mdPath, outputDir) {
const mdFileName = path.basename(mdPath, '.md');
const pdfFileName = `${mdFileName}.pdf`;
const pdfPath = path.join(outputDir, pdfFileName);
console.log(`📄 Generating PDF: ${pdfPath}`);
// Create Python script for PDF generation
const pythonScript = `
import sys
import markdown
from weasyprint import HTML, CSS
from pathlib import Path
try:
from PyPDF2 import PdfReader, PdfWriter
has_pypdf2 = True
except ImportError:
has_pypdf2 = False
md_path = sys.argv[1]
pdf_path = sys.argv[2]
title = sys.argv[3]
author = sys.argv[4]
# Read markdown
with open(md_path, 'r', encoding='utf-8') as f:
md_content = f.read()
# Convert to HTML
html_content = markdown.markdown(
md_content,
extensions=[
'markdown.extensions.tables',
'markdown.extensions.fenced_code',
'markdown.extensions.toc',
'markdown.extensions.sane_lists'
]
)
# Wrap in HTML
full_html = f"""
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>{title}</title>
</head>
<body>
{html_content}
</body>
</html>
"""
# CSS styling
css = CSS(string="""
@page {
size: Letter;
margin: 1in;
@bottom-center {
content: counter(page);
font-size: 10pt;
color: #666;
}
}
body {
font-family: "Georgia", "Times New Roman", serif;
font-size: 11pt;
line-height: 1.6;
color: #333;
}
h1 {
font-size: 24pt;
font-weight: bold;
color: #1976d2;
margin-top: 24pt;
margin-bottom: 12pt;
page-break-after: avoid;
border-bottom: 2px solid #1976d2;
padding-bottom: 4pt;
}
h2 {
font-size: 18pt;
font-weight: bold;
color: #1976d2;
margin-top: 20pt;
margin-bottom: 10pt;
page-break-after: avoid;
border-bottom: 2px solid #1976d2;
padding-bottom: 4pt;
}
h3 {
font-size: 14pt;
font-weight: bold;
color: #424242;
margin-top: 16pt;
margin-bottom: 8pt;
page-break-after: avoid;
}
p {
margin-top: 0;
margin-bottom: 10pt;
text-align: justify;
}
table {
width: 100%;
border-collapse: collapse;
margin: 12pt 0;
page-break-inside: avoid;
}
th {
background-color: #1976d2;
color: white;
font-weight: bold;
padding: 8pt;
text-align: left;
border: 1px solid #1976d2;
}
td {
padding: 6pt;
border: 1px solid #ddd;
}
code {
font-family: "Courier New", monospace;
font-size: 10pt;
background-color: #f5f5f5;
padding: 2pt 4pt;
border-radius: 2pt;
}
""")
# Generate PDF
HTML(string=full_html).write_pdf(pdf_path, stylesheets=[css])
# Add metadata if PyPDF2 is available
if has_pypdf2:
reader = PdfReader(pdf_path)
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page)
writer.add_metadata({
'/Title': title,
'/Author': author,
'/Creator': 'Tractatus Framework',
'/Producer': 'WeasyPrint'
})
with open(pdf_path, 'wb') as f:
writer.write(f)
print(f"✓ PDF generated: {pdf_path}")
`;
return new Promise((resolve, reject) => {
const python = spawn('python3', [
'-c',
pythonScript,
mdPath,
pdfPath,
options.title || 'Tractatus Document',
options.author
]);
python.stdout.on('data', (data) => {
console.log(data.toString().trim());
});
python.stderr.on('data', (data) => {
console.error(data.toString().trim());
});
python.on('close', (code) => {
if (code === 0) {
resolve(pdfPath);
} else {
reject(new Error(`PDF generation failed with code ${code}`));
}
});
});
}
/**
* Add license and metadata to markdown file
*
* Licence type is determined by --licence option or inferred from --category:
* CC-BY-4.0: research-theory, advanced-topics, case-studies
* Apache-2.0: getting-started, technical-reference, business-leadership
*/
async function addLicenseAndMetadata(mdPath) {
const content = await fs.readFile(mdPath, 'utf-8');
// Check if already has license
if (content.includes('## License') || content.includes('## Licence') ||
content.includes('Apache License') || content.includes('CC BY 4.0') ||
content.includes('Creative Commons')) {
console.log('⚠️ Document already has license section');
return;
}
const year = new Date().getFullYear();
const docTitle = options.title || path.basename(mdPath, '.md');
let licenceBlock;
if (options.licence === 'CC-BY-4.0') {
licenceBlock = `
---
## Contact
**Research Inquiries:** ${options.contact}
**Website:** https://agenticgovernance.digital
---
## Licence
Copyright © ${year} John Stroh.
This work is licensed under the [Creative Commons Attribution 4.0 International Licence (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).
You are free to share, copy, redistribute, adapt, remix, transform, and build upon this material for any purpose, including commercially, provided you give appropriate attribution, provide a link to the licence, and indicate if changes were made.
**Suggested citation:**
Stroh, J., & Claude (Anthropic). (${year}). *${docTitle}*. Agentic Governance Digital. https://agenticgovernance.digital
**Note:** The Tractatus AI Safety Framework source code is separately licensed under the Apache License 2.0. This Creative Commons licence applies to the research paper text and figures only.`;
} else {
licenceBlock = `
---
## Contact
**Research Inquiries:** ${options.contact}
**Website:** https://agenticgovernance.digital
**Repository:** https://codeberg.org/mysovereignty/tractatus-framework
---
## License
Copyright ${year} Agentic Governance Initiative
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at:
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.`;
}
const metadata = `
---
## Document Metadata
<div class="document-metadata">
- **Version:** 1.0
- **Created:** ${new Date().toISOString().split('T')[0]}
- **Last Modified:** ${new Date().toISOString().split('T')[0]}
- **Author:** ${options.author}
- **Licence:** ${options.licence}
- **Document ID:** ${generateSlug(options.title || path.basename(mdPath, '.md'))}
- **Status:** Active
</div>
`;
await fs.writeFile(mdPath, content + licenceBlock + metadata, 'utf-8');
console.log(`✓ Added ${options.licence} licence and metadata to markdown file`);
}
/**
* Main upload function
*/
async function uploadDocument() {
try {
console.log('\n=== Tractatus Document Upload ===\n');
// SECURITY: Require --category and --type for public documents
const validTypes = ['working-paper', 'case-study', 'technical-report', 'guide', 'reference', 'brief'];
if (!options.category) {
console.error('❌ Error: --category is required. Available: getting-started, resources, research-theory, technical-reference, advanced-topics, business-leadership');
process.exit(1);
}
if (!options.document_type || !validTypes.includes(options.document_type)) {
console.error(`❌ Error: --type is required. Available: ${validTypes.join(', ')}`);
process.exit(1);
}
// Verify markdown file exists
const mdPath = path.resolve(mdFilePath);
try {
await fs.access(mdPath);
} catch (err) {
console.error(`❌ Error: File not found: ${mdPath}`);
process.exit(1);
}
console.log(`📄 Processing: ${mdPath}`);
// Add license and metadata
await addLicenseAndMetadata(mdPath);
// Read markdown content
const rawContent = await fs.readFile(mdPath, 'utf-8');
// Extract title from first H1 or use provided title
let title = options.title;
if (!title) {
const h1Match = rawContent.match(/^#\s+(.+)$/m);
title = h1Match ? h1Match[1] : path.basename(mdPath, '.md');
}
console.log(`📌 Title: ${title}`);
console.log(`📂 Category: ${options.category}`);
console.log(`👥 Audience: ${options.audience}`);
// Generate PDF if requested
let pdfPath = null;
let pdfWebPath = null;
if (options.generatePDF) {
try {
const outputDir = path.resolve(options.pdfDir);
await fs.mkdir(outputDir, { recursive: true });
pdfPath = await generatePDF(mdPath, outputDir);
// Convert to web path
pdfWebPath = '/' + path.relative(path.resolve('public'), pdfPath);
console.log(`✓ PDF available at: ${pdfWebPath}`);
} catch (err) {
console.error(`⚠️ PDF generation failed: ${err.message}`);
console.log(' Continuing without PDF...');
}
}
// Convert markdown to HTML
const htmlContent = markdownToHtml(rawContent);
// Extract table of contents
const tableOfContents = extractTOC(rawContent);
// Generate slug
const slug = generateSlug(title);
console.log(`🔗 Slug: ${slug}`);
// Connect to database
await connect();
// Check if document already exists
const existing = await Document.findBySlug(slug);
if (existing && !options.force) {
console.error(`\n❌ Error: Document already exists with slug: ${slug}`);
console.log(' Use --force to overwrite');
await close();
process.exit(1);
}
// Create document object
const doc = {
title: title,
slug: slug,
quadrant: null,
persistence: 'HIGH',
audience: options.audience,
document_type: options.document_type,
visibility: 'public',
status: 'current',
category: options.category,
licence: options.licence,
order: options.order,
content_html: htmlContent,
content_markdown: rawContent,
toc: tableOfContents,
security_classification: {
contains_credentials: false,
contains_financial_info: false,
contains_vulnerability_info: false,
contains_infrastructure_details: false,
requires_authentication: false
},
metadata: {
author: options.author,
version: '1.0',
document_code: null,
related_documents: [],
tags: options.tags
},
translations: {},
search_index: rawContent.toLowerCase(),
download_formats: {}
};
// Add PDF download if available
if (pdfWebPath) {
doc.download_formats.pdf = pdfWebPath;
}
// Create or update document
if (existing && options.force) {
await Document.update(existing._id, doc);
console.log(`\n✅ Document updated successfully!`);
} else {
await Document.create(doc);
console.log(`\n✅ Document created successfully!`);
}
console.log(`\n📊 Document Details:`);
console.log(` Title: ${doc.title}`);
console.log(` Slug: ${doc.slug}`);
console.log(` Category: ${doc.category}`);
console.log(` Audience: ${doc.audience}`);
console.log(` Order: ${doc.order}`);
console.log(` Tags: ${doc.metadata.tags.join(', ') || 'none'}`);
if (pdfWebPath) {
console.log(` PDF: ${pdfWebPath}`);
}
console.log(`\n✅ Document is now available at:`);
console.log(` https://agenticgovernance.digital/docs.html?doc=${slug}`);
console.log(` https://agenticgovernance.digital/docs.html?category=${doc.category}`);
console.log(`\n💡 Next Steps:`);
console.log(` 1. Clear browser cache (Ctrl+Shift+R or Cmd+Shift+R)`);
console.log(` 2. Visit docs.html to see your document in the sidebar`);
console.log(` 3. Document will appear under "${options.category}" category`);
await close();
} catch (error) {
console.error('\n❌ Upload failed:', error.message);
console.error(error.stack);
process.exit(1);
}
}
// Run if called directly
if (require.main === module) {
uploadDocument();
}
module.exports = uploadDocument;