TheFlow 611bb6999e SECURITY + docs: remove pptx-env (3019 files), add world-class CONTRIBUTING.md, fix Stripe key exposure

CRITICAL SECURITY:
- Removed 3,019 pptx-env Python virtualenv files from public tracking
- Added pptx-env/ to .gitignore
- Note: Stripe live key ALREADY removed in previous commit a6dc277

NEW CONTRIBUTING.md (world-class):
- Research-quality standards matching README
- Honest about alpha status (108 failing tests)
- Correct contact: research@agenticgovernance.digital
- No fabricated features or cultural positioning
- Rigorous testing/documentation standards
- Clear research ethics section

RESULT: Public GitHub now properly cleaned and documented

2025-10-21 20:25:43 +13:00

18 KiB

Raw Blame History

Contributing to Tractatus Framework

Status: Alpha Research Project (October 2025)

Thank you for your interest in contributing to architectural AI safety research. Tractatus welcomes contributions that advance our understanding of structural constraints in AI systems.

🎯 What We're Building

Tractatus explores whether architectural constraints can make certain AI decisions structurally impossible without human judgment. Unlike alignment-based approaches that hope AI will choose safety, we investigate whether safety can be enforced through system architecture.

This is active research, not production software.

We welcome contributions that:

Advance research questions with empirical rigor
Improve implementation quality and test coverage
Document real-world failure modes and responses
Challenge assumptions with evidence
Replicate findings in new contexts

🔬 Types of Contributions

Research Contributions (Highest Value)

Empirical Studies

Controlled experiments testing framework effectiveness
Comparative analysis with baseline (no framework) conditions
Measurement of false positive/false negative rates
Cross-LLM compatibility testing (GPT-4, Gemini, open-source models)
Multi-domain generalization studies

Theoretical Work

Formal verification of safety properties
Proofs of correctness for specific boundary conditions
Extensions of value pluralism theory to AI systems
Analysis of rule proliferation dynamics

Replication Studies

Independent validation of our findings
Testing in different application domains
Deployment in production contexts with documented results

Format: Submit as GitHub issue tagged research with methodology, data, and findings. We'll work with you on publication if results are significant.

Implementation Contributions

High Priority

Fix failing tests - We have 108 known failures that need investigation
Improve test coverage - Focus on edge cases and integration scenarios
Performance optimization - Rule validation overhead, MongoDB query efficiency
Cross-platform testing - Windows, macOS compatibility verification

Medium Priority

Language ports (Python, Rust, Go, TypeScript)
Integration examples (Express, FastAPI, Spring Boot)
Enhanced logging and observability
API documentation improvements

Lower Priority

UI enhancements (currently minimal by design)
Developer experience improvements
Build system optimizations

Documentation Contributions

Critical Needs

Case studies from real deployments (with data)
Failure mode documentation (what went wrong and why)
Integration tutorials with working code examples
Critical analyses of framework limitations

Standard Needs

Corrections to existing documentation
Clarity improvements
Code comment additions
API reference updates

🚀 Getting Started

Prerequisites

Required

Node.js 18+ (tested on 18.x and 20.x)
MongoDB 7.0+ (critical - earlier versions have compatibility issues)
Git
8GB RAM minimum (for local MongoDB + tests)

Helpful

Understanding of organizational decision theory (March & Simon)
Familiarity with value pluralism (Berlin, Chang)
Experience with LLM-assisted development contexts

Local Development Setup

# 1. Fork and clone
git clone git@github.com:YOUR_USERNAME/tractatus-framework.git
cd tractatus-framework

# 2. Install dependencies
npm install

# 3. Set up environment
cp .env.example .env
# Edit .env - ensure MongoDB connection string is correct

# 4. Start MongoDB (if not running)
# macOS: brew services start mongodb-community
# Ubuntu: sudo systemctl start mongod
# Windows: net start MongoDB

# 5. Initialize database with test data
npm run init:db

# 6. Run tests to verify setup
npm test

# Expected: 625 passing, 108 failing (known issues)
# If you get different numbers, something is wrong

# 7. Start development server
npm start
# Runs on http://localhost:9000

Project Structure

tractatus-framework/
├── src/
│   ├── services/              # 6 core framework components
│   │   ├── InstructionPersistenceClassifier.service.js
│   │   ├── CrossReferenceValidator.service.js
│   │   ├── BoundaryEnforcer.service.js
│   │   ├── ContextPressureMonitor.service.js
│   │   ├── MetacognitiveVerifier.service.js
│   │   └── PluralisticDeliberationOrchestrator.service.js
│   ├── models/                # MongoDB schemas
│   ├── routes/                # API endpoints
│   ├── controllers/           # Request handlers
│   ├── middleware/            # Express middleware
│   └── server.js              # Application entry point
├── tests/
│   ├── unit/                  # Service unit tests
│   └── integration/           # API integration tests
├── public/                    # Frontend (vanilla JS, no framework)
├── docs/                      # Research documentation
└── scripts/                   # Utilities and migrations

Key files to understand:

src/services/ContextPressureMonitor.service.js - Session health tracking (good entry point)
src/services/CrossReferenceValidator.service.js - Training pattern override detection
tests/unit/ContextPressureMonitor.test.js - Example test structure
.env.example - Required configuration variables

📝 Contribution Process

1. Before You Start

For significant work (new features, architectural changes, research studies):

Open a GitHub Discussion or Issue first
Describe your proposal with:
- Problem being addressed
- Proposed approach
- Expected outcomes
- Resource requirements
Wait for feedback before investing significant time

For minor fixes (typos, small bugs, documentation corrections):

Just submit a PR with clear description

2. Development Workflow

# Create feature branch
git checkout -b research/empirical-validation-study
# or
git checkout -b fix/mongodb-connection-pool
# or
git checkout -b docs/integration-tutorial

# Make changes iteratively
# ... edit files ...

# Run tests frequently
npm test

# Verify no regressions
npm run test:unit
npm run test:integration

# Commit with clear messages
git add .
git commit -m "fix(validation): resolve race condition in CrossReferenceValidator

Issue: Concurrent validation requests caused inconsistent results
Root cause: Shared state in validator instance
Solution: Make validation stateless, pass context explicitly

Tested with 100 concurrent requests - no failures

Fixes #123"

# Push to your fork
git push origin research/empirical-validation-study

3. Pull Request Guidelines

Title Format:

type(scope): brief description

Examples:
fix(tests): resolve MongoDB connection timeout in integration tests
feat(validation): add configurable threshold for context pressure
docs(README): correct test count and clarify maturity status
research(replication): independent validation of 27027 failure mode

Types:

fix - Bug fixes
feat - New features
docs - Documentation only
test - Test additions/fixes
refactor - Code restructuring
research - Research contributions
chore - Build/tooling changes

PR Description Must Include:

## Problem
Clear description of what issue this addresses

## Solution
How you solved it and why this approach

## Testing
What tests were added/modified
How you verified the fix

## Breaking Changes
List any breaking changes (or "None")

## Research Context (if applicable)
Methodology, data, findings

## Checklist
- [ ] Tests added/updated
- [ ] All tests passing locally
- [ ] Documentation updated
- [ ] No unintended breaking changes
- [ ] Commit messages follow conventions

4. Code Review Process

Automated checks run first (tests, linting)
Maintainer review for:
- Alignment with research goals
- Code quality and test coverage
- Documentation completeness
- Architectural consistency
Feedback provided within 7 days (usually faster)
Iteration if changes needed
Merge when approved

Review criteria:

Does this advance research questions?
Is it tested thoroughly?
Is documentation clear and honest?
Does it maintain architectural integrity?

🧪 Testing Standards

Unit Tests (Required)

Every new function/method must have unit tests.

// tests/unit/NewService.test.js
const { NewService } = require('../../src/services/NewService.service');

describe('NewService', () => {
  describe('criticalFunction', () => {
    it('should handle normal case correctly', () => {
      const service = new NewService();
      const result = service.criticalFunction({ input: 'test' });

      expect(result.status).toBe('success');
      expect(result.data).toBeDefined();
    });

    it('should handle edge case: empty input', () => {
      const service = new NewService();
      expect(() => service.criticalFunction({}))
        .toThrow('Input required');
    });

    it('should handle edge case: invalid input type', () => {
      const service = new NewService();
      const result = service.criticalFunction({ input: 123 });

      expect(result.status).toBe('error');
      expect(result.error).toContain('Expected string');
    });
  });
});

Testing requirements:

Test normal operation
Test edge cases (empty, null, invalid types)
Test error conditions
Mock external dependencies (MongoDB, APIs)
Use descriptive test names
One assertion per test (generally)

Integration Tests (For API Changes)

// tests/integration/api.newEndpoint.test.js
const request = require('supertest');
const app = require('../../src/server');
const db = require('../helpers/db-test-helper');

describe('POST /api/new-endpoint', () => {
  beforeAll(async () => {
    await db.connect();
  });

  afterAll(async () => {
    await db.cleanup();
    await db.disconnect();
  });

  it('should create resource successfully', async () => {
    const response = await request(app)
      .post('/api/new-endpoint')
      .send({ data: 'test' })
      .expect(201);

    expect(response.body.id).toBeDefined();

    // Verify database state
    const saved = await db.findById(response.body.id);
    expect(saved.data).toBe('test');
  });
});

Running Tests

# All tests (current status: 625 pass, 108 fail)
npm test

# Unit tests only
npm run test:unit

# Integration tests only
npm run test:integration

# Watch mode (auto-rerun on changes)
npm run test:watch

# Coverage report
npm run test:coverage

Expectations:

New code: 100% coverage required
Bug fixes: Add test that would have caught the bug
Integration tests: Must use test database, not production

📚 Documentation Standards

Code Documentation

Use JSDoc for all public functions:

/**
 * Validates a proposed action against stored instruction history
 *
 * This prevents the "27027 failure mode" where LLM training patterns
 * override explicit user instructions (e.g., MongoDB port 27017 vs
 * user's explicit instruction to use 27027).
 *
 * @param {Object} action - Proposed action to validate
 * @param {string} action.type - Action type (e.g., 'database_config')
 * @param {Object} action.parameters - Action-specific parameters
 * @param {Array<Instruction>} instructionHistory - Active instructions
 * @returns {Promise<ValidationResult>} Validation outcome
 * @throws {ValidationError} If action type is unsupported
 *
 * @example
 * const result = await validator.validate({
 *   type: 'database_config',
 *   parameters: { port: 27017 }
 * }, instructionHistory);
 *
 * if (result.status === 'REJECTED') {
 *   console.log(result.reason); // "Training pattern override detected"
 * }
 */
async validate(action, instructionHistory) {
  // Implementation...
}

Comment complex logic:

// Edge case: When context window is 95%+ full, quality degrades rapidly.
// Empirical observation across 50+ sessions suggests threshold should be
// 60% for ELEVATED, 75% for HIGH. These values are NOT proven optimal.
if (tokenUsage > 0.60) {
  // ...
}

Research Documentation

For research contributions, include:

Methodology - How the study was conducted
Data - Sample sizes, measurements, statistical methods
Findings - What was discovered (with error bars/confidence intervals)
Limitations - What the study didn't prove
Replication - Enough detail for others to replicate

Example structure:

# Empirical Validation of CrossReferenceValidator

## Research Question
Does the CrossReferenceValidator reduce training pattern override frequency?

## Methodology
- Controlled experiment: 100 test cases with known override patterns
- Conditions: (A) No validator, (B) Validator enabled
- LLM: Claude 3.5 Sonnet
- Measurement: Override rate per 100 interactions
- Statistical test: Chi-square test for independence

## Results
- Condition A (no validator): 23/100 overrides (23%)
- Condition B (validator enabled): 3/100 overrides (3%)
- p < 0.001, effect size: large (Cramér's V = 0.42)

## Limitations
- Single LLM tested (generalization unclear)
- Synthetic test cases (may not reflect real usage)
- Short sessions (long-term drift not measured)
- Observer bias (researcher knew test purpose)

## Conclusion
Strong evidence that validator reduces training pattern overrides in
controlled conditions with Claude 3.5. Replication with other LLMs
and real-world deployments needed.

## Data & Code
- Raw data: [link to CSV]
- Analysis script: [link to R/Python script]
- Test prompts: [link to test suite]

⚖️ Research Ethics & Integrity

Required Standards

Transparency

Acknowledge all limitations
Report negative results (what didn't work)
Disclose conflicts of interest
Share data and methodology

Accuracy

No fabricated statistics or results
Clearly distinguish observation from proof
Use appropriate statistical methods
Acknowledge uncertainty

Attribution

Cite all sources
Credit collaborators
Acknowledge AI assistance in implementation
Reference prior work

What We Reject

❌ Fabricated data or statistics
❌ Selective reporting (hiding negative results)
❌ Plagiarism or insufficient attribution
❌ Overclaiming ("proves", "guarantees" without rigorous evidence)
❌ Undisclosed conflicts of interest

AI-Assisted Contributions

We welcome AI-assisted contributions with proper disclosure:

This code was generated with assistance from [Claude/GPT-4/etc] and
subsequently reviewed and tested by [human contributor name].

Testing: [description of validation performed]

Be honest about:

What the AI generated vs. what you wrote
What testing/validation you performed
Any limitations you're aware of

🚫 What We Don't Accept

Technical

Code without tests
Breaking changes without migration path
Commits that reduce test coverage
Violations of existing architectural patterns
Features that bypass safety constraints

Process

PRs without description or context
Unconstructive criticism without alternatives
Ignoring review feedback
Force-pushing over maintainer commits

Content

Disrespectful or discriminatory language
Marketing hyperbole or unsubstantiated claims
Promises of features/capabilities that don't exist
Plagiarized content

📞 Getting Help

Technical Questions

Open a GitHub Discussion (preferred)
Tag with appropriate label (question, help-wanted)

Research Collaboration

Email: research@agenticgovernance.digital
Include: Research question, proposed methodology, timeline

Bug Reports

Open GitHub Issue
Include: Steps to reproduce, expected vs actual behavior, environment

Security Issues

Email: research@agenticgovernance.digital
Do NOT open public issue for security vulnerabilities

🏆 Recognition

Contributors are acknowledged through:

Code Contributors

GitHub contributors list (automatic)
Release notes for significant contributions
In-code attribution for major features

Research Contributors

Co-authorship on papers (if applicable)
Citation in research documentation
Acknowledgment in published materials

All forms of contribution are valued - code, documentation, research, community support, and critical feedback all advance the project.

📜 License

By contributing, you agree that your contributions will be licensed under Apache License 2.0 (see LICENSE file).

You retain copyright to your contributions. The Apache 2.0 license grants the project and users broad permissions while protecting contributors from liability.

🎓 Learning Resources

For New Contributors

Start here:

Read README.md - Understand project goals and current state
Browse existing issues - See what needs work
Review test files - Understand code patterns
Try local setup - Get environment working

Recommended reading:

March & Simon - Organizations (1958) - Organizational decision theory foundations
Isaiah Berlin - Two Concepts of Liberty (1958) - Value pluralism
Ruth Chang - Hard Choices (2013) - Incommensurability theory

Project-specific:

Case Studies - Real-world examples
API Documentation - Technical reference
Existing tests - Best way to understand how code works

For Researchers

Academic context:

AI safety through architectural constraints (vs. alignment)
Value pluralism in AI system design
Organizational theory applied to AI governance
Empirical validation of governance frameworks

Open research questions:

What is the optimal rule count before brittleness?
Can boundary detection be made more precise?
Does this generalize beyond software development contexts?
How to measure framework effectiveness rigorously?

Thank you for contributing to architectural AI safety research.

Last updated: 2025-10-21

18 KiB Raw Blame History