CRITICAL SECURITY: - Removed 3,019 pptx-env Python virtualenv files from public tracking - Added pptx-env/ to .gitignore - Note: Stripe live key ALREADY removed in previous commit a6dc277 NEW CONTRIBUTING.md (world-class): - Research-quality standards matching README - Honest about alpha status (108 failing tests) - Correct contact: research@agenticgovernance.digital - No fabricated features or cultural positioning - Rigorous testing/documentation standards - Clear research ethics section RESULT: Public GitHub now properly cleaned and documented
18 KiB
Contributing to Tractatus Framework
Status: Alpha Research Project (October 2025)
Thank you for your interest in contributing to architectural AI safety research. Tractatus welcomes contributions that advance our understanding of structural constraints in AI systems.
🎯 What We're Building
Tractatus explores whether architectural constraints can make certain AI decisions structurally impossible without human judgment. Unlike alignment-based approaches that hope AI will choose safety, we investigate whether safety can be enforced through system architecture.
This is active research, not production software.
We welcome contributions that:
- Advance research questions with empirical rigor
- Improve implementation quality and test coverage
- Document real-world failure modes and responses
- Challenge assumptions with evidence
- Replicate findings in new contexts
🔬 Types of Contributions
Research Contributions (Highest Value)
Empirical Studies
- Controlled experiments testing framework effectiveness
- Comparative analysis with baseline (no framework) conditions
- Measurement of false positive/false negative rates
- Cross-LLM compatibility testing (GPT-4, Gemini, open-source models)
- Multi-domain generalization studies
Theoretical Work
- Formal verification of safety properties
- Proofs of correctness for specific boundary conditions
- Extensions of value pluralism theory to AI systems
- Analysis of rule proliferation dynamics
Replication Studies
- Independent validation of our findings
- Testing in different application domains
- Deployment in production contexts with documented results
Format: Submit as GitHub issue tagged research with methodology, data, and findings. We'll work with you on publication if results are significant.
Implementation Contributions
High Priority
- Fix failing tests - We have 108 known failures that need investigation
- Improve test coverage - Focus on edge cases and integration scenarios
- Performance optimization - Rule validation overhead, MongoDB query efficiency
- Cross-platform testing - Windows, macOS compatibility verification
Medium Priority
- Language ports (Python, Rust, Go, TypeScript)
- Integration examples (Express, FastAPI, Spring Boot)
- Enhanced logging and observability
- API documentation improvements
Lower Priority
- UI enhancements (currently minimal by design)
- Developer experience improvements
- Build system optimizations
Documentation Contributions
Critical Needs
- Case studies from real deployments (with data)
- Failure mode documentation (what went wrong and why)
- Integration tutorials with working code examples
- Critical analyses of framework limitations
Standard Needs
- Corrections to existing documentation
- Clarity improvements
- Code comment additions
- API reference updates
🚀 Getting Started
Prerequisites
Required
- Node.js 18+ (tested on 18.x and 20.x)
- MongoDB 7.0+ (critical - earlier versions have compatibility issues)
- Git
- 8GB RAM minimum (for local MongoDB + tests)
Helpful
- Understanding of organizational decision theory (March & Simon)
- Familiarity with value pluralism (Berlin, Chang)
- Experience with LLM-assisted development contexts
Local Development Setup
# 1. Fork and clone
git clone git@github.com:YOUR_USERNAME/tractatus-framework.git
cd tractatus-framework
# 2. Install dependencies
npm install
# 3. Set up environment
cp .env.example .env
# Edit .env - ensure MongoDB connection string is correct
# 4. Start MongoDB (if not running)
# macOS: brew services start mongodb-community
# Ubuntu: sudo systemctl start mongod
# Windows: net start MongoDB
# 5. Initialize database with test data
npm run init:db
# 6. Run tests to verify setup
npm test
# Expected: 625 passing, 108 failing (known issues)
# If you get different numbers, something is wrong
# 7. Start development server
npm start
# Runs on http://localhost:9000
Project Structure
tractatus-framework/
├── src/
│ ├── services/ # 6 core framework components
│ │ ├── InstructionPersistenceClassifier.service.js
│ │ ├── CrossReferenceValidator.service.js
│ │ ├── BoundaryEnforcer.service.js
│ │ ├── ContextPressureMonitor.service.js
│ │ ├── MetacognitiveVerifier.service.js
│ │ └── PluralisticDeliberationOrchestrator.service.js
│ ├── models/ # MongoDB schemas
│ ├── routes/ # API endpoints
│ ├── controllers/ # Request handlers
│ ├── middleware/ # Express middleware
│ └── server.js # Application entry point
├── tests/
│ ├── unit/ # Service unit tests
│ └── integration/ # API integration tests
├── public/ # Frontend (vanilla JS, no framework)
├── docs/ # Research documentation
└── scripts/ # Utilities and migrations
Key files to understand:
src/services/ContextPressureMonitor.service.js- Session health tracking (good entry point)src/services/CrossReferenceValidator.service.js- Training pattern override detectiontests/unit/ContextPressureMonitor.test.js- Example test structure.env.example- Required configuration variables
📝 Contribution Process
1. Before You Start
For significant work (new features, architectural changes, research studies):
- Open a GitHub Discussion or Issue first
- Describe your proposal with:
- Problem being addressed
- Proposed approach
- Expected outcomes
- Resource requirements
- Wait for feedback before investing significant time
For minor fixes (typos, small bugs, documentation corrections):
- Just submit a PR with clear description
2. Development Workflow
# Create feature branch
git checkout -b research/empirical-validation-study
# or
git checkout -b fix/mongodb-connection-pool
# or
git checkout -b docs/integration-tutorial
# Make changes iteratively
# ... edit files ...
# Run tests frequently
npm test
# Verify no regressions
npm run test:unit
npm run test:integration
# Commit with clear messages
git add .
git commit -m "fix(validation): resolve race condition in CrossReferenceValidator
Issue: Concurrent validation requests caused inconsistent results
Root cause: Shared state in validator instance
Solution: Make validation stateless, pass context explicitly
Tested with 100 concurrent requests - no failures
Fixes #123"
# Push to your fork
git push origin research/empirical-validation-study
3. Pull Request Guidelines
Title Format:
type(scope): brief description
Examples:
fix(tests): resolve MongoDB connection timeout in integration tests
feat(validation): add configurable threshold for context pressure
docs(README): correct test count and clarify maturity status
research(replication): independent validation of 27027 failure mode
Types:
fix- Bug fixesfeat- New featuresdocs- Documentation onlytest- Test additions/fixesrefactor- Code restructuringresearch- Research contributionschore- Build/tooling changes
PR Description Must Include:
## Problem
Clear description of what issue this addresses
## Solution
How you solved it and why this approach
## Testing
What tests were added/modified
How you verified the fix
## Breaking Changes
List any breaking changes (or "None")
## Research Context (if applicable)
Methodology, data, findings
## Checklist
- [ ] Tests added/updated
- [ ] All tests passing locally
- [ ] Documentation updated
- [ ] No unintended breaking changes
- [ ] Commit messages follow conventions
4. Code Review Process
- Automated checks run first (tests, linting)
- Maintainer review for:
- Alignment with research goals
- Code quality and test coverage
- Documentation completeness
- Architectural consistency
- Feedback provided within 7 days (usually faster)
- Iteration if changes needed
- Merge when approved
Review criteria:
- Does this advance research questions?
- Is it tested thoroughly?
- Is documentation clear and honest?
- Does it maintain architectural integrity?
🧪 Testing Standards
Unit Tests (Required)
Every new function/method must have unit tests.
// tests/unit/NewService.test.js
const { NewService } = require('../../src/services/NewService.service');
describe('NewService', () => {
describe('criticalFunction', () => {
it('should handle normal case correctly', () => {
const service = new NewService();
const result = service.criticalFunction({ input: 'test' });
expect(result.status).toBe('success');
expect(result.data).toBeDefined();
});
it('should handle edge case: empty input', () => {
const service = new NewService();
expect(() => service.criticalFunction({}))
.toThrow('Input required');
});
it('should handle edge case: invalid input type', () => {
const service = new NewService();
const result = service.criticalFunction({ input: 123 });
expect(result.status).toBe('error');
expect(result.error).toContain('Expected string');
});
});
});
Testing requirements:
- Test normal operation
- Test edge cases (empty, null, invalid types)
- Test error conditions
- Mock external dependencies (MongoDB, APIs)
- Use descriptive test names
- One assertion per test (generally)
Integration Tests (For API Changes)
// tests/integration/api.newEndpoint.test.js
const request = require('supertest');
const app = require('../../src/server');
const db = require('../helpers/db-test-helper');
describe('POST /api/new-endpoint', () => {
beforeAll(async () => {
await db.connect();
});
afterAll(async () => {
await db.cleanup();
await db.disconnect();
});
it('should create resource successfully', async () => {
const response = await request(app)
.post('/api/new-endpoint')
.send({ data: 'test' })
.expect(201);
expect(response.body.id).toBeDefined();
// Verify database state
const saved = await db.findById(response.body.id);
expect(saved.data).toBe('test');
});
});
Running Tests
# All tests (current status: 625 pass, 108 fail)
npm test
# Unit tests only
npm run test:unit
# Integration tests only
npm run test:integration
# Watch mode (auto-rerun on changes)
npm run test:watch
# Coverage report
npm run test:coverage
Expectations:
- New code: 100% coverage required
- Bug fixes: Add test that would have caught the bug
- Integration tests: Must use test database, not production
📚 Documentation Standards
Code Documentation
Use JSDoc for all public functions:
/**
* Validates a proposed action against stored instruction history
*
* This prevents the "27027 failure mode" where LLM training patterns
* override explicit user instructions (e.g., MongoDB port 27017 vs
* user's explicit instruction to use 27027).
*
* @param {Object} action - Proposed action to validate
* @param {string} action.type - Action type (e.g., 'database_config')
* @param {Object} action.parameters - Action-specific parameters
* @param {Array<Instruction>} instructionHistory - Active instructions
* @returns {Promise<ValidationResult>} Validation outcome
* @throws {ValidationError} If action type is unsupported
*
* @example
* const result = await validator.validate({
* type: 'database_config',
* parameters: { port: 27017 }
* }, instructionHistory);
*
* if (result.status === 'REJECTED') {
* console.log(result.reason); // "Training pattern override detected"
* }
*/
async validate(action, instructionHistory) {
// Implementation...
}
Comment complex logic:
// Edge case: When context window is 95%+ full, quality degrades rapidly.
// Empirical observation across 50+ sessions suggests threshold should be
// 60% for ELEVATED, 75% for HIGH. These values are NOT proven optimal.
if (tokenUsage > 0.60) {
// ...
}
Research Documentation
For research contributions, include:
- Methodology - How the study was conducted
- Data - Sample sizes, measurements, statistical methods
- Findings - What was discovered (with error bars/confidence intervals)
- Limitations - What the study didn't prove
- Replication - Enough detail for others to replicate
Example structure:
# Empirical Validation of CrossReferenceValidator
## Research Question
Does the CrossReferenceValidator reduce training pattern override frequency?
## Methodology
- Controlled experiment: 100 test cases with known override patterns
- Conditions: (A) No validator, (B) Validator enabled
- LLM: Claude 3.5 Sonnet
- Measurement: Override rate per 100 interactions
- Statistical test: Chi-square test for independence
## Results
- Condition A (no validator): 23/100 overrides (23%)
- Condition B (validator enabled): 3/100 overrides (3%)
- p < 0.001, effect size: large (Cramér's V = 0.42)
## Limitations
- Single LLM tested (generalization unclear)
- Synthetic test cases (may not reflect real usage)
- Short sessions (long-term drift not measured)
- Observer bias (researcher knew test purpose)
## Conclusion
Strong evidence that validator reduces training pattern overrides in
controlled conditions with Claude 3.5. Replication with other LLMs
and real-world deployments needed.
## Data & Code
- Raw data: [link to CSV]
- Analysis script: [link to R/Python script]
- Test prompts: [link to test suite]
⚖️ Research Ethics & Integrity
Required Standards
Transparency
- Acknowledge all limitations
- Report negative results (what didn't work)
- Disclose conflicts of interest
- Share data and methodology
Accuracy
- No fabricated statistics or results
- Clearly distinguish observation from proof
- Use appropriate statistical methods
- Acknowledge uncertainty
Attribution
- Cite all sources
- Credit collaborators
- Acknowledge AI assistance in implementation
- Reference prior work
What We Reject
- ❌ Fabricated data or statistics
- ❌ Selective reporting (hiding negative results)
- ❌ Plagiarism or insufficient attribution
- ❌ Overclaiming ("proves", "guarantees" without rigorous evidence)
- ❌ Undisclosed conflicts of interest
AI-Assisted Contributions
We welcome AI-assisted contributions with proper disclosure:
This code was generated with assistance from [Claude/GPT-4/etc] and
subsequently reviewed and tested by [human contributor name].
Testing: [description of validation performed]
Be honest about:
- What the AI generated vs. what you wrote
- What testing/validation you performed
- Any limitations you're aware of
🚫 What We Don't Accept
Technical
- Code without tests
- Breaking changes without migration path
- Commits that reduce test coverage
- Violations of existing architectural patterns
- Features that bypass safety constraints
Process
- PRs without description or context
- Unconstructive criticism without alternatives
- Ignoring review feedback
- Force-pushing over maintainer commits
Content
- Disrespectful or discriminatory language
- Marketing hyperbole or unsubstantiated claims
- Promises of features/capabilities that don't exist
- Plagiarized content
📞 Getting Help
Technical Questions
- Open a GitHub Discussion (preferred)
- Tag with appropriate label (
question,help-wanted)
Research Collaboration
- Email: research@agenticgovernance.digital
- Include: Research question, proposed methodology, timeline
Bug Reports
- Open GitHub Issue
- Include: Steps to reproduce, expected vs actual behavior, environment
Security Issues
- Email: research@agenticgovernance.digital
- Do NOT open public issue for security vulnerabilities
🏆 Recognition
Contributors are acknowledged through:
Code Contributors
- GitHub contributors list (automatic)
- Release notes for significant contributions
- In-code attribution for major features
Research Contributors
- Co-authorship on papers (if applicable)
- Citation in research documentation
- Acknowledgment in published materials
All forms of contribution are valued - code, documentation, research, community support, and critical feedback all advance the project.
📜 License
By contributing, you agree that your contributions will be licensed under Apache License 2.0 (see LICENSE file).
You retain copyright to your contributions. The Apache 2.0 license grants the project and users broad permissions while protecting contributors from liability.
🎓 Learning Resources
For New Contributors
Start here:
- Read README.md - Understand project goals and current state
- Browse existing issues - See what needs work
- Review test files - Understand code patterns
- Try local setup - Get environment working
Recommended reading:
- March & Simon - Organizations (1958) - Organizational decision theory foundations
- Isaiah Berlin - Two Concepts of Liberty (1958) - Value pluralism
- Ruth Chang - Hard Choices (2013) - Incommensurability theory
Project-specific:
- Case Studies - Real-world examples
- API Documentation - Technical reference
- Existing tests - Best way to understand how code works
For Researchers
Academic context:
- AI safety through architectural constraints (vs. alignment)
- Value pluralism in AI system design
- Organizational theory applied to AI governance
- Empirical validation of governance frameworks
Open research questions:
- What is the optimal rule count before brittleness?
- Can boundary detection be made more precise?
- Does this generalize beyond software development contexts?
- How to measure framework effectiveness rigorously?
Thank you for contributing to architectural AI safety research.
Last updated: 2025-10-21