CRITICAL SECURITY: - Removed 3,019 pptx-env Python virtualenv files from public tracking - Added pptx-env/ to .gitignore - Note: Stripe live key ALREADY removed in previous commit a6dc277 NEW CONTRIBUTING.md (world-class): - Research-quality standards matching README - Honest about alpha status (108 failing tests) - Correct contact: research@agenticgovernance.digital - No fabricated features or cultural positioning - Rigorous testing/documentation standards - Clear research ethics section RESULT: Public GitHub now properly cleaned and documented
643 lines
18 KiB
Markdown
643 lines
18 KiB
Markdown
# Contributing to Tractatus Framework
|
|
|
|
**Status:** Alpha Research Project (October 2025)
|
|
|
|
Thank you for your interest in contributing to architectural AI safety research. Tractatus welcomes contributions that advance our understanding of structural constraints in AI systems.
|
|
|
|
---
|
|
|
|
## 🎯 What We're Building
|
|
|
|
Tractatus explores whether **architectural constraints** can make certain AI decisions structurally impossible without human judgment. Unlike alignment-based approaches that hope AI will choose safety, we investigate whether safety can be enforced through system architecture.
|
|
|
|
**This is active research, not production software.**
|
|
|
|
We welcome contributions that:
|
|
- Advance research questions with empirical rigor
|
|
- Improve implementation quality and test coverage
|
|
- Document real-world failure modes and responses
|
|
- Challenge assumptions with evidence
|
|
- Replicate findings in new contexts
|
|
|
|
---
|
|
|
|
## 🔬 Types of Contributions
|
|
|
|
### Research Contributions (Highest Value)
|
|
|
|
**Empirical Studies**
|
|
- Controlled experiments testing framework effectiveness
|
|
- Comparative analysis with baseline (no framework) conditions
|
|
- Measurement of false positive/false negative rates
|
|
- Cross-LLM compatibility testing (GPT-4, Gemini, open-source models)
|
|
- Multi-domain generalization studies
|
|
|
|
**Theoretical Work**
|
|
- Formal verification of safety properties
|
|
- Proofs of correctness for specific boundary conditions
|
|
- Extensions of value pluralism theory to AI systems
|
|
- Analysis of rule proliferation dynamics
|
|
|
|
**Replication Studies**
|
|
- Independent validation of our findings
|
|
- Testing in different application domains
|
|
- Deployment in production contexts with documented results
|
|
|
|
**Format**: Submit as GitHub issue tagged `research` with methodology, data, and findings. We'll work with you on publication if results are significant.
|
|
|
|
### Implementation Contributions
|
|
|
|
**High Priority**
|
|
1. **Fix failing tests** - We have 108 known failures that need investigation
|
|
2. **Improve test coverage** - Focus on edge cases and integration scenarios
|
|
3. **Performance optimization** - Rule validation overhead, MongoDB query efficiency
|
|
4. **Cross-platform testing** - Windows, macOS compatibility verification
|
|
|
|
**Medium Priority**
|
|
- Language ports (Python, Rust, Go, TypeScript)
|
|
- Integration examples (Express, FastAPI, Spring Boot)
|
|
- Enhanced logging and observability
|
|
- API documentation improvements
|
|
|
|
**Lower Priority**
|
|
- UI enhancements (currently minimal by design)
|
|
- Developer experience improvements
|
|
- Build system optimizations
|
|
|
|
### Documentation Contributions
|
|
|
|
**Critical Needs**
|
|
- Case studies from real deployments (with data)
|
|
- Failure mode documentation (what went wrong and why)
|
|
- Integration tutorials with working code examples
|
|
- Critical analyses of framework limitations
|
|
|
|
**Standard Needs**
|
|
- Corrections to existing documentation
|
|
- Clarity improvements
|
|
- Code comment additions
|
|
- API reference updates
|
|
|
|
---
|
|
|
|
## 🚀 Getting Started
|
|
|
|
### Prerequisites
|
|
|
|
**Required**
|
|
- Node.js 18+ (tested on 18.x and 20.x)
|
|
- MongoDB 7.0+ (critical - earlier versions have compatibility issues)
|
|
- Git
|
|
- 8GB RAM minimum (for local MongoDB + tests)
|
|
|
|
**Helpful**
|
|
- Understanding of organizational decision theory (March & Simon)
|
|
- Familiarity with value pluralism (Berlin, Chang)
|
|
- Experience with LLM-assisted development contexts
|
|
|
|
### Local Development Setup
|
|
|
|
```bash
|
|
# 1. Fork and clone
|
|
git clone git@github.com:YOUR_USERNAME/tractatus-framework.git
|
|
cd tractatus-framework
|
|
|
|
# 2. Install dependencies
|
|
npm install
|
|
|
|
# 3. Set up environment
|
|
cp .env.example .env
|
|
# Edit .env - ensure MongoDB connection string is correct
|
|
|
|
# 4. Start MongoDB (if not running)
|
|
# macOS: brew services start mongodb-community
|
|
# Ubuntu: sudo systemctl start mongod
|
|
# Windows: net start MongoDB
|
|
|
|
# 5. Initialize database with test data
|
|
npm run init:db
|
|
|
|
# 6. Run tests to verify setup
|
|
npm test
|
|
|
|
# Expected: 625 passing, 108 failing (known issues)
|
|
# If you get different numbers, something is wrong
|
|
|
|
# 7. Start development server
|
|
npm start
|
|
# Runs on http://localhost:9000
|
|
```
|
|
|
|
### Project Structure
|
|
|
|
```
|
|
tractatus-framework/
|
|
├── src/
|
|
│ ├── services/ # 6 core framework components
|
|
│ │ ├── InstructionPersistenceClassifier.service.js
|
|
│ │ ├── CrossReferenceValidator.service.js
|
|
│ │ ├── BoundaryEnforcer.service.js
|
|
│ │ ├── ContextPressureMonitor.service.js
|
|
│ │ ├── MetacognitiveVerifier.service.js
|
|
│ │ └── PluralisticDeliberationOrchestrator.service.js
|
|
│ ├── models/ # MongoDB schemas
|
|
│ ├── routes/ # API endpoints
|
|
│ ├── controllers/ # Request handlers
|
|
│ ├── middleware/ # Express middleware
|
|
│ └── server.js # Application entry point
|
|
├── tests/
|
|
│ ├── unit/ # Service unit tests
|
|
│ └── integration/ # API integration tests
|
|
├── public/ # Frontend (vanilla JS, no framework)
|
|
├── docs/ # Research documentation
|
|
└── scripts/ # Utilities and migrations
|
|
```
|
|
|
|
**Key files to understand:**
|
|
- `src/services/ContextPressureMonitor.service.js` - Session health tracking (good entry point)
|
|
- `src/services/CrossReferenceValidator.service.js` - Training pattern override detection
|
|
- `tests/unit/ContextPressureMonitor.test.js` - Example test structure
|
|
- `.env.example` - Required configuration variables
|
|
|
|
---
|
|
|
|
## 📝 Contribution Process
|
|
|
|
### 1. Before You Start
|
|
|
|
**For significant work** (new features, architectural changes, research studies):
|
|
1. Open a GitHub Discussion or Issue first
|
|
2. Describe your proposal with:
|
|
- Problem being addressed
|
|
- Proposed approach
|
|
- Expected outcomes
|
|
- Resource requirements
|
|
3. Wait for feedback before investing significant time
|
|
|
|
**For minor fixes** (typos, small bugs, documentation corrections):
|
|
- Just submit a PR with clear description
|
|
|
|
### 2. Development Workflow
|
|
|
|
```bash
|
|
# Create feature branch
|
|
git checkout -b research/empirical-validation-study
|
|
# or
|
|
git checkout -b fix/mongodb-connection-pool
|
|
# or
|
|
git checkout -b docs/integration-tutorial
|
|
|
|
# Make changes iteratively
|
|
# ... edit files ...
|
|
|
|
# Run tests frequently
|
|
npm test
|
|
|
|
# Verify no regressions
|
|
npm run test:unit
|
|
npm run test:integration
|
|
|
|
# Commit with clear messages
|
|
git add .
|
|
git commit -m "fix(validation): resolve race condition in CrossReferenceValidator
|
|
|
|
Issue: Concurrent validation requests caused inconsistent results
|
|
Root cause: Shared state in validator instance
|
|
Solution: Make validation stateless, pass context explicitly
|
|
|
|
Tested with 100 concurrent requests - no failures
|
|
|
|
Fixes #123"
|
|
|
|
# Push to your fork
|
|
git push origin research/empirical-validation-study
|
|
```
|
|
|
|
### 3. Pull Request Guidelines
|
|
|
|
**Title Format:**
|
|
```
|
|
type(scope): brief description
|
|
|
|
Examples:
|
|
fix(tests): resolve MongoDB connection timeout in integration tests
|
|
feat(validation): add configurable threshold for context pressure
|
|
docs(README): correct test count and clarify maturity status
|
|
research(replication): independent validation of 27027 failure mode
|
|
```
|
|
|
|
**Types:**
|
|
- `fix` - Bug fixes
|
|
- `feat` - New features
|
|
- `docs` - Documentation only
|
|
- `test` - Test additions/fixes
|
|
- `refactor` - Code restructuring
|
|
- `research` - Research contributions
|
|
- `chore` - Build/tooling changes
|
|
|
|
**PR Description Must Include:**
|
|
|
|
```markdown
|
|
## Problem
|
|
Clear description of what issue this addresses
|
|
|
|
## Solution
|
|
How you solved it and why this approach
|
|
|
|
## Testing
|
|
What tests were added/modified
|
|
How you verified the fix
|
|
|
|
## Breaking Changes
|
|
List any breaking changes (or "None")
|
|
|
|
## Research Context (if applicable)
|
|
Methodology, data, findings
|
|
|
|
## Checklist
|
|
- [ ] Tests added/updated
|
|
- [ ] All tests passing locally
|
|
- [ ] Documentation updated
|
|
- [ ] No unintended breaking changes
|
|
- [ ] Commit messages follow conventions
|
|
```
|
|
|
|
### 4. Code Review Process
|
|
|
|
1. **Automated checks** run first (tests, linting)
|
|
2. **Maintainer review** for:
|
|
- Alignment with research goals
|
|
- Code quality and test coverage
|
|
- Documentation completeness
|
|
- Architectural consistency
|
|
3. **Feedback** provided within 7 days (usually faster)
|
|
4. **Iteration** if changes needed
|
|
5. **Merge** when approved
|
|
|
|
**Review criteria:**
|
|
- Does this advance research questions?
|
|
- Is it tested thoroughly?
|
|
- Is documentation clear and honest?
|
|
- Does it maintain architectural integrity?
|
|
|
|
---
|
|
|
|
## 🧪 Testing Standards
|
|
|
|
### Unit Tests (Required)
|
|
|
|
**Every new function/method must have unit tests.**
|
|
|
|
```javascript
|
|
// tests/unit/NewService.test.js
|
|
const { NewService } = require('../../src/services/NewService.service');
|
|
|
|
describe('NewService', () => {
|
|
describe('criticalFunction', () => {
|
|
it('should handle normal case correctly', () => {
|
|
const service = new NewService();
|
|
const result = service.criticalFunction({ input: 'test' });
|
|
|
|
expect(result.status).toBe('success');
|
|
expect(result.data).toBeDefined();
|
|
});
|
|
|
|
it('should handle edge case: empty input', () => {
|
|
const service = new NewService();
|
|
expect(() => service.criticalFunction({}))
|
|
.toThrow('Input required');
|
|
});
|
|
|
|
it('should handle edge case: invalid input type', () => {
|
|
const service = new NewService();
|
|
const result = service.criticalFunction({ input: 123 });
|
|
|
|
expect(result.status).toBe('error');
|
|
expect(result.error).toContain('Expected string');
|
|
});
|
|
});
|
|
});
|
|
```
|
|
|
|
**Testing requirements:**
|
|
- Test normal operation
|
|
- Test edge cases (empty, null, invalid types)
|
|
- Test error conditions
|
|
- Mock external dependencies (MongoDB, APIs)
|
|
- Use descriptive test names
|
|
- One assertion per test (generally)
|
|
|
|
### Integration Tests (For API Changes)
|
|
|
|
```javascript
|
|
// tests/integration/api.newEndpoint.test.js
|
|
const request = require('supertest');
|
|
const app = require('../../src/server');
|
|
const db = require('../helpers/db-test-helper');
|
|
|
|
describe('POST /api/new-endpoint', () => {
|
|
beforeAll(async () => {
|
|
await db.connect();
|
|
});
|
|
|
|
afterAll(async () => {
|
|
await db.cleanup();
|
|
await db.disconnect();
|
|
});
|
|
|
|
it('should create resource successfully', async () => {
|
|
const response = await request(app)
|
|
.post('/api/new-endpoint')
|
|
.send({ data: 'test' })
|
|
.expect(201);
|
|
|
|
expect(response.body.id).toBeDefined();
|
|
|
|
// Verify database state
|
|
const saved = await db.findById(response.body.id);
|
|
expect(saved.data).toBe('test');
|
|
});
|
|
});
|
|
```
|
|
|
|
### Running Tests
|
|
|
|
```bash
|
|
# All tests (current status: 625 pass, 108 fail)
|
|
npm test
|
|
|
|
# Unit tests only
|
|
npm run test:unit
|
|
|
|
# Integration tests only
|
|
npm run test:integration
|
|
|
|
# Watch mode (auto-rerun on changes)
|
|
npm run test:watch
|
|
|
|
# Coverage report
|
|
npm run test:coverage
|
|
```
|
|
|
|
**Expectations:**
|
|
- New code: 100% coverage required
|
|
- Bug fixes: Add test that would have caught the bug
|
|
- Integration tests: Must use test database, not production
|
|
|
|
---
|
|
|
|
## 📚 Documentation Standards
|
|
|
|
### Code Documentation
|
|
|
|
**Use JSDoc for all public functions:**
|
|
|
|
```javascript
|
|
/**
|
|
* Validates a proposed action against stored instruction history
|
|
*
|
|
* This prevents the "27027 failure mode" where LLM training patterns
|
|
* override explicit user instructions (e.g., MongoDB port 27017 vs
|
|
* user's explicit instruction to use 27027).
|
|
*
|
|
* @param {Object} action - Proposed action to validate
|
|
* @param {string} action.type - Action type (e.g., 'database_config')
|
|
* @param {Object} action.parameters - Action-specific parameters
|
|
* @param {Array<Instruction>} instructionHistory - Active instructions
|
|
* @returns {Promise<ValidationResult>} Validation outcome
|
|
* @throws {ValidationError} If action type is unsupported
|
|
*
|
|
* @example
|
|
* const result = await validator.validate({
|
|
* type: 'database_config',
|
|
* parameters: { port: 27017 }
|
|
* }, instructionHistory);
|
|
*
|
|
* if (result.status === 'REJECTED') {
|
|
* console.log(result.reason); // "Training pattern override detected"
|
|
* }
|
|
*/
|
|
async validate(action, instructionHistory) {
|
|
// Implementation...
|
|
}
|
|
```
|
|
|
|
**Comment complex logic:**
|
|
|
|
```javascript
|
|
// Edge case: When context window is 95%+ full, quality degrades rapidly.
|
|
// Empirical observation across 50+ sessions suggests threshold should be
|
|
// 60% for ELEVATED, 75% for HIGH. These values are NOT proven optimal.
|
|
if (tokenUsage > 0.60) {
|
|
// ...
|
|
}
|
|
```
|
|
|
|
### Research Documentation
|
|
|
|
For research contributions, include:
|
|
|
|
1. **Methodology** - How the study was conducted
|
|
2. **Data** - Sample sizes, measurements, statistical methods
|
|
3. **Findings** - What was discovered (with error bars/confidence intervals)
|
|
4. **Limitations** - What the study didn't prove
|
|
5. **Replication** - Enough detail for others to replicate
|
|
|
|
**Example structure:**
|
|
|
|
```markdown
|
|
# Empirical Validation of CrossReferenceValidator
|
|
|
|
## Research Question
|
|
Does the CrossReferenceValidator reduce training pattern override frequency?
|
|
|
|
## Methodology
|
|
- Controlled experiment: 100 test cases with known override patterns
|
|
- Conditions: (A) No validator, (B) Validator enabled
|
|
- LLM: Claude 3.5 Sonnet
|
|
- Measurement: Override rate per 100 interactions
|
|
- Statistical test: Chi-square test for independence
|
|
|
|
## Results
|
|
- Condition A (no validator): 23/100 overrides (23%)
|
|
- Condition B (validator enabled): 3/100 overrides (3%)
|
|
- p < 0.001, effect size: large (Cramér's V = 0.42)
|
|
|
|
## Limitations
|
|
- Single LLM tested (generalization unclear)
|
|
- Synthetic test cases (may not reflect real usage)
|
|
- Short sessions (long-term drift not measured)
|
|
- Observer bias (researcher knew test purpose)
|
|
|
|
## Conclusion
|
|
Strong evidence that validator reduces training pattern overrides in
|
|
controlled conditions with Claude 3.5. Replication with other LLMs
|
|
and real-world deployments needed.
|
|
|
|
## Data & Code
|
|
- Raw data: [link to CSV]
|
|
- Analysis script: [link to R/Python script]
|
|
- Test prompts: [link to test suite]
|
|
```
|
|
|
|
---
|
|
|
|
## ⚖️ Research Ethics & Integrity
|
|
|
|
### Required Standards
|
|
|
|
**Transparency**
|
|
- Acknowledge all limitations
|
|
- Report negative results (what didn't work)
|
|
- Disclose conflicts of interest
|
|
- Share data and methodology
|
|
|
|
**Accuracy**
|
|
- No fabricated statistics or results
|
|
- Clearly distinguish observation from proof
|
|
- Use appropriate statistical methods
|
|
- Acknowledge uncertainty
|
|
|
|
**Attribution**
|
|
- Cite all sources
|
|
- Credit collaborators
|
|
- Acknowledge AI assistance in implementation
|
|
- Reference prior work
|
|
|
|
### What We Reject
|
|
|
|
- ❌ Fabricated data or statistics
|
|
- ❌ Selective reporting (hiding negative results)
|
|
- ❌ Plagiarism or insufficient attribution
|
|
- ❌ Overclaiming ("proves", "guarantees" without rigorous evidence)
|
|
- ❌ Undisclosed conflicts of interest
|
|
|
|
### AI-Assisted Contributions
|
|
|
|
**We welcome AI-assisted contributions** with proper disclosure:
|
|
|
|
```
|
|
This code was generated with assistance from [Claude/GPT-4/etc] and
|
|
subsequently reviewed and tested by [human contributor name].
|
|
|
|
Testing: [description of validation performed]
|
|
```
|
|
|
|
Be honest about:
|
|
- What the AI generated vs. what you wrote
|
|
- What testing/validation you performed
|
|
- Any limitations you're aware of
|
|
|
|
---
|
|
|
|
## 🚫 What We Don't Accept
|
|
|
|
### Technical
|
|
|
|
- Code without tests
|
|
- Breaking changes without migration path
|
|
- Commits that reduce test coverage
|
|
- Violations of existing architectural patterns
|
|
- Features that bypass safety constraints
|
|
|
|
### Process
|
|
|
|
- PRs without description or context
|
|
- Unconstructive criticism without alternatives
|
|
- Ignoring review feedback
|
|
- Force-pushing over maintainer commits
|
|
|
|
### Content
|
|
|
|
- Disrespectful or discriminatory language
|
|
- Marketing hyperbole or unsubstantiated claims
|
|
- Promises of features/capabilities that don't exist
|
|
- Plagiarized content
|
|
|
|
---
|
|
|
|
## 📞 Getting Help
|
|
|
|
**Technical Questions**
|
|
- Open a GitHub Discussion (preferred)
|
|
- Tag with appropriate label (`question`, `help-wanted`)
|
|
|
|
**Research Collaboration**
|
|
- Email: research@agenticgovernance.digital
|
|
- Include: Research question, proposed methodology, timeline
|
|
|
|
**Bug Reports**
|
|
- Open GitHub Issue
|
|
- Include: Steps to reproduce, expected vs actual behavior, environment
|
|
|
|
**Security Issues**
|
|
- Email: research@agenticgovernance.digital
|
|
- Do NOT open public issue for security vulnerabilities
|
|
|
|
---
|
|
|
|
## 🏆 Recognition
|
|
|
|
Contributors are acknowledged through:
|
|
|
|
**Code Contributors**
|
|
- GitHub contributors list (automatic)
|
|
- Release notes for significant contributions
|
|
- In-code attribution for major features
|
|
|
|
**Research Contributors**
|
|
- Co-authorship on papers (if applicable)
|
|
- Citation in research documentation
|
|
- Acknowledgment in published materials
|
|
|
|
**All forms of contribution are valued** - code, documentation, research, community support, and critical feedback all advance the project.
|
|
|
|
---
|
|
|
|
## 📜 License
|
|
|
|
By contributing, you agree that your contributions will be licensed under Apache License 2.0 (see LICENSE file).
|
|
|
|
You retain copyright to your contributions. The Apache 2.0 license grants the project and users broad permissions while protecting contributors from liability.
|
|
|
|
---
|
|
|
|
## 🎓 Learning Resources
|
|
|
|
### For New Contributors
|
|
|
|
**Start here:**
|
|
1. Read [README.md](README.md) - Understand project goals and current state
|
|
2. Browse [existing issues](https://github.com/AgenticGovernance/tractatus-framework/issues) - See what needs work
|
|
3. Review [test files](tests/) - Understand code patterns
|
|
4. Try [local setup](#local-development-setup) - Get environment working
|
|
|
|
**Recommended reading:**
|
|
- March & Simon - *Organizations* (1958) - Organizational decision theory foundations
|
|
- Isaiah Berlin - *Two Concepts of Liberty* (1958) - Value pluralism
|
|
- Ruth Chang - *Hard Choices* (2013) - Incommensurability theory
|
|
|
|
**Project-specific:**
|
|
- [Case Studies](https://agenticgovernance.digital/docs.html) - Real-world examples
|
|
- [API Documentation](https://agenticgovernance.digital/docs.html) - Technical reference
|
|
- Existing tests - Best way to understand how code works
|
|
|
|
### For Researchers
|
|
|
|
**Academic context:**
|
|
- AI safety through architectural constraints (vs. alignment)
|
|
- Value pluralism in AI system design
|
|
- Organizational theory applied to AI governance
|
|
- Empirical validation of governance frameworks
|
|
|
|
**Open research questions:**
|
|
- What is the optimal rule count before brittleness?
|
|
- Can boundary detection be made more precise?
|
|
- Does this generalize beyond software development contexts?
|
|
- How to measure framework effectiveness rigorously?
|
|
|
|
---
|
|
|
|
**Thank you for contributing to architectural AI safety research.**
|
|
|
|
*Last updated: 2025-10-21*
|