tractatus/CONTRIBUTING.md

# Contributing to Tractatus Framework

**Status:** Alpha Research Project (October 2025)

Thank you for your interest in contributing to architectural AI safety research. Tractatus welcomes contributions that advance our understanding of structural constraints in AI systems.

---

## 🎯 What We're Building

Tractatus explores whether **architectural constraints** can make certain AI decisions structurally impossible without human judgment. Unlike alignment-based approaches that hope AI will choose safety, we investigate whether safety can be enforced through system architecture.

**This is active research, not production software.**

We welcome contributions that:
- Advance research questions with empirical rigor
- Improve implementation quality and test coverage
- Document real-world failure modes and responses
- Challenge assumptions with evidence
- Replicate findings in new contexts

---

## 🔬 Types of Contributions

### Research Contributions (Highest Value)

**Empirical Studies**
- Controlled experiments testing framework effectiveness
- Comparative analysis with baseline (no framework) conditions
- Measurement of false positive/false negative rates
- Cross-LLM compatibility testing (GPT-4, Gemini, open-source models)
- Multi-domain generalization studies

**Theoretical Work**
- Formal verification of safety properties
- Proofs of correctness for specific boundary conditions
- Extensions of value pluralism theory to AI systems
- Analysis of rule proliferation dynamics

**Replication Studies**
- Independent validation of our findings
- Testing in different application domains
- Deployment in production contexts with documented results

**Format**: Submit as GitHub issue tagged `research` with methodology, data, and findings. We'll work with you on publication if results are significant.

### Implementation Contributions

**High Priority**
1. **Fix failing tests** - We have 108 known failures that need investigation
2. **Improve test coverage** - Focus on edge cases and integration scenarios
3. **Performance optimization** - Rule validation overhead, MongoDB query efficiency
4. **Cross-platform testing** - Windows, macOS compatibility verification

**Medium Priority**
- Language ports (Python, Rust, Go, TypeScript)
- Integration examples (Express, FastAPI, Spring Boot)
- Enhanced logging and observability
- API documentation improvements

**Lower Priority**
- UI enhancements (currently minimal by design)
- Developer experience improvements
- Build system optimizations

### Documentation Contributions

**Critical Needs**
- Case studies from real deployments (with data)
- Failure mode documentation (what went wrong and why)
- Integration tutorials with working code examples
- Critical analyses of framework limitations

**Standard Needs**
- Corrections to existing documentation
- Clarity improvements
- Code comment additions
- API reference updates

---

## 🚀 Getting Started

### Prerequisites

**Required**
- Node.js 18+ (tested on 18.x and 20.x)
- MongoDB 7.0+ (critical - earlier versions have compatibility issues)
- Git
- 8GB RAM minimum (for local MongoDB + tests)

**Helpful**
- Understanding of organizational decision theory (March & Simon)
- Familiarity with value pluralism (Berlin, Chang)
- Experience with LLM-assisted development contexts

### Local Development Setup

```bash
# 1. Fork and clone
git clone git@github.com:YOUR_USERNAME/tractatus-framework.git
cd tractatus-framework

# 2. Install dependencies
npm install

# 3. Set up environment
cp .env.example .env
# Edit .env - ensure MongoDB connection string is correct

# 4. Start MongoDB (if not running)
# macOS: brew services start mongodb-community
# Ubuntu: sudo systemctl start mongod
# Windows: net start MongoDB

# 5. Initialize database with test data
npm run init:db

# 6. Run tests to verify setup
npm test

# Expected: 625 passing, 108 failing (known issues)
# If you get different numbers, something is wrong

# 7. Start development server
npm start
# Runs on http://localhost:9000
```

### Project Structure

```
tractatus-framework/
├── src/
│   ├── services/              # 6 core framework components
│   │   ├── InstructionPersistenceClassifier.service.js
│   │   ├── CrossReferenceValidator.service.js
│   │   ├── BoundaryEnforcer.service.js
│   │   ├── ContextPressureMonitor.service.js
│   │   ├── MetacognitiveVerifier.service.js
│   │   └── PluralisticDeliberationOrchestrator.service.js
│   ├── models/                # MongoDB schemas
│   ├── routes/                # API endpoints
│   ├── controllers/           # Request handlers
│   ├── middleware/            # Express middleware
│   └── server.js              # Application entry point
├── tests/
│   ├── unit/                  # Service unit tests
│   └── integration/           # API integration tests
├── public/                    # Frontend (vanilla JS, no framework)
├── docs/                      # Research documentation
└── scripts/                   # Utilities and migrations
```

**Key files to understand:**
- `src/services/ContextPressureMonitor.service.js` - Session health tracking (good entry point)
- `src/services/CrossReferenceValidator.service.js` - Training pattern override detection
- `tests/unit/ContextPressureMonitor.test.js` - Example test structure
- `.env.example` - Required configuration variables

---

## 📝 Contribution Process

### 1. Before You Start

**For significant work** (new features, architectural changes, research studies):
1. Open a GitHub Discussion or Issue first
2. Describe your proposal with:
   - Problem being addressed
   - Proposed approach
   - Expected outcomes
   - Resource requirements
3. Wait for feedback before investing significant time

**For minor fixes** (typos, small bugs, documentation corrections):
- Just submit a PR with clear description

### 2. Development Workflow

```bash
# Create feature branch
git checkout -b research/empirical-validation-study
# or
git checkout -b fix/mongodb-connection-pool
# or
git checkout -b docs/integration-tutorial

# Make changes iteratively
# ... edit files ...

# Run tests frequently
npm test

# Verify no regressions
npm run test:unit
npm run test:integration

# Commit with clear messages
git add .
git commit -m "fix(validation): resolve race condition in CrossReferenceValidator

Issue: Concurrent validation requests caused inconsistent results
Root cause: Shared state in validator instance
Solution: Make validation stateless, pass context explicitly

Tested with 100 concurrent requests - no failures

Fixes #123"

# Push to your fork
git push origin research/empirical-validation-study
```

### 3. Pull Request Guidelines

**Title Format:**
```
type(scope): brief description

Examples:
fix(tests): resolve MongoDB connection timeout in integration tests
feat(validation): add configurable threshold for context pressure
docs(README): correct test count and clarify maturity status
research(replication): independent validation of 27027 failure mode
```

**Types:**
- `fix` - Bug fixes
- `feat` - New features
- `docs` - Documentation only
- `test` - Test additions/fixes
- `refactor` - Code restructuring
- `research` - Research contributions
- `chore` - Build/tooling changes

**PR Description Must Include:**

```markdown
## Problem
Clear description of what issue this addresses

## Solution
How you solved it and why this approach

## Testing
What tests were added/modified
How you verified the fix

## Breaking Changes
List any breaking changes (or "None")

## Research Context (if applicable)
Methodology, data, findings

## Checklist
- [ ] Tests added/updated
- [ ] All tests passing locally
- [ ] Documentation updated
- [ ] No unintended breaking changes
- [ ] Commit messages follow conventions
```

### 4. Code Review Process

1. **Automated checks** run first (tests, linting)
2. **Maintainer review** for:
   - Alignment with research goals
   - Code quality and test coverage
   - Documentation completeness
   - Architectural consistency
3. **Feedback** provided within 7 days (usually faster)
4. **Iteration** if changes needed
5. **Merge** when approved

**Review criteria:**
- Does this advance research questions?
- Is it tested thoroughly?
- Is documentation clear and honest?
- Does it maintain architectural integrity?

---

## 🧪 Testing Standards

### Unit Tests (Required)

**Every new function/method must have unit tests.**

```javascript
// tests/unit/NewService.test.js
const { NewService } = require('../../src/services/NewService.service');

describe('NewService', () => {
  describe('criticalFunction', () => {
    it('should handle normal case correctly', () => {
      const service = new NewService();
      const result = service.criticalFunction({ input: 'test' });

      expect(result.status).toBe('success');
      expect(result.data).toBeDefined();
    });

    it('should handle edge case: empty input', () => {
      const service = new NewService();
      expect(() => service.criticalFunction({}))
        .toThrow('Input required');
    });

    it('should handle edge case: invalid input type', () => {
      const service = new NewService();
      const result = service.criticalFunction({ input: 123 });

      expect(result.status).toBe('error');
      expect(result.error).toContain('Expected string');
    });
  });
});
```

**Testing requirements:**
- Test normal operation
- Test edge cases (empty, null, invalid types)
- Test error conditions
- Mock external dependencies (MongoDB, APIs)
- Use descriptive test names
- One assertion per test (generally)

### Integration Tests (For API Changes)

```javascript
// tests/integration/api.newEndpoint.test.js
const request = require('supertest');
const app = require('../../src/server');
const db = require('../helpers/db-test-helper');

describe('POST /api/new-endpoint', () => {
  beforeAll(async () => {
    await db.connect();
  });

  afterAll(async () => {
    await db.cleanup();
    await db.disconnect();
  });

  it('should create resource successfully', async () => {
    const response = await request(app)
      .post('/api/new-endpoint')
      .send({ data: 'test' })
      .expect(201);

    expect(response.body.id).toBeDefined();

    // Verify database state
    const saved = await db.findById(response.body.id);
    expect(saved.data).toBe('test');
  });
});
```

### Running Tests

```bash
# All tests (current status: 625 pass, 108 fail)
npm test

# Unit tests only
npm run test:unit

# Integration tests only
npm run test:integration

# Watch mode (auto-rerun on changes)
npm run test:watch

# Coverage report
npm run test:coverage
```

**Expectations:**
- New code: 100% coverage required
- Bug fixes: Add test that would have caught the bug
- Integration tests: Must use test database, not production

---

## 📚 Documentation Standards

### Code Documentation

**Use JSDoc for all public functions:**

```javascript
/**
 * Validates a proposed action against stored instruction history
 *
 * This prevents the "27027 failure mode" where LLM training patterns
 * override explicit user instructions (e.g., MongoDB port 27017 vs
 * user's explicit instruction to use 27027).
 *
 * @param {Object} action - Proposed action to validate
 * @param {string} action.type - Action type (e.g., 'database_config')
 * @param {Object} action.parameters - Action-specific parameters
 * @param {Array<Instruction>} instructionHistory - Active instructions
 * @returns {Promise<ValidationResult>} Validation outcome
 * @throws {ValidationError} If action type is unsupported
 *
 * @example
 * const result = await validator.validate({
 *   type: 'database_config',
 *   parameters: { port: 27017 }
 * }, instructionHistory);
 *
 * if (result.status === 'REJECTED') {
 *   console.log(result.reason); // "Training pattern override detected"
 * }
 */
async validate(action, instructionHistory) {
  // Implementation...
}
```

**Comment complex logic:**

```javascript
// Edge case: When context window is 95%+ full, quality degrades rapidly.
// Empirical observation across 50+ sessions suggests threshold should be
// 60% for ELEVATED, 75% for HIGH. These values are NOT proven optimal.
if (tokenUsage > 0.60) {
  // ...
}
```

### Research Documentation

For research contributions, include:

1. **Methodology** - How the study was conducted
2. **Data** - Sample sizes, measurements, statistical methods
3. **Findings** - What was discovered (with error bars/confidence intervals)
4. **Limitations** - What the study didn't prove
5. **Replication** - Enough detail for others to replicate

**Example structure:**

```markdown
# Empirical Validation of CrossReferenceValidator

## Research Question
Does the CrossReferenceValidator reduce training pattern override frequency?

## Methodology
- Controlled experiment: 100 test cases with known override patterns
- Conditions: (A) No validator, (B) Validator enabled
- LLM: Claude 3.5 Sonnet
- Measurement: Override rate per 100 interactions
- Statistical test: Chi-square test for independence

## Results
- Condition A (no validator): 23/100 overrides (23%)
- Condition B (validator enabled): 3/100 overrides (3%)
- p < 0.001, effect size: large (Cramér's V = 0.42)

## Limitations
- Single LLM tested (generalization unclear)
- Synthetic test cases (may not reflect real usage)
- Short sessions (long-term drift not measured)
- Observer bias (researcher knew test purpose)

## Conclusion
Strong evidence that validator reduces training pattern overrides in
controlled conditions with Claude 3.5. Replication with other LLMs
and real-world deployments needed.

## Data & Code
- Raw data: [link to CSV]
- Analysis script: [link to R/Python script]
- Test prompts: [link to test suite]
```

---

## ⚖️ Research Ethics & Integrity

### Required Standards

**Transparency**
- Acknowledge all limitations
- Report negative results (what didn't work)
- Disclose conflicts of interest
- Share data and methodology

**Accuracy**
- No fabricated statistics or results
- Clearly distinguish observation from proof
- Use appropriate statistical methods
- Acknowledge uncertainty

**Attribution**
- Cite all sources
- Credit collaborators
- Acknowledge AI assistance in implementation
- Reference prior work

### What We Reject

- ❌ Fabricated data or statistics
- ❌ Selective reporting (hiding negative results)
- ❌ Plagiarism or insufficient attribution
- ❌ Overclaiming ("proves", "guarantees" without rigorous evidence)
- ❌ Undisclosed conflicts of interest

### AI-Assisted Contributions

**We welcome AI-assisted contributions** with proper disclosure:

```
This code was generated with assistance from [Claude/GPT-4/etc] and
subsequently reviewed and tested by [human contributor name].

Testing: [description of validation performed]
```

Be honest about:
- What the AI generated vs. what you wrote
- What testing/validation you performed
- Any limitations you're aware of

---

## 🚫 What We Don't Accept

### Technical

- Code without tests
- Breaking changes without migration path
- Commits that reduce test coverage
- Violations of existing architectural patterns
- Features that bypass safety constraints

### Process

- PRs without description or context
- Unconstructive criticism without alternatives
- Ignoring review feedback
- Force-pushing over maintainer commits

### Content

- Disrespectful or discriminatory language
- Marketing hyperbole or unsubstantiated claims
- Promises of features/capabilities that don't exist
- Plagiarized content

---

## 📞 Getting Help

**Technical Questions**
- Open a GitHub Discussion (preferred)
- Tag with appropriate label (`question`, `help-wanted`)

**Research Collaboration**
- Email: research@agenticgovernance.digital
- Include: Research question, proposed methodology, timeline

**Bug Reports**
- Open GitHub Issue
- Include: Steps to reproduce, expected vs actual behavior, environment

**Security Issues**
- Email: research@agenticgovernance.digital
- Do NOT open public issue for security vulnerabilities

---

## 🏆 Recognition

Contributors are acknowledged through:

**Code Contributors**
- GitHub contributors list (automatic)
- Release notes for significant contributions
- In-code attribution for major features

**Research Contributors**
- Co-authorship on papers (if applicable)
- Citation in research documentation
- Acknowledgment in published materials

**All forms of contribution are valued** - code, documentation, research, community support, and critical feedback all advance the project.

---

## 📜 License

By contributing, you agree that your contributions will be licensed under Apache License 2.0 (see LICENSE file).

You retain copyright to your contributions. The Apache 2.0 license grants the project and users broad permissions while protecting contributors from liability.

---

## 🎓 Learning Resources

### For New Contributors

**Start here:**
1. Read [README.md](README.md) - Understand project goals and current state
2. Browse [existing issues](https://github.com/AgenticGovernance/tractatus-framework/issues) - See what needs work
3. Review [test files](tests/) - Understand code patterns
4. Try [local setup](#local-development-setup) - Get environment working

**Recommended reading:**
- March & Simon - *Organizations* (1958) - Organizational decision theory foundations
- Isaiah Berlin - *Two Concepts of Liberty* (1958) - Value pluralism
- Ruth Chang - *Hard Choices* (2013) - Incommensurability theory

**Project-specific:**
- [Case Studies](https://agenticgovernance.digital/docs.html) - Real-world examples
- [API Documentation](https://agenticgovernance.digital/docs.html) - Technical reference
- Existing tests - Best way to understand how code works

### For Researchers

**Academic context:**
- AI safety through architectural constraints (vs. alignment)
- Value pluralism in AI system design
- Organizational theory applied to AI governance
- Empirical validation of governance frameworks

**Open research questions:**
- What is the optimal rule count before brittleness?
- Can boundary detection be made more precise?
- Does this generalize beyond software development contexts?
- How to measure framework effectiveness rigorously?

---

**Thank you for contributing to architectural AI safety research.**

*Last updated: 2025-10-21*