GitHub - paiml/paiml-mcp-agent-toolkit: Pragmatic AI Labs MCP Agent Toolkit - An MCP Server designed to make code with agents more deterministic (original) (raw)

PMAT


What is PMAT?

PMAT (Pragmatic Multi-language Agent Toolkit) provides everything needed to analyze code quality and generate AI-ready context:

Part of the PAIML Stack, following Toyota Way quality principles (Jidoka, Genchi Genbutsu, Kaizen).

Getting Started

Add to your system:

Install from crates.io

cargo install pmat

Or from source (latest)

git clone https://github.com/paiml/paiml-mcp-agent-toolkit cd paiml-mcp-agent-toolkit && cargo install --path server

Basic Usage

Generate AI-ready context

pmat context --output context.md --format llm-optimized

Analyze code complexity

pmat analyze complexity

Grade technical debt (A+ through F)

pmat analyze tdg

Score repository health

pmat repo-score .

Run mutation testing

pmat mutate --target src/

MCP Server Mode

Start MCP server for Claude Code, Cline, etc.

pmat mcp

Features

Context Generation

Generate comprehensive context for AI assistants:

pmat context # Basic analysis pmat context --format llm-optimized # AI-optimized output pmat context --include-tests # Include test files

Technical Debt Grading (TDG)

Six orthogonal metrics for accurate quality assessment:

pmat analyze tdg # Project-wide grade pmat analyze tdg --include-components # Per-component breakdown pmat tdg baseline create # Create quality baseline pmat tdg check-regression # Detect quality degradation

Grading Scale:

Mutation Testing

Validate test suite effectiveness:

pmat mutate --target src/lib.rs # Single file pmat mutate --target src/ --threshold 85 # Quality gate pmat mutate --failures-only # CI optimization

Supported Languages: Rust, Python, TypeScript, JavaScript, Go, C++

Repository Health Scoring

Evidence-based quality metrics (0-211 scale):

pmat rust-project-score # Fast mode (3 min) pmat rust-project-score --full # Comprehensive (10-15 min) pmat repo-score . --deep # Full git history

Workflow Prompts

Pre-configured AI prompts enforcing EXTREME TDD:

pmat prompt --list # Available prompts pmat prompt code-coverage # 85%+ coverage enforcement pmat prompt debug # Five Whys analysis pmat prompt quality-enforcement # All quality gates

Git Hooks

Automatic quality enforcement:

pmat hooks install # Install pre-commit hooks pmat hooks install --tdg-enforcement # With TDG quality gates pmat hooks status # Check hook status

Examples

Generate Context for AI

For Claude Code

pmat context --output context.md --format llm-optimized

With semantic search

pmat embed sync ./src pmat semantic search "error handling patterns"

CI/CD Integration

.github/workflows/quality.yml

name: Quality Gates on: [push, pull_request]

jobs: quality: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: cargo install pmat - run: pmat analyze tdg --fail-on-violation --min-grade B - run: pmat mutate --target src/ --threshold 80

Quality Baseline Workflow

1. Create baseline

pmat tdg baseline create --output .pmat/baseline.json

2. Check for regressions

pmat tdg check-regression
--baseline .pmat/baseline.json
--max-score-drop 5.0
--fail-on-regression

Architecture

pmat/
├── server/           CLI and MCP server
│   ├── src/
│   │   ├── cli/      Command handlers
│   │   ├── services/ Analysis engines
│   │   ├── mcp/      MCP protocol
│   │   └── tdg/      Technical Debt Grading
├── crates/
│   └── pmat-dashboard/  Pure WASM dashboard
└── docs/
    └── specifications/  Technical specs

Quality

Metric Value
Tests 2500+ passing
Coverage >85%
Mutation Score >80%
Languages 17+ supported
MCP Tools 19 available

Falsifiable Quality Commitments

Per Popper's demarcation criterion, all claims are measurable and testable:

Commitment Threshold Verification Method
Context Generation < 5 seconds for 10K LOC project time pmat context on test corpus
Memory Usage < 500 MB for 100K LOC analysis Measured via heaptrack in CI
Test Coverage ≥ 85% line coverage cargo llvm-cov (CI enforced)
Mutation Score ≥ 80% killed mutants pmat mutate --threshold 80
Build Time < 3 minutes incremental cargo build --timings
CI Pipeline < 15 minutes total GitHub Actions workflow timing
Binary Size < 50 MB release binary ls -lh target/release/pmat
Language Parsers All 17 languages parse without panic Fuzz testing in CI

How to Verify:

Run self-assessment with Popper Falsifiability Score

pmat popper-score --verbose

Individual commitment verification

cargo llvm-cov --html # Coverage ≥85% pmat mutate --threshold 80 # Mutation ≥80% cargo build --timings # Build time <3min

Failure = Regression: Any commitment violation blocks CI merge.

Benchmark Results (Statistical Rigor)

All benchmarks use Criterion.rs with proper statistical methodology:

Operation Mean 95% CI Std Dev Sample Size
Context (1K LOC) 127ms [124, 130] ±12.3ms n=1000 runs
Context (10K LOC) 1.84s [1.79, 1.90] ±156ms n=500 runs
TDG Scoring 156ms [148, 164] ±18.2ms n=500 runs
Complexity Analysis 23ms [22, 24] ±3.1ms n=1000 runs

Comparison Baselines (vs. Alternatives):

Metric PMAT ctags tree-sitter Effect Size
10K LOC parsing 1.84s 0.3s 0.8s d=0.72 (medium)
Memory (10K LOC) 287MB 45MB 120MB -
Semantic depth Full Syntax only AST only -

See docs/BENCHMARKS.md for complete statistical analysis.

ML/AI Reproducibility

PMAT uses ML for semantic search and embeddings. All ML operations are reproducible:

Random Seed Management:

Model Artifacts:

Dataset Sources

PMAT does not train models but uses these data sources for evaluation:

Dataset Source Purpose Size
CodeSearchNet GitHub/Microsoft Semantic search benchmarks 2M functions
PMAT-bench Internal Regression testing 500 queries

Data provenance and licensing documented in docs/ml/REPRODUCIBILITY.md.

PAIML Stack

Library Purpose Version
trueno SIMD tensor operations 0.7.3
entrenar Training & optimization 0.2.3
aprender ML algorithms 0.14.0
realizar GGUF inference 0.2.1
pmat Code analysis toolkit 2.209.0

Documentation

License

MIT License - see LICENSE for details.


Built with Extreme TDD | Part of PAIML