Your inference isn't performing at its peak

Code says streaming enabled. Runtime shows 0% streams.
That's why your p95 latency is 2.4s instead of 400ms.

See drift detection30 seconds. No signup.

What PeakInfer reveals

This is from a real codebase. What's in yours?

peakinfer — drift detection

See drift detection in action

$4,200

Monthly cost waste found

Latency from broken streaming

Error handling on critical paths

Run on your code

Run it where you work

Same analysis. Your preferred environment.

Terminal

Your API key. Free forever.

npm i -g @kalmantic/peakinfer

VS Code

50 free credits. No card.

Install from Marketplace

Claude

MCP server. Free forever.

Works in Desktop & Code

GitHub

50 free credits. Every PR.

Add workflow file

Terminal

Free forever

1. Install

npm install -g @kalmantic/peakinfer

2. Set your API key

export ANTHROPIC_API_KEY=sk-ant-...

3. Run on your code

peakinfer analyze ./src

Results in 30 seconds.

Optional: Add runtime events for drift detection

peakinfer analyze ./src --events prod.jsonl

View full documentation →

VS Code

50 free credits

1. Install extension

Search "PeakInfer" in VS Code Extensions

Or run:

code --install-extension kalmantic.peakinfer

2. Get your token

peakinfer.com/dashboard → Sign in with GitHub → Generate token

50 free credits. No credit card required.

3. Analyze

Open any file → Cmd+Shift+P → "PeakInfer: Analyze Current File"

Issues appear as inline diagnostics.

View on VS Code Marketplace →

Claude (MCP)

Free forever

1. Add to Claude config

Edit ~/.config/claude/claude_desktop_config.json

{
  "mcpServers": {
    "peakinfer": {
      "command": "npx",
      "args": ["@kalmantic/peakinfer-mcp"]
    }
  }
}

2. Restart Claude Desktop or Claude Code

3. Ask Claude

"Analyze this file for LLM inference performance issues"

Works with your existing Anthropic API key.

View MCP server documentation →

GitHub Action

50 free credits

1. Get your token

peakinfer.com/dashboard → Sign in with GitHub → Generate token

50 free credits. No credit card required.

2. Add secret to repo

Settings → Secrets and variables → Actions → New repository secret

Name: PEAKINFER_TOKEN

3. Add workflow file

Create .github/workflows/peakinfer.yml

name: PeakInfer
on: [pull_request]
jobs:
  analyze:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: Kalmantic/peakinfer-action@v1
        with:
          token: ${{ secrets.PEAKINFER_TOKEN }}

Results posted as PR comments.

View GitHub Action documentation →

What teams find

"We had streaming=true for 6 months but it wasn't actually streaming. Our p95 was 3x what it should have been."

— Engineering Lead, Series B

"Retry logic was there but never fired because the exception type changed in the new SDK version."

— Staff Engineer, Fintech

"We were paying for gpt-4 but the fallback to gpt-3.5 was triggering on 40% of requests. Config bug."

— AI Engineer, Healthcare