Your inference isn't performing at its peak

Code says streaming enabled. Runtime shows 0% streams.
That's why your p95 latency is 2.4s instead of 400ms.

See drift detection30 seconds. No signup.

What PeakInfer reveals

This is from a real codebase. What's in yours?

peakinfer — drift detection

See drift detection in action

$4,200
Monthly cost waste found
5x
Latency from broken streaming
0
Error handling on critical paths

Terminal

Free forever

1. Install

npm install -g @kalmantic/peakinfer

2. Set your API key

export ANTHROPIC_API_KEY=sk-ant-...

3. Run on your code

peakinfer analyze ./src

Results in 30 seconds.

Optional: Add runtime events for drift detection

peakinfer analyze ./src --events prod.jsonl
View full documentation →

VS Code

50 free credits

1. Install extension

Search "PeakInfer" in VS Code Extensions

Or run:

code --install-extension kalmantic.peakinfer

2. Get your token

peakinfer.com/dashboard → Sign in with GitHub → Generate token

50 free credits. No credit card required.

3. Analyze

Open any file → Cmd+Shift+P → "PeakInfer: Analyze Current File"

Issues appear as inline diagnostics.

View on VS Code Marketplace →

Claude (MCP)

Free forever

1. Add to Claude config

Edit ~/.config/claude/claude_desktop_config.json

{
  "mcpServers": {
    "peakinfer": {
      "command": "npx",
      "args": ["@kalmantic/peakinfer-mcp"]
    }
  }
}

2. Restart Claude Desktop or Claude Code

3. Ask Claude

"Analyze this file for LLM inference performance issues"

Works with your existing Anthropic API key.

View MCP server documentation →

GitHub Action

50 free credits

1. Get your token

peakinfer.com/dashboard → Sign in with GitHub → Generate token

50 free credits. No credit card required.

2. Add secret to repo

Settings → Secrets and variables → Actions → New repository secret

Name: PEAKINFER_TOKEN

3. Add workflow file

Create .github/workflows/peakinfer.yml

name: PeakInfer
on: [pull_request]
jobs:
  analyze:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: Kalmantic/peakinfer-action@v1
        with:
          token: ${{ secrets.PEAKINFER_TOKEN }}

Results posted as PR comments.

View GitHub Action documentation →

What teams find

"We had streaming=true for 6 months but it wasn't actually streaming. Our p95 was 3x what it should have been."

— Engineering Lead, Series B

"Retry logic was there but never fired because the exception type changed in the new SDK version."

— Staff Engineer, Fintech

"We were paying for gpt-4 but the fallback to gpt-3.5 was triggering on 40% of requests. Config bug."

— AI Engineer, Healthcare