Your inference isn't performing at its peak
Code says streaming enabled. Runtime shows 0% streams.
That's why your p95 latency is 2.4s instead of 400ms.
What PeakInfer reveals
This is from a real codebase. What's in yours?
Run it where you work
Same analysis. Your preferred environment.
Terminal
Free forever1. Install
npm install -g @kalmantic/peakinfer2. Set your API key
export ANTHROPIC_API_KEY=sk-ant-...3. Run on your code
peakinfer analyze ./srcResults in 30 seconds.
Optional: Add runtime events for drift detection
peakinfer analyze ./src --events prod.jsonlVS Code
50 free credits1. Install extension
Search "PeakInfer" in VS Code Extensions
Or run:
code --install-extension kalmantic.peakinfer2. Get your token
peakinfer.com/dashboard → Sign in with GitHub → Generate token
50 free credits. No credit card required.
3. Analyze
Open any file → Cmd+Shift+P → "PeakInfer: Analyze Current File"
Issues appear as inline diagnostics.
View on VS Code Marketplace →Claude (MCP)
Free forever1. Add to Claude config
Edit ~/.config/claude/claude_desktop_config.json
{
"mcpServers": {
"peakinfer": {
"command": "npx",
"args": ["@kalmantic/peakinfer-mcp"]
}
}
}2. Restart Claude Desktop or Claude Code
3. Ask Claude
"Analyze this file for LLM inference performance issues"
Works with your existing Anthropic API key.
View MCP server documentation →GitHub Action
50 free credits1. Get your token
peakinfer.com/dashboard → Sign in with GitHub → Generate token
50 free credits. No credit card required.
2. Add secret to repo
Settings → Secrets and variables → Actions → New repository secret
Name: PEAKINFER_TOKEN
3. Add workflow file
Create .github/workflows/peakinfer.yml
name: PeakInfer
on: [pull_request]
jobs:
analyze:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: Kalmantic/peakinfer-action@v1
with:
token: ${{ secrets.PEAKINFER_TOKEN }}Results posted as PR comments.
View GitHub Action documentation →What teams find
"We had streaming=true for 6 months but it wasn't actually streaming. Our p95 was 3x what it should have been."
— Engineering Lead, Series B
"Retry logic was there but never fired because the exception type changed in the new SDK version."
— Staff Engineer, Fintech
"We were paying for gpt-4 but the fallback to gpt-3.5 was triggering on 40% of requests. Config bug."
— AI Engineer, Healthcare