v1.0 — open source & free

Compare AI models side-by-side
in your terminal

One prompt, multiple models, real-time streaming, performance stats, and an AI judge — all in a single command.

npx yardstiq "your prompt" -m claude-sonnet -m gpt-4o

terminal

Everything you need to compare models

Stop copying prompts between tabs. One command gives you streaming comparisons, hard numbers, and AI-powered evaluation.

⚡

Side-by-Side Streaming

Watch model outputs appear in parallel, in real time. No more tab-switching between chat windows.

🤖

40+ Models

Claude, GPT, Gemini, Llama, DeepSeek, Mistral, Grok — every major model in one tool.

📊

Performance Stats

Time to first token, throughput, token counts, and cost per model. Data, not vibes.

⚖️

AI Judge

Let an AI evaluate which response wins with scored verdicts and reasoning.

📁

Export Anywhere

JSON for pipelines, Markdown for docs, self-contained HTML for sharing.

🧪

Benchmark Suites

Define prompt suites in YAML and run them across models with aggregate scoring.

🏠

Local Models

Compare Ollama models with zero API cost. Your hardware, your data, your rules.

🔑

Flexible Auth

One Vercel AI Gateway key for everything, or individual provider keys. Mix and match.

Up and running in 60 seconds

No config files. No web UI. Just your terminal.

Install (or just use npx)

npm install -g yardstiq

# or skip install entirely
npx yardstiq "your prompt" -m claude-sonnet -m gpt-4o

Configure your API keys

# Interactive setup — walks you through it
yardstiq setup

# Or configure a single provider directly
yardstiq setup --provider gateway

# Prefer env vars? That works too
export AI_GATEWAY_API_KEY=your_key
export ANTHROPIC_API_KEY=sk-ant-...

Compare models

# Basic comparison
yardstiq "Explain monads" -m claude-sonnet -m gpt-4o

# With AI judge
yardstiq "Write a sort algorithm" -m claude-sonnet -m gpt-4o --judge

# Three models + export
yardstiq "Explain DNS" -m claude-sonnet -m gpt-4o -m gemini-flash --json > results.json

Go local (optional)

# No API key needed — just run Ollama
yardstiq "hello" -m local:llama3.2 -m local:mistral

Real benchmarks, not marketing

Run your own benchmark suites with YAML configs. Here's a sample across coding, creative writing, and reasoning tasks.

# benchmark.yaml
name: model-showdown
prompts:
  - "Write a Python fibonacci with memoization"
  - "Explain quantum entanglement to a 10-year-old"
  - "Debug this async race condition: ..."
models:
  - claude-sonnet
  - gpt-4o
  - gemini-flash
judge: true

yardstiq benchmark run benchmark.yaml --json

Model	Coding	Creative	Reasoning	Speed	Cost/req
Claude Sonnet	92	88	94	69 t/s	$0.0013
GPT-4o	89	85	90	48 t/s	$0.0010
Gemini Flash	84	82	86	112 t/s	$0.0004
Llama 3.1 70B	81	79	83	35 t/s	$0.0000

Sample results — run your own benchmarks to get real numbers for your use case

Stop guessing. Start measuring.

Join developers who use yardstiq to make data-driven model decisions.

Star on GitHub View on npm

Compare AI models side-by-sidein your terminal