Back to home
Engine #27 · Live · Real-Time

Token Arbitrage in real time.

We watch 12 LLM providers across 33 models, detect spot bursts within seconds, and route every token to the cheapest provider that still meets your quality bar.

Providers tracked
12
Models priced live
33
Spot bursts (24h)
43
Active right now
33

Live routing decision

Pick a task class. We evaluate every model against the quality floor for that task and return the cheapest viable provider — recomputed on every price tick.

Winner
openai
gpt-4o-mini
cheap tier · quality 0.78 · 180ms p50
Blended cost / 1K
$0.00028
Savings vs baseline
+99.1%
Evaluated 27 models · recomputed live
Always-use-frontier baseline
anthropic
claude-opus-4-5
frontier tier · the obvious-but-expensive default
Blended cost / 1K
$0.03248
This is what you'd pay if you always sent every request to the most-expensive frontier model. Most teams default to this.
Projected savings vs always-use-baseline
$63/month
95.7% reduction · routing every summarization request through Engine #27
v2 · Auto-Compiler · LIVE

One line. Three savings layers. Zero code changes.

Point your existing OpenAI / Anthropic SDK at the CARTIE proxy. Every prompt is automatically compacted, cached, and cascade-routed. You get standard OpenAI responses back — plus 40-70% lower bills and full observability.

Compact
15-40% fewer input tokens
Whitespace, dedup, comments stripped before forwarding. Deterministic heuristics. <5ms.
Cache
100% savings on dupes
Normalized-hash lookup (timestamps + UUIDs + emails masked). 24h TTL. Sub-ms.
Cascade
80% savings on simple tasks
Try cheap model first. Score confidence. Escalate to frontier only if needed.
Try it · live in your browser
Original
75
tokens
Compacted
50
tokens
Saved
33%
25 tokens
Show compacted output ↓
You are a helpful customer support assistant. The user's name is John. They have an enterprise account. The user's name is John. They have an enterprise account. Question: What is the refund policy?
Drop-in install · Python
Same OpenAI SDK. New base_url. Standard responses. 40-70% lower bill.
from openai import OpenAI

# ONE line change — point base_url at CARTIE.
# Get a free key at cartieai.com/llm-proxy
client = OpenAI(
    api_key="<YOUR-OPENAI-KEY>",
    base_url="https://cartieai.com/api/llm-proxy/v1",
    default_headers={"X-Cartie-Key": "<YOUR-CARTIE-PROXY-KEY>"},
)

# Every call now: compacted → cached → cascaded.
# Returns standard OpenAI response. Saves 40-70%.
resp = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Hello"}],
)
Every proxied call returns these headers so your observability stack can track savings in real time: X-Cartie-Saved-Usd · X-Cartie-Cache-Hit · X-Cartie-Tokens-Saved · X-Cartie-Cascade-Used

Cheapest 3 right now

#1
groqllama-3.3-8b
in $0.00005/1K · out $0.00008/1K · 75ms p50
#2
bedrocktitan-text-lite
in $0.00015/1K · out $0.00019/1K · 200ms p50
#3
perplexitysonar-small
in $0.00020/1K · out $0.00020/1K · 250ms p50

Spot bursts · last 24h

perplexitysonar-large
−40.2% · 120min window· ACTIVE
mistralmistral-medium-3
−46.2% · 45min window· ACTIVE
googlegemini-3-flash-lite
−25.3% · 60min window· ACTIVE
bedrockllama-4-70b-aws
−28% · 90min window· ACTIVE
bedrocktitan-text-lite
−36% · 120min window· ACTIVE
openaigpt-5-mini
−33.2% · 90min window· ACTIVE
groqmixtral-8x22b-v3
−39.8% · 90min window· ACTIVE
fireworksllama-4-70b
−49.8% · 45min window· ACTIVE
googlegemini-3-flash
−39.2% · 120min window· ACTIVE
groqllama-3.3-8b
−41.8% · 30min window· ACTIVE
perplexitysonar-large
−42.8% · 120min window· ACTIVE
groqmixtral-8x22b-v3
−33% · 90min window· ACTIVE
togetherqwen-2.5-72b
−46.2% · 90min window· ACTIVE
deepinfrallama-3.3-70b
−46% · 120min window· ACTIVE
anyscalellama-3.3-70b
−42.6% · 45min window· ACTIVE
googlegemini-3-flash
−36% · 120min window· ACTIVE
anyscalemixtral-8x7b
−32.9% · 90min window· ACTIVE
coherecommand-r-plus-v2
−49.1% · 30min window· ACTIVE
anyscalemixtral-8x7b
−33.9% · 45min window· ACTIVE
coherecommand-r-v2
−44.9% · 60min window· ACTIVE
zhipuglm-5.2
−35% · 90min window· ACTIVE
anthropicclaude-opus-4-5
−49.8% · 120min window· ACTIVE
groqllama-4-70b
−38.2% · 30min window· ACTIVE
togetherllama-4-70b-instruct
−27.9% · 30min window· ACTIVE
openaigpt-5-mini
−42.5% · 120min window· ACTIVE
fireworksllama-4-70b
−36.3% · 45min window· ACTIVE
togetherqwen-2.5-72b
−48.4% · 90min window· ACTIVE
googlegemini-3-flash-lite
−49.8% · 30min window· ACTIVE
groqmixtral-8x22b-v3
−33.4% · 30min window· ACTIVE
togetherllama-4-70b-instruct
−41.3% · 60min window· ACTIVE
bedrocktitan-text-lite
−37% · 90min window· ACTIVE
perplexitysonar-small
−33.6% · 120min window· ACTIVE
deepinfradeepseek-v3
−38.2% · 120min window· ACTIVE
bedrockclaude-sonnet-4-5-aws
−37.7% · 45min window
zhipuglm-5.1
−43.1% · 45min window
mistralcodestral-2
−31.6% · 60min window
groqmixtral-8x22b-v3
−39.7% · 60min window
groqmixtral-8x22b-v3
−37.2% · 60min window
openaigpt-4o-mini
−37.9% · 45min window
coherecommand-r-v2
−42.9% · 45min window
googlegemini-3-flash
−29% · 30min window
mistralmistral-medium-3
−43.9% · 60min window
perplexitysonar-small
−31.1% · 30min window

How Engine #27 works

01
Watch
Every 30s we ingest pricing + TTFT from all 12 provider APIs.
02
Score
Each model gets a quality-adjusted blended cost per task class. Models below the floor are dropped.
03
Route
Your /route call hits the cheapest qualifying model. Spot bursts auto-trigger reroutes.
New in v34.99
Slack DMs the moment a ≥ 35% spot burst lands

Every burst over the alert threshold sends a formatted Slack block — provider, model, drop %, duration window, and a deep-link back here. Customize the threshold with ENGINE27_BURST_ALERT_THRESHOLD_PCT. No-op gracefully if no webhook is configured — failures never block a routing tick.

Stop paying frontier prices for non-frontier tasks

Connect your stack. See your savings.

CARTIE plugs into your existing OpenAI / Anthropic / Bedrock SDK calls and routes them automatically. No code changes — just a base URL swap.

We value your privacy. Cookies help us improve your experience. Learn more

Install CARTIE AI

Add to your home screen for quick access and offline support