Engine #27 · Live · Real-Time

Token Arbitrage in real time.

Name: CARTIE AI
Author: CARTIE AI

We watch 12 LLM providers across 33 models, detect spot bursts within seconds, and route every token to the cheapest provider that still meets your quality bar.

Providers tracked

Models priced live

Spot bursts (24h)

Active right now

Live routing decision

Pick a task class. We evaluate every model against the quality floor for that task and return the cheapest viable provider — recomputed on every price tick.

Winner

openai

gpt-4o-mini

cheap tier · quality 0.78 · 180ms p50

Blended cost / 1K

$0.00028

Savings vs baseline

+99.1%

Evaluated 27 models · recomputed live

Always-use-frontier baseline

anthropic

claude-opus-4-5

frontier tier · the obvious-but-expensive default

Blended cost / 1K

$0.03248

This is what you'd pay if you always sent every request to the most-expensive frontier model. Most teams default to this.

Projected savings vs always-use-baseline

$63/month

95.7% reduction · routing every summarization request through Engine #27

Monthly tokens

v2 · Auto-Compiler · LIVE

One line. Three savings layers. Zero code changes.

Point your existing OpenAI / Anthropic SDK at the CARTIE proxy. Every prompt is automatically compacted, cached, and cascade-routed. You get standard OpenAI responses back — plus 40-70% lower bills and full observability.

Compact

15-40% fewer input tokens

Whitespace, dedup, comments stripped before forwarding. Deterministic heuristics. <5ms.

Cache

100% savings on dupes

Normalized-hash lookup (timestamps + UUIDs + emails masked). 24h TTL. Sub-ms.

Cascade

80% savings on simple tasks

Try cheap model first. Score confidence. Escalate to frontier only if needed.

Try it · live in your browser

Paste any prompt — we'll compact it client-side first.

Original

tokens

Compacted

tokens

Saved

33%

25 tokens

Show compacted output ↓

You are a helpful customer support assistant. The user's name is John. They have an enterprise account. The user's name is John. They have an enterprise account. Question: What is the refund policy?

Drop-in install · Python

Same OpenAI SDK. New base_url. Standard responses. 40-70% lower bill.

from openai import OpenAI

# ONE line change — point base_url at CARTIE.
# Get a free key at cartieai.com/llm-proxy
client = OpenAI(
    api_key="<YOUR-OPENAI-KEY>",
    base_url="https://cartieai.com/api/llm-proxy/v1",
    default_headers={"X-Cartie-Key": "<YOUR-CARTIE-PROXY-KEY>"},
)

# Every call now: compacted → cached → cascaded.
# Returns standard OpenAI response. Saves 40-70%.
resp = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Hello"}],
)

Try the 60-sec Connect Read the docs

Every proxied call returns these headers so your observability stack can track savings in real time: X-Cartie-Saved-Usd · X-Cartie-Cache-Hit · X-Cartie-Tokens-Saved · X-Cartie-Cascade-Used

Cheapest 3 right now

groqllama-3.3-8b

in $0.00005/1K · out $0.00008/1K · 75ms p50

bedrocktitan-text-lite

in $0.00015/1K · out $0.00019/1K · 200ms p50

perplexitysonar-small

in $0.00020/1K · out $0.00020/1K · 250ms p50

Spot bursts · last 24h

perplexitysonar-large

−40.2% · 120min window· ACTIVE

mistralmistral-medium-3

−46.2% · 45min window· ACTIVE

googlegemini-3-flash-lite

−25.3% · 60min window· ACTIVE

bedrockllama-4-70b-aws

−28% · 90min window· ACTIVE

bedrocktitan-text-lite

−36% · 120min window· ACTIVE

openaigpt-5-mini

−33.2% · 90min window· ACTIVE

groqmixtral-8x22b-v3

−39.8% · 90min window· ACTIVE

fireworksllama-4-70b

−49.8% · 45min window· ACTIVE

googlegemini-3-flash

−39.2% · 120min window· ACTIVE

groqllama-3.3-8b

−41.8% · 30min window· ACTIVE

perplexitysonar-large

−42.8% · 120min window· ACTIVE

groqmixtral-8x22b-v3

−33% · 90min window· ACTIVE

togetherqwen-2.5-72b

−46.2% · 90min window· ACTIVE

deepinfrallama-3.3-70b

−46% · 120min window· ACTIVE

anyscalellama-3.3-70b

−42.6% · 45min window· ACTIVE

googlegemini-3-flash

−36% · 120min window· ACTIVE

anyscalemixtral-8x7b

−32.9% · 90min window· ACTIVE

coherecommand-r-plus-v2

−49.1% · 30min window· ACTIVE

anyscalemixtral-8x7b

−33.9% · 45min window· ACTIVE

coherecommand-r-v2

−44.9% · 60min window· ACTIVE

zhipuglm-5.2

−35% · 90min window· ACTIVE

anthropicclaude-opus-4-5

−49.8% · 120min window· ACTIVE

groqllama-4-70b

−38.2% · 30min window· ACTIVE

togetherllama-4-70b-instruct

−27.9% · 30min window· ACTIVE

openaigpt-5-mini

−42.5% · 120min window· ACTIVE

fireworksllama-4-70b

−36.3% · 45min window· ACTIVE

togetherqwen-2.5-72b

−48.4% · 90min window· ACTIVE

googlegemini-3-flash-lite

−49.8% · 30min window· ACTIVE

groqmixtral-8x22b-v3

−33.4% · 30min window· ACTIVE

togetherllama-4-70b-instruct

−41.3% · 60min window· ACTIVE

bedrocktitan-text-lite

−37% · 90min window· ACTIVE

perplexitysonar-small

−33.6% · 120min window· ACTIVE

deepinfradeepseek-v3

−38.2% · 120min window· ACTIVE

bedrockclaude-sonnet-4-5-aws

−37.7% · 45min window

zhipuglm-5.1

−43.1% · 45min window

mistralcodestral-2

−31.6% · 60min window

groqmixtral-8x22b-v3

−39.7% · 60min window

groqmixtral-8x22b-v3

−37.2% · 60min window

openaigpt-4o-mini

−37.9% · 45min window

coherecommand-r-v2

−42.9% · 45min window

googlegemini-3-flash

−29% · 30min window

mistralmistral-medium-3

−43.9% · 60min window

perplexitysonar-small

−31.1% · 30min window

How Engine #27 works

Watch

Every 30s we ingest pricing + TTFT from all 12 provider APIs.

Score

Each model gets a quality-adjusted blended cost per task class. Models below the floor are dropped.

Route

Your /route call hits the cheapest qualifying model. Spot bursts auto-trigger reroutes.

New in v34.99

Slack DMs the moment a ≥ 35% spot burst lands

Every burst over the alert threshold sends a formatted Slack block — provider, model, drop %, duration window, and a deep-link back here. Customize the threshold with ENGINE27_BURST_ALERT_THRESHOLD_PCT. No-op gracefully if no webhook is configured — failures never block a routing tick.

Stop paying frontier prices for non-frontier tasks

Connect your stack. See your savings.

CARTIE plugs into your existing OpenAI / Anthropic / Bedrock SDK calls and routes them automatically. No code changes — just a base URL swap.

See pricing See all 21 engines

We value your privacy. Cookies help us improve your experience. Learn more· GDPR & CCPA Compliant

Install CARTIE AI

Add to your home screen for quick access and offline support