We watch 12 LLM providers across 33 models, detect spot bursts within seconds, and route every token to the cheapest provider that still meets your quality bar.
Pick a task class. We evaluate every model against the quality floor for that task and return the cheapest viable provider — recomputed on every price tick.
Point your existing OpenAI / Anthropic SDK at the CARTIE proxy. Every prompt is automatically compacted, cached, and cascade-routed. You get standard OpenAI responses back — plus 40-70% lower bills and full observability.
You are a helpful customer support assistant. The user's name is John. They have an enterprise account. The user's name is John. They have an enterprise account. Question: What is the refund policy?
base_url. Standard responses. 40-70% lower bill.from openai import OpenAI
# ONE line change — point base_url at CARTIE.
# Get a free key at cartieai.com/llm-proxy
client = OpenAI(
api_key="<YOUR-OPENAI-KEY>",
base_url="https://cartieai.com/api/llm-proxy/v1",
default_headers={"X-Cartie-Key": "<YOUR-CARTIE-PROXY-KEY>"},
)
# Every call now: compacted → cached → cascaded.
# Returns standard OpenAI response. Saves 40-70%.
resp = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Hello"}],
)X-Cartie-Saved-Usd · X-Cartie-Cache-Hit · X-Cartie-Tokens-Saved · X-Cartie-Cascade-UsedEvery burst over the alert threshold sends a formatted Slack block — provider, model, drop %, duration window, and a deep-link back here. Customize the threshold with ENGINE27_BURST_ALERT_THRESHOLD_PCT. No-op gracefully if no webhook is configured — failures never block a routing tick.
CARTIE plugs into your existing OpenAI / Anthropic / Bedrock SDK calls and routes them automatically. No code changes — just a base URL swap.