Drop-in proxy. Same response shape. Three savings layers (compaction · cache · cascade) baked in. Headers tell you exactly what we saved on every call.
You are a helpful customer support assistant for ACME Corp. The user's name is John Doe. They have an enterprise account. The user's name is John Doe. They have an enterprise account. Question: What is the refund policy?
base_url.from openai import OpenAI
client = OpenAI(
api_key="<YOUR-OPENAI-KEY>",
base_url="https://cartieai.com/api/llm-proxy/v1",
default_headers={"X-Cartie-Key": "<YOUR-CARTIE-PROXY-KEY>"},
)
# Every call: compacted → cached → cascaded.
# Returns standard OpenAI response.
resp = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Hello"}],
)
# Response headers tell you exactly what we saved:
# X-Cartie-Tokens-Saved, X-Cartie-Cache-Hit, X-Cartie-Saved-Usd