Databricks bills explode quietly — Photon's 2x DBU markup, idle clusters at the 120-minute default, and Serverless's convenience premium combine into a stack that doubles bills in <12 months. This is the 10-pattern playbook to fix it.
Default = 120 minutes. Drop to 10 via cluster policy. Single biggest no-regret savings lever in Databricks.
Photon = 2x DBUs. If your job is I/O-bound (>30% I/O wait), the 2x markup buys you no speedup. Diagnose with system.billing.usage joined to query metrics.
JSON policies that cap `node_type_id` to mid-size, force `auto_termination_minutes ≤ 30`, and require tags. Even your seniors can't override.
Serverless wins for spiky analyst queries (Pattern 8 below). For 24/7 ELT, Classic is 25-40% cheaper. Audit `system.billing.usage WHERE sku LIKE '%SERVERLESS%'`.
Dev/staging clusters: `spot_bid_price_percent: -1`, `spot_fall_back: true`. Job clusters in prod: spot with on-demand fallback for the driver only.
`system.billing.usage` + `system.compute.clusters` reveal underutilized worker types. Most i3.xlarge clusters should be m5.large (no NVMe needed).
For each scheduled job, run a week with Photon and a week without. Compare DBU/job. Disable Photon where ROI is negative.
Counter to Pattern 4: spiky analyst queries belong on Serverless. Cold-start penalty pays for itself within the first 10s of analysis time.
Move historical data > 90 days from DBFS to S3 Glacier IR or Azure Archive. Use Unity Catalog external locations to keep query path stable.
Cluster policy with `custom_tags` block requiring `team`, `cost_center`, `env`. Without per-cluster tags, you can't do showback.
CARTIE AI runs all 10 patterns against your workspace using `system.billing.usage`. Read-only PAT, no agent install.
Get the auditTHE FINOPS BRIEF
Built for finance & engineering teams who are tired of paying for cloud they don't use. No fluff. Just what works.
Unsubscribe anytime. We never sell your data.