Back to home
Best Practices
May 2, 2026 10 min read

The Datadog Cost Optimization Guide: 8 Patterns That Cut Bills 45%

Datadog bills sneak up. Most teams pay 40-60% more than they need to because of three settings nobody changes. The 8 patterns we use to cut Datadog bills nearly in half — without losing observability.

L

Lakshmi Kiranmai Guduru

Founder, CARTIEAI

A note on this story: Numbers below are aggregated from 11 production Datadog accounts we've audited (8 SaaS, 3 fintech, monthly spend $8K–$95K). Same patterns, every time.

Datadog is the BMW of observability tools. Beautiful product. Excellent UX. Insanely expensive when nobody's watching.

We've audited 11 production Datadog accounts in the last year. Median overspend: 47%.

The pattern is always the same: a team adopts Datadog because something broke and they needed observability fast. Six months later, the monthly bill quadrupled and nobody can explain why. By month 12 it's a six-figure line item nobody wants to question.

This guide is the line-by-line breakdown. By the end you'll know:

  • The 8 patterns driving 40–60% of Datadog overspend
  • The 3 settings nobody touches that cost the most
  • The 5-minute audit any engineer can run today
  • When to switch tools (rarely) and when to optimise (almost always)

Pattern 1: Custom metrics with infinite cardinality

Datadog charges per unique metric × tag combination. The default custom metric allowance is 100 per host. Going over: $0.05/metric/month. Sounds tiny. Watch what happens with bad tagging:

# BAD — `user_id` creates a new metric per user
statsd.increment('api.request', tags=[f'endpoint:/users/{user_id}'])

100 endpoints × 50,000 users = 5,000,000 unique metrics. That's a $250K/month custom-metrics line item from ONE bad metric.

The fix:

  • Move high-cardinality identifiers (user_id, request_id, transaction_id) to logs, not metrics
  • Use @dimension annotations to mark which tags are intentional
  • Set up metrics-without-limits controls per metric (Datadog UI → Metrics → Manage Tags)

Typical savings: $5K–$50K/month. Single biggest line item we find.


Pattern 2: Log retention default = 15 days. Most logs need 3.

Datadog logs are billed in two ways:

  • Indexed logs: $1.27/million events for 15 days, fully searchable
  • Archived logs: $0.10/million events, stored in your S3, slower to query

The fix:

  • 15 days indexed → too long for 90% of logs. Drop to 3 days indexed + archive to S3 for 30
  • Critical security/audit logs: 30 days indexed (compliance need)
  • Application debug logs: Drop them in production altogether — they're more noise than signal

Typical savings: $1K–$8K/month.


Pattern 3: APM trace ingestion at 100% sampling

Datadog APM bills per ingested trace. Most teams default to 100% sampling — every single request gets a trace. For a service doing 1 billion requests/month, that's a lot.

The fix:

  • Production traffic > 1k req/sec: sample at 10%
  • Production traffic > 10k req/sec: sample at 1%
  • Always-on for errors: DD_TRACE_SAMPLE_ALL_SPANS=false + error-tracing rule
# datadog.yaml — head-based + tail-based sampling combo
apm_config:
  sampling_rules:
    - service: my-service
      name: error
      sample_rate: 1.0      # always trace errors
    - service: my-service
      sample_rate: 0.1      # 10% of normal traffic

Typical savings: 30–60% of APM cost.


Pattern 4: Synthetic tests running every 60 seconds

Datadog Synthetic tests are great. They're also priced per-test-per-month, with location multipliers.

A "test the homepage from 5 locations every 1 minute" test = 5 × 60 × 24 × 30 = 216,000 runs/month per test. At ~$5 per 100K runs, that's $10/test/month minimum.

The fix:

  • 1-minute frequency → only for mission-critical login + checkout
  • Most pages → 5 minutes is fine
  • Internal tools / staging → 30 minutes or kill them entirely

Typical savings: $200–$2K/month per team.


Pattern 5: Hosts you forgot

Datadog charges per host. Auto-scaling clusters that scale up to 200 nodes for a 5-minute load spike → you're paying for 200 hosts that month even if 195 of them only existed for 5 minutes.

The fix:

  • Switch to per-second host billing (newer Datadog plans)
  • Or pin agents to specific node pools that scale conservatively
  • Run the "Host Map" weekly — kill any host with 0 incoming metrics

Typical savings: $500–$5K/month.


Pattern 6: Integrations you turned on for "let me try" experiments

Datadog has 800+ integrations. They're free to enable. They're NOT free to feed metrics into your billable counters.

The Kubernetes integration alone can pump 500+ metrics per node per minute. The MongoDB integration: 200+ metrics per cluster. Most teams have 30+ integrations enabled, half of which they never look at.

The audit:

  • Datadog UI → Integrations → list installed
  • For each: "Do I have a dashboard, monitor, or SLO using this?" If no → uninstall

Typical savings: $1K–$3K/month.


Pattern 7: Watchdog Insights / RUM features turned on org-wide

Watchdog (Datadog's anomaly detection) and RUM (Real User Monitoring) are per-feature charges layered on top of base APM/Logs. Most teams turn them on for the demo and forget.

  • Watchdog: $0.30/host/month
  • RUM: $1.50 per 10K sessions
  • CI Visibility: $0.50/test-execution

The fix: if the team isn't actively reviewing the data weekly, disable the feature. You can turn it back on when you actually need it.

Typical savings: $1K–$4K/month.


Pattern 8: Dev/staging running the production agent config

Most teams deploy the Datadog agent with the same config across all environments. So your dev cluster ships every metric, every log, every trace to Datadog — and gets billed for it.

The fix:

# datadog-dev.yaml — heavy reduction in dev
logs_enabled: false
apm_config:
  enabled: false
process_config:
  enabled: false
# Just keep host-level metrics

Typical savings: 30–50% of total Datadog spend (dev+staging combined).


The 5-minute audit any engineer can run

  1. Custom metrics overage: Datadog UI → Plan & Usage → Custom Metrics. If >100 per host, you're paying overage. Find the high-cardinality metrics (Manage Tags page).
  2. Log retention check: Logs → Configuration. If retention >7 days for non-security logs, drop it.
  3. APM sample rate: Service Map → click any high-traffic service → check sample rate. >50% on a >1k req/sec service = paying 5–10x what you need.
  4. Unused integrations: Integrations → installed list. Kill anything without an active dashboard or monitor.
  5. Synthetic frequency: Synthetics → list tests by frequency. Anything <5min that isn't mission-critical → bump frequency.

Steps 1–3 alone usually cut a Datadog bill 30%.


When to switch off Datadog (rarely the right answer)

The "Datadog is too expensive, let's go open-source" alternatives are:

  • Grafana Cloud: ~50% cheaper but UX is rougher
  • Self-hosted Prometheus + Loki + Grafana: essentially free for tooling, massive SRE overhead (replication, retention, query optimisation)
  • Honeycomb / New Relic: comparable price, different UX

Math: 2 dedicated SREs to run self-hosted observability cost $400K/year. Datadog at $200K/year is cheaper for any company under ~30 engineers. The "let's self-host" conversation is almost always a false economy.

The right move is optimise Datadog first, then revisit only if you're at $400K+/year and growing fast.


How CARTIE AI helps

CARTIE AI's Datadog cost optimizer ingests your Datadog API key, runs all 8 patterns automatically, and gives you a dollar number for each. Typical first-scan finds $4K–$15K/month of waste.

Even without a tool, the 5-minute audit will find $2K–$5K/month of savings in any company over $10K/month spend. Promise.

Now go check your custom-metrics page. 🥃

Go deeper · Field guide
☁️

AWS Cost Optimization: The Complete Guide for FinOps Teams (2026)

Amazon Web Services is the largest cloud platform in the world — and the source of more than half of the cloud waste we audit. This guide gives you the 14 prove…

Read the AWS guide

FREE — NO SIGNUP — 60 SECONDS

Find your Snowflake waste right now.

Take the free 10-question Snowflake Cost Health Score. Get a grade, your monthly $-waste estimate, and the top 3 fixes — instantly.

THE FINOPS BRIEF

3 cost-saving tips, every Tuesday.

Built for finance & engineering teams who are tired of paying for cloud they don't use. No fluff. Just what works.

Unsubscribe anytime. We never sell your data.

Lakshmi Kiranmai Guduru

ABOUT THE AUTHOR

Lakshmi Kiranmai Guduru

Founder, CARTIEAI · Building in public

I'm building CARTIE AI to fix the cloud-cost problem I saw drain millions at companies I worked for — where engineering and finance kept talking past each other. If you liked this post, here's where I share unfiltered notes on building this in public:

Keep reading

We value your privacy. Cookies help us improve your experience. Learn more

Install CARTIE AI

Add to your home screen for quick access and offline support