Back to home
Best Practices
May 3, 2026 11 min read

DynamoDB Hot Partitions: How a Bad Key Choice Turned $800/mo into $18K/mo (And How to Fix It)

Hot partitions are the #1 DynamoDB cost surprise nobody warns you about. A single bad key can 20× your bill overnight. Here's the diagnostic playbook — with real Cloudwatch queries.

L

Lakshmi Kiranmai Guduru

Founder, CARTIEAI

A note on this story: The numbers below are a composite of 4 real DynamoDB audits we ran between 2024 and 2026, on tables ranging from 2M to 900M rows. The patterns are real; exact figures have been lightly altered to protect customer privacy.

A founder slid into our DMs one Tuesday morning:

"Our DynamoDB bill just went from $800/month to $18,000. Nothing obvious changed. What the hell?"

Six hours of audit later, we'd found it. One partition key. A single item on the orders table — PK = "tenant#megacorp" — was receiving 87% of the table's write load. DynamoDB silently over-provisioned capacity to keep up, then invoicing caught up a week later.

By the end of the audit we'd cut the bill from $18K/month back to $2.4K/month — a lower total than before the incident — without migrating off DynamoDB. This is the playbook.


Why DynamoDB cost blows up silently

DynamoDB pricing has three cost drivers people track, and one they don't:

  1. Reads / Writes (on-demand: $1.25/million WCU, $0.25/million RCU)
  2. Storage ($0.25/GB-month)
  3. Data transfer out ($0.09/GB cross-region)
  4. 🟥 Hidden: per-partition capacity ceilings

On-demand looks magical — "pay for what you use." But DynamoDB internally splits your table into partitions (~10 GB or 3,000 RCU / 1,000 WCU each). A single "hot" partition key can blow past the partition's ceiling, trigger auto-scaling, and multiply your effective per-unit cost 10-20× while latency degrades.

The billing symptom: a spike that looks random. The real cause: you're hammering one partition while the rest of the table sleeps.


Step 1 — Pull 7 days of CloudWatch Contributor Insights

DynamoDB has a free-tier-for-1-day, cheap-for-longer feature called Contributor Insights. Enable it on any table you suspect:

aws dynamodb update-contributor-insights \
  --table-name my-table \
  --contributor-insights-action ENABLE

Within 60 minutes you'll see the top-N most-accessed partition keys. If the top key accounts for >15% of traffic, you have a hot partition. If the top 3 account for >50%, it's catastrophic.

Our composite audit: the top partition key had 87% of all writes. A single tenant, "tenant#megacorp", was the entire table's write volume.


Step 2 — Measure the cost delta vs a balanced table

Here's the math that matters:

A balanced on-demand table at 1M writes/day:
  1M × $1.25/M = $1.25/day × 30 = $37.50/month

A table with an 87% hot key at 1M writes/day:
  DynamoDB auto-scales peak capacity to keep the hot partition alive.
  Effective cost: $1.25/M × (peak WCU / average WCU) = ~12× = ~$450/month
  Plus throttled requests get retried (invisible 3× multiplier) = ~$1,350/month

This is why the bill triples silently: the advertised "on-demand" pricing assumes load is uniform. It rarely is.


Step 3 — Diagnose your partition key pattern

Ask yourself these 4 questions about the PK design:

1. Is the key <i>tenant-level</i>? (tenant#acme, org#123) If one tenant is 10× bigger than others, that tenant is a hot partition.

2. Is the key <i>time-based</i>? (date#2026-05-03) Today's date is always hot; yesterday's is always cold. Terrible design for writes.

3. Is the key <i>user-id</i> with power-law distribution? 1% of users doing 50% of writes. Common on chat/social apps.

4. Is the key <i>status-based</i>? (status#active) All writes target the same handful of status values.

If you answered yes to any, you have a hot-key problem. Two fix patterns follow.


Step 4 — Fix Pattern A: Write-sharding

For tenant-level hot keys, shard the partition key with a synthetic suffix:

# Before (1 partition per tenant):
pk = f"tenant#{tenant_id}"

# After (N partitions per tenant):
SHARD_COUNT = 20
pk = f"tenant#{tenant_id}#{random.randint(0, SHARD_COUNT - 1)}"

Writes spread across 20 partitions instead of 1. Reads become a fan-out query — run 20 Query calls in parallel and merge.

The trade-off: reads are now N× more expensive (because you issue N queries). Good when writes dominate (which is why the partition is hot in the first place). Measure before you deploy.

Our audit result: SHARD_COUNT = 16 → hot-key share dropped from 87% to 8%. Cost dropped from $18K → $2.4K/month.


Step 5 — Fix Pattern B: Provisioned + Adaptive Capacity

If sharding isn't feasible (legacy readers can't fan-out), switch off on-demand and onto provisioned:

aws dynamodb update-table \
  --table-name my-table \
  --billing-mode PROVISIONED \
  --provisioned-throughput ReadCapacityUnits=500,WriteCapacityUnits=500

Then enable Application Auto Scaling with target utilization 70%:

aws application-autoscaling register-scalable-target \
  --service-namespace dynamodb \
  --resource-id "table/my-table" \
  --scalable-dimension "dynamodb:table:WriteCapacityUnits" \
  --min-capacity 500 --max-capacity 5000

Why this helps: Provisioned capacity is charged at ~$0.00065/WCU-hour. At 5,000 WCU sustained that's ~$2,340/month — a fixed ceiling, not a surprise bill. You pay more on cold hours, less on peak hours. Net-net often 40-60% cheaper on hot-partition workloads.

But: you now need adaptive capacity turned on (it's default on since 2019) so DynamoDB can give a hot partition more than its share of provisioned WCUs. Verify with:

aws dynamodb describe-table --table-name my-table \
  --query 'Table.ProvisionedThroughput'

Step 6 — Add the 3 cost-protection alarms

Three CloudWatch alarms every production DynamoDB table needs:

1. ConsumedWriteCapacityUnits — anomaly on 50% spike Catches hot-partition events within minutes, not at bill time.

2. ThrottledRequests >0 for 5 minutes Throttles = money being spent on failed reads/writes. Always worth investigating.

3. BillingMode budget alarm (AWS Budgets) Set at 1.3× current monthly run-rate. If the table crosses it mid-month, Slack-ping the SRE channel. Cheap, saved us twice.

aws cloudwatch put-metric-alarm \
  --alarm-name dynamodb-write-spike \
  --metric-name ConsumedWriteCapacityUnits \
  --namespace AWS/DynamoDB \
  --statistic Sum --period 300 --threshold-metric-id e1 \
  --comparison-operator GreaterThanUpperThreshold \
  --evaluation-periods 2 --datapoints-to-alarm 2

The hidden DynamoDB costs nobody warns you about

  • GSIs (Global Secondary Indexes) are billed separately at the same rate as the table. 3 GSIs ≈ 4× the write cost.
  • DynamoDB Streams run at $0.00002/read. Looks free — until a high-volume stream costs $1,500/month.
  • Backup retention default is 35 days continuous + daily snapshots. Bump to 7 days on dev tables — saves $0.20/GB-month.
  • Export to S3 looks cheap ($0.10/GB) but scans the whole table. On a 2 TB table that's $200 per export.
  • PITR (Point-in-Time Recovery) is $0.20/GB-month on top of storage. Double-check which tables really need it.

When to migrate OFF DynamoDB

DynamoDB is the wrong tool when:

  1. You need complex relational queries (joins, aggregations, GROUP BY). Use Aurora / Postgres.
  2. Your hot partition is structural (e.g., "the newest item is always hottest"). Consider a log-style store like DynamoDB Streams → Redshift.
  3. You're below 1M requests/day. On-demand single-digit-dollar bills are often cheaper on Postgres RDS ($15/mo for db.t4g.micro).

How CARTIE AI helps

CARTIE AI's DynamoDB audit reads your Contributor Insights (no data plane access), identifies the top hot partitions, models the cost impact of sharding vs provisioned, and gives you copy-paste refactor patterns. Typical first-scan finds $3K–$15K/month of hot-partition waste.

Even without a tool, the 6-step audit above — start with Contributor Insights, end with the 3 alarms — will catch the issue before the bill doubles.

Now go check your top partition key. 🔑

Free · Printable · Ready to run

Get the DynamoDB Hot-Partition Diagnostic Checklist

Find and fix the partition key that's quietly 10×-ing your bill.

No spam. The founder reads every reply personally.

Go deeper · Field guide
☁️

AWS Cost Optimization: The Complete Guide for FinOps Teams (2026)

Amazon Web Services is the largest cloud platform in the world — and the source of more than half of the cloud waste we audit. This guide gives you the 14 prove…

Read the AWS guide

FREE — NO SIGNUP — 60 SECONDS

Find your Snowflake waste right now.

Take the free 10-question Snowflake Cost Health Score. Get a grade, your monthly $-waste estimate, and the top 3 fixes — instantly.

THE FINOPS BRIEF

3 cost-saving tips, every Tuesday.

Built for finance & engineering teams who are tired of paying for cloud they don't use. No fluff. Just what works.

Unsubscribe anytime. We never sell your data.

Lakshmi Kiranmai Guduru

ABOUT THE AUTHOR

Lakshmi Kiranmai Guduru

Founder, CARTIEAI · Building in public

I'm building CARTIE AI to fix the cloud-cost problem I saw drain millions at companies I worked for — where engineering and finance kept talking past each other. If you liked this post, here's where I share unfiltered notes on building this in public:

Keep reading

Install CARTIE AI

Add to your home screen for quick access and offline support