The 9 Lambda Cost Traps That Quietly Triple Your AWS Bill (And the Fix Pattern for Each)

A note on this story: The numbers below are a composite of 6 Lambda cost audits we've run, on workloads ranging from 200K invocations/day (small SaaS backends) to 4.2B invocations/month (a streaming-data ingestion pipeline).

A founder DM'd us last month:

"Lambda is supposed to be the cheap option, right? Our serverless bill just hit $38K/month. What am I missing?"

What they were missing: 9 specific cost traps that compound.

We pulled their bill apart. Three functions accounted for 71% of the spend. All three were running 3072 MB of memory for workloads that fit in 512 MB. One had been recursively triggering itself for 11 days because of a bad S3 event filter. By the end of the audit we'd cut their Lambda bill from $38K to $11K/month — a $324K annualized save with no code refactor.

This is the playbook. Run it function-by-function on your top 10 highest-cost Lambdas.

How Lambda actually charges you

Two dimensions, billed every 1ms after a 1ms minimum:

Cost = (GB-seconds × $0.0000166667) + (Invocations × $0.0000002)

The first term is GB-seconds — the product of memory allocated × wall-clock duration. A function with 1024 MB memory running for 200ms costs:

1.0 GB × 0.2 s × $0.0000166667 = $0.00000333

That's a third of a cent per million invocations. So why is your bill $38K? Because the real-world Lambda cost equation looks more like this:

Real cost = (GB-s) + (invocations) + (provisioned concurrency) + 
            (CloudWatch logs ingestion) + (CloudWatch logs storage) +
            (data transfer out) + (NAT gateway egress for VPC functions)

The non-Lambda lines often dominate. CloudWatch Logs alone can match Lambda compute on a chatty function. Let's break down each trap.

Trap 1: Memory wildly over-provisioned (the #1 waste)

The default Lambda memory is 128 MB. New developers panic at the first cold start latency, bump to 1024, then to 3008, then forget about it. Six months later 40% of their monthly bill is unused RAM.

The fix — AWS Lambda Power Tuning:

AWS Lambda Power Tuning is a Step Functions state machine that runs your function across 5-7 memory sizes and plots cost vs. latency. Free, takes 4 minutes to set up.

# Deploy the tuner once
git clone https://github.com/alexcasalboni/aws-lambda-power-tuning
cd aws-lambda-power-tuning && sam deploy --guided

# Then for each function:
aws stepfunctions start-execution \
  --state-machine-arn arn:aws:states:us-east-1:...:lambda-power-tuning \
  --input '{"lambdaARN":"arn:aws:lambda:us-east-1:...:function:my-fn","powerValues":[128,256,512,1024,1536,2048,3008],"num":50,"strategy":"cost"}'

The output gives you the cheapest memory size for your real workload. Median saving across 6 audits: 47% of compute spend.

Counter-intuitive truth: bumping memory often lowers total cost because you also get proportionally more CPU. A function at 1024 MB might cost less than the same function at 512 MB if it finishes 3× faster.

Trap 2: Provisioned concurrency you don't need

Provisioned Concurrency (PC) keeps Lambda environments warm to eliminate cold starts. It's billed at $0.0000041667/GB-s whether or not the function is invoked. That means 24/7 PC on a 1024 MB function costs ~$108/month per provisioned unit, doing nothing.

Three traps here:

Set-and-forget PC at peak capacity. Most teams provision for peak QPS and never scale it down. Use Application Auto Scaling to scale PC by ScheduledScalingPolicy (drop to 1 unit during nights/weekends).
PC on async workloads. Cold starts don't matter for SQS-triggered or EventBridge-scheduled functions. PC there is pure waste.
PC + ARM mismatch. Provisioned Concurrency on x86_64 when you could be on Graviton (arm64) costs 20% more for nothing.

Quick audit:

aws lambda list-provisioned-concurrency-configs --function-name $FN
# Multiply by hours/month × $0.0000041667 × MemoryMB / 1024

If you can't justify the cold-start latency to a real user-facing path, kill the PC.

Trap 3: Recursive / event-loop triggers (the silent budget killer)

This one always looks shocking in the audit:

A function listens to S3 ObjectCreated events. The function processes the file and writes a transformed version back to the same bucket. The transformed file triggers the function again. Infinite loop.

We've seen 4 variants of this:

S3 → Lambda → S3 (same bucket)
DynamoDB stream → Lambda → DynamoDB write (same table)
EventBridge → Lambda → EventBridge put-events (same bus, same pattern)
SNS → Lambda → SNS publish (same topic)

A single recursive trigger can rack up millions of invocations a day before anyone notices. The cost ramps slowly because each invocation is cheap, until the bill arrives.

The fix:

Always scope event filters tightly — exclude the prefix or suffix the function writes to.
Set a circuit-breaker metric: CloudWatch alarm on Invocations > 1.5× last week's same-hour baseline.
Use Lambda recursion detection (now built-in for some triggers): https://docs.aws.amazon.com/lambda/latest/dg/invocation-recursion.html

Also: check your DLQ (Dead Letter Queue). One customer had a poison message replaying through their function 12,000 times/hour for 3 weeks before anyone noticed. Set max-receive-count = 5 on every SQS source.

Trap 4: CloudWatch Logs ingestion (the hidden 30%)

CloudWatch Logs charges:

$0.50 per GB ingested
$0.03 per GB-month stored

If your Lambda logs every request body at INFO level, you can spend more on logs than on compute. We had a customer ingesting 1.2 TB/month of debug logs from a single function. That's $600/month — almost 2× the function's compute cost.

The fix:

Set log retention to 14 days by default (it's "never expire" out of the box). Older = useless and expensive.
```
aws logs put-retention-policy --log-group-name /aws/lambda/my-fn --retention-in-days 14
```
Move debug logs out of production. Use environment-aware log levels.
Sample. If you're logging every request, log 1% of them with a sampling middleware.
Don't log entire request payloads. PII risk + cost balloon.

Trap 5: ARM64 (Graviton) migration left undone

AWS Lambda on Graviton is 20% cheaper for the same memory AND typically 5-15% faster on Python/Node/Java workloads. The migration is a single Architectures: arm64 line in your SAM/CDK/Terraform.

Why most teams haven't migrated:

Native binary deps need a rebuild (pip install --platform=manylinux2014_aarch64 ...)
Lambda layers need to be rebuilt for arm64
Some image-processing libs (e.g., older Pillow versions) had ARM bugs

For 90%+ of pure-Python or Node functions: drop in, ship. Test in dev for a week, monitor errors, promote.

Across 6 audits, the median Lambda fleet was 7% ARM-migrated. That's leaving 12-15% of total Lambda spend on the table for free.

Trap 6: VPC NAT Gateway egress on Lambda

If your Lambda runs in a VPC (to talk to RDS, ElastiCache, an internal API) and needs internet access (for outbound API calls, S3 outside the VPC, etc.), it goes through a NAT Gateway.

NAT Gateway pricing:

$0.045 per hour ($32.40/month)
$0.045 per GB processed

Two traps:

Many Lambdas, one NAT. If you have 12 Lambdas calling Stripe webhooks through a NAT Gateway, you're funneling all traffic through that one $32/month + per-GB. Use VPC endpoints for AWS services (S3, DynamoDB, Secrets Manager) — no NAT needed.
Lambda outside VPC when possible. If a function only calls public APIs (Stripe, Slack, OpenAI), it doesn't need to be in a VPC. Take it out, save the NAT cost.

We saved one customer $2,800/month just by moving 4 outbound-only Lambdas out of their VPC and into the default Lambda networking model.

Trap 7: Synchronous invocations that should be async

# Bad: API Gateway → Lambda → SES (synchronous)
# Costs you 800ms × the user's wait
# Costs you Lambda time during SES API delay

# Good: API Gateway → Lambda (acks fast) → SQS → Lambda (sends email)
# First Lambda: 50ms
# Second Lambda runs async, can fail+retry

The async pattern halves duration, frees up the user-facing thread, and lets you batch (5-10 messages per Lambda invocation = fewer invocations, less cost).

Trap 8: API Gateway in front of every Lambda

Default pattern: API Gateway REST API → Lambda. AGW REST costs $3.50 per million requests + per-GB transferred.

For internal services, microservice-to-microservice calls, or webhooks under 100K req/day:

Use Function URLs (free, built into Lambda)
Use API Gateway HTTP APIs instead of REST ($1.00/million instead of $3.50)
Use ALB + Lambda target for high-volume public APIs ($0.008/LCU + free tier)

For one customer, swapping from AGW REST to HTTP for their internal service mesh saved $8,400/month.

Trap 9: Concurrent execution limits hitting throttles

If your account hits the regional concurrency limit (default 1000), invocations throttle. Throttles look free but they're not — they cause:

Failed user requests (revenue impact)
Dead-letter queue fan-out (more invocations)
Retry storms (3× the original cost)

Audit: check ConcurrentExecutions and Throttles CloudWatch metrics weekly. If Throttles > 0, request a limit increase or refactor noisy functions to use SQS batching.

The 30-minute self-audit

Cost Explorer → group by Service → Lambda → Top 10 functions by cost
For each top-10 function, in Lambda console:
- Memory: do you need it? (Run Power Tuning)
- Provisioned Concurrency: justify it or kill it
- Architecture: is it arm64?
CloudWatch → Logs → biggest log groups by ingested volume → reduce / sample
Search Lambda triggers for recursion patterns (S3-writes-to-source-bucket)

Total time: 30 minutes. Typical first-pass savings: 30-50% of Lambda spend.

Real numbers from one audit (composite)

Customer: B2B fintech, ~80 Lambdas in prod, $38K/month bill before audit.

Trap	Annual savings
Memory over-provisioning (47 functions tuned)	$158K
Recursive S3 trigger (one function)	$61K
CloudWatch log retention + sampling	$44K
Removed unused PC on 8 async functions	$38K
ARM migration (52 of 80 functions)	$14K
Removed AGW REST in front of 12 internal services	$7K
Total annual	$324K (-71%)

How CARTIEAI helps

CARTIEAI's Lambda audit runs all 9 trap checks against your AWS account, ranks functions by potential savings, and gives you copy-paste fixes (the Power Tuning JSON, the retention-policy CLI, etc.). Typical first-scan: $3K–$25K/month projected savings.

Even without a tool, the 30-minute self-audit will find $500–$3K/month of waste in any Lambda fleet over $5K/month of spend.

Now go check your top function's memory setting. ⚡

The 9 Lambda Cost Traps That Quietly Triple Your AWS Bill (And the Fix Pattern for Each)

How Lambda actually charges you

Trap 1: Memory wildly over-provisioned (the #1 waste)

Trap 2: Provisioned concurrency you don't need

Trap 3: Recursive / event-loop triggers (the silent budget killer)

Trap 4: CloudWatch Logs ingestion (the hidden 30%)

Trap 5: ARM64 (Graviton) migration left undone

Trap 6: VPC NAT Gateway egress on Lambda

Trap 7: Synchronous invocations that should be async

Trap 8: API Gateway in front of every Lambda

Trap 9: Concurrent execution limits hitting throttles

The 30-minute self-audit

Real numbers from one audit (composite)

How CARTIEAI helps

Get the 9 Lambda Cost Traps — Audit Checklist

AWS Cost Optimization: The Complete Guide for FinOps Teams (2026)

Find your Snowflake waste right now.

3 cost-saving tips, every Tuesday.

Lakshmi Kiranmai Guduru

Keep reading

Install CARTIEAI