Back to home
Best Practices
May 2, 2026 13 min read

The 9 Lambda Cost Traps That Quietly Triple Your AWS Bill (And the Fix Pattern for Each)

Lambda is supposed to be the cheap option. But provisioned concurrency, oversized memory, recursive triggers, and a few other traps mean most teams overpay 2-3×. Here's the field guide.

L

Lakshmi Kiranmai Guduru

Founder, CARTIEAI

A note on this story: The numbers below are a composite of 6 Lambda cost audits we've run, on workloads ranging from 200K invocations/day (small SaaS backends) to 4.2B invocations/month (a streaming-data ingestion pipeline).

A founder DM'd us last month:

"Lambda is supposed to be the cheap option, right? Our serverless bill just hit $38K/month. What am I missing?"

What they were missing: 9 specific cost traps that compound.

We pulled their bill apart. Three functions accounted for 71% of the spend. All three were running 3072 MB of memory for workloads that fit in 512 MB. One had been recursively triggering itself for 11 days because of a bad S3 event filter. By the end of the audit we'd cut their Lambda bill from $38K to $11K/month — a $324K annualized save with no code refactor.

This is the playbook. Run it function-by-function on your top 10 highest-cost Lambdas.


How Lambda actually charges you

Two dimensions, billed every 1ms after a 1ms minimum:

Cost = (GB-seconds × $0.0000166667) + (Invocations × $0.0000002)

The first term is GB-seconds — the product of memory allocated × wall-clock duration. A function with 1024 MB memory running for 200ms costs:

1.0 GB × 0.2 s × $0.0000166667 = $0.00000333

That's a third of a cent per million invocations. So why is your bill $38K? Because the real-world Lambda cost equation looks more like this:

Real cost = (GB-s) + (invocations) + (provisioned concurrency) + 
            (CloudWatch logs ingestion) + (CloudWatch logs storage) +
            (data transfer out) + (NAT gateway egress for VPC functions)

The non-Lambda lines often dominate. CloudWatch Logs alone can match Lambda compute on a chatty function. Let's break down each trap.


Trap 1: Memory wildly over-provisioned (the #1 waste)

The default Lambda memory is 128 MB. New developers panic at the first cold start latency, bump to 1024, then to 3008, then forget about it. Six months later 40% of their monthly bill is unused RAM.

The fix — AWS Lambda Power Tuning:

AWS Lambda Power Tuning is a Step Functions state machine that runs your function across 5-7 memory sizes and plots cost vs. latency. Free, takes 4 minutes to set up.

# Deploy the tuner once
git clone https://github.com/alexcasalboni/aws-lambda-power-tuning
cd aws-lambda-power-tuning && sam deploy --guided

# Then for each function:
aws stepfunctions start-execution \
  --state-machine-arn arn:aws:states:us-east-1:...:lambda-power-tuning \
  --input '{"lambdaARN":"arn:aws:lambda:us-east-1:...:function:my-fn","powerValues":[128,256,512,1024,1536,2048,3008],"num":50,"strategy":"cost"}'

The output gives you the cheapest memory size for your real workload. Median saving across 6 audits: 47% of compute spend.

Counter-intuitive truth: bumping memory often lowers total cost because you also get proportionally more CPU. A function at 1024 MB might cost less than the same function at 512 MB if it finishes 3× faster.


Trap 2: Provisioned concurrency you don't need

Provisioned Concurrency (PC) keeps Lambda environments warm to eliminate cold starts. It's billed at $0.0000041667/GB-s whether or not the function is invoked. That means 24/7 PC on a 1024 MB function costs ~$108/month per provisioned unit, doing nothing.

Three traps here:

  1. Set-and-forget PC at peak capacity. Most teams provision for peak QPS and never scale it down. Use Application Auto Scaling to scale PC by ScheduledScalingPolicy (drop to 1 unit during nights/weekends).
  2. PC on async workloads. Cold starts don't matter for SQS-triggered or EventBridge-scheduled functions. PC there is pure waste.
  3. PC + ARM mismatch. Provisioned Concurrency on x86_64 when you could be on Graviton (arm64) costs 20% more for nothing.

Quick audit:

aws lambda list-provisioned-concurrency-configs --function-name $FN
# Multiply by hours/month × $0.0000041667 × MemoryMB / 1024

If you can't justify the cold-start latency to a real user-facing path, kill the PC.


Trap 3: Recursive / event-loop triggers (the silent budget killer)

This one always looks shocking in the audit:

A function listens to S3 ObjectCreated events. The function processes the file and writes a transformed version back to the same bucket. The transformed file triggers the function again. Infinite loop.

We've seen 4 variants of this:

  • S3 → Lambda → S3 (same bucket)
  • DynamoDB stream → Lambda → DynamoDB write (same table)
  • EventBridge → Lambda → EventBridge put-events (same bus, same pattern)
  • SNS → Lambda → SNS publish (same topic)

A single recursive trigger can rack up millions of invocations a day before anyone notices. The cost ramps slowly because each invocation is cheap, until the bill arrives.

The fix:

  1. Always scope event filters tightly — exclude the prefix or suffix the function writes to.
  2. Set a circuit-breaker metric: CloudWatch alarm on Invocations > 1.5× last week's same-hour baseline.
  3. Use Lambda recursion detection (now built-in for some triggers): https://docs.aws.amazon.com/lambda/latest/dg/invocation-recursion.html

Also: check your DLQ (Dead Letter Queue). One customer had a poison message replaying through their function 12,000 times/hour for 3 weeks before anyone noticed. Set max-receive-count = 5 on every SQS source.


Trap 4: CloudWatch Logs ingestion (the hidden 30%)

CloudWatch Logs charges:

  • $0.50 per GB ingested
  • $0.03 per GB-month stored

If your Lambda logs every request body at INFO level, you can spend more on logs than on compute. We had a customer ingesting 1.2 TB/month of debug logs from a single function. That's $600/month — almost 2× the function's compute cost.

The fix:

  1. Set log retention to 14 days by default (it's "never expire" out of the box). Older = useless and expensive.
    aws logs put-retention-policy --log-group-name /aws/lambda/my-fn --retention-in-days 14
    
  2. Move debug logs out of production. Use environment-aware log levels.
  3. Sample. If you're logging every request, log 1% of them with a sampling middleware.
  4. Don't log entire request payloads. PII risk + cost balloon.

Trap 5: ARM64 (Graviton) migration left undone

AWS Lambda on Graviton is 20% cheaper for the same memory AND typically 5-15% faster on Python/Node/Java workloads. The migration is a single Architectures: arm64 line in your SAM/CDK/Terraform.

Why most teams haven't migrated:

  • Native binary deps need a rebuild (pip install --platform=manylinux2014_aarch64 ...)
  • Lambda layers need to be rebuilt for arm64
  • Some image-processing libs (e.g., older Pillow versions) had ARM bugs

For 90%+ of pure-Python or Node functions: drop in, ship. Test in dev for a week, monitor errors, promote.

Across 6 audits, the median Lambda fleet was 7% ARM-migrated. That's leaving 12-15% of total Lambda spend on the table for free.


Trap 6: VPC NAT Gateway egress on Lambda

If your Lambda runs in a VPC (to talk to RDS, ElastiCache, an internal API) and needs internet access (for outbound API calls, S3 outside the VPC, etc.), it goes through a NAT Gateway.

NAT Gateway pricing:

  • $0.045 per hour ($32.40/month)
  • $0.045 per GB processed

Two traps:

  1. Many Lambdas, one NAT. If you have 12 Lambdas calling Stripe webhooks through a NAT Gateway, you're funneling all traffic through that one $32/month + per-GB. Use VPC endpoints for AWS services (S3, DynamoDB, Secrets Manager) — no NAT needed.
  2. Lambda outside VPC when possible. If a function only calls public APIs (Stripe, Slack, OpenAI), it doesn't need to be in a VPC. Take it out, save the NAT cost.

We saved one customer $2,800/month just by moving 4 outbound-only Lambdas out of their VPC and into the default Lambda networking model.


Trap 7: Synchronous invocations that should be async

# Bad: API Gateway → Lambda → SES (synchronous)
# Costs you 800ms × the user's wait
# Costs you Lambda time during SES API delay

# Good: API Gateway → Lambda (acks fast) → SQS → Lambda (sends email)
# First Lambda: 50ms
# Second Lambda runs async, can fail+retry

The async pattern halves duration, frees up the user-facing thread, and lets you batch (5-10 messages per Lambda invocation = fewer invocations, less cost).


Trap 8: API Gateway in front of every Lambda

Default pattern: API Gateway REST API → Lambda. AGW REST costs $3.50 per million requests + per-GB transferred.

For internal services, microservice-to-microservice calls, or webhooks under 100K req/day:

  • Use Function URLs (free, built into Lambda)
  • Use API Gateway HTTP APIs instead of REST ($1.00/million instead of $3.50)
  • Use ALB + Lambda target for high-volume public APIs ($0.008/LCU + free tier)

For one customer, swapping from AGW REST to HTTP for their internal service mesh saved $8,400/month.


Trap 9: Concurrent execution limits hitting throttles

If your account hits the regional concurrency limit (default 1000), invocations throttle. Throttles look free but they're not — they cause:

  • Failed user requests (revenue impact)
  • Dead-letter queue fan-out (more invocations)
  • Retry storms (3× the original cost)

Audit: check ConcurrentExecutions and Throttles CloudWatch metrics weekly. If Throttles > 0, request a limit increase or refactor noisy functions to use SQS batching.


The 30-minute self-audit

  1. Cost Explorer → group by Service → Lambda → Top 10 functions by cost
  2. For each top-10 function, in Lambda console:
    • Memory: do you need it? (Run Power Tuning)
    • Provisioned Concurrency: justify it or kill it
    • Architecture: is it arm64?
  3. CloudWatch → Logs → biggest log groups by ingested volume → reduce / sample
  4. Search Lambda triggers for recursion patterns (S3-writes-to-source-bucket)

Total time: 30 minutes. Typical first-pass savings: 30-50% of Lambda spend.


Real numbers from one audit (composite)

Customer: B2B fintech, ~80 Lambdas in prod, $38K/month bill before audit.

TrapAnnual savings
Memory over-provisioning (47 functions tuned)$158K
Recursive S3 trigger (one function)$61K
CloudWatch log retention + sampling$44K
Removed unused PC on 8 async functions$38K
ARM migration (52 of 80 functions)$14K
Removed AGW REST in front of 12 internal services$7K
Total annual$324K (-71%)

How CARTIE AI helps

CARTIE AI's Lambda audit runs all 9 trap checks against your AWS account, ranks functions by potential savings, and gives you copy-paste fixes (the Power Tuning JSON, the retention-policy CLI, etc.). Typical first-scan: $3K–$25K/month projected savings.

Even without a tool, the 30-minute self-audit will find $500–$3K/month of waste in any Lambda fleet over $5K/month of spend.

Now go check your top function's memory setting. ⚡

Free · Printable · Ready to run

Get the 9 Lambda Cost Traps — Audit Checklist

Print this, walk through your top 10 functions, ship the fix.

No spam. The founder reads every reply personally.

Go deeper · Field guide
☁️

AWS Cost Optimization: The Complete Guide for FinOps Teams (2026)

Amazon Web Services is the largest cloud platform in the world — and the source of more than half of the cloud waste we audit. This guide gives you the 14 prove…

Read the AWS guide

FREE — NO SIGNUP — 60 SECONDS

Find your Snowflake waste right now.

Take the free 10-question Snowflake Cost Health Score. Get a grade, your monthly $-waste estimate, and the top 3 fixes — instantly.

THE FINOPS BRIEF

3 cost-saving tips, every Tuesday.

Built for finance & engineering teams who are tired of paying for cloud they don't use. No fluff. Just what works.

Unsubscribe anytime. We never sell your data.

Lakshmi Kiranmai Guduru

ABOUT THE AUTHOR

Lakshmi Kiranmai Guduru

Founder, CARTIEAI · Building in public

I'm building CARTIE AI to fix the cloud-cost problem I saw drain millions at companies I worked for — where engineering and finance kept talking past each other. If you liked this post, here's where I share unfiltered notes on building this in public:

Keep reading

We value your privacy. Cookies help us improve your experience. Learn more

Install CARTIE AI

Add to your home screen for quick access and offline support