Back to home
Back to SOC 2 readiness
SOC 2 evidence document

Capacity Planning Runbook

Triggers for scaling MongoDB, APScheduler, and the FastAPI/React surface — sized for honest growth.

Version 1.0 Last reviewed 2026-02-18Owner: Founder

Current capacity (Feb 2026)

  • FastAPI backend: single-pod (Kubernetes), 2 vCPU + 4GB RAM, supervisor-managed
  • React frontend: single-pod static-served via nginx, behind the same ingress
  • MongoDB Atlas: M10 dedicated cluster, ~300 GB storage, 3-node replica set
  • APScheduler: in-process AsyncIO scheduler (single instance) running ~12 cron jobs

Scaling triggers

Each trigger is a measurable threshold + a documented response. We do NOT speculatively over-provision; we DO move when a threshold trips.

FastAPI backend

MetricTriggerResponse
p95 request latency> 800ms for 24hAdd horizontal replica (HPA target: 60% CPU)
CPU utilisation> 70% sustained over 4hScale pod RAM/CPU up one tier
Error rate> 0.5% over 1hSEV-2 incident; do NOT auto-scale, investigate first
Concurrent tenants> 50 active orgsSwitch from in-process scheduler to externalized (Redis-backed)

MongoDB Atlas

MetricTriggerResponse
Storage used> 80% of cluster sizeUpgrade tier (M10 → M20 → M30 → sharded)
Connections> 70% of poolIncrease Motor pool size; if still hot, increase replica count
Per-tenant DB count> 200 tenant DBs on one clusterMove to multi-cluster sharding (router lookup by tenant prefix)
Replication lag> 5 sec on secondariesSEV-3 ticket; investigate slow queries; index audit

APScheduler / cron

MetricTriggerResponse
Job execution time> 50% of intervalOptimize query; split job by tenant batch
Job count> 20 concurrent jobsExternalize scheduler to dedicated worker pod
Missed runs≥ 2/weekIncrease misfire_grace_time + move to externalized scheduler

Headroom check cadence

Monthly headroom review (every 1st Tuesday): Founder reads the Atlas + Sentry dashboards, files a "headroom report" in capacity_reports. If any metric is within 25% of its trigger, we move BEFORE the trigger fires.

Forecasting

CARTIE's own Roadmap-Aware Forecast is used internally to predict cost (and therefore capacity) as customer count grows. The Jira automation that imports product-launch events also fires capacity-review tickets when projected MAU growth exceeds 50%.

Sign-off

Runbook approved Feb 18, 2026. Next review: May 18, 2026.

Linked SOC 2 controls
A1.2

We value your privacy. Cookies help us improve your experience. Learn more

Install CARTIE AI

Add to your home screen for quick access and offline support