AI Summary - 20-sec read - Reviewed by experts
- Stores go down during flash sales because capacity is fixed for normal traffic and cannot absorb a sudden spike. Auto scaling adds and removes capacity automatically so you neither crash at the peak nor pay for the peak all month.
- Use target tracking - hold a metric like CPU or requests-per-target at a set level and let AWS add instances or tasks to keep it there. It is simpler and more reliable than hand-tuned step rules for most stores.
- Scaling is only as good as its slowest part. New instances take minutes to boot, so use warm pools or fast-starting containers, and scale the actual bottleneck - often the database or a single service, not the web tier.
- For known peaks (a launch, a sale, Black Friday) do not wait for reactive scaling - schedule capacity ahead of time, and load-test before the event so you discover the limit in rehearsal, not live.
- Short on time? Book a free call.
Short on time? Book a free call.
You send the email, the influencer posts, and traffic goes from a trickle to a flood in ninety seconds. That is the moment you planned for - and it is the moment the site returns a 503. The campaign did its job; the infrastructure did not. A storefront that falls over during its own flash sale is not just lost orders that hour, it is the customers who tried, failed, and will not come back, and it almost always traces to capacity that was sized for an ordinary Tuesday.
Auto scaling on AWS is the fix, but the default, out-of-the-box version disappoints people for three predictable reasons: it reacts too slowly, it scales the wrong component, and it never scales back down. This is how to set it up so AWS adds capacity fast enough to absorb a spike, scales the part that is actually the bottleneck, and shrinks again afterwards so you are not paying flash-sale prices every day of the month.
Why fixed capacity fails on sale day
Most stores run a fixed number of servers or containers sized for typical load with a little headroom. That is efficient on a normal day and fatal on a spike. When traffic jumps 5x in minutes, the fixed fleet saturates - CPU pins, request queues back up, response times climb, and then requests start failing. The irony is that you could have afforded the capacity; you just had no mechanism to add it in time.
The two bad alternatives are over-provisioning (run peak capacity all the time and pay for idle 360 days a year) and crossing your fingers. Auto scaling is the third option: capacity that tracks demand, up during the sale and down after it. Done right, it is both the reliability fix and a cost control, because you stop paying for a peak you hit twice a quarter.
Use target tracking, not hand-tuned rules
AWS gives you a few scaling policy types, and the one most teams should default to is target tracking. You pick a metric and a target value - keep average CPU at 60 percent, or keep requests-per-target on the load balancer at a set number - and AWS adds or removes capacity automatically to hold that line. It behaves like a thermostat: you set the temperature, it manages the furnace.
- Target tracking is the right starting point for web and app tiers. It self-adjusts, needs little tuning, and avoids the brittleness of hand-written thresholds.
- Step scaling earns its place only when you need different reactions at different severity levels - add two instances at 70 percent, ten at 90. More control, more to maintain.
- Scheduled scaling sets capacity by the clock for events you can predict, which we will come back to for known peaks.
Pick the metric that actually reflects user pain. CPU is fine for compute-bound apps; requests-per-target or p95 latency is often a truer signal of whether real users are waiting. The goal is to scale on the thing that goes bad first, and scaling on the right metric depends on knowing your real utilization - the same measurement discipline behind right-sizing EC2 so you stop overpaying.
Worried your store will not survive the next big sale?
Get a free audit. Send us your AWS setup and your traffic pattern and we will find the component that breaks first and design the scaling and pre-warming to survive your next peak. No pitch, reply in 2 hrs, no card needed, NDA on request.
Get a free auditScaling is only as fast as its slowest part
The most common disappointment with auto scaling is that it triggers correctly but arrives late. The policy fires the instant CPU crosses the target - and then a brand-new EC2 instance still has to boot the OS, start the app, pass health checks, and register with the load balancer. That can be several minutes, and a flash-sale spike is over in less than that. The capacity shows up after the wave has already broken.
Three things close that gap:
- Warm pools. Keep pre-initialised, stopped instances ready to come online in seconds instead of minutes. For EC2 Auto Scaling, a warm pool is the difference between catching a spike and missing it.
- Fast-starting compute. Containers start far faster than full instances. Running the spiky tier on a serverless container model so new tasks launch in seconds is exactly the kind of bursty workload that suits Fargate, which is part of the Fargate versus EC2 decision for ECS.
- Lean startup. Bake the app into the image so a new node is ready on boot rather than installing dependencies on the way up. Every second of startup is a second the spike is unserved.
Equally important: scale the right thing. Adding ten web servers does nothing if the bottleneck is the database, a payment service, or a single overloaded microservice. Find the component that saturates first under load and scale or protect that - read replicas and connection pooling for the database, capacity for the specific service that pins. A CDN in front, so static and cacheable traffic never reaches your origin at all, removes a huge share of the load before scaling even has to engage, which is the same leverage as using CloudFront to cut page load and offload the origin.
For known peaks, pre-scale - do not react
Reactive scaling is for surprises. A product launch, a flash sale, or Black Friday is not a surprise - you know the date and roughly the size. For those, waiting for the metric to cross a threshold is the wrong move, because reactive scaling is always a step behind the curve. Use scheduled scaling to raise the floor before the doors open: set minimum capacity up an hour before, hold it through the event, and let it fall afterwards. Predictive scaling, which forecasts from historical patterns, can add to this for recurring cycles.
And rehearse. The only way to know your stack survives 10x is to push 10x at it in a load test before the event, watch what breaks first, fix that, and repeat. Teams that load-test discover the limit in a rehearsal they control; teams that do not discover it live, in front of paying customers. The reliability engineering that makes peaks boring is the core of any serious managed cloud service, and scoping it properly is what a focused AWS consulting engagement exists to do.
Takeaways
- Fixed capacity crashes on spikes; auto scaling tracks demand up and down, fixing reliability and cost at once.
- Default to target tracking on a user-pain metric (requests-per-target or p95 latency), not hand-tuned step rules.
- Close the speed gap with warm pools and fast-starting containers, and scale the real bottleneck - often the database, not the web tier.
- For known events, schedule capacity ahead and load-test first, so you find the limit in rehearsal, not live.
Do not forget to scale back down
The half of auto scaling everyone forgets is the scale-in. If capacity ramps up for the sale and never comes back down, you have quietly turned a two-hour peak into a month-long bill, and the cost saving that justified auto scaling evaporates. Set sensible scale-in policies with cooldowns so the fleet shrinks after the wave without flapping up and down on every minor dip. The whole point is to pay for the peak only while you are in it. Capacity that goes up and stays up is just over-provisioning with extra steps.
Want a store that stays up when the campaign works?
Talk to a team that designs auto scaling, warm pools, and pre-scaling for UK and US ecommerce on AWS - and load-tests it before your peak so sale day is boring. No pitch, reply in 2 hrs.
Book a free callFAQ
Why does my site still crash even though auto scaling is on?
Usually one of three reasons: scaling reacts too slowly because new instances take minutes to boot, it scales the web tier while the real bottleneck is the database or a single service, or the spike is over before capacity arrives. Warm pools or fast-starting containers fix the speed problem, and finding the true bottleneck under a load test fixes the wrong-component problem.
What metric should I scale on?
Scale on the signal that reflects user pain first. For compute-bound apps, CPU works; for web tiers, requests-per-target on the load balancer or p95 latency is often a truer indicator that real users are waiting. The right metric is whichever one degrades first under your load - which you learn by testing, not by guessing.
How do I prepare for a known event like Black Friday?
Do not rely on reactive scaling alone. Use scheduled scaling to raise minimum capacity before the event starts, hold it through, and scale down after. Add predictive scaling for recurring patterns. Most importantly, load-test at the traffic you expect beforehand so you find and fix the first breaking point in rehearsal rather than live.
Does auto scaling save money or cost more?
Done correctly it saves money, because you pay for high capacity only during the peak instead of running it all the time. The trap is forgetting scale-in: if capacity ramps up and never comes back down, you turn a short peak into a permanent bill. Sensible scale-in policies with cooldowns are what make auto scaling a cost control rather than a cost.
The takeaway: a flash sale should be your best day, not your worst outage. Default to target tracking on a metric that reflects real user pain, close the speed gap with warm pools and fast-starting compute, scale the bottleneck rather than the web tier, pre-scale and load-test for events you can see coming - and always scale back down so the peak does not bill you all month.
Founder and CEO of Braincuber. Has scoped and shipped 500+ Odoo, AI, and cloud projects for US mid-market and global brands. Takes every founder call personally — no SDR layer between buyers and the people building the system.
