A $4M beauty brand's BFCM flash sale strategy for the last two years: one camera on a tripod for YouTube Live, one phone propped on a box at a slightly different angle for TikTok LIVE. The social manager watches both feeds from a laptop, jumps between the two platform dashboards, and manually repositions the phone whenever the presenter moves too far from center. Two hours of live content, three people actively involved in production, and two completely independent streams with different lighting, audio sync, and framing — none of it planned to match.
The YouTube stream looks professional. The TikTok stream looks like what it is: a phone on a box.
AWS Elemental MediaLive with Elemental Inference smart crop solves this with a single channel. One camera feeds MediaLive. Elemental Inference analyzes every frame and produces subject-tracking metadata — where the presenter is, where the product is, what the dominant subject in frame is at any moment. MediaLive uses that metadata to generate two simultaneous outputs: the original 1280×720 landscape for YouTube, and a 720×1280 portrait for TikTok with the subject automatically centered and tracked. The presenter walks left — the vertical crop follows. They hold up a product — the crop tightens around it. No phone on a box, no social manager repositioning equipment mid-stream.
Running D2C live events and managing multiple platform feeds manually? Book a 30-min audit — Dhwani joins every call, we review your current live stream setup and map the MediaLive architecture to your specific event cadence. Written brief inside a week. No SDR layer.
What Smart Crop Actually Does — and Does Not Do
Smart crop is spatial reframing, not content-aware editing. Elemental Inference does not cut clips, add transitions, detect product names, or select highlight moments. What it does: analyze each video frame, identify the dominant subject using computer vision, and output frame-level bounding box metadata specifying where in the 16:9 frame that subject sits. MediaLive reads that metadata and applies a pan-and-crop transformation to produce the vertical output with the subject positioned in the upper-center of the 9:16 frame — where mobile viewers look first.
The subject detection works on faces and human figures. For a D2C live stream where a presenter is demonstrating a product, the presenter is the tracked subject. When they hold the product up and lean in, the crop tracks toward them. When they step back to show the full packaging, the crop zooms out within the vertical frame to maintain the subject in view. This is the behavior that makes the vertical output usable rather than just technically valid: a center crop of a 16:9 stream that was shot for landscape viewing looks like a rectangle punched out of the middle. Smart crop looks like the stream was intentionally shot vertical.
What smart crop does not fix: a horizontal camera composition with wide establishing shots, heavy background elements, or a presenter who spends significant time at the frame edges. Smart crop works best when the horizontal stream was shot with at least some awareness that a vertical output would be generated — keeping the presenter in the central 70% of the frame rather than the full width.
The Live Event Architecture for a D2C Flash Sale
The full workflow for a dual-output D2C flash sale stream:
Camera and encoder. The camera (or cameras) feeds a hardware encoder or a software encoder like OBS. OBS or the hardware encoder sends an RTMP stream to a MediaLive input endpoint. MediaLive supports RTMP push inputs, so no additional ingest infrastructure is needed — the encoder pushes directly to the MediaLive channel URL and stream key generated during channel setup.
MediaLive channel configuration. The channel is configured with two output groups. Output group 1: RTMP output pointing at the YouTube Live RTMP ingest URL, video encoded at 1280×720, QVBR rate control at 4 Mbps, standard H.264. Output group 2: RTMP output pointing at TikTok LIVE's RTMP ingest URL, video encoded at 720×1280, QVBR rate control at 1.5 Mbps, with SMART_CROP set as the scaling behavior on the output encode. Elemental Inference is associated with the channel and runs frame analysis continuously for the duration of the live event.
Archive to S3. A third output group — an HLS archive output to S3 — captures both the landscape and portrait streams as segmented files during the live event. Post-event, the S3 archive becomes the source for Reels clips, Shorts repurposing, and any highlight edits. The vertical archive is already in the correct format; the Reels editor pulls the relevant time segments without any reframing work.
Stream health monitoring. CloudWatch metrics from MediaLive — OutputVideoFrameRate, NetworkOut, and ActiveAlerts — feed a dashboard that the social manager monitors during the event. A CloudWatch Alarm on ActiveAlerts sends an SNS notification to Slack if the channel enters a degraded state, giving the team a heads-up before viewers notice stream quality drop. For a deeper look at how AWS monitoring integrates with D2C incident response, see our D2C incident investigation post.
Cost Math for a Flash Sale Stream
AWS Elemental MediaLive pricing is per-hour for the duration the channel is running, based on the input and output configuration:
For an HD channel (1280×720 or higher) with a single RTMP input and two RTMP outputs plus one HLS archive output, expect approximately $2.00–$2.40 per hour for a single-pipeline channel (single-pipeline is appropriate for live events where a brief failover gap is acceptable; dual-pipeline adds redundancy and roughly doubles the cost).
Elemental Inference adds additional per-minute charges for the AI frame analysis running alongside the channel. AWS does not publish a flat per-minute rate separately from MediaLive in all regions, but in practice the inference cost adds 15–25% to the base MediaLive hourly rate for a channel with smart crop enabled.
For a 2-hour BFCM flash sale event: approximately $5–7 total AWS cost, all in. Compare that to:
- A freelance video editor reframing 2 hours of footage post-event: $120–200
- A second camera operator to run the TikTok LIVE setup: $150–300
- Lost TikTok and Reels reach from posting a center-cropped horizontal video days late: uncalculated but real
There is no MediaLive free tier. Every test run costs money. Budget a 30-minute test event ($1.50–2) before the first live flash sale to confirm the encoder settings, RTMP destinations, and smart crop behavior are working as expected.
When MediaLive Is Overkill: The Pre-Recorded Product Video Pipeline
For pre-recorded product videos — the 60-second demo shot in the brand's studio, the unboxing video, the ingredient walkthrough — MediaLive is the wrong tool. MediaLive is a live video processing service; running it on a file that was recorded yesterday adds unnecessary complexity and cost.
The right pipeline for VOD product video reframing is a Lambda + Rekognition + FFmpeg pattern:
- Product video uploaded to S3 triggers a Lambda function
- Lambda calls Amazon Rekognition
DetectFacesorDetectLabelson sampled frames to get subject bounding boxes - Lambda calculates the optimal 9:16 crop window across the video timeline based on where the subject appears
- Lambda invokes an ECS task or Elastic Transcoder job with FFmpeg to apply the crop and encode the vertical output
- Vertical output lands in a separate S3 prefix, ready for Reels/TikTok upload via the platform API or manual post
Cost per video processed: approximately $0.05–$0.20 depending on video length and the number of Rekognition API calls. This pipeline sits within our AI solutions for D2C offering — the same computer vision approach applies to product image processing, catalogue enrichment, and UGC moderation.
The decision rule: live events → MediaLive + Elemental Inference. Pre-recorded → Lambda + Rekognition + FFmpeg. Mixing them — using MediaLive to process pre-recorded files, or trying to build a live smart crop on Lambda — adds cost and complexity without benefit.
What Breaks in Practice
TikTok stream key rotation. TikTok LIVE requires a fresh RTMP stream key for each session. The stream key is not persistent. Before each flash sale event, the MediaLive output group's TikTok RTMP destination must be updated with the new stream key — a 2-minute configuration update in the MediaLive console or via the AWS CLI. Build this into the pre-event checklist; missing it means the TikTok output starts streaming to an expired endpoint and goes nowhere.
Presenter frame discipline. Smart crop tracks the dominant subject but cannot compensate for wide horizontal compositions where the presenter routinely stands near the far edges of the frame. If the 16:9 shot is composed to show the full studio — props on the left, backdrop on the right, presenter in the center — the vertical output works well. If the presenter moves to demonstrate different products across the full horizontal width of the set, the vertical output will pan significantly and can feel disorienting. A 30-minute session with the presenter to show them the vertical preview output before the event produces noticeably better results.
Inference latency during subject transitions. Elemental Inference introduces a short analysis lag — typically 2 to 4 frames — between the subject moving and the vertical crop following. For standard flash sale content, this is imperceptible. For live commerce integrations where viewers are expected to tap a product in real-time as it appears on screen, confirm that the latency is within the platform's interactive commerce timing requirements before relying on smart crop for product reveal moments.
Flash sale coming up? Book 30 minutes with Dhwani — we size the MediaLive channel for your stream volume, review the OBS settings and RTMP destinations, and set up the S3 archive for post-event Reels repurposing. Written brief inside a week.
Frequently Asked Questions
Can the MediaLive smart crop output go directly to Instagram Reels?
Not directly from MediaLive during the live event. Instagram does not support third-party RTMP ingest for Reels the way TikTok and YouTube do. The workflow for Reels: archive the smart-cropped vertical stream to S3 during the live event, then post-event, publish the S3 clip to Reels via the Instagram Graph API or manually. The vertical output is already in the correct 9:16 format — no re-encoding needed before upload. For Instagram Live specifically (not Reels), third-party RTMP is supported for Creator and Business accounts, so a MediaLive output group pointing at the Instagram Live RTMP endpoint works during the event itself with smart crop applied.
What happens when Elemental Inference can't detect a subject in the frame?
When Elemental Inference has no subject to track — the presenter steps out of frame, the shot is a static product close-up with no face, or low lighting reduces detection confidence — smart crop defaults to a center crop of the 16:9 frame. Center crop is what most D2C brands use today for their vertical social content, so the fallback is no worse than the current baseline. The main place this matters: during product close-up shots where the camera pans away from the presenter to show product detail, the vertical output will center-crop the horizontal shot rather than tracking to the product label or texture being shown. Composing product close-ups to keep the subject centered in the horizontal frame avoids this limitation.
Should every D2C live stream use this, or only high-stakes events?
The MediaLive cost structure favors episodic events over continuous daily streams. At approximately $2.00–$2.40 per hour plus Elemental Inference, a weekly 30-minute product demo stream runs around $15–20 per month in AWS costs — reasonable if both TikTok and YouTube audiences are meaningful for the brand. For daily streams, the cost compounds and dedicated stream management platforms with flat monthly multi-destination fees are worth comparing. The build makes the strongest case for quarterly or monthly high-stakes events: BFCM flash sales, collection launches, influencer reveal events — where the production investment has a clear revenue return and the S3 archive becomes a content asset for weeks of post-event Reels and Shorts repurposing.
Founder and CEO of Braincuber. Has scoped and shipped 500+ Odoo, AI, and cloud projects for US mid-market and global brands. Takes every founder call personally — no SDR layer between buyers and the people building the system.
