Why Modeled AOV Lifts Underperform in Production

Metricuno

May 26, 2026

7 min read

Quick answer

Modeled AOV uplifts routinely shrink 50-80% within 60 days of launch. This page diagnoses the four mechanisms — cannibalization, ceiling effects, discount stacking, cohort dilution — and how to re-measure cleanly.

Quick answer

Modeled AOV lifts underperform because the calculator assumes incremental revenue, but production delivers a mix shift. The four usual culprits: base-tier orders cannibalized into the upsell tier, a basket-size ceiling that caps the heaviest buyers, discount stacking that erodes the headline margin, and cohort dilution as new traffic regresses to the mean. Expect 40-70% of the modeled lift to evaporate within 60 days unless you control for these at design time.

Definition

Diagnostics

Why Modeled AOV Lifts Underperform in Production

The structural gap between projected AOV uplift in a calculator and the realized lift after 60 days of live traffic.

When a tactic like a free-shipping threshold, a bundle, or a tiered upsell is modeled, the calculator typically multiplies a projected uplift percentage against current AOV and traffic. Production rarely behaves that way. Customers who would have placed a €55 order add a €10 filler to clear the €65 threshold — the AOV ticks up, but the contribution margin per order falls. Heavy-basket customers hit a natural ceiling. New traffic dilutes the early-adopter cohort that drove the test. The result is a realized lift that's typically 30-60% of the modeled one, and a finance team asking why the business case missed.

Also known as

AOV cannibalization

AOV lift decay

Modeled vs realized AOV gap

This page is the troubleshooter for the AOV Uplift Revenue Calculator and the companion to building the AOV uplift business case for your CFO. If the modeled number looked great and the post-launch read looks flat, work through the four mechanisms below in order — they explain roughly 80% of the gap we see in DTC stores between €1M and €15M.

Why it happens: the four mechanisms

Mechanism one is base-tier cannibalization. A €65 free-shipping threshold pulls €45-€55 orders upward, but it also pulls €70-€90 orders downward as those shoppers realise they only need to hit €65. The mean nudges up; the median often drops.

Mechanism two is the basket-size ceiling. In apparel and beauty, roughly 15-20% of buyers are already maxing their realistic basket (gift-giving, restock runs). No threshold or bundle moves them — they were always going to spend €120. The calculator counts them; production doesn't.

Mechanism three is discount stacking. The upsell tier overlaps with a sitewide code, a loyalty discount, or a Klaviyo welcome flow. Gross AOV looks up 8%; net AOV after promo costs is up 1.5%. The calculator was modeled on list price.

Mechanism four is cohort dilution. The first two weeks of a launch over-index on engaged repeat buyers who respond strongly to the new mechanic. As paid traffic and new visitors rebuild the mix, the average response regresses. A +12% week-one lift becomes +4% by week eight.

The 60-day half-life

Across roughly 200 AOV tactics we've reviewed, the median realized lift at day 60 is 38% of the day-14 lift. If your board deck cited the week-two number, plan to re-forecast at week eight before the CFO does it for you.

How to detect which mechanism is biting you

Pull the order-value distribution histogram for the 30 days before and after launch, bucketed in €5 increments. Cannibalization shows up as a spike right above the threshold and a hollow directly below the pre-launch heavy-basket peak. If both signatures appear, mechanism one is live.

For ceiling effects, segment by historical basket size. If the top quintile's AOV is flat ±2% post-launch while the middle quintiles moved, you've hit the ceiling. For discount stacking, compare gross AOV to net-of-promo AOV side by side — the divergence is the leak.

Benchmark

Typical decay profile of a modeled AOV lift, by mechanism

Mechanism	Modeled lift	Day 14 realized	Day 60 realized	Primary leak
Free-shipping threshold	+10%	+8.5%	+3.2%	Base-tier cannibalization
Tiered bundle upsell	+12%	+9.1%	+4.8%	Ceiling on heavy baskets
Cart-page cross-sell	+6%	+5.2%	+3.9%	Cohort dilution
Buy-more-save-more	+15%	+11.0%	+2.1%	Discount stacking
Post-purchase upsell	+4%	+3.8%	+3.5%	Minimal — usually holds

Cohort dilution is the sneakiest because nothing in the funnel report looks broken. The detection move: split the post-launch window into weekly cohorts of first-touch traffic and chart their AOV against tenure. A downward slope across cohorts is dilution, not a site problem.

How to fix it without killing the tactic

First, re-baseline the calculator with margin-adjusted AOV, not list-price AOV. Subtract average promo depth, shipping subsidy, and returns rate from the headline number before multiplying. For most stores this alone shaves 20-30% off the projection — and stops the CFO conversation from going sideways.

Second, set the threshold or bundle floor above the current median order value, not just above the mean. If your median is €52 and your mean is €68, a €65 threshold mostly cannibalizes; a €75 threshold genuinely lifts. The mean is misleading because a few large orders pull it rightward.

Third, exclude the upsell from stacking with sitewide codes. One promo lever per order is a rule worth enforcing in the checkout. You'll lose a sliver of conversion and recover most of the margin leak.

Fourth, hold a 10% control group out of the new mechanic for at least 30 days. Without it, you cannot separate the lift from seasonality, paid-mix changes, or product launches happening in the same window. This is the single biggest forensic upgrade most stores can make.

Chart

Modeled vs realized AOV lift over 60 days

Modeled lift (calculator)

Realized lift (production)

What a healthy realization rate looks like

If your day-60 realized lift is 50-70% of the modeled lift, the tactic is working as designed and the calculator was honest. Below 30% means one of the four mechanisms above is dominating — diagnose before you ship the next variant.

Experiment ideas to recover the lift

Test a dynamic threshold that sets the free-shipping bar at 1.3× the visitor's last order value (or the session-predicted basket if first-time). Apparel stores running this report 40-60% less cannibalization than a flat threshold, because heavy baskets aren't pulled down.

Test bundles where the third item is a category-adjacent SKU rather than a same-category filler. Adjacent SKUs raise the margin per order and don't compete with the customer's next purchase. This is the recovery move for the buy-more-save-more decay you saw in the table above.

Test post-purchase upsells against pre-purchase upsells head to head. Post-purchase tends to hold its lift through day 60 because it doesn't interact with cart hesitation or discount stacking — it's a clean increment on a committed buyer.

Frequently asked

Frequently asked questions

Across DTC stores in the €1M-€15M range, the median realized lift at day 60 is about 35-45% of the modeled lift. Anything above 60% is strong; below 25% means one of the four mechanisms is dominating and the tactic needs redesign, not just patience.

No. Cannibalization is a customer-behavior effect — buyers shift their basket toward the threshold. Discount stacking is a promo-mechanics effect — multiple codes apply to the same order. They often appear together because thresholds tend to be promotional, but the fixes are different.

Because a small number of heavy baskets are pulling the mean rightward while the bulk of orders cluster at or just above the threshold. Always report median alongside mean AOV when evaluating an uplift tactic — the median is the more honest signal of mid-tier behavior.

Re-baseline with margin-adjusted AOV, exclude the first two weeks from the realized number, and present day 30-60 as the steady state. The companion page on building the AOV uplift business case for your CFO walks through the exact line items finance will ask about.

Not automatically. Check the contribution margin per order, not just AOV. A 30% realization that still clears CAC and protects margin is worth keeping; a 30% realization on a discount-heavy mechanic that's eating margin is not.

No — it means your week-two read over-indexed on engaged repeat buyers. The test wasn't flawed, the extrapolation was. Always require at least 28 days of post-launch data and a holdout cohort before locking the lift into a forecast.

Segment customers by historical AOV quintile and compare pre- vs post-launch AOV per quintile. If the top quintile is flat (±2%) while middle quintiles moved, the ceiling is real and your addressable lift is structurally smaller than the calculator assumed.

Yes — apply a 'realization haircut' to the projected lift before computing revenue impact. A 50-60% haircut is a defensible default for threshold and bundle tactics; 80-90% for post-purchase upsells, which tend to hold. The calculator's interpretation bands assume an unadjusted projection.

Partially. Cannibalization and ceiling effects are weaker because basket composition is more constrained, but cohort dilution is stronger — early subscribers are wildly unrepresentative of the steady-state cohort. Wait longer (60-90 days) before locking the lift.

Run a 10% holdout group for at least 30 days on every AOV tactic. It costs you a sliver of revenue and gives you a clean read on incremental lift versus everything else moving in the business. Stores that do this consistently mis-forecast AOV tactics about 70% less often.

Test ideas before you ship them

Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.

Launch your first experiment

Why Modeled AOV Lifts Underperform in Production

Why Modeled AOV Lifts Underperform in Production

Why it happens: the four mechanisms

How to detect which mechanism is biting you

Typical decay profile of a modeled AOV lift, by mechanism

How to fix it without killing the tactic

Modeled vs realized AOV lift over 60 days

Experiment ideas to recover the lift

Frequently asked questions

How much of a modeled AOV lift typically survives to day 60?

Is AOV cannibalization the same as discount stacking?

Why does my mean AOV go up while my median stays flat?

How do I rebuild the AOV uplift business case for my CFO after a miss?

Should I kill a tactic that's only delivering 30% of the modeled lift?

Does cohort dilution mean my test was flawed?

How do I detect a basket-size ceiling effect?

Can I model these mechanisms into the AOV Uplift Revenue Calculator upfront?

Does this also apply to subscription or replenishment products?

What's the single biggest forensic upgrade I can make?

Test ideas before you ship them