Holdout vs Pre/Post for Attributing Checkout-CR Lift to CAC
A practical comparison of the two control designs for attributing a checkout-CR lift to a CAC change in 30 days — geo/audience holdouts vs pre/post — and which one holds up under CFO scrutiny.
Holdout vs Pre/Post for Attributing Checkout-CR Lift to CAC
Two control designs — geo/audience holdouts and pre/post-period comparison — for translating a checkout-CR win into a defensible 30-day CAC delta.
When a checkout-page experiment wins, finance does not pay you in conversion-rate points. They pay you in CAC reduction. To make that claim defensible inside a 30-day window, you need a control: either a slice of traffic that never saw the variant (a holdout) or a comparable prior period (pre/post).
Holdouts isolate the lift from spend and seasonality but cost you exposure and require enough volume to power the read. Pre/post is cheap and fast but inherits every confound the calendar throws at it — promo cadence, channel mix, paid-media auction shifts. Choosing between them is the difference between a CAC number that survives the CFO meeting and one that gets waved away.
The reason this choice matters is that checkout-CR sits one layer removed from CAC. A 6% relative lift on checkout doesn't mechanically reduce CAC by 6% — it reduces it only to the extent that paid traffic mix, AOV, and refund rate stay constant during the read. Your control design is what holds those constant on paper.
Most Shopify stores in the €1M–€15M range default to pre/post because they don't have the daily order volume to split a geo holdout cleanly. That's defensible — until paid spend changes mid-test or a Klaviyo flow goes out the same week. Then the CFO asks what would have happened anyway, and pre/post has no answer.
Holdout vs pre/post: what each control design actually controls for
| Dimension | Geo / audience holdout | Pre/post comparison |
|---|---|---|
| Controls for seasonality | Yes — concurrent | No — calendar-confounded |
| Controls for paid-spend shifts | Partial (if spend is geo-balanced) | No |
| Controls for promo / email cadence | Yes if randomised | No |
| Min daily orders to power 30-day read | ~120/day | ~60/day |
| Time to defensible CAC read | 21–30 days | 14–21 days |
| Revenue cost of running the control | 3–8% of variant-eligible sessions | Zero |
| CFO-defensibility score (1–5) | 4–5 | 2–3 |
The table makes the trade explicit: pre/post is cheaper and faster, but every row that matters for CAC attribution flips against it. If your finance team treats marketing claims as guilty until proven innocent, the cost of the holdout buys you the proof.
When each design survives CFO scrutiny
A geo holdout survives when you can hold one or two markets (say, Belgium and Austria for a DACH-focused apparel brand) at the control variant while the rest of the EU sees the winning checkout. Paid spend in the holdout geos stays proportional, and you compare CAC in test geos vs holdout geos over the 30-day window. The read is concurrent, so a TikTok auction spike hits both arms equally.
Pre/post survives only when nothing else moved. That means a frozen ad budget, no new creative rotation, no promo, no email-list growth spike, and no major channel-mix shift. On a Shopify store running performance marketing, those conditions almost never co-occur for 30 consecutive days. Pre/post is then a directional sanity check, not a CAC claim.
The pre/post trap most teams fall into
If your variant shipped the same week as a Black Friday teaser email, a Meta budget increase, or a new creative refresh, pre/post will attribute all of that lift to the checkout change. Finance will eventually notice — usually after you've raised the forecast on the back of it. A holdout, even a small one, prevents this.
What each design fails to control for
Holdouts don't control for cross-geo contamination — a customer who sees an ad in Belgium and buys on a German IP shows up in the test arm with a control-arm impression. They also leak when paid spend isn't actually geo-balanced; if Meta optimises bids toward the higher-CR test geos, the spend ratio drifts and your CAC comparison is no longer apples-to-apples. Lock spend per geo before you start.
Pre/post fails on confounds you can't see in the data: a competitor's stockout, a press mention, an algorithm refresh. Even with covariates (spend, channel mix, weekday) regressed out, you're modelling a counterfactual rather than observing one. That is the gap a sceptical CFO walks through. The companion read on the 30-day attribution window covers why shorter calendars make pre/post even less defensible.
Defensibility of CAC attribution by control design and store volume
Geo / audience holdout
Pre/post comparison
Frequently asked questions
Default to a geo or audience holdout if you have 100+ orders/day. Below that, run pre/post but caveat the CAC number heavily and pair it with a sample-size calculation showing you couldn't have powered a holdout in 30 days anyway.
Large enough to hit your minimum detectable effect — typically 10–20% of eligible sessions. Smaller holdouts save revenue but lengthen the read past the 30-day window, which defeats the point. Run the sample-size math first, then size the holdout to fit.
Yes — a concurrent holdout with a pre/period covariate adjustment is the gold standard. The holdout gives you the causal read; the pre-period baseline lets you check that the two arms were comparable before exposure. Most finance teams find this combination credible.
Shopify reports CR shifts but doesn't isolate them from paid-spend changes or promo cadence. The CR delta it shows is descriptive, not causal — fine for a weekly dashboard, not fine for a CAC claim that flows into a forecast.
Often, yes — small stores can't power a holdout in 30 days. But the window is chosen because that's how long it takes paid-channel CAC to stabilise after a CR change. If you extend to 60 days for a holdout, you trade speed for defensibility; that's a reasonable trade for a major checkout change.
Lock geo-level spend caps for the duration of the holdout, or pick geos where natural spend ratios already match the population split. Otherwise the auction will quietly reallocate budget toward the higher-CR arm and bias your CAC comparison.
Hold paid spend constant in the test arm, measure the order-count lift over 30 days, and divide spend by the new order count. The CAC delta is the difference vs the control arm's CAC over the same window — not vs last month's blended number.
For on-site CR experiments, audience holdouts are usually cleaner — no cross-geo leakage, randomisation at the user level. Geo holdouts win when the change affects channels you can't cookie (paid social impressions, offline). Match the holdout type to where the confound lives.
A budget change on Meta or Google, a new promo, a creative refresh, a press hit, or a competitor stockout. Any one of these inside the 30-day window means the pre/post CAC delta is no longer attributable to the checkout change.
Yes — the snippet handles audience and geo holdouts client-side, with the historical GA4 import giving you a pre-period covariate baseline on day one. That means you can ship the holdout on Monday and have a defensible 30-day CAC read without engineering involvement.
Track CAC, channels, and funnel conversion in one place
Metricuno connects ad spend, funnel events, and revenue so you can see CAC by channel, cohort, and campaign — without stitching together five tools.