How to use Incrementality Testing

Metricuno
May 21, 2026
6 min read
Quick answer

Geo holdouts, ghost bids, and conversion lift studies isolate the true incremental contribution of a paid channel — the only honest test of whether your spend is actually working.

Definition
Measurement

Incrementality Testing

A causal measurement method that isolates the incremental revenue a channel drives by comparing exposed audiences against a holdout group.

Incrementality testing answers a single question: if you turned this channel off, how much revenue would actually disappear? Instead of crediting conversions based on click paths or view-through windows, it compares a group exposed to ads against a statistically matched holdout that wasn't, and attributes the difference to the channel.

The three workhorse designs are geo holdouts (pause a channel in matched regions), ghost bids and PSA tests (the platform withholds ads from a randomised slice of eligible users), and conversion lift studies (Meta, Google, and TikTok all run them natively). Each gives you a causal lift number that platform-reported ROAS cannot.

Also known as
lift testing
causal measurement
holdout testing

Most paid-channel reporting is correlational. Meta sees a purchase happen within seven days of an impression and claims it. Google does the same. Klaviyo claims the email. Add it all up and attributed revenue exceeds actual revenue — sometimes by 40%.

Incrementality testing breaks that loop. By holding out a controlled slice of your audience and measuring the revenue gap, you get a number that survives scrutiny from a CFO. It's slower than last-click dashboards, but it's the only way to know which line of your media plan is actually earning its budget.

Why platform-reported ROAS overstates reality

Platforms optimise for users likely to convert anyway. A shopper who already added a moisturiser to cart yesterday is exactly who Meta's algorithm will retarget today — and exactly who would have come back without the ad. That converted purchase gets stamped as paid social revenue.

This is selection bias dressed up as performance. The platform is finding existing demand, not creating it, and the reported ROAS reflects both effects mixed together. For a Shopify store running brand-keyword Google search alongside prospecting Meta, the gap between reported and incremental ROAS can be 2-3x.

Attribution and incrementality solve different problems. Attribution allocates credit across touchpoints in a customer journey that already converted. Incrementality asks whether the journey would have ended in conversion at all. You need both, but only one of them tells you what to cut.

The brand-keyword trap

Google brand search is the most overstated line item in most paid budgets. People typing your brand name into Google would visit your site anyway. A geo holdout on brand keywords routinely shows 80-95% of that revenue is non-incremental — meaning you're paying Google to intercept traffic you already earned.

The three core test designs

Geo holdout tests pause a channel in a set of matched regions (say, the Netherlands and Belgium for a European apparel brand) while continuing spend in matched control regions. You compare revenue per region over 4-6 weeks and back out the lift. It works for any channel, including offline ones like out-of-home and TV.

Ghost bids and PSA tests run inside the ad platform: a randomised slice of users in the target audience is withheld from your ads (or shown a public service announcement instead). Meta, Google, and TikTok all support a version of this natively as conversion lift studies, and they're the fastest way to get a causal answer — usually 2-4 weeks.

Chart

Minimum detectable lift by test design (weekly revenue €100k store)

0%5%10%15%20%25%Platform conversion lift (Meta/Google)Geo holdout (10 matched pairs)Geo holdout (4 matched pairs)PSA / ghost-bid testSwitchback (national on/off)Min detectable liftTest design

Conversion lift studies are the cleanest design but only work inside one platform at a time and require minimum spend thresholds (typically €30-50k for Meta over the test window). Geo holdouts are more flexible and work across channels, but you need enough matched pairs of regions to hit statistical power.

Typical incremental rates by channel

Once you start running these tests, a pattern emerges. Top-of-funnel prospecting channels usually show high incrementality — they really are reaching people who wouldn't have converted otherwise. Retargeting and brand search show low incrementality because they harvest demand that already exists.

The numbers below are ballparks from published lift studies and our own client work across apparel, beauty, and home goods stores in the €1M-€15M revenue range. Your mix will vary — but the ordering rarely does.

Benchmark

Incremental share of platform-reported revenue, by channel

ChannelReported ROASIncremental ROASIncremental share
Meta prospecting2.4x1.8x70-80%
Meta retargeting6.0x1.5x20-30%
Google non-brand search3.2x2.4x65-80%
Google brand search12.0x1.8x10-20%
Google Shopping (non-brand)4.0x3.0x70-80%
TikTok prospecting1.8x1.5x75-90%
Klaviyo flows (post-purchase)30.0x6.0x15-25%
Affiliate / coupon sites8.0x1.2x10-20%

The two rows that surprise teams the most are brand search and post-purchase email flows. Both look like top performers in dashboards because they're attached to high-intent moments — but the incremental share shows most of that revenue was coming anyway. Reallocating even 30% of that spend toward prospecting usually pays for itself within a quarter.

Running your first incrementality test

Start with the channel where you suspect the biggest gap between reported and real performance — usually brand search, retargeting, or a long-running affiliate program. Pick a test window of 4-6 weeks, long enough to cover purchase-cycle latency for your category but short enough that seasonality doesn't swamp the signal.

Pre-register the hypothesis and the success threshold before you start. Decide upfront what incremental ROAS would make you cut, hold, or scale the channel. Post-hoc rationalisation is the most common way these tests fail to change decisions — the number comes back ambiguous and everyone defaults to the dashboard they already trust.

What good looks like

A mature measurement stack runs a rolling calendar of one always-on geo holdout (for a brand-spend channel) plus one or two platform conversion lift studies per quarter. The cadence keeps incrementality numbers fresh enough to inform budget rebalances every 90 days, not once a year.

Frequently asked

Frequently asked questions

Attribution divides credit across touchpoints among customers who already converted. Incrementality asks whether those conversions would have happened at all without the channel. Attribution tells you the customer's path; incrementality tells you which steps on that path were actually causing the purchase.

Four to six weeks is the typical window. You need at least one full purchase cycle for your category — longer for considered purchases like furniture, shorter for impulse categories like beauty. Going past six weeks introduces seasonality and competitor noise that erode the signal.

Yes. Meta, Google Ads, and TikTok all support self-serve conversion lift studies inside their platforms, though minimum spend thresholds apply (usually €30-50k across the test window). You configure the holdout audience, the platform handles randomisation and significance testing.

For platform-native lift studies, roughly €1k/day during the test for a 2-4 week window. For geo holdouts, the constraint is matched-region revenue rather than spend — you need 8-10 paired regions each doing meaningful weekly volume to hit statistical power on a 10-15% lift.

People searching your brand name are already convinced; they would have found you organically. The paid ad just intercepts a click that would have been free. Geo holdouts on brand keywords consistently show 80-95% of revenue is non-incremental — meaning Google is charging you for traffic you already earned.

Mostly yes for the lift number itself — the randomisation is real and the methodology is sound. Be skeptical of the audience definitions: platforms test on their best-performing audiences, which biases lift upward. Cross-check with a geo holdout once a year to validate.

Match on baseline revenue trend, demographic similarity, and channel mix exposure over the prior 12 weeks. Tools like GeoLift (Meta's open-source library) automate the matching. Avoid pairing regions with very different climates, urban/rural splits, or competitor presence.

A switchback alternates a channel on and off nationally — say, two weeks on, two weeks off, repeated four times. It's a poor substitute for geo holdouts because seasonality contaminates results, but it's the only option for brands without enough geographic spread to run matched pairs.

Quarterly for high-spend channels, annually for everything else. Creative fatigue, audience saturation, and competitor activity shift incrementality over time — a channel that was 75% incremental in Q1 can drop to 50% by Q4 as your retargeting pool overlaps more with prospecting.

Yes, and you should. Randomise a 10-20% holdout from each campaign send or flow trigger and compare conversion rates. Post-purchase flows and win-back campaigns often show much lower incrementality than Klaviyo reports — those customers were coming back anyway.

Track CAC, channels, and funnel conversion in one place

Metricuno connects ad spend, funnel events, and revenue so you can see CAC by channel, cohort, and campaign — without stitching together five tools.