How to use Cohort Analysis

Metricuno
May 19, 2026
6 min read
Quick answer

Cohort analysis groups users by a shared starting point — acquisition week, first purchase, exposure to a variant — and tracks them forward to reveal whether an experiment actually changes long-term behavior.

Definition
Experiment Analysis

Cohort Analysis

Grouping users by a shared starting event and tracking their behavior over time to measure long-term impact.

Cohort analysis groups users by a temporal marker they share — the week they first landed, the day they made their first purchase, or the moment they were exposed to a test variant — and then follows that group forward through subsequent sessions, weeks, or months. Instead of asking "did this variant convert more visitors today?", it asks "do users who saw this variant come back, buy again, and spend more 30, 60, 90 days later?"

In experimentation, cohort analysis is the layer that catches novelty effects, sleeper wins, and tests that boost first-session conversion at the cost of repeat behavior. It turns a single point estimate into a curve.

Also known as
Cohort retention analysis
Time-cohort segmentation

Most A/B tests report a single number: conversion rate during the experiment window. That number tells you what happened in the session, but it says nothing about whether the visitor came back, how much they spent over six months, or whether the variant trained them to expect a discount.

Cohort analysis fixes that blind spot. It sits one level above standard experiment analysis — the parent discipline that interprets a test's lift — and extends the observation window from days to quarters. You stop measuring sessions and start measuring relationships.

Why cohorts matter for experimentation

A variant can win on Day 0 and lose on Day 60. The classic example is an aggressive discount banner: it lifts first-session conversion by 12%, but the cohort it acquires has 30% lower repeat-purchase rate because the price anchor is broken. Session-level reporting calls this a win. Cohort analysis calls it a leak.

The reverse also happens. A new product-page layout shows a flat conversion lift in week one, then the test-exposed cohort outpaces control by 8% in repeat purchases by week six. Without a cohort view you ship the wrong call — or worse, you kill a sleeper winner.

For Shopify and WooCommerce stores in the €1M–€15M band, where a single retained customer is worth three to five times the acquisition spend, cohort signals usually matter more than the headline lift. The test that protects retention beats the test that pumps the funnel.

Rule of thumb

If your average customer makes a second purchase within 45 days, your minimum cohort observation window is 60 days post-exposure. Anything shorter and you're guessing at retention.

How to build cohorts from an A/B test

The cleanest cohort definition for experimentation is exposure-based: every visitor assigned to variant A on the day they first see it forms the A cohort; variant B forms its mirror. From that anchor you track forward — sessions in week 2, orders in week 4, revenue in week 8 — using the original assignment, not subsequent behavior.

Other useful cohort axes layer on top: acquisition channel (paid social cohorts behave differently from organic), device, first-order AOV tier, and country. A test that wins overall but loses among mobile paid-social visitors is a different decision than a flat win — and you can only see that with cohort slicing.

Chart

Repeat-purchase rate by weeks since first order — variant vs control

0%5%10%15%20%25%30%Week 2Week 4Week 6Week 8Week 12Week 16Repeat-purchase rateWeeks since first order

Control

Variant (free-shipping threshold)

The shape above is the typical pattern: the variant pulls ahead early (a free-shipping threshold lifts AOV and confidence) but the curves converge by week 12 and cross by week 16. The session-level report would have shipped this variant on a Week-2 readout. The cohort view says hold.

What to measure inside each cohort

Four metrics carry most of the signal: retention rate (share of cohort active in week N), repeat-purchase rate, cumulative revenue per cohort member, and contribution margin per cohort member. Revenue alone hides margin-destroying behavior — a cohort that buys more but returns 18% of orders is not the cohort you want.

For apparel and beauty brands, returns are the silent killer. A variant that lifts AOV by surfacing larger sizes or bundle SKUs can simultaneously lift return rate by 4 points — net-negative once you factor reverse logistics. Cohort the return rate too.

Benchmark

Typical cohort metric ranges by vertical (Shopify stores, €1M–€15M revenue)

VerticalWeek-4 repeat rateWeek-12 repeat rateDay-90 LTV indexReturn rate
Apparel12–18%24–32%1.6×14–22%
Beauty / skincare18–26%38–48%2.1×4–7%
Home & lifestyle8–14%18–26%1.3×6–10%
Consumer electronics5–9%11–17%1.1×9–14%
Food & supplements28–40%52–65%2.6×2–4%

Use the table as a sanity check, not a target. If your beauty store shows a week-12 repeat rate of 19%, the gap to the 38–48% band is the real opportunity — and the right place to point your next experiment, not another homepage hero test.

Common pitfalls and how to avoid them

Mistake one: cohorting on outcome instead of exposure. If you define your cohort as "everyone who purchased during the test", you've selected on the dependent variable and your retention comparison is meaningless. Always anchor cohorts on assignment, not behavior.

Mistake two: mixing cohort sizes. If variant B got 40% of traffic for the first week and 50% thereafter, your week-1 cohort is structurally smaller and noisier. Either hold allocation constant or weight comparisons by cohort size. Mistake three: reading cohorts before they've matured — a 21-day-old cohort cannot tell you anything about 60-day retention.

Don't ship on Week 2 readouts for retention-sensitive tests

Pricing, shipping thresholds, subscription prompts, and account-creation flows all have delayed effects. For these test families, require at least one full repeat-purchase cycle of cohort data before calling the result. Otherwise you're optimising for the session and paying for it in LTV.

Frequently asked

Frequently asked questions

Segmentation slices users by an attribute (device, country, channel) at a moment in time. Cohort analysis slices by a shared starting event and tracks the group forward. Segments are static snapshots; cohorts are trajectories.

Anchor it to your repeat-purchase cycle. If most second orders happen within 45 days, observe for at least 60. For subscription or replenishment categories, 90–120 days is closer to the right window.

Standard experiment analysis answers "did the variant lift the primary metric during the test?" Cohort analysis extends that to "did the variant change behavior of the exposed cohort over time?" It's the long-tail layer on top of the headline result.

Yes, as long as you preserved the original assignment IDs. Pull every user assigned to each variant during the test window, then query their behavior in the months after. Historical GA4 data, if imported, makes this retroactive analysis straightforward.

At minimum, enough purchasers per variant that you'd expect 100+ repeat orders in the observation window. For most €1M–€15M stores that means 5,000–15,000 exposed visitors per variant, depending on baseline repeat rate.

Both, for different questions. Acquisition-date cohorts answer "is this traffic source getting better or worse?" First-purchase cohorts answer "are buyers behaving differently than they used to?" For experimentation, exposure-date cohorts are the primary axis.

Decide upfront whether the cohort is new buyers only or all exposed visitors. Mixing them dilutes the signal because returning customers have very different baseline behavior. Most experimentation teams cohort new and returning separately and compare each within its strata.

A sleeper effect is when a variant looks flat or slightly negative in the test window but pulls ahead in cohort behavior weeks later — typically because the change reduces friction in a step that only matters at repeat purchase. Account-creation flow changes are a frequent source.

No — it complements it. You still need significance on the primary metric. Cohort analysis adds directional evidence on retention and LTV, where strict significance is often impractical due to long windows and small cohort sizes.

Yes, and you should. Once your tracking captures variant assignment plus user_id, a standing cohort report can run automatically for every shipped test, posting 30/60/90-day readouts back to the experiment record. That's how retention regressions get caught before they compound.

Test ideas before you ship them

Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.