Product Experiments

Metricuno

May 18, 2026

4 min read

Quick answer

Product experiments test functional changes — new features, new flows, or removed features — rather than visual tweaks. They run longer, carry higher stakes, and need careful sizing.

Definition

Experimentation

Product Experiments

A/B tests on functional product changes — new features, new flows, or removals — rather than purely visual edits.

A product experiment is an A/B test where the variant changes how the product actually works, not just how it looks. Adding a wishlist, replacing a multi-step checkout with a single page, removing a filter from a collection grid, or launching a subscribe-and-save option all count. Visual or copy tweaks do not.

Because the behavioural change is deeper, product experiments take longer to read, need more traffic to reach significance, and carry larger downside if rolled out wrong. They sit inside the broader practice of feature experimentation and are the highest-stakes test type most online stores will run.

Also known as

functional A/B tests

feature tests

product A/B tests

The line between a product experiment and a UX experiment is the layer of the stack being changed. Swapping a hero image or rewriting a button is UX. Changing what the button does — what flow it triggers, what data it submits, what the customer can now do that they could not before — is product.

On Shopify and WooCommerce stores, the most common product experiments are checkout-flow changes, cross-sell mechanics, account/guest logic, and PDP feature additions like size finders or bundle builders. Each one moves a behavioural metric (add-to-cart rate, checkout completion, repeat purchase) rather than a click-through rate.

Formula

n = 16 * p * (1 - p) / MDE^2

Variables

Sample size per variant

Visitors needed per arm to detect the effect at 80% power, 95% confidence.

Baseline conversion rate

Current conversion rate on the metric you are testing (as a decimal).

MDE

Minimum detectable effect

Smallest absolute lift you want to be able to detect (as a decimal).

Worked example

An apparel Shopify store wants to test a one-page checkout against the default multi-step flow. Baseline checkout completion is 62%, and the team wants to detect a 2 percentage-point lift.

Baseline conversion (p): 0.62

Minimum detectable effect (MDE): 0.02

→ ≈ 9,400 visitors per variant

At ~30k weekly checkout starts split 50/50, the test reads in about ten days. Smaller MDEs balloon the requirement quickly — dropping to a 1pt MDE pushes the same test past 37k per arm.

Runtime is where product experiments diverge most sharply from visual tests. Behaviour like repeat purchase or subscription retention takes weeks of post-exposure observation, so even high-traffic stores commonly run product experiments for 3-6 weeks rather than the 10-14 days that suffice for PDP visual tests.

Benchmark

Typical runtimes for product experiments by test scope (mid-traffic Shopify store, ~50k weekly sessions)

Test scope	Primary metric	Typical MDE	Runtime
One-page vs multi-step checkout	Checkout completion	1.5-2 pts	2-3 weeks
New PDP feature (size finder, bundle)	Add-to-cart rate	2-3 pts	2-4 weeks
Guest vs forced-account checkout	Order completion	1-2 pts	3-4 weeks
Subscribe-and-save launch	Repeat purchase 60d	1-1.5 pts	6-10 weeks
Removing a filter or sort option	Collection → PDP CTR	3-4 pts	1-2 weeks
New post-purchase upsell	AOV	€2-4	3-5 weeks

Use the table as a sanity check before launching. If your traffic cannot support the runtime a test requires, either widen the MDE (accept that only bigger wins will register), narrow the audience to a higher-converting segment, or pick a different test. Underpowered product experiments are worse than no test at all — they ship false negatives and bury real wins.

Frequently asked

Product experiments FAQ

UX experiments change how something looks or reads — colours, copy, layout. Product experiments change what the product does — new features, new flows, removed functionality. The behavioural change is deeper, so runtimes and traffic requirements are larger.

Feature experimentation is the umbrella discipline: shipping product changes behind a flag and measuring impact. Product experiments are the customer-facing tests inside that practice — specifically the ones that compare a functional variant against a control on a conversion or revenue metric.

Three to six weeks is typical, depending on the metric. Checkout-completion tests can read in 2-3 weeks; repeat-purchase or subscription tests often need 8-12 weeks to capture enough post-exposure behaviour. Always run for whole business-week cycles.

Usually yes — product experiments touch the cart, checkout, or account logic, which lives in your theme or app code. Some platforms ship feature-flag SDKs and Shopify plugins that let non-developers toggle variants, but the variant itself still has to be built.

1.5-2 percentage points on checkout completion is realistic for most mid-traffic stores. Anything tighter than 1pt pushes sample-size requirements past what a typical €1M-€15M Shopify store can deliver in a reasonable window.

Yes — removals are some of the highest-leverage product experiments. If a filter, upsell, or step in the flow isn't pulling its weight, an A/B test that hides it for 50% of traffic tells you definitively whether it's adding value or just noise.

Tag the variant assignment as a user property in GA4 and your data warehouse before launch, then sanity-check that conversion events fire identically across arms in the first 48 hours. Many product experiments fail QA, not statistics.

Yes, if they touch different parts of the funnel and you accept some interaction risk. A checkout test and a PDP feature test can run in parallel. Two checkout tests should not — assign one to a holdout instead.

Subscription metrics convert at low base rates (often 2-5%), so per-variant sample sizes of 15k-30k visitors are common. Most stores read these tests over 6-10 weeks rather than days.

A flat read is real information: the change you tested did not move the metric enough to matter. Ship the simpler variant, document the learning, and move on. The cost of leaving a no-impact feature live is operational complexity, not just opportunity cost.

Test ideas before you ship them

Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.

Launch your first experiment

Product Experiments

Product Experiments

Typical runtimes for product experiments by test scope (mid-traffic Shopify store, ~50k weekly sessions)

Product experiments FAQ

How are product experiments different from UX experiments?

How do product experiments relate to feature experimentation?

How long should a product experiment run?

Do I need a developer to run product experiments?

What's a realistic minimum detectable effect for a checkout test?

Should I test removing a feature?

How do I avoid breaking analytics during a product experiment?

Can I run multiple product experiments at once?

What sample size do I need for a subscription product experiment?

What happens if my product experiment shows no significant result?

Test ideas before you ship them