Product Experiments
Product experiments test functional changes — new features, new flows, or removed features — rather than visual tweaks. They run longer, carry higher stakes, and need careful sizing.
Product Experiments
A/B tests on functional product changes — new features, new flows, or removals — rather than purely visual edits.
A product experiment is an A/B test where the variant changes how the product actually works, not just how it looks. Adding a wishlist, replacing a multi-step checkout with a single page, removing a filter from a collection grid, or launching a subscribe-and-save option all count. Visual or copy tweaks do not.
Because the behavioural change is deeper, product experiments take longer to read, need more traffic to reach significance, and carry larger downside if rolled out wrong. They sit inside the broader practice of feature experimentation and are the highest-stakes test type most online stores will run.
The line between a product experiment and a UX experiment is the layer of the stack being changed. Swapping a hero image or rewriting a button is UX. Changing what the button does — what flow it triggers, what data it submits, what the customer can now do that they could not before — is product.
On Shopify and WooCommerce stores, the most common product experiments are checkout-flow changes, cross-sell mechanics, account/guest logic, and PDP feature additions like size finders or bundle builders. Each one moves a behavioural metric (add-to-cart rate, checkout completion, repeat purchase) rather than a click-through rate.
n = 16 * p * (1 - p) / MDE^2
n
Sample size per variant
Visitors needed per arm to detect the effect at 80% power, 95% confidence.
p
Baseline conversion rate
Current conversion rate on the metric you are testing (as a decimal).
MDE
Minimum detectable effect
Smallest absolute lift you want to be able to detect (as a decimal).
An apparel Shopify store wants to test a one-page checkout against the default multi-step flow. Baseline checkout completion is 62%, and the team wants to detect a 2 percentage-point lift.
Baseline conversion (p): 0.62
Minimum detectable effect (MDE): 0.02
→ ≈ 9,400 visitors per variant
At ~30k weekly checkout starts split 50/50, the test reads in about ten days. Smaller MDEs balloon the requirement quickly — dropping to a 1pt MDE pushes the same test past 37k per arm.
Runtime is where product experiments diverge most sharply from visual tests. Behaviour like repeat purchase or subscription retention takes weeks of post-exposure observation, so even high-traffic stores commonly run product experiments for 3-6 weeks rather than the 10-14 days that suffice for PDP visual tests.
Typical runtimes for product experiments by test scope (mid-traffic Shopify store, ~50k weekly sessions)
| Test scope | Primary metric | Typical MDE | Runtime |
|---|---|---|---|
| One-page vs multi-step checkout | Checkout completion | 1.5-2 pts | 2-3 weeks |
| New PDP feature (size finder, bundle) | Add-to-cart rate | 2-3 pts | 2-4 weeks |
| Guest vs forced-account checkout | Order completion | 1-2 pts | 3-4 weeks |
| Subscribe-and-save launch | Repeat purchase 60d | 1-1.5 pts | 6-10 weeks |
| Removing a filter or sort option | Collection → PDP CTR | 3-4 pts | 1-2 weeks |
| New post-purchase upsell | AOV | €2-4 | 3-5 weeks |
Use the table as a sanity check before launching. If your traffic cannot support the runtime a test requires, either widen the MDE (accept that only bigger wins will register), narrow the audience to a higher-converting segment, or pick a different test. Underpowered product experiments are worse than no test at all — they ship false negatives and bury real wins.
Product experiments FAQ
UX experiments change how something looks or reads — colours, copy, layout. Product experiments change what the product does — new features, new flows, removed functionality. The behavioural change is deeper, so runtimes and traffic requirements are larger.
Feature experimentation is the umbrella discipline: shipping product changes behind a flag and measuring impact. Product experiments are the customer-facing tests inside that practice — specifically the ones that compare a functional variant against a control on a conversion or revenue metric.
Three to six weeks is typical, depending on the metric. Checkout-completion tests can read in 2-3 weeks; repeat-purchase or subscription tests often need 8-12 weeks to capture enough post-exposure behaviour. Always run for whole business-week cycles.
Usually yes — product experiments touch the cart, checkout, or account logic, which lives in your theme or app code. Some platforms ship feature-flag SDKs and Shopify plugins that let non-developers toggle variants, but the variant itself still has to be built.
1.5-2 percentage points on checkout completion is realistic for most mid-traffic stores. Anything tighter than 1pt pushes sample-size requirements past what a typical €1M-€15M Shopify store can deliver in a reasonable window.
Yes — removals are some of the highest-leverage product experiments. If a filter, upsell, or step in the flow isn't pulling its weight, an A/B test that hides it for 50% of traffic tells you definitively whether it's adding value or just noise.
Tag the variant assignment as a user property in GA4 and your data warehouse before launch, then sanity-check that conversion events fire identically across arms in the first 48 hours. Many product experiments fail QA, not statistics.
Yes, if they touch different parts of the funnel and you accept some interaction risk. A checkout test and a PDP feature test can run in parallel. Two checkout tests should not — assign one to a holdout instead.
Subscription metrics convert at low base rates (often 2-5%), so per-variant sample sizes of 15k-30k visitors are common. Most stores read these tests over 6-10 weeks rather than days.
A flat read is real information: the change you tested did not move the metric enough to matter. Ship the simpler variant, document the learning, and move on. The cost of leaving a no-impact feature live is operational complexity, not just opportunity cost.
Test ideas before you ship them
Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.