Feature Flags

Metricuno
May 18, 2026
5 min read
Quick answer

Feature flags are boolean toggles that release code to a subset of users — the infrastructure behind canary releases, progressive rollouts, and product experiments.

Definition
Experimentation infrastructure

Feature Flags

Boolean toggles in code that gate features so they can be released, tested, or rolled back for specific users without redeploying.

A feature flag (also called a feature toggle) is a conditional in your codebase that decides — at request time — whether a given user sees the new code path or the old one. The decision is made by a flag service based on the user's ID, segment, geography, or a random bucket, which means you can ship code to production but only expose it to 1%, 10%, or 100% of traffic as confidence grows.

Flags are the plumbing underneath three different jobs: progressive rollouts (de-risking releases), canary deployments (catching regressions on a small slice), and product A/B testing (measuring the impact of a variant on revenue, conversion, or AOV). On a Shopify or headless storefront, that often means toggling a new checkout flow, a redesigned PDP, or a recommendation widget without a full redeploy.

Also known as
Feature toggles
Release flags
Release toggles

The core appeal is decoupling deploy from release. Code can sit in production behind an "off" flag for weeks while QA, product, and growth teams decide when and to whom it goes live. If something breaks, you flip the flag off in seconds — no rollback, no hotfix branch, no Friday-night deploy.

For online stores, flags also unlock the cleanest path to running real experiments. Instead of pushing variant B to every visitor and praying, you assign 50% of sessions to each variant via the flag SDK, log the assignment, and let the analytics layer attribute revenue and conversion back to the bucket. This is the substrate underneath every serious feature experimentation programme.

Formula

Exposed Users = Eligible Users × Rollout %

Variables

Eligible Users

Eligible audience

Visitors or sessions that match the flag's targeting rules (e.g. logged-in shoppers in the EU on mobile).

Rollout %

Rollout percentage

The share of the eligible audience the flag is currently turned on for, between 0% and 100%.

Exposed Users

Exposed users

The number of users who will actually see the gated feature in a given window.

Worked example

A Shopify apparel store is rolling out a redesigned product page to mobile shoppers in France. Eligible mobile FR traffic is 80,000 sessions per week. The team starts the flag at 10% to canary the redesign before scaling.

Eligible Users (weekly mobile FR sessions): 80000

Rollout %: 10%

8,000 exposed sessions per week

8,000 sessions is enough to surface obvious regressions (checkout errors, layout breaks, latency spikes) within 24-48 hours without risking the other 72,000 sessions if the variant underperforms.

Most teams ramp rollouts in stages — 1%, 5%, 25%, 50%, 100% — with a holdback or kill criterion at each step. The pattern you pick depends on what you're protecting: regression risk, statistical power, or both.

Benchmark

Common feature flag rollout patterns and when to use each

PatternTypical rampPrimary goalBest for
Canary release1% → 5% → 25% → 100% over 1-3 daysCatch regressions earlyBackend changes, checkout refactors, payment integrations
Progressive rollout10% → 50% → 100% over 1-2 weeksDe-risk while monitoring KPIsTheme updates, new PDP layouts, navigation changes
A/B test (50/50)Hold at 50/50 for 2-4 weeksMeasure causal impact on conversionPricing display, CTA copy, recommendation widgets
Holdback experiment95% on, 5% permanent holdoutMeasure long-term liftLoyalty programmes, post-purchase upsells
Targeted release100% to one segment, 0% to othersTest with a safe cohort firstBeta features for VIP customers, geo-specific launches
Kill switchAlways on, flip to 0% on incidentInstant rollback without redeployThird-party scripts, recommendation engines, search providers

The thing flags don't give you for free is measurement. A flag tells you who saw what; it doesn't tell you whether the variant moved revenue, AOV, or checkout completion. That's where the flag SDK has to log assignments into your analytics layer so conversions can be attributed back to the bucket — otherwise you're shipping safely but learning nothing.

Frequently asked

Feature flags FAQ

A feature flag is the mechanism — the boolean that decides which code path a user hits. An A/B test is one job that mechanism can do: hold the flag at 50/50, log assignments, and measure conversion. Every A/B test needs a flag, but not every flag is a test (rollouts and kill switches use the same plumbing without measuring lift).

A well-implemented flag SDK adds 5-30ms to the first decision and caches afterwards, so the user-facing impact is usually negligible. The risk is loading a heavy third-party SDK on every page when you only need flags on the PDP and checkout — scope the script to the pages that actually evaluate flags.

Feature experimentation is the broader practice of testing product changes to measure impact; feature flags are the infrastructure that makes it possible. Without flags you can still A/B test cosmetic changes via a tag manager, but anything that touches the cart, checkout, or backend logic needs a flag to be controlled safely.

For surface-level changes (copy, colours, banners) — yes, via a visual editor on top of a flag service. For anything touching cart logic, checkout flows, or backend integrations, you'll need a developer to wrap the relevant code paths in flag checks the first time. After that, toggling and ramping is a non-dev action.

A canary release exposes a new feature to a tiny slice (1-5%) of traffic first to catch regressions before broader rollout. It's named after the canary in a coal mine — if the small group's metrics or error rates degrade, you flip the flag off before the rest of your traffic is affected.

Rollout and canary flags should be removed within 2-4 weeks of hitting 100% — leaving them creates technical debt and dead code paths. A/B test flags stay for the test duration plus a holdback period if you want long-term lift data. Permanent flags (kill switches, segment toggles) are fine to keep indefinitely if they're documented.

A dark launch ships code to production but doesn't expose the new UI to anyone — it runs server-side to test performance, capacity, or backend correctness. Feature flags are the mechanism that makes dark launches possible: the flag is on for execution but off for the visible interface.

Yes, but you need to evaluate the flag on the server before rendering and pass the assignment into the client so the SDK doesn't re-bucket the user. Mismatches between server and client decisions cause flicker (the wrong variant briefly shows) and break experiment integrity — most flag SDKs ship a server adapter to handle this cleanly.

The flag SDK should fire an assignment event (user X saw variant B) into your analytics layer at the moment of exposure. When that user converts, the conversion is joined to the assignment on user or session ID, giving you variant-level revenue, AOV, and conversion rate. Without the assignment event you'll never close the loop.

Yes, but with more guardrails than on PDPs. Always keep a kill switch ready, ramp slowly (1% → 5% → 25%), and monitor checkout completion rate and payment errors in real time. Anything that touches tax, shipping, or payment logic should be reviewed and ideally rolled out by geography first so a regression doesn't take down global revenue.

Test ideas before you ship them

Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.