Learning Systems

Metricuno

May 18, 2026

4 min read

Quick answer

A learning system is the org muscle that captures what each experiment taught, so your CRO program compounds knowledge instead of re-running the same test.

Definition

Experimentation

Learning System

The process and tooling that captures what each experiment taught — including losing tests — so the team compounds insight instead of repeating it.

A learning system is the operational discipline that turns individual A/B tests into durable, searchable knowledge for the whole team. It covers how hypotheses are written, how results are documented, where insights live, and how past learnings feed the next test backlog.

Most programs treat tests as one-off projects: ship a variant, declare a winner, move on. A learning system treats each test as an evidence event — winners, losers, and inconclusive runs all contribute. Done well, it cuts repeat tests, sharpens hypothesis quality, and prevents the same lesson being relearned every time a new optimiser joins.

Also known as

experiment knowledge base

insight repository

test learning loop

Learning compounds only when it's written down in a format the next person can search. A Slack message, a closed Notion doc, or a screenshot in someone's Drive does not count. The artefact has to survive team turnover and be discoverable in under a minute.

A working learning system has four parts: a hypothesis template that forces a falsifiable claim, a result record that captures lift, confidence, segment splits, and qualitative observations, a tagging schema (page, audience, principle), and a review cadence where past results inform the next backlog. Miss any one and the system leaks.

Formula

Knowledge Yield = (Documented Insights / Tests Run) × Reuse Rate

Variables

Documented Insights

Documented insights

Number of tests in the period with a complete result record (hypothesis, outcome, segment notes, tags).

Tests Run

Tests run

Total experiments completed in the period, including inconclusive and losing tests.

Reuse Rate

Reuse rate

Share of new hypotheses in the next quarter that explicitly cite a prior documented insight.

Worked example

A Shopify apparel brand ran 24 tests last quarter on its PDP and checkout. 18 had complete result records; the other 6 were declared and forgotten. Of the 20 hypotheses written for the next quarter, 9 cited a past learning.

Documented Insights: 18

Tests Run: 24

Reuse Rate: 9 / 20 = 0.45

→ 0.75 × 0.45 = 0.34

A knowledge yield of 0.34 is mid-range. Above 0.5 indicates a program where past learning actively shapes future tests; below 0.2 means the team is effectively starting from scratch each quarter.

The metric only matters because it forces two behaviours: documenting losers (which raises the numerator) and citing prior work in new hypotheses (which raises reuse). Teams that track it for a quarter usually find they were over-counting wins and under-counting institutional memory.

Benchmark

Learning-system maturity by program stage

Maturity stage	Tests documented	Avg. time to find a past result	Repeat-test rate
Ad-hoc (no system)	20-40%	20+ minutes or never found	30-45%
Spreadsheet log	50-70%	5-10 minutes	15-25%
Tagged repository	80-90%	Under 2 minutes	5-10%
Insight-driven backlog	95%+	Under 1 minute	Under 5%

The jump from spreadsheet to tagged repository is where most teams stall. The fix is rarely better software — it's assigning one person to own the schema and enforcing a 'no merge without a result record' rule on the experimentation strategy ritual itself.

Frequently asked

Learning systems FAQ

Losers tell you which levers don't move the needle on a given page or audience — that's the more valuable half of the data. A documented loss prevents a teammate from re-running essentially the same idea six months later and burning two weeks of traffic on a known dead end.

It's the feedback loop inside the strategy. Your experimentation strategy sets what to test and why; the learning system records what you found and feeds it back into prioritisation. Without it, the strategy degrades into a backlog of guesses within a quarter or two.

Hypothesis statement, primary metric and result with confidence level, segment splits where relevant, two or three sentences on what you think happened, and tags for page type, audience, and design principle. Anything less and the record won't be searchable; anything more rarely gets filled in.

All three work if you enforce structure. A spreadsheet with required columns beats a beautiful Notion page that nobody fills in. Dedicated tools earn their cost once you cross roughly 30 tests a year and need filtering by tag, audience, or page.

One named person — usually the CRO lead or senior optimiser. Shared ownership becomes no ownership. Their job isn't to write every record but to enforce the schema and run the monthly review where past learnings inform next month's backlog.

Tag every insight with the date and the page version it was tested on. When the page is redesigned, mark linked insights as 'context changed'. A learning from a 2022 checkout flow may not apply to your 2024 Shop Pay-enabled flow, and the system should make that obvious.

Monthly for active programs (10+ tests per quarter), quarterly for slower ones. The review should produce three to five candidate hypotheses for the next sprint, each citing a specific prior result. If a review generates zero new hypotheses, your tagging schema is too coarse.

Session recordings, heatmaps, and survey quotes belong in the same record as the quantitative result. They explain why a variant won or lost and travel with the insight when someone searches it later. A test record with only a lift number is half a learning.

Document them with the same rigour as winners and losers. Note the observed effect size, why you couldn't reach significance (low traffic, short run, noisy metric), and what conditions would make the test worth repeating. Inconclusive is information, not failure.

Roughly one quarter for behavioural change (people start citing past tests in standups) and two to three quarters for measurable impact on win rate and velocity. Programs that stick with it typically see hypothesis quality improve before raw win rate does.

Test ideas before you ship them

Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.

Launch your first experiment

Learning Systems

Learning System

Learning-system maturity by program stage

Learning systems FAQ

Why document losing tests at all?

Where does a learning system fit in our experimentation strategy?

What's the minimum viable format for documenting a test?

Spreadsheet, Notion, or a dedicated tool?

Who owns the learning system?

How do you stop records from going stale?

What's a good cadence for reviewing past learnings?

How does qualitative data fit in?

How do you handle inconclusive tests?

How long before a learning system pays off?

Test ideas before you ship them