How to use Return Reason Analysis

Metricuno
May 25, 2026
7 min read
Quick answer

A practical framework for coding return-reason data, joining it to PDP and traffic-source behavior, and converting the output into a prioritized experimentation backlog.

Definition
Post-purchase analytics

Return Reason Analysis

A structured method for coding, joining, and prioritizing return-reason data so it drives PDP and merchandising fixes.

Return reason analysis is the process of turning raw refund data — dropdown selections, free-text comments, RMA notes, support tickets — into a coded dataset that ties each return to a specific SKU, product detail page, and acquisition source. The output is a ranked list of problems worth solving, not a spreadsheet of complaints.

Done well, it sits between returns operations and CRO. Ops cares about the cost per return. CRO cares about the upstream cause — a misleading hero image, a missing size chart, a paid social audience converting on the wrong expectation. A good framework gives both teams the same source of truth and turns refund volume into a prioritized research backlog.

Also known as
Refund reason coding
Return categorization
Post-purchase failure analysis

Most stores already collect return reasons. Few do anything with them. The dropdown gets exported to a CSV once a quarter, the merchandiser scans it, and the data dies there — disconnected from the PDP, the ad creative, and the traffic source that drove the purchase in the first place.

The framework below is what we recommend for stores doing €1M–€15M and processing 200+ returns per month. It assumes you have a Shopify, WooCommerce, or Magento backend, a returns platform (Loop, Returnly, AfterShip) writing back reason codes, and at least 90 days of order history to work with.

Step 1: Code free-text reasons into a clean taxonomy

The dropdown your returns portal offers is rarely good enough. "Didn't fit" is four different problems: cut runs small, photo misrepresented the silhouette, the size chart was wrong, or the customer ordered between sizes. Each one points to a different fix.

Build a two-level taxonomy. The top level is the operational bucket your CX team already uses (Fit, Quality, Expectation Mismatch, Damaged in Transit, Changed Mind, Wrong Item Shipped). The second level is the CRO-actionable cause — "runs small", "color off vs PDP image", "fabric thinner than described", "size chart inaccurate".

Code the free-text field into the second level. For 200 returns/month an analyst can do this manually in an hour; above 1,000/month, use an LLM with a fixed label set and spot-check 10% of assignments. Keep the taxonomy stable for at least two quarters — recoding mid-stream destroys trend data.

Don't let the returns portal define your taxonomy

Loop, Returnly and the native Shopify returns flow ship with generic reason lists optimized for warehouse routing, not CRO. Treat their codes as raw input and code your own second level on top. If you build dashboards on the portal's labels directly, you'll never see the difference between "photo lied" and "size chart lied".

Step 2: Join returns to SKU, PDP, and traffic source

Coded reasons on their own tell you what's wrong. Joining them to the rest of your data tells you where to fix it. The minimum join keys are SKU (or variant ID), the PDP URL the customer landed on, the device, and the acquisition source of the original session.

This is where behavioral analytics earns its keep. If you can reconstruct the session that led to the purchase — which PDP images were viewed, whether the size chart was opened, scroll depth on the description — you can separate "customer didn't read" from "information wasn't there to read". Without that layer, every return looks like a customer problem.

Chart

Return rate by acquisition source — apparel store, 90 days

0%10%20%30%40%DirectOrganic searchEmailGoogle ShoppingMeta — retargetingMeta — prospectingTikTok prospectingReturn rateAcquisition source

The chart above is the pattern we see repeatedly: top-of-funnel paid social drives returns at 2-3× the rate of intent-led channels. The reason codes underneath that volume are usually "didn't look like the video" or "expected different quality" — symptoms of creative-led demand the PDP isn't reinforcing.

Step 3: Prioritize by refund value, not return count

Once the data is coded and joined, the temptation is to fix whatever has the most returns. That's the wrong sort. Rank reason × SKU combinations by total refund value (returns × AOV × refund rate), then by addressability — "size chart inaccurate" is a one-hour fix; "fabric quality" needs a supplier conversation.

A useful cut: filter to reasons that account for at least 5% of refunded revenue on a SKU doing at least 50 units/month. Below those thresholds you're optimizing noise. The table below shows a typical 90-day prioritization for an apparel brand.

Benchmark

Prioritized return-reason backlog — apparel store, 90 days

Reason × SKUReturnsRefund value (€)Likely fixEffort
Runs small — Linen Trousers14211,360Update size chart + add fit noteLow
Color off vs PDP — Sage Knit987,840Reshoot product photographyMedium
Thinner than expected — Tee 3-pack764,560Add GSM + drape videoLow
Didn't match TikTok video — Cargo Pant615,490Add UGC to PDP, brief paid teamMedium
Size chart inaccurate — Wrap Dress443,960Remeasure + republish chartLow
Quality below expectation — Cashmere Scarf295,800Supplier reviewHigh

Notice the cashmere scarf: highest refund per unit, lowest count, hardest fix. It belongs on a separate roadmap (sourcing) rather than the CRO backlog. Mixing operational fixes with experiment ideas is the most common reason these projects stall — keep the two lists separate from day one.

Step 4: Convert the backlog into research questions

The final step is the bridge to experimentation. Each prioritized row becomes a research question, not a solution. "Runs small — Linen Trousers" becomes: does adding a fit note above the size selector reduce the return rate without suppressing conversion? Now it's testable.

From there it feeds directly into your refund reduction levers — the menu of PDP, merchandising, and post-purchase interventions you can A/B test. Return reason analysis is the diagnostic; the levers are the treatment. Run them in that order or you'll be testing solutions to problems you haven't confirmed.

What a healthy cadence looks like

Code returns weekly (or stream into your warehouse), review the prioritized backlog monthly with merchandising and CRO in the same room, and ship two PDP-level fixes per month against the top reasons. Revisit the taxonomy quarterly — only revise if a new reason is consistently landing in "Other" above 10% of volume.

Frequently asked

Frequently asked questions

Roughly 150 coded returns gives you enough volume to spot patterns at the reason × SKU level. Below that, stick to weekly qualitative reads — read every return comment yourself and look for repeats. The structured framework starts paying off around 200-300 returns per month.

Use the portal's dropdown for the warehouse — it's optimized for routing. Build your own second-level taxonomy in your data warehouse or analytics tool for CRO. The two coexist: the customer picks "didn't fit" in Loop, your coding layer translates it into "runs small", "runs large", or "between sizes" based on the free-text comment.

Yes, and you should once volume passes ~1,000 returns/month. Give the model a fixed label set, a few examples per label, and a confidence threshold below which the case is flagged for human review. Audit 10% of assignments weekly for the first month, then 5% thereafter.

Return reason analysis is the methodology — how you code and join the data. Refund drivers are the categories of root cause it reveals (fit, expectation mismatch, quality, logistics). One is the process, the other is the output. Most teams jump to driver-level conclusions without doing the coding work, which is why their fixes don't stick.

Shopify stores the landing-page session attribution on the order. Join the return record to the order ID, then to the original session in your analytics tool (GA4, Metricuno, or a warehouse table). If you're losing attribution to iOS 17+ or cookie expiry, server-side tagging and a 90-day attribution window cover most of the gap.

Quarterly review, but only revise when a new reason consistently lands in "Other" above 10% of volume or when you launch a category that genuinely needs new codes (e.g. adding footwear when you only sold apparel before). Frequent recoding destroys trend comparability.

Often, yes. "Color off vs PDP" almost always coincides with a lower add-to-cart rate from organic and direct sessions on the same PDP — the photo problem is suppressing buyers, not just disappointing them. Joining return reasons to PDP behavioral analytics is how you catch this.

Look at whether the relevant on-page element was viewed in the original session — size chart open, GSM tab clicked, scroll depth past the spec block. If the customer scrolled past it and still returned for that reason, it's a clarity problem (rewrite, reposition). If they never saw it, it's a placement problem.

Depends entirely on category. Beauty and consumables sit at 3-8%, electronics 8-12%, general apparel 15-25%, fashion footwear 25-40%. The number to watch isn't the absolute rate — it's the trend within reason category, and the gap between your best and worst SKUs.

CX tickets surface pre-purchase friction and use-case confusion; return reasons surface post-purchase disappointment. They overlap maybe 30%. Run both, and look for reasons that show up in tickets before the purchase AND in returns after — those are your highest-confidence PDP fixes.

Get an AI expert review of your site

Paste your URL — Metricuno's AI runs the same heuristic checks a senior CRO consultant would, scoring your page and prioritising the fixes that'll move conversion fastest.