Finding what drives repeat purchases at a regional retailer

Illustrative case

This is an illustrative composite, not a real client engagement. The company, the numbers, and the findings are invented to show how a typical data-science project unfolds and what it produces.

A regional retailer with a few dozen stores wanted to grow repeat business but wasn't sure what actually drove it. The marketing team had theories — loyalty emails, discounts, store location — and a spreadsheet of sales and customer data nobody had analysed beyond monthly totals. The question they brought was simple to state and surprisingly hard to answer well: what should we change to get more customers coming back?

The question behind the question

The first job was to turn a vague goal into a measurable one. "Repeat purchases" became a specific outcome in the data: how many times a customer bought again within a year. That mattered because the rest of the work — which factors relate to that outcome — only makes sense once the outcome itself is defined precisely. We also separated the factors the business could actually change (price, delivery speed, promotions) from ones it couldn't (a customer's home region), since only the first kind is useful for deciding what to do.

How we approached it

We started by exploring the data plainly — distributions, missing values, obvious errors — before modelling anything, because a model built on messy or misunderstood data produces confident nonsense. With the data cleaned, we built a straightforward model that estimates how strongly each factor is associated with repeat purchases, and — just as importantly — how uncertain each of those estimates is. The aim was never a single magic number. It was a ranked, honest picture: strong signals, weak signals, and signals too uncertain to act on.

What emerged

In this illustrative case, delivery speed turned out to be the factor most strongly associated with repeat purchases — faster delivery, more repeat business — with a fairly tight range around the estimate, meaning the data was reasonably sure. Price was moderately associated, but with a wider range; the relationship was real but less certain. Home region, one of the team's favourite theories, turned out to be too uncertain to call: the data simply couldn't separate its effect from noise. That last finding was as valuable as the first, because it stopped the team from building a regional strategy on what amounted to a coin flip.

The honest-uncertainty part

A weaker analysis would have reported three tidy percentages and let the team treat them as equally solid. We reported ranges instead, and said plainly which findings were trustworthy and which weren't. We were also explicit about the limit every analysis of this kind shares: these are associations, not proven causes. Faster delivery being linked to repeat purchases doesn't guarantee that speeding up delivery will cause more of them — but it is exactly the kind of well-supported lead worth testing with a small controlled trial before committing budget.

What the retailer received

The deliverable was a short written findings memo a non-specialist could act on, a forecast they could update as new data arrived, and a simple dashboard for tracking repeat-purchase rates over time. The practical upshot: invest first in the delivery-speed improvements (strongest evidence), test the pricing change rather than assume it (real but less certain), and drop the regional campaign idea (no support in the data). Clear, honest, and useful — which is the whole point.

This case is an illustrative composite for explanation only and does not describe a real client, engagement, or result. Specific findings depend entirely on your own data and context. To discuss your situation, get in touch.