What is a product recommendation engine?

A product recommendation engine is a system that predicts which products a specific customer is most likely to purchase or find relevant, based on signals such as their purchase history, browsing behaviour, product attributes, and the behaviour of similar customers.

What is the difference between collaborative filtering and content-based filtering?

Collaborative filtering recommends products based on the preferences of similar users - it finds customers whose purchase history resembles yours and surfaces what they bought that you have not. Content-based filtering recommends products based on the attributes of items you have already purchased - if you bought a Sony WH-1000XM5, it recommends other wireless headphones with similar specifications.

What is the cold start problem in recommendation systems?

The cold start problem occurs when a recommendation system lacks sufficient data to make accurate recommendations. It manifests in two forms: new user cold start (no purchase history to base recommendations on) and new item cold start (a product has no purchase history so collaborative filtering cannot surface it). Solutions include popularity-based fallbacks, onboarding questionnaires, content-based filtering for new items, and hybrid approaches.

How do you evaluate a recommendation engine?

Recommendation engines are evaluated using offline metrics computed from held-out test data: Precision@K measures what fraction of the top-K recommendations were actually purchased; Recall@K measures what fraction of purchases were captured in the top-K; NDCG@K accounts for ranking position. Online metrics measured through A/B tests include click-through rate, add-to-cart rate, and revenue per session.

Ecommerce AI Series Built on TechHeavenIntermediateLive

Product Recommendation Engine

Match products to customers using purchase history, product similarity, and buying patterns

The Problem

Most customers never find the products they would have bought

TechHeaven carries 291 products across 40 categories. A customer visiting the site sees the same homepage, the same featured products, and the same promotions as every other customer - regardless of whether they are a first-time buyer looking at budget laptops or a returning customer who has purchased six audio products.

The discovery gap is measurable. In TechHeaven’s order data, the top 20 products account for a disproportionate share of revenue. The remaining 250 products sell primarily to customers who searched for them specifically. Customers who would have bought those products but did not know they existed represent recoverable revenue.

35%

of Amazon revenue

attributed to its recommendation engine

75%

of Netflix views

driven by personalised recommendations

5-15%

revenue uplift

typical range for well-tuned ecommerce recommendations

A recommendation engine does not need to be complex to generate value. TechHeaven already has the signals: 5,000 orders, 15,000 line items, and 5,000 reviews. The data to build a useful recommendation system is already there.

The Data

What TechHeaven already has

Every recommendation approach in this playbook is trained on data that a Bagisto store produces as a normal side-effect of operating. No additional data collection is needed.

Order items15,000 records

data/transactions/order_items.json

order_id, product_id, product_name, qty, price, category

Association rules and collaborative filtering - the primary signal for what customers bought together

Orders5,000 records

data/transactions/orders.json

order_id, customer_id, status, grand_total, order_date

Links order items to specific customers - essential for user-based collaborative filtering

Product catalog291 records

data/catalog/products.json

name, brand, category, price, description, specs

Content-based filtering - product attributes define what makes two products similar

Reviews5,000 records

data/reviews/reviews.json

product_id, customer_id, rating (1-5)

Explicit feedback signal - ratings provide a cleaner preference signal than purchase alone

The Approaches

Four ways to recommend products

No single algorithm dominates across all situations. Each approach makes different assumptions about the data, produces different recommendation types, and handles cold start differently. Production systems combine multiple approaches.

Association RulesLow complexity

Signal

What was bought together

Strength

Highly interpretable, no user history needed

Weakness

Misses personalisation - same rules for every user

Best for

Checkout "frequently bought together" widgets

Collaborative FilteringMedium complexity

Signal

What similar users bought

Strength

Captures latent preferences, highly personalised

Weakness

Cold start: fails for new users and new products

Best for

Personalised homepage, "you may also like" sections

Content-Based FilteringMedium complexity

Signal

What is similar to what you bought

Strength

Works for new products immediately, no cold start

Weakness

Filter bubble: rarely recommends outside known categories

Best for

Product detail page "similar products" section

HybridHigh complexity

Signal

Multiple signals combined

Strength

Best accuracy, mitigates each approach's weaknesses

Weakness

More complex to build, tune, and explain

Best for

Production systems with sufficient data

Core Concepts

How each piece works

Transaction Matrix

The fundamental data structure: a matrix where rows are users, columns are products, and cells contain a signal - purchase count, rating, or binary bought/not-bought. Collaborative filtering operates directly on this matrix. Most cells are empty (sparse), because no customer has bought every product.

TechHeavenA 1,000 x 271 matrix of 1,000 TechHeaven customers against 271 products. Most cells are 0. Customers who purchased a product have a 1 (or their rating if reviews are used). The matrix is 99.3% sparse.

Association Rules

Derived from market basket analysis: given that a customer bought product A, how likely are they to buy product B? Measured by Support (how often A and B appear together), Confidence (P(B|A)), and Lift (whether the co-occurrence is higher than chance). High lift means a real relationship, not coincidence.

TechHeavenLaptop + Wireless Mouse: support 0.08, confidence 0.67, lift 3.2. This means 8% of orders contain both, and 67% of orders with a laptop also contain a mouse - 3.2x more than you'd expect by chance.

Collaborative Filtering

Finds customers whose purchase history is similar to the target customer, then recommends products those similar customers bought. User-based CF computes similarity between users (cosine similarity on their purchase vectors). Item-based CF computes similarity between items instead and is more scalable. Matrix Factorization (SVD, ALS) decomposes the interaction matrix into latent factors that capture hidden preference dimensions.

TechHeaven1,000 customers, 271 products, 50 latent factors via SVD. The factorisation discovers hidden dimensions like "audio enthusiast", "mobile professional", and "gaming" without being explicitly told about them.

Content-Based Filtering

Represents each product as a feature vector - brand, category, price range, technical specifications, description text. Recommends products whose feature vectors are most similar (cosine similarity) to products the customer has already purchased. Does not require any purchase data from other customers.

TechHeavenA customer who bought a Sony WH-1000XM5 gets recommended the Bose QuietComfort Ultra because both are over-ear wireless headphones with ANC. The recommendation uses product attributes, not other customers' behaviour.

Cold Start Problem

New users have no purchase history, so collaborative filtering cannot compute their similarity to other users. New products have no purchase history, so they never appear in association rules. This is the primary failure mode of pure collaborative filtering and affects every new customer and every new product launch.

TechHeavenA new TechHeaven customer sees popularity-based recommendations on their first visit (top-selling products per category). After their first purchase, content-based kicks in. After their third order, collaborative filtering starts producing meaningful personalisation.

Evaluation Metrics

Offline evaluation uses held-out data: Precision@K (of the top-K recommendations, what fraction did the customer actually buy?), Recall@K (of all the things the customer actually bought, what fraction appeared in the top-K?), NDCG@K (did the right products appear near the top of the list?). Online evaluation uses A/B tests measuring click-through rate, add-to-cart rate, and revenue per session.

TechHeavenA model achieving Precision@5 = 0.22 means that on average 1.1 of the top 5 recommendations were actually purchased by the held-out test customers. Whether that is good or bad depends on the baseline (random: ~0.02).

Hybrid Systems

Production recommendation systems combine multiple approaches to offset each approach's weaknesses. Weighted hybrids blend scores from multiple models. Switching hybrids select the most appropriate model based on available data (use popularity when cold, content-based when catalogue is new, collaborative when history is rich). Feature augmentation feeds the output of one model as input to another.

TechHeavenScore = 0.4 x CF_score + 0.3 x CBF_score + 0.3 x AR_score. When CF is unavailable (new user), fall back to 0.5 x CBF + 0.5 x AR. This is a switching hybrid that degrades gracefully.

Architecture

Offline training and online serving

Recommendation systems split cleanly into two pipelines. The training pipeline runs on a schedule (hourly to daily) and is computationally expensive. The serving pipeline runs in real time and must return recommendations in under 50ms - it reads pre-computed results, it does not recompute them on demand.

Offline - Training

Runs nightly or on new data

1Pull order items, orders, reviews from data warehouse
2Build user-item interaction matrix
3Train association rules (Apriori / FP-Growth)
4Factorize matrix (SVD / ALS) for collaborative filtering
5Build content feature vectors (TF-IDF on product attributes)
6Compute top-N recommendations per user
7Write results to recommendation store (Redis / Postgres)
8Log model version and evaluation metrics

Online - Serving

Runs on every page load, target <50ms

1Identify customer_id from session or cookie
2Look up pre-computed recommendations from store
3Apply real-time filters (out of stock, already purchased)
4Apply business rules (promoted items, margin targets)
5Blend with contextual signals (current page, cart)
6Return ranked product list with recommendation reason
7Log impression for later evaluation and retraining

Technology choices per layer

Training

Python, scikit-learn, implicit, LightFM

Storage

Redis, Postgres, DynamoDB, BigQuery

Serving

FastAPI, Next.js API routes, GraphQL

Monitoring

Evidently, custom Grafana dashboards

Business Insights

What the data reveals about buying patterns

Association rules extracted from TechHeaven’s order data reveal consistent patterns across product categories. These are not hypotheses - they are signals measured directly from purchase co-occurrence, with lift values that confirm the relationship is not random.

Laptop buyers frequently also buy

Wireless MouseTrackpad fatigue - extended sessions demand precision input

Mechanical KeyboardDocking station workflow requires a full keyboard

USB-C Hub / DockModern laptops ship with 2 ports; workflows need more

Laptop Sleeve / CasePurchased in same session to protect a new device

External SSDStorage expansion, often immediately after purchase

Headphone / Audio buyers frequently also buy

Headphone AmplifierHigh-impedance headphones (250-600 ohm) require amplification

Balanced Cable / DACAudiophile workflow - separating D/A conversion from the amp

Carrying CaseProtective transport for premium devices

Ear Cushion ReplacementsConsumable accessory, ordered 3-6 months after headphone purchase

Gaming peripheral buyers frequently also buy

Gaming HeadsetMouse + headset is the core gaming setup - high co-occurrence

Mechanical KeyboardSwitch feel is a primary differentiation from membrane keyboards

High Refresh MonitorA competitive mouse is wasted on a 60Hz display

Mouse Pad (XL)Low DPI players need surface area; frequently bought together

These patterns are the output of association rule mining on TechHeaven’s order data. The notebook reproduces them with exact support, confidence, and lift values from the actual dataset.

Interactive

Explore the system

These sections will become interactive once the recommendation models are served via API. Select a customer to see their purchase history alongside model-generated recommendations and the reason each product was surfaced.

Recommendation Playground

Select a TechHeaven customer, see their purchase history, and explore recommendations from each model side by side

Coming soon

Association Rules Explorer

Browse all mined rules - filter by lift, confidence, or product category to see which products drive each other

Coming soon

Product Similarity Matrix

Pick any product and see its nearest neighbours under content-based filtering, with similarity scores and shared features

Coming soon

Customer Purchase Timeline

Visualise a customer's order history over time and see how recommendations evolve as their history grows

Coming soon

Frequently Bought Together

Product-level view of the top co-purchased items with support, confidence, and lift metrics displayed

Coming soon

Recommendation Journey

Trace how a recommendation is generated end to end: customer profile, matched similar users, filtered results, final ranking

Coming soon

Business Impact

What a recommendation engine delivers

+15-30%

Average Order Value

Cross-sell and upsell surfaces higher-value configurations and complementary products at the point of decision

+12-25%

Conversion Rate

Personalised product surfaces reduce friction - customers find relevant products faster with less browsing

3-5x

Long-tail Discovery

Recommendations distribute demand across the full catalogue, not just top-20 products

Traceable

Attribution

Every recommendation carries a reason: bought together, similar to, customers like you - measurable and auditable

Business Applications

Shopify / WooCommerce

Association rules surface "frequently bought together" widgets at checkout. Collaborative filtering powers personalised homepage sections. Both run as nightly batch jobs writing recommendations to a cache layer the storefront reads.

B2B / Manufacturing

Content-based filtering on product attributes (voltage, connector type, compatibility) reduces misconfigured orders. Buyers who purchased a component are shown compatible accessories and replacement parts based on attribute overlap.

Insurance / Financial Services

The same collaborative filtering logic applied to product purchases works for policy and product bundles. Customers who hold auto and home policies are shown life insurance - surfaced because similar customers hold all three.

SaaS / Marketplaces

Feature adoption follows the same logic as product purchase. Users who enable feature A frequently enable feature B. Recommendation engines applied to usage events surface integrations, add-ons, and plan upgrades at the right moment.

Run the notebook

The Jupyter notebook implements all four approaches on real TechHeaven data. Association rules, collaborative filtering, content-based filtering, hybrid blending, and evaluation metrics in a single reproducible pipeline.

Open in Colab

Ecommerce AI Series

AI Customer Support with RAG

View full series

Ecommerce AI Series

AI Customer Support with RAG

Product Recommendation EngineCurrent

Semantic Product Search

Voice Shopping Assistant

Inventory Forecasting

Customer Churn Prediction

+6 more - view full roadmap