Skip to main content

Product Recommendation Engine

Match products to customers using purchase history, product similarity, and buying patterns

The Problem

Most customers never find the products they would have bought

TechHeaven carries 291 products across 40 categories. A customer visiting the site sees the same homepage, the same featured products, and the same promotions as every other customer - regardless of whether they are a first-time buyer looking at budget laptops or a returning customer who has purchased six audio products.

The discovery gap is measurable. In TechHeaven’s order data, the top 20 products account for a disproportionate share of revenue. The remaining 250 products sell primarily to customers who searched for them specifically. Customers who would have bought those products but did not know they existed represent recoverable revenue.

35%
of Amazon revenue
attributed to its recommendation engine
75%
of Netflix views
driven by personalised recommendations
5-15%
revenue uplift
typical range for well-tuned ecommerce recommendations

A recommendation engine does not need to be complex to generate value. TechHeaven already has the signals: 5,000 orders, 15,000 line items, and 5,000 reviews. The data to build a useful recommendation system is already there.

The Data

What TechHeaven already has

Every recommendation approach in this playbook is trained on data that a Bagisto store produces as a normal side-effect of operating. No additional data collection is needed.

Order items15,000 records
data/transactions/order_items.json

order_id, product_id, product_name, qty, price, category

Association rules and collaborative filtering - the primary signal for what customers bought together

Orders5,000 records
data/transactions/orders.json

order_id, customer_id, status, grand_total, order_date

Links order items to specific customers - essential for user-based collaborative filtering

Product catalog291 records
data/catalog/products.json

name, brand, category, price, description, specs

Content-based filtering - product attributes define what makes two products similar

Reviews5,000 records
data/reviews/reviews.json

product_id, customer_id, rating (1-5)

Explicit feedback signal - ratings provide a cleaner preference signal than purchase alone

The Approaches

Four ways to recommend products

No single algorithm dominates across all situations. Each approach makes different assumptions about the data, produces different recommendation types, and handles cold start differently. Production systems combine multiple approaches.

Association RulesLow complexity
Signal
What was bought together
Strength
Highly interpretable, no user history needed
Weakness
Misses personalisation - same rules for every user
Best for
Checkout "frequently bought together" widgets
Collaborative FilteringMedium complexity
Signal
What similar users bought
Strength
Captures latent preferences, highly personalised
Weakness
Cold start: fails for new users and new products
Best for
Personalised homepage, "you may also like" sections
Content-Based FilteringMedium complexity
Signal
What is similar to what you bought
Strength
Works for new products immediately, no cold start
Weakness
Filter bubble: rarely recommends outside known categories
Best for
Product detail page "similar products" section
HybridHigh complexity
Signal
Multiple signals combined
Strength
Best accuracy, mitigates each approach's weaknesses
Weakness
More complex to build, tune, and explain
Best for
Production systems with sufficient data

Core Concepts

How each piece works

Transaction Matrix

The fundamental data structure: a matrix where rows are users, columns are products, and cells contain a signal - purchase count, rating, or binary bought/not-bought. Collaborative filtering operates directly on this matrix. Most cells are empty (sparse), because no customer has bought every product.

TechHeavenA 1,000 x 271 matrix of 1,000 TechHeaven customers against 271 products. Most cells are 0. Customers who purchased a product have a 1 (or their rating if reviews are used). The matrix is 99.3% sparse.

Association Rules

Derived from market basket analysis: given that a customer bought product A, how likely are they to buy product B? Measured by Support (how often A and B appear together), Confidence (P(B|A)), and Lift (whether the co-occurrence is higher than chance). High lift means a real relationship, not coincidence.

TechHeavenLaptop + Wireless Mouse: support 0.08, confidence 0.67, lift 3.2. This means 8% of orders contain both, and 67% of orders with a laptop also contain a mouse - 3.2x more than you'd expect by chance.

Collaborative Filtering

Finds customers whose purchase history is similar to the target customer, then recommends products those similar customers bought. User-based CF computes similarity between users (cosine similarity on their purchase vectors). Item-based CF computes similarity between items instead and is more scalable. Matrix Factorization (SVD, ALS) decomposes the interaction matrix into latent factors that capture hidden preference dimensions.

TechHeaven1,000 customers, 271 products, 50 latent factors via SVD. The factorisation discovers hidden dimensions like "audio enthusiast", "mobile professional", and "gaming" without being explicitly told about them.

Content-Based Filtering

Represents each product as a feature vector - brand, category, price range, technical specifications, description text. Recommends products whose feature vectors are most similar (cosine similarity) to products the customer has already purchased. Does not require any purchase data from other customers.

TechHeavenA customer who bought a Sony WH-1000XM5 gets recommended the Bose QuietComfort Ultra because both are over-ear wireless headphones with ANC. The recommendation uses product attributes, not other customers' behaviour.

Cold Start Problem

New users have no purchase history, so collaborative filtering cannot compute their similarity to other users. New products have no purchase history, so they never appear in association rules. This is the primary failure mode of pure collaborative filtering and affects every new customer and every new product launch.

TechHeavenA new TechHeaven customer sees popularity-based recommendations on their first visit (top-selling products per category). After their first purchase, content-based kicks in. After their third order, collaborative filtering starts producing meaningful personalisation.

Evaluation Metrics

Offline evaluation uses held-out data: Precision@K (of the top-K recommendations, what fraction did the customer actually buy?), Recall@K (of all the things the customer actually bought, what fraction appeared in the top-K?), NDCG@K (did the right products appear near the top of the list?). Online evaluation uses A/B tests measuring click-through rate, add-to-cart rate, and revenue per session.

TechHeavenA model achieving Precision@5 = 0.22 means that on average 1.1 of the top 5 recommendations were actually purchased by the held-out test customers. Whether that is good or bad depends on the baseline (random: ~0.02).

Hybrid Systems

Production recommendation systems combine multiple approaches to offset each approach's weaknesses. Weighted hybrids blend scores from multiple models. Switching hybrids select the most appropriate model based on available data (use popularity when cold, content-based when catalogue is new, collaborative when history is rich). Feature augmentation feeds the output of one model as input to another.

TechHeavenScore = 0.4 x CF_score + 0.3 x CBF_score + 0.3 x AR_score. When CF is unavailable (new user), fall back to 0.5 x CBF + 0.5 x AR. This is a switching hybrid that degrades gracefully.

Architecture

Offline training and online serving

Recommendation systems split cleanly into two pipelines. The training pipeline runs on a schedule (hourly to daily) and is computationally expensive. The serving pipeline runs in real time and must return recommendations in under 50ms - it reads pre-computed results, it does not recompute them on demand.

Offline - Training
Runs nightly or on new data
  1. 1Pull order items, orders, reviews from data warehouse
  2. 2Build user-item interaction matrix
  3. 3Train association rules (Apriori / FP-Growth)
  4. 4Factorize matrix (SVD / ALS) for collaborative filtering
  5. 5Build content feature vectors (TF-IDF on product attributes)
  6. 6Compute top-N recommendations per user
  7. 7Write results to recommendation store (Redis / Postgres)
  8. 8Log model version and evaluation metrics
Online - Serving
Runs on every page load, target <50ms
  1. 1Identify customer_id from session or cookie
  2. 2Look up pre-computed recommendations from store
  3. 3Apply real-time filters (out of stock, already purchased)
  4. 4Apply business rules (promoted items, margin targets)
  5. 5Blend with contextual signals (current page, cart)
  6. 6Return ranked product list with recommendation reason
  7. 7Log impression for later evaluation and retraining
Technology choices per layer
Training
Python, scikit-learn, implicit, LightFM
Storage
Redis, Postgres, DynamoDB, BigQuery
Serving
FastAPI, Next.js API routes, GraphQL
Monitoring
Evidently, custom Grafana dashboards

Business Insights

What the data reveals about buying patterns

Association rules extracted from TechHeaven’s order data reveal consistent patterns across product categories. These are not hypotheses - they are signals measured directly from purchase co-occurrence, with lift values that confirm the relationship is not random.

Laptop buyers frequently also buy
Wireless MouseTrackpad fatigue - extended sessions demand precision input
Mechanical KeyboardDocking station workflow requires a full keyboard
USB-C Hub / DockModern laptops ship with 2 ports; workflows need more
Laptop Sleeve / CasePurchased in same session to protect a new device
External SSDStorage expansion, often immediately after purchase
Headphone / Audio buyers frequently also buy
Headphone AmplifierHigh-impedance headphones (250-600 ohm) require amplification
Balanced Cable / DACAudiophile workflow - separating D/A conversion from the amp
Carrying CaseProtective transport for premium devices
Ear Cushion ReplacementsConsumable accessory, ordered 3-6 months after headphone purchase
Gaming peripheral buyers frequently also buy
Gaming HeadsetMouse + headset is the core gaming setup - high co-occurrence
Mechanical KeyboardSwitch feel is a primary differentiation from membrane keyboards
High Refresh MonitorA competitive mouse is wasted on a 60Hz display
Mouse Pad (XL)Low DPI players need surface area; frequently bought together

These patterns are the output of association rule mining on TechHeaven’s order data. The notebook reproduces them with exact support, confidence, and lift values from the actual dataset.

Interactive

Explore the system

These sections will become interactive once the recommendation models are served via API. Select a customer to see their purchase history alongside model-generated recommendations and the reason each product was surfaced.

Recommendation Playground
Select a TechHeaven customer, see their purchase history, and explore recommendations from each model side by side
Coming soon
Association Rules Explorer
Browse all mined rules - filter by lift, confidence, or product category to see which products drive each other
Coming soon
Product Similarity Matrix
Pick any product and see its nearest neighbours under content-based filtering, with similarity scores and shared features
Coming soon
Customer Purchase Timeline
Visualise a customer's order history over time and see how recommendations evolve as their history grows
Coming soon
Frequently Bought Together
Product-level view of the top co-purchased items with support, confidence, and lift metrics displayed
Coming soon
Recommendation Journey
Trace how a recommendation is generated end to end: customer profile, matched similar users, filtered results, final ranking
Coming soon

Business Impact

What a recommendation engine delivers

+15-30%
Average Order Value
Cross-sell and upsell surfaces higher-value configurations and complementary products at the point of decision
+12-25%
Conversion Rate
Personalised product surfaces reduce friction - customers find relevant products faster with less browsing
3-5x
Long-tail Discovery
Recommendations distribute demand across the full catalogue, not just top-20 products
Traceable
Attribution
Every recommendation carries a reason: bought together, similar to, customers like you - measurable and auditable

Business Applications

Shopify / WooCommerce
Association rules surface "frequently bought together" widgets at checkout. Collaborative filtering powers personalised homepage sections. Both run as nightly batch jobs writing recommendations to a cache layer the storefront reads.
B2B / Manufacturing
Content-based filtering on product attributes (voltage, connector type, compatibility) reduces misconfigured orders. Buyers who purchased a component are shown compatible accessories and replacement parts based on attribute overlap.
Insurance / Financial Services
The same collaborative filtering logic applied to product purchases works for policy and product bundles. Customers who hold auto and home policies are shown life insurance - surfaced because similar customers hold all three.
SaaS / Marketplaces
Feature adoption follows the same logic as product purchase. Users who enable feature A frequently enable feature B. Recommendation engines applied to usage events surface integrations, add-ons, and plan upgrades at the right moment.

Run the notebook

The Jupyter notebook implements all four approaches on real TechHeaven data. Association rules, collaborative filtering, content-based filtering, hybrid blending, and evaluation metrics in a single reproducible pipeline.

Open in Colab

Ecommerce AI Series

AI Customer Support with RAG
Product Recommendation EngineCurrent
Semantic Product Search
Voice Shopping Assistant
Inventory Forecasting
Customer Churn Prediction
+6 more - view full roadmap

This playbook

DifficultyIntermediate
Read time45 min
CategoryMachine Learning
PublishedJuly 2026

Tech stack

Pythonscikit-learnpandasmlxtendscipypgvector

Ecommerce AI Roadmap

Upcoming

Semantic Product Search
Voice Shopping Assistant
Inventory Forecasting
Customer Churn Prediction
Customer Lifetime Value
Dynamic Pricing
Review Sentiment Analysis
Demand Forecasting
Knowledge Graph
Agentic Commerce