Product Recommendation Engine
Match products to customers using purchase history, product similarity, and buying patterns
The Problem
Most customers never find the products they would have bought
TechHeaven carries 291 products across 40 categories. A customer visiting the site sees the same homepage, the same featured products, and the same promotions as every other customer - regardless of whether they are a first-time buyer looking at budget laptops or a returning customer who has purchased six audio products.
The discovery gap is measurable. In TechHeaven’s order data, the top 20 products account for a disproportionate share of revenue. The remaining 250 products sell primarily to customers who searched for them specifically. Customers who would have bought those products but did not know they existed represent recoverable revenue.
A recommendation engine does not need to be complex to generate value. TechHeaven already has the signals: 5,000 orders, 15,000 line items, and 5,000 reviews. The data to build a useful recommendation system is already there.
The Data
What TechHeaven already has
Every recommendation approach in this playbook is trained on data that a Bagisto store produces as a normal side-effect of operating. No additional data collection is needed.
data/transactions/order_items.jsonorder_id, product_id, product_name, qty, price, category
Association rules and collaborative filtering - the primary signal for what customers bought together
data/transactions/orders.jsonorder_id, customer_id, status, grand_total, order_date
Links order items to specific customers - essential for user-based collaborative filtering
data/catalog/products.jsonname, brand, category, price, description, specs
Content-based filtering - product attributes define what makes two products similar
data/reviews/reviews.jsonproduct_id, customer_id, rating (1-5)
Explicit feedback signal - ratings provide a cleaner preference signal than purchase alone
The Approaches
Four ways to recommend products
No single algorithm dominates across all situations. Each approach makes different assumptions about the data, produces different recommendation types, and handles cold start differently. Production systems combine multiple approaches.
Core Concepts
How each piece works
Transaction Matrix
The fundamental data structure: a matrix where rows are users, columns are products, and cells contain a signal - purchase count, rating, or binary bought/not-bought. Collaborative filtering operates directly on this matrix. Most cells are empty (sparse), because no customer has bought every product.
Association Rules
Derived from market basket analysis: given that a customer bought product A, how likely are they to buy product B? Measured by Support (how often A and B appear together), Confidence (P(B|A)), and Lift (whether the co-occurrence is higher than chance). High lift means a real relationship, not coincidence.
Collaborative Filtering
Finds customers whose purchase history is similar to the target customer, then recommends products those similar customers bought. User-based CF computes similarity between users (cosine similarity on their purchase vectors). Item-based CF computes similarity between items instead and is more scalable. Matrix Factorization (SVD, ALS) decomposes the interaction matrix into latent factors that capture hidden preference dimensions.
Content-Based Filtering
Represents each product as a feature vector - brand, category, price range, technical specifications, description text. Recommends products whose feature vectors are most similar (cosine similarity) to products the customer has already purchased. Does not require any purchase data from other customers.
Cold Start Problem
New users have no purchase history, so collaborative filtering cannot compute their similarity to other users. New products have no purchase history, so they never appear in association rules. This is the primary failure mode of pure collaborative filtering and affects every new customer and every new product launch.
Evaluation Metrics
Offline evaluation uses held-out data: Precision@K (of the top-K recommendations, what fraction did the customer actually buy?), Recall@K (of all the things the customer actually bought, what fraction appeared in the top-K?), NDCG@K (did the right products appear near the top of the list?). Online evaluation uses A/B tests measuring click-through rate, add-to-cart rate, and revenue per session.
Hybrid Systems
Production recommendation systems combine multiple approaches to offset each approach's weaknesses. Weighted hybrids blend scores from multiple models. Switching hybrids select the most appropriate model based on available data (use popularity when cold, content-based when catalogue is new, collaborative when history is rich). Feature augmentation feeds the output of one model as input to another.
Architecture
Offline training and online serving
Recommendation systems split cleanly into two pipelines. The training pipeline runs on a schedule (hourly to daily) and is computationally expensive. The serving pipeline runs in real time and must return recommendations in under 50ms - it reads pre-computed results, it does not recompute them on demand.
- 1Pull order items, orders, reviews from data warehouse
- 2Build user-item interaction matrix
- 3Train association rules (Apriori / FP-Growth)
- 4Factorize matrix (SVD / ALS) for collaborative filtering
- 5Build content feature vectors (TF-IDF on product attributes)
- 6Compute top-N recommendations per user
- 7Write results to recommendation store (Redis / Postgres)
- 8Log model version and evaluation metrics
- 1Identify customer_id from session or cookie
- 2Look up pre-computed recommendations from store
- 3Apply real-time filters (out of stock, already purchased)
- 4Apply business rules (promoted items, margin targets)
- 5Blend with contextual signals (current page, cart)
- 6Return ranked product list with recommendation reason
- 7Log impression for later evaluation and retraining
Business Insights
What the data reveals about buying patterns
Association rules extracted from TechHeaven’s order data reveal consistent patterns across product categories. These are not hypotheses - they are signals measured directly from purchase co-occurrence, with lift values that confirm the relationship is not random.
These patterns are the output of association rule mining on TechHeaven’s order data. The notebook reproduces them with exact support, confidence, and lift values from the actual dataset.
Interactive
Explore the system
These sections will become interactive once the recommendation models are served via API. Select a customer to see their purchase history alongside model-generated recommendations and the reason each product was surfaced.
Business Impact
What a recommendation engine delivers
Business Applications
Run the notebook
The Jupyter notebook implements all four approaches on real TechHeaven data. Association rules, collaborative filtering, content-based filtering, hybrid blending, and evaluation metrics in a single reproducible pipeline.
Open in ColabEcommerce AI Series
Ecommerce AI Series
Resources
This playbook
Tech stack
Ecommerce AI Roadmap
Completed
Products, orders, customers, policies, and FAQ
Upcoming