What is semantic product search?

Semantic product search uses dense vector embeddings to match a shopper's query to products based on meaning, not keyword overlap. A query like "laptop for video editing" matches products described as "ProRes acceleration" or "creator workflows" even if those exact words never appear in the query. Traditional keyword search would return zero results for this query if no product title contains the phrase "video editing".

What is the difference between semantic search and keyword search in ecommerce?

Keyword search (TF-IDF) matches words literally - it scores products by how often the query terms appear in the product text. Semantic search encodes both the query and products as dense vectors in a shared embedding space, then finds the products whose vectors are closest to the query vector. Keyword search fails when shoppers use different words than your product descriptions use. Semantic search handles vocabulary gaps, synonyms, and natural-language intent.

What is TF-IDF and why does it fail for natural-language queries?

TF-IDF (Term Frequency-Inverse Document Frequency) represents documents as sparse vectors where each dimension is a vocabulary word. The score is higher when a word appears frequently in a document but rarely across the corpus. It fails for natural-language queries because it can only match exact words. A query for "headphones for a noisy office" matches the word "open" in "open-back headphones" - and confidently returns a product that is actively the wrong choice for a noisy environment. Open-back headphones leak sound in both directions, making them worse than useless in a noisy office.

How do sentence embeddings work for product search?

A sentence embedding model (such as all-MiniLM-L6-v2) encodes any text as a fixed-length dense vector (384 dimensions in this case). Semantically similar texts end up close together in the vector space even if they share no words. At index time, you encode all product texts and store the vectors. At query time, you encode the search query and compute cosine similarity against all product vectors. The products with the highest similarity scores are returned as results.

How do you deploy semantic search in production for an ecommerce store?

Production semantic search splits into offline and online pipelines. Offline: encode all product texts nightly (or on catalog change), store vectors in a vector database (pgvector, Qdrant, Pinecone, Weaviate). Online: encode the search query, run a nearest-neighbor search against the vector store, apply filters (in stock, category), and return ranked results. For a catalog of 10,000 products or fewer, even a simple numpy cosine similarity search returns results in under 50ms. Vector databases add approximate nearest-neighbor search for larger catalogs.

Ecommerce AI/ML Series Built on TechHeavenLive

Semantic Product Search

Match shopper intent to your product catalog without keyword matching

The Problem

Your search box and your shoppers speak different languages

TechHeaven carries 291 products across 15 categories. A shopper searching “laptop for video editing” should find three relevant products - all with descriptions mentioning creator workflows, ProRes acceleration, and high-core-count chips. Traditional keyword search returns nothing. No product title contains the phrase “video editing.”

This vocabulary gap between how shoppers describe what they want and how product teams write descriptions is permanent. It is not a content problem that better copy solves - the same shopper who types “earbuds for the gym” is looking for products described as “IPX5 water resistant sport earphones.” No human writes product descriptions in the language shoppers use to search for them.

10-15%

of searches return zero results

Bloomreach / Algolia ecommerce benchmarks

30-60%

exit rate after zero results

Shoppers who see nothing relevant leave immediately

category accuracy

Semantic search vs TF-IDF on natural-language queries - this notebook

Semantic search closes the vocabulary gap by understanding intent rather than matching words. The same query - “laptop for video editing” - maps to the same region of an embedding space as “ProRes acceleration” and “creator workflows,” even without a single shared word. This playbook builds both systems and shows exactly where keyword search fails and why.

The Data

What the notebook indexes

Both search systems in this playbook index the same dataset: the TechHeaven product catalog loaded directly from GitHub. No database, no local setup, no API key required - the notebook fetches the data at runtime and builds the indexes from scratch.

Products indexed291

Consumer electronics across 15 categories: Laptops, Audio, Gaming, Smartphones, Cameras, Smart Home, and more

Search text per product4 fields

name + category + short_description + description concatenated into a single string for indexing

TF-IDF vocabulary~4,800 terms

Sparse matrix: most query terms score zero against most products

Embedding dimensions384

all-MiniLM-L6-v2 output: dense vector for every product, 22M parameter model, no API key required

data/catalog/products.json

Every product has: name, category, brand, price, short_description, and a longer description with technical specifications. The description field is where the vocabulary gap lives - technical terms that shoppers never use in a search box.

The Approaches

Keyword search vs. semantic search

Both systems use cosine similarity as the final ranking step. The difference is entirely in the representation: how the query and products are encoded before comparison.

TF-IDF (Baseline)

scikit-learn TfidfVectorizerLow complexity

How it works

Encodes products and queries as sparse word-frequency vectors. A product scores higher when the query words appear often in it and rarely across the corpus.

Strength

Fast, interpretable, no model download, zero false positives when words match exactly

Weakness

Zero score for any query word not in a product description. Cannot match synonyms, intent, or natural-language phrasing.

Semantic Search (Dense Embeddings)

sentence-transformers / all-MiniLM-L6-v2Medium complexity

How it works

Encodes products and queries as 384-dimension dense vectors using a pre-trained sentence transformer. Nearby vectors in this space share meaning, not just words.

Strength

Handles vocabulary gaps, synonyms, and natural-language intent. Works even when query and product share zero words.

Weakness

Slower to index (model inference per product), requires more memory per product than sparse vectors.

Results

Doubling accuracy on natural-language queries

The notebook runs 10 natural-language test queries - each designed to use shopper language that does not match product description vocabulary. Both systems return the top 5 results. The correct answer for each query is the expected product category.

5/10

TF-IDF

correct category in top-5 results

10/10

Semantic

correct category in top-5 results

Three queries illustrate the failure modes most clearly. These are not edge cases - they represent the everyday gap between how shoppers search and how products are described.

Query“headphones for a noisy open office”Expected: Closed-back, noise-isolating headphones

TF-IDF result

Sennheiser HD 660S2 Open-Back

Matched "open" in query to "open-back" in title - confidently returns the opposite of what was needed

Semantic result

Sony WH-1000XM5 (Noise Cancelling)

Understood "noisy office" = need isolation, not open sound staging

Query“laptop for video editing”Expected: High-performance laptop, GPU/CPU priority

TF-IDF result

No strong match - no products contain "video editing"

Zero vocabulary overlap with product descriptions

Semantic result

Apple MacBook Pro M4 Max

Understood creator workflows = high-performance laptop with strong GPU

Query“something to carry my gear”Expected: Bags / cases

TF-IDF result

Random weak matches across categories

No product contains "gear" - no meaningful score produced

Semantic result

Incase DSLR Pro Pack

Understood "carry my gear" = carrying case category

The open-back headphones case

The most striking failure is not just a wrong result - it is the right category but the actively harmful product. TF-IDF scores the Sennheiser HD 660S2 Open-Back highest for “headphones for a noisy open office” because the word “open” appears in both the query and the product title. But open-back headphones leak sound in both directions - they are the worst possible choice for a noisy environment. The model does not know this. Semantic search returns Sony WH-1000XM5 with noise cancellation because the embedding space places “noisy office” near “noise cancelling” and far from “open soundstage.”

Core Concepts

How each piece works

TF-IDF (Term Frequency-Inverse Document Frequency)

Represents each document as a sparse vector where each dimension is a vocabulary word. The weight for a word is proportional to how often it appears in this document (TF) multiplied by the inverse of how commonly it appears across all documents (IDF). Words that appear in every product get low weight; rare words that identify specific products get high weight. Cosine similarity then compares query vector to product vectors.

TechHeavenThe word "laptop" appears in 40 of 291 products - moderate IDF weight. The word "Thunderbolt" appears in 12 products - higher IDF weight. A query containing "Thunderbolt" gets a stronger signal toward those 12 products than a query containing "laptop" would.

Sentence Embeddings

A neural network encodes any text (a product description, a search query, a sentence) as a fixed-length dense vector. The model is trained so that texts with similar meaning produce similar vectors - regardless of word overlap. all-MiniLM-L6-v2 produces 384-dimension vectors from 22M parameters. It runs locally, requires no API key, and encodes 291 products in under 10 seconds.

TechHeaven"Laptop for video editing" and "Apple MacBook Pro M4 Max - ProRes hardware acceleration, 14-core GPU for creator workflows" produce vectors that are close in the 384-dimension space even though they share no meaningful words. The model learned this relationship from millions of sentence pairs during pre-training.

Cosine Similarity

Measures the angle between two vectors in a high-dimensional space. Returns a value from -1 (opposite directions) to 1 (identical direction). In practice, product search scores range from 0.1 (unrelated) to 0.85 (very close match). Both TF-IDF and semantic search use cosine similarity as the final ranking step - the difference is in the vectors being compared, not the comparison method.

TechHeavenQuery "headphones for a noisy office" vs. Sennheiser HD 660S2 (TF-IDF): score 0.31 (word "open" matches). Query "headphones for a noisy office" vs. Sony WH-1000XM5 (semantic): score 0.67 (embeddings capture noise cancelling intent). The semantic score is more than twice as high for the correct product.

Embedding Space Visualization (PCA)

The notebook reduces 384 dimensions to 2 using PCA (Principal Component Analysis) to visualize how products cluster. Products in the same category cluster together in the embedding space - headphones near headphones, laptops near laptops. A query for "headphones for a noisy office" falls into the audio cluster, near noise-cancelling products and far from open-back products.

TechHeavenThe 2D PCA chart shows 15 distinct product clusters. Gaming peripherals, audio equipment, and mobile accessories each occupy a distinct region. Overlaps appear at meaningful boundaries - wireless earbuds bridge the audio and mobile clusters.

Evaluation: Category Accuracy

For each test query, we define an expected product category (e.g., "headphones for a noisy office" expects Audio / headphones). We check whether the top-1 result falls in the correct category. This is a proxy for relevance - not a perfect metric, but one that is easy to audit and reproduce without click data.

TechHeaven10 queries, each with a defined expected category. TF-IDF: 5 correct. Semantic: 10 correct. The notebook prints the result table and confirms the finding reproducibly from any environment, no data download required.

Architecture

Offline indexing and online serving

Semantic search splits cleanly into two pipelines. Encoding products is expensive (model inference for each product) so it runs offline on a schedule. Serving queries is fast (one model inference + cosine similarity lookup) so it runs in real time at query time.

Offline - Indexing

Runs nightly or on catalog change

1Fetch product catalog from data source
2Concatenate name + category + description into search text
3Encode all product texts with sentence transformer model
4Store embedding vectors alongside product IDs
5Repeat whenever products are added, updated, or removed

Online - Serving

Runs on every search, target <100ms

1Receive search query from the storefront
2Encode query with the same sentence transformer model
3Compute cosine similarity against all product vectors
4Apply filters (in stock, category, price range)
5Return top-N results ranked by similarity score
6Log query and results for evaluation and retraining

Technology choices per layer

Encoding

sentence-transformers, OpenAI embeddings, Cohere embed

Vector store

pgvector, Qdrant, Pinecone, Weaviate, numpy (small catalogs)

Serving

FastAPI, Next.js API routes, Shopify App Bridge

Monitoring

Zero-result rate, click-through rate, query logs

For a catalog under 10,000 products, numpy cosine similarity across all product vectors returns results in under 50ms with no vector database required. The notebook demonstrates this exact approach on 291 products.

Business Applications

Shopify / WooCommerce

Replace the default storefront search with semantic embeddings. A shopper searching "wireless earbuds for running" finds products described as "sport earphones" and "IPX5 water resistant" even with no word overlap. Encodes at publish time, serves sub-50ms cosine similarity at query time with a small catalog.

B2B / Manufacturing

Part number search fails for buyers who describe what a part does rather than what it is called. Semantic search matches "connector that fits into a 3mm slot" to technical specifications written in engineering terms. Reduces "no results" dead ends that send buyers to a competitor.

Insurance / Professional Services

Policy and product search applies the same logic. A customer searching "I need cover if I can't work" should surface disability and income protection products - even if none of those exact words appear in the product name. Dense retrieval closes the vocabulary gap between customer language and actuarial language.

SaaS / Marketplaces

Feature discovery, documentation search, and help content retrieval all benefit from semantic search. Users searching "how do I undo a change" find results about version history and revision tracking even if neither phrase appears in the documentation title.

Run the notebook

The Jupyter notebook builds both search systems from scratch on 291 TechHeaven products. TF-IDF baseline, sentence-transformer semantic search, a side-by-side comparison on 10 natural-language queries, and a 2D PCA visualization of the product embedding space. No API key required.

Open in Colab

Ecommerce AI/ML Series

Product Recommendation Engine

Inventory Forecasting

Ecommerce AI/ML Series

AI Customer Support with RAG

Product Recommendation Engine

Semantic Product SearchCurrent

Inventory Forecasting

Review Sentiment Analysis

Resources

semantic_product_search.ipynb