Review Sentiment Analysis
Turn 5,000 customer reviews into actionable product quality signals
The Problem
What star ratings don't tell you
TechHeaven has 5,000 customer reviews. Average star rating per product is what most retailers track. But a 3-star review that says “the product works but the app keeps disconnecting” and a 3-star review that says “decent but overpriced” are completely different signals. The first is an engineering problem. The second is a pricing perception problem. The average rating cannot distinguish them. Neither can any human reading all 5,000 reviews manually.
Sentiment analysis reads the text, scores it, and answers three questions a retailer actually needs answered: which products have hidden quality problems that generous raters are glossing over, what product attributes are driving the most complaints, and is customer satisfaction trending up or down this month across each category.
This playbook uses VADER - a rule-based sentiment model that requires no training, no API key, and no model deployment. It scores all 5,000 reviews in seconds, produces a category sentiment dashboard, extracts the top complaint aspects per category, and plots monthly satisfaction trends. All outputs are reproducible in Google Colab with no setup.
The Data
5,000 reviews across 15 categories
Two JSON files loaded directly from the TechHeaven public repository. No database, no local setup, no API key required.
Fields: review_id, product_id, customer_id, rating, title, comment, review_date. All in approved status. Date range: Jan 2025 - Jun 2026.
2,250 five-star, 1,500 four-star, 750 three-star, 350 two-star, 150 one-star. Typical ecommerce distribution - satisfied customers review more.
Short consumer text - ideal for VADER, which was designed for this format. Concatenated title + comment gives VADER more signal.
Laptops (774 reviews), Memory (272), Smart Home (255), Networking (255), Audio (340), and 10 more. Category-level aggregation is the primary analysis unit.
The Approach
VADER: rule-based sentiment, no training required
VADER ships with a pre-built lexicon of ~7,500 words annotated with valence scores by human raters. It handles capitalisation, punctuation, intensifiers, and negation. Scoring a review is a dictionary lookup and arithmetic - it takes microseconds per review and requires no GPU, no model weights, and no API call.
Validation: Pearson r = 0.733
On TechHeaven reviews, VADER compound scores correlate with star ratings at r = 0.733 (p < 0.001). One-star reviews average -0.635 compound; five-star reviews average +0.772. This validates VADER as a reliable proxy for customer satisfaction on this dataset without any fine-tuning.
Results
Four outputs from one scoring pass
VADER runs once on all 5,000 reviews. Every analysis below is a different aggregation of the same compound scores.
Matches the star rating skew. 587 negative reviews contain the most actionable signal - these are the reviews the product team needs to read. VADER surfaces them without manual triage.
All 15 categories are above the neutral threshold. Memory and Mobile Accessories show the highest percentage of negative reviews (14-15%). The category ranking guides sourcing and quality control priorities.
Keyword extraction on the 587 negative reviews. Value is the dominant complaint - more than performance or app quality. This signals a pricing perception issue rather than a product defect issue.
150 low-rating reviews (1-2 stars) score positive on VADER - customers who used mixed or understated complaint language. 225 three-star reviews score negative - customers who were lenient with the rating but expressed clear dissatisfaction in text.
Misalignment examples: where text and rating diverge
“The product has some strong points and the build feels solid. My issue is the companion app which makes the whole experience frustrating.”
VADER scores the positive framing ("strong points", "solid") and misses the net negative intent. The customer is complaining, but the language is mixed.
“I was skeptical at first but it has proven itself. Fantastic product - the battery drains faster than expected on standby, but performance is rock solid.”
VADER compound drops below 5-star average (0.77) because "skeptical", "drains" and comparison language lower the score. The review contains a real complaint masked by overall satisfaction.
Core Concepts
How each piece works
VADER Lexicon and Valence Scoring
Each word in the VADER lexicon has a human-rated valence score from -4 (most negative) to +4 (most positive). Scoring a sentence: for each word, look up its valence score, apply context modifiers (capitalisation adds intensity, "very" amplifies, "not" negates), sum the scores, and normalise to the -1 to +1 compound range. No machine learning is involved. The lexicon was validated by crowd-sourced human raters specifically on social media and review text.
Aspect-Based Sentiment Extraction
Aspect extraction finds which product attributes are mentioned in reviews and whether those mentions are positive or negative. The simplest implementation: define keyword lists per aspect, match keywords in negative review text, count matches per category. The output is a ranked list of complaint aspects. The count of negative mentions per aspect is more actionable than overall sentiment - it points to specific features the product team can address.
Sentiment Trend Monitoring
Monthly average VADER compound per category shows whether customer satisfaction is improving or declining over time. The sentiment trend is a leading indicator: negative text is written immediately after a bad experience, while star ratings accumulate slowly. A drop in average compound score for a category in a specific month - before the overall star rating drops - gives 2-4 weeks of lead time to investigate the cause.
When to use VADER vs a Transformer Model
VADER is appropriate when: you need results in seconds with no setup, your text is short consumer reviews or social media, and you want full interpretability (you can see exactly why a review was scored positive or negative by looking at the lexicon). Use a fine-tuned transformer (DistilBERT, RoBERTa, or a sentiment-specific model from Hugging Face) when: your domain has specialist vocabulary that VADER does not handle well, you need aspect-opinion pair extraction (not just keyword counting), or you need to handle sarcasm and nuanced negative sentiment more accurately.
Architecture
Production sentiment monitoring pipeline
The notebook runs sentiment analysis in batch. In production, the pipeline runs weekly on new reviews and pushes to a dashboard or alert system.
- 1Fetch new reviews since last run from the ecommerce database
- 2Join to product catalog to get category per review
- 3Score each review with VADER (< 1 second for 1,000 reviews)
- 4Aggregate compound scores by product and category
- 5Compare to prior week averages - flag significant drops
- 6Push updated scores to dashboard, write weekly aspect report
- 1"Connectivity" aspect spike → engineering team
- 2"App / software" aspect spike → mobile team
- 3"Value" aspect spike → pricing and marketing team
- 4"Build quality" aspect spike → sourcing and supplier team
- 5Product compound drops >0.2 week-over-week → product manager
- 6Worst 10 reviews by compound → customer success for manual follow-up
Business Applications
Run the notebook
VADER scoring on 5,000 TechHeaven reviews. Correlation validation, category sentiment dashboard, aspect extraction, sentiment trend chart, misalignment analysis, and the 5 most negative reviews by compound score. No API key required - runs in under 30 seconds on Colab.
Open in ColabEcommerce AI/ML Series
Ecommerce AI/ML Series
Resources
This playbook
Tech stack
Ecommerce AI/ML Roadmap
Completed
Products, orders, customers, policies, and reviews