Skip to main content

Review Sentiment Analysis

Turn 5,000 customer reviews into actionable product quality signals

The Problem

What star ratings don't tell you

TechHeaven has 5,000 customer reviews. Average star rating per product is what most retailers track. But a 3-star review that says “the product works but the app keeps disconnecting” and a 3-star review that says “decent but overpriced” are completely different signals. The first is an engineering problem. The second is a pricing perception problem. The average rating cannot distinguish them. Neither can any human reading all 5,000 reviews manually.

Sentiment analysis reads the text, scores it, and answers three questions a retailer actually needs answered: which products have hidden quality problems that generous raters are glossing over, what product attributes are driving the most complaints, and is customer satisfaction trending up or down this month across each category.

0.733
Pearson r: VADER vs star rating
Strong correlation validates the approach on TechHeaven reviews
16.4%
reviews misaligned with rating
822 reviews where text sentiment and star rating disagree
370
negative reviews mention value
Top complaint aspect - more than performance, app, or build quality

This playbook uses VADER - a rule-based sentiment model that requires no training, no API key, and no model deployment. It scores all 5,000 reviews in seconds, produces a category sentiment dashboard, extracts the top complaint aspects per category, and plots monthly satisfaction trends. All outputs are reproducible in Google Colab with no setup.

The Data

5,000 reviews across 15 categories

Two JSON files loaded directly from the TechHeaven public repository. No database, no local setup, no API key required.

Reviews5,000

Fields: review_id, product_id, customer_id, rating, title, comment, review_date. All in approved status. Date range: Jan 2025 - Jun 2026.

Rating skew45% five-star

2,250 five-star, 1,500 four-star, 750 three-star, 350 two-star, 150 one-star. Typical ecommerce distribution - satisfied customers review more.

Comment length~194 chars avg

Short consumer text - ideal for VADER, which was designed for this format. Concatenated title + comment gives VADER more signal.

Categories15

Laptops (774 reviews), Memory (272), Smart Home (255), Networking (255), Audio (340), and 10 more. Category-level aggregation is the primary analysis unit.

The Approach

VADER: rule-based sentiment, no training required

VADER ships with a pre-built lexicon of ~7,500 words annotated with valence scores by human raters. It handles capitalisation, punctuation, intensifiers, and negation. Scoring a review is a dictionary lookup and arithmetic - it takes microseconds per review and requires no GPU, no model weights, and no API call.

VADER output fields
compound
-1 to +1
Normalised aggregate score. The primary metric. >= 0.05 = positive, <= -0.05 = negative.
pos
0 to 1
Proportion of review text with positive valence.
neg
0 to 1
Proportion of review text with negative valence.
neu
0 to 1
Proportion of review text that is neutral.

Validation: Pearson r = 0.733

On TechHeaven reviews, VADER compound scores correlate with star ratings at r = 0.733 (p < 0.001). One-star reviews average -0.635 compound; five-star reviews average +0.772. This validates VADER as a reliable proxy for customer satisfaction on this dataset without any fine-tuning.

Results

Four outputs from one scoring pass

VADER runs once on all 5,000 reviews. Every analysis below is a different aggregation of the same compound scores.

1. Sentiment distribution85.8% positive / 11.7% negative / 2.4% neutral

Matches the star rating skew. 587 negative reviews contain the most actionable signal - these are the reviews the product team needs to read. VADER surfaces them without manual triage.

2. Category sentiment rankingMemory: lowest (0.514), Accessories: highest (0.594)

All 15 categories are above the neutral threshold. Memory and Mobile Accessories show the highest percentage of negative reviews (14-15%). The category ranking guides sourcing and quality control priorities.

3. Top complaint aspectsValue (370), Performance (309), App/Software (284), Build Quality (276)

Keyword extraction on the 587 negative reviews. Value is the dominant complaint - more than performance or app quality. This signals a pricing perception issue rather than a product defect issue.

4. Misalignment: 16.4% of reviews disagree with their star822 reviews where VADER and star rating diverge

150 low-rating reviews (1-2 stars) score positive on VADER - customers who used mixed or understated complaint language. 225 three-star reviews score negative - customers who were lenient with the rating but expressed clear dissatisfaction in text.

Misalignment examples: where text and rating diverge

2-star review - VADER reads positive
2 starsVADER 0.34

The product has some strong points and the build feels solid. My issue is the companion app which makes the whole experience frustrating.

VADER scores the positive framing ("strong points", "solid") and misses the net negative intent. The customer is complaining, but the language is mixed.

5-star review with complaint language
5 starsVADER 0.43

I was skeptical at first but it has proven itself. Fantastic product - the battery drains faster than expected on standby, but performance is rock solid.

VADER compound drops below 5-star average (0.77) because "skeptical", "drains" and comparison language lower the score. The review contains a real complaint masked by overall satisfaction.

Core Concepts

How each piece works

VADER Lexicon and Valence Scoring

Each word in the VADER lexicon has a human-rated valence score from -4 (most negative) to +4 (most positive). Scoring a sentence: for each word, look up its valence score, apply context modifiers (capitalisation adds intensity, "very" amplifies, "not" negates), sum the scores, and normalise to the -1 to +1 compound range. No machine learning is involved. The lexicon was validated by crowd-sourced human raters specifically on social media and review text.

TechHeaven"Fantastic product" scores +0.6. "Not bad" scores +0.3 (negation reduces positive valence). "Absolutely terrible" scores -0.8 (intensifier amplifies negative valence). "Decent but overpriced" scores around +0.1 - VADER sees "decent" as slightly positive and "overpriced" as moderately negative, producing a near-neutral overall score.

Aspect-Based Sentiment Extraction

Aspect extraction finds which product attributes are mentioned in reviews and whether those mentions are positive or negative. The simplest implementation: define keyword lists per aspect, match keywords in negative review text, count matches per category. The output is a ranked list of complaint aspects. The count of negative mentions per aspect is more actionable than overall sentiment - it points to specific features the product team can address.

TechHeaven"Value" appears in 370 of 587 negative TechHeaven reviews. The keyword list includes "price", "overpriced", "expensive", "worth", "value", "cost". This concentration suggests pricing perception is the primary driver of negative reviews, which points to a pricing or positioning decision rather than a product engineering issue.

Sentiment Trend Monitoring

Monthly average VADER compound per category shows whether customer satisfaction is improving or declining over time. The sentiment trend is a leading indicator: negative text is written immediately after a bad experience, while star ratings accumulate slowly. A drop in average compound score for a category in a specific month - before the overall star rating drops - gives 2-4 weeks of lead time to investigate the cause.

TechHeavenIf Smart Home category compound drops from 0.55 in February to 0.35 in March, that is a -0.2 point decline. Before checking the overall star rating (which will lag by 4-6 weeks as low ratings accumulate), the operations team can look at the March reviews: are they all mentioning the same firmware version? A specific product batch? The answer determines whether it is a product issue or a supplier issue.

When to use VADER vs a Transformer Model

VADER is appropriate when: you need results in seconds with no setup, your text is short consumer reviews or social media, and you want full interpretability (you can see exactly why a review was scored positive or negative by looking at the lexicon). Use a fine-tuned transformer (DistilBERT, RoBERTa, or a sentiment-specific model from Hugging Face) when: your domain has specialist vocabulary that VADER does not handle well, you need aspect-opinion pair extraction (not just keyword counting), or you need to handle sarcasm and nuanced negative sentiment more accurately.

TechHeavenVADER correctly scores "This product is terrible" as negative (-0.76) and "This is genuinely excellent" as positive (+0.73). It struggles with "I wanted to like this product" (reads as positive because of "like") and domain-specific terms ("the latency spikes" - VADER sees "spikes" as neutral). A transformer fine-tuned on electronics reviews would handle both cases.

Architecture

Production sentiment monitoring pipeline

The notebook runs sentiment analysis in batch. In production, the pipeline runs weekly on new reviews and pushes to a dashboard or alert system.

Weekly batch job
Runs Monday morning, covers reviews from the prior week
  1. 1Fetch new reviews since last run from the ecommerce database
  2. 2Join to product catalog to get category per review
  3. 3Score each review with VADER (< 1 second for 1,000 reviews)
  4. 4Aggregate compound scores by product and category
  5. 5Compare to prior week averages - flag significant drops
  6. 6Push updated scores to dashboard, write weekly aspect report
Alert routing
Routes complaints to the right team automatically
  1. 1"Connectivity" aspect spike → engineering team
  2. 2"App / software" aspect spike → mobile team
  3. 3"Value" aspect spike → pricing and marketing team
  4. 4"Build quality" aspect spike → sourcing and supplier team
  5. 5Product compound drops >0.2 week-over-week → product manager
  6. 6Worst 10 reviews by compound → customer success for manual follow-up
Upgrade path
Start with VADER
Rule-based, no training, zero cost, full interpretability. Production-ready for most ecommerce review volumes.
Add transformer model
Fine-tune DistilBERT or use a Hugging Face sentiment model for better accuracy on nuanced or domain-specific language.
Add LLM summarisation
Run GPT-4o or Claude on the worst 1% of reviews to generate structured complaint summaries and suggested resolutions for the support team.

Business Applications

Ecommerce / Retail
Product quality monitoring at scale. Weekly VADER scoring on new reviews surfaces the products and categories with declining sentiment before the average star rating reflects it. Route connectivity complaints to engineering, value complaints to pricing, app complaints to the mobile team - automatically.
Manufacturing / D2C Brands
Warranty and defect signal detection. A spike in negative sentiment mentioning "battery" or "build quality" in a specific month often predicts an incoming returns wave. Catching it 2-4 weeks early via sentiment trend monitoring gives operations time to prepare the returns process and contact the supplier.
SaaS / B2B
Support ticket and NPS comment analysis. VADER runs on any short text: Zendesk ticket summaries, G2 or Capterra review comments, NPS verbatims. A weekly sentiment dashboard across your review sources shows product sentiment trends faster than waiting for quarterly NPS surveys.
Insurance / Financial Services
Claims and complaint analysis. Regulatory bodies require monitoring of customer complaints. VADER provides a first-pass severity triage: flagging highly negative complaint text for priority human review, and surfacing the most common complaint aspects for root cause analysis without reading every document.

Run the notebook

VADER scoring on 5,000 TechHeaven reviews. Correlation validation, category sentiment dashboard, aspect extraction, sentiment trend chart, misalignment analysis, and the 5 most negative reviews by compound score. No API key required - runs in under 30 seconds on Colab.

Open in Colab