How do you analyze customer reviews with Python?

The standard pipeline: (1) load review text from your data source; (2) join to product catalog to get category per review; (3) run vaderSentiment.SentimentIntensityAnalyzer().polarity_scores() on each review comment; (4) extract the compound score; (5) aggregate by product, category, or time period. The full pipeline on 5,000 reviews runs in under 10 seconds on a laptop with no GPU required. For more accurate results on domain-specific text, fine-tuned transformer models (DistilBERT, RoBERTa) improve on VADER at higher compute cost.

What is aspect-based sentiment analysis?

Aspect-based sentiment analysis (ABSA) identifies which specific product attributes customers mention and whether they express positive or negative sentiment about each. Instead of "this review is negative," ABSA outputs "this review is negative about battery life but positive about build quality." The simplest implementation uses keyword lists per aspect (battery: ["battery", "charge", "drain", "runtime"]) and counts how often each aspect appears in negative reviews. More sophisticated approaches use fine-tuned models that can identify aspect-opinion pairs without predefined keyword lists.

How do you identify products with hidden quality issues from reviews?

Two approaches: (1) sentiment-rating misalignment - find reviews where the customer gave a high star rating but the text scores negatively on VADER. These are cases where the customer was lenient with the rating but expressed reservations in text. (2) Below-average category sentiment - compute average VADER compound per product and flag products scoring significantly below their category average. A product with a 4.2-star average that scores 0.3 on VADER (vs a 0.75 category average) has more negative language in its reviews than the rating suggests, which often predicts increasing complaints or returns.

What data do you need for review sentiment analysis?

At minimum: review text (title and/or comment body), star rating, product identifier, and review date. The product identifier lets you join to a product catalog to get category, brand, and price - enabling category-level analysis. The review date enables trend detection. Star rating is used as ground truth to validate VADER scores and identify misaligned reviews. This playbook uses TechHeaven review data fetched directly from GitHub at runtime - no database or local setup required.

Ecommerce AI/ML Series Built on TechHeavenLive

Review Sentiment Analysis

Q: What is VADER sentiment analysis?

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a rule-based sentiment model designed for short consumer text such as product reviews, social media posts, and forum comments. It requires no training data and no model deployment - it ships with a pre-built lexicon of ~7,500 words annotated with valence scores by human raters. VADER produces a compound score from -1 (most negative) to +1 (most positive) in milliseconds per review. Standard classification thresholds: compound >= 0.05 = positive, compound <= -0.05 = negative, between those = neutral.

Turn 5,000 customer reviews into actionable product quality signals

The Problem

What star ratings don't tell you

TechHeaven has 5,000 customer reviews. Average star rating per product is what most retailers track. But a 3-star review that says “the product works but the app keeps disconnecting” and a 3-star review that says “decent but overpriced” are completely different signals. The first is an engineering problem. The second is a pricing perception problem. The average rating cannot distinguish them. Neither can any human reading all 5,000 reviews manually.

Sentiment analysis reads the text, scores it, and answers three questions a retailer actually needs answered: which products have hidden quality problems that generous raters are glossing over, what product attributes are driving the most complaints, and is customer satisfaction trending up or down this month across each category.

0.733

Pearson r: VADER vs star rating

Strong correlation validates the approach on TechHeaven reviews

16.4%

reviews misaligned with rating

822 reviews where text sentiment and star rating disagree

370

negative reviews mention value

Top complaint aspect - more than performance, app, or build quality

This playbook uses VADER - a rule-based sentiment model that requires no training, no API key, and no model deployment. It scores all 5,000 reviews in seconds, produces a category sentiment dashboard, extracts the top complaint aspects per category, and plots monthly satisfaction trends. All outputs are reproducible in Google Colab with no setup.

The Data

5,000 reviews across 15 categories

Two JSON files loaded directly from the TechHeaven public repository. No database, no local setup, no API key required.

Reviews5,000

Fields: review_id, product_id, customer_id, rating, title, comment, review_date. All in approved status. Date range: Jan 2025 - Jun 2026.

Rating skew45% five-star

2,250 five-star, 1,500 four-star, 750 three-star, 350 two-star, 150 one-star. Typical ecommerce distribution - satisfied customers review more.

Comment length~194 chars avg

Short consumer text - ideal for VADER, which was designed for this format. Concatenated title + comment gives VADER more signal.

Categories15

Laptops (774 reviews), Memory (272), Smart Home (255), Networking (255), Audio (340), and 10 more. Category-level aggregation is the primary analysis unit.

The Approach

VADER: rule-based sentiment, no training required

VADER ships with a pre-built lexicon of ~7,500 words annotated with valence scores by human raters. It handles capitalisation, punctuation, intensifiers, and negation. Scoring a review is a dictionary lookup and arithmetic - it takes microseconds per review and requires no GPU, no model weights, and no API call.

VADER output fields

compound

-1 to +1

Normalised aggregate score. The primary metric. >= 0.05 = positive, <= -0.05 = negative.

pos

0 to 1

Proportion of review text with positive valence.

neg

0 to 1

Proportion of review text with negative valence.

neu

0 to 1

Proportion of review text that is neutral.

Validation: Pearson r = 0.733

On TechHeaven reviews, VADER compound scores correlate with star ratings at r = 0.733 (p < 0.001). One-star reviews average -0.635 compound; five-star reviews average +0.772. This validates VADER as a reliable proxy for customer satisfaction on this dataset without any fine-tuning.

Results

Four outputs from one scoring pass

VADER runs once on all 5,000 reviews. Every analysis below is a different aggregation of the same compound scores.

1. Sentiment distribution85.8% positive / 11.7% negative / 2.4% neutral

Matches the star rating skew. 587 negative reviews contain the most actionable signal - these are the reviews the product team needs to read. VADER surfaces them without manual triage.

2. Category sentiment rankingMemory: lowest (0.514), Accessories: highest (0.594)

All 15 categories are above the neutral threshold. Memory and Mobile Accessories show the highest percentage of negative reviews (14-15%). The category ranking guides sourcing and quality control priorities.

3. Top complaint aspectsValue (370), Performance (309), App/Software (284), Build Quality (276)

Keyword extraction on the 587 negative reviews. Value is the dominant complaint - more than performance or app quality. This signals a pricing perception issue rather than a product defect issue.

4. Misalignment: 16.4% of reviews disagree with their star822 reviews where VADER and star rating diverge

150 low-rating reviews (1-2 stars) score positive on VADER - customers who used mixed or understated complaint language. 225 three-star reviews score negative - customers who were lenient with the rating but expressed clear dissatisfaction in text.

Misalignment examples: where text and rating diverge

2-star review - VADER reads positive

2 starsVADER 0.34

“The product has some strong points and the build feels solid. My issue is the companion app which makes the whole experience frustrating.”

VADER scores the positive framing ("strong points", "solid") and misses the net negative intent. The customer is complaining, but the language is mixed.

5-star review with complaint language

5 starsVADER 0.43

“I was skeptical at first but it has proven itself. Fantastic product - the battery drains faster than expected on standby, but performance is rock solid.”

VADER compound drops below 5-star average (0.77) because "skeptical", "drains" and comparison language lower the score. The review contains a real complaint masked by overall satisfaction.

Core Concepts

How each piece works

VADER Lexicon and Valence Scoring

Each word in the VADER lexicon has a human-rated valence score from -4 (most negative) to +4 (most positive). Scoring a sentence: for each word, look up its valence score, apply context modifiers (capitalisation adds intensity, "very" amplifies, "not" negates), sum the scores, and normalise to the -1 to +1 compound range. No machine learning is involved. The lexicon was validated by crowd-sourced human raters specifically on social media and review text.

TechHeaven"Fantastic product" scores +0.6. "Not bad" scores +0.3 (negation reduces positive valence). "Absolutely terrible" scores -0.8 (intensifier amplifies negative valence). "Decent but overpriced" scores around +0.1 - VADER sees "decent" as slightly positive and "overpriced" as moderately negative, producing a near-neutral overall score.

Aspect-Based Sentiment Extraction

Aspect extraction finds which product attributes are mentioned in reviews and whether those mentions are positive or negative. The simplest implementation: define keyword lists per aspect, match keywords in negative review text, count matches per category. The output is a ranked list of complaint aspects. The count of negative mentions per aspect is more actionable than overall sentiment - it points to specific features the product team can address.

TechHeaven"Value" appears in 370 of 587 negative TechHeaven reviews. The keyword list includes "price", "overpriced", "expensive", "worth", "value", "cost". This concentration suggests pricing perception is the primary driver of negative reviews, which points to a pricing or positioning decision rather than a product engineering issue.

Sentiment Trend Monitoring

Monthly average VADER compound per category shows whether customer satisfaction is improving or declining over time. The sentiment trend is a leading indicator: negative text is written immediately after a bad experience, while star ratings accumulate slowly. A drop in average compound score for a category in a specific month - before the overall star rating drops - gives 2-4 weeks of lead time to investigate the cause.

TechHeavenIf Smart Home category compound drops from 0.55 in February to 0.35 in March, that is a -0.2 point decline. Before checking the overall star rating (which will lag by 4-6 weeks as low ratings accumulate), the operations team can look at the March reviews: are they all mentioning the same firmware version? A specific product batch? The answer determines whether it is a product issue or a supplier issue.

When to use VADER vs a Transformer Model

VADER is appropriate when: you need results in seconds with no setup, your text is short consumer reviews or social media, and you want full interpretability (you can see exactly why a review was scored positive or negative by looking at the lexicon). Use a fine-tuned transformer (DistilBERT, RoBERTa, or a sentiment-specific model from Hugging Face) when: your domain has specialist vocabulary that VADER does not handle well, you need aspect-opinion pair extraction (not just keyword counting), or you need to handle sarcasm and nuanced negative sentiment more accurately.

TechHeavenVADER correctly scores "This product is terrible" as negative (-0.76) and "This is genuinely excellent" as positive (+0.73). It struggles with "I wanted to like this product" (reads as positive because of "like") and domain-specific terms ("the latency spikes" - VADER sees "spikes" as neutral). A transformer fine-tuned on electronics reviews would handle both cases.

Architecture

Production sentiment monitoring pipeline

The notebook runs sentiment analysis in batch. In production, the pipeline runs weekly on new reviews and pushes to a dashboard or alert system.

Weekly batch job

Runs Monday morning, covers reviews from the prior week

1Fetch new reviews since last run from the ecommerce database
2Join to product catalog to get category per review
3Score each review with VADER (< 1 second for 1,000 reviews)
4Aggregate compound scores by product and category
5Compare to prior week averages - flag significant drops
6Push updated scores to dashboard, write weekly aspect report

Alert routing

Routes complaints to the right team automatically

1"Connectivity" aspect spike → engineering team
2"App / software" aspect spike → mobile team
3"Value" aspect spike → pricing and marketing team
4"Build quality" aspect spike → sourcing and supplier team
5Product compound drops >0.2 week-over-week → product manager
6Worst 10 reviews by compound → customer success for manual follow-up

Upgrade path

Start with VADER

Rule-based, no training, zero cost, full interpretability. Production-ready for most ecommerce review volumes.

Add transformer model

Fine-tune DistilBERT or use a Hugging Face sentiment model for better accuracy on nuanced or domain-specific language.

Add LLM summarisation

Run GPT-4o or Claude on the worst 1% of reviews to generate structured complaint summaries and suggested resolutions for the support team.

Business Applications

Ecommerce / Retail

Product quality monitoring at scale. Weekly VADER scoring on new reviews surfaces the products and categories with declining sentiment before the average star rating reflects it. Route connectivity complaints to engineering, value complaints to pricing, app complaints to the mobile team - automatically.

Manufacturing / D2C Brands

Warranty and defect signal detection. A spike in negative sentiment mentioning "battery" or "build quality" in a specific month often predicts an incoming returns wave. Catching it 2-4 weeks early via sentiment trend monitoring gives operations time to prepare the returns process and contact the supplier.

SaaS / B2B

Support ticket and NPS comment analysis. VADER runs on any short text: Zendesk ticket summaries, G2 or Capterra review comments, NPS verbatims. A weekly sentiment dashboard across your review sources shows product sentiment trends faster than waiting for quarterly NPS surveys.

Insurance / Financial Services

Claims and complaint analysis. Regulatory bodies require monitoring of customer complaints. VADER provides a first-pass severity triage: flagging highly negative complaint text for priority human review, and surfacing the most common complaint aspects for root cause analysis without reading every document.

Run the notebook

VADER scoring on 5,000 TechHeaven reviews. Correlation validation, category sentiment dashboard, aspect extraction, sentiment trend chart, misalignment analysis, and the 5 most negative reviews by compound score. No API key required - runs in under 30 seconds on Colab.

Open in Colab

Ecommerce AI/ML Series

Inventory Forecasting

View full series

Ecommerce AI/ML Series

AI Customer Support with RAG

Product Recommendation Engine

Semantic Product Search

Inventory Forecasting

Review Sentiment AnalysisCurrent

Resources

review_sentiment.ipynb