Lesson 7: Sentiment Analysis Trading - Quantitative Trading Course

The Psychology of Markets

Market sentiment drives price movements beyond what fundamental and technical analysis can explain. Fear, greed, optimism, and pessimism create trading opportunities for those who can quantify and exploit these emotions. This lesson teaches you to build sentiment-driven trading strategies using natural language processing and machine learning.

Why Sentiment Analysis Is Wall Street's Secret Weapon

Behavioral Finance Reality: Traditional finance assumes rational investors, but behavioral finance proves otherwise. Investors systematically overreact to bad news, underreact to complex information, and follow herds during bubbles and crashes. Sentiment analysis quantifies these irrational behaviors, turning human psychology into profitable trading signals.

Information Edge: News moves markets, but not all news is created equal. A negative earnings report might cause a 5% drop, but the language used by management during the earnings call can predict whether the stock continues falling or recovers. Sentiment analysis extracts signal from the noise of financial communication.

Speed Advantage: Professional trading firms use sentiment analysis to react to news within milliseconds. By the time human analysts finish reading a press release, algorithms have already processed the sentiment and placed trades. This speed advantage can mean the difference between profit and loss in fast-moving markets.

Alternative Data Gold Rush: Hedge funds spend millions on alternative data sources - satellite imagery, credit card transactions, social media feeds - because traditional data is commoditized. Sentiment analysis of non-traditional sources provides unique insights that aren't reflected in conventional financial metrics.

Core Sentiment Trading Concepts

Market Psychology: Emotions drive price movements beyond rational analysis
News Impact: Breaking news creates immediate and lasting price effects
Social Media Sentiment: Retail investor sentiment affects stock prices
Contrarian Signals: Extreme sentiment often marks turning points
Event-Driven Trading: Earnings, announcements, and news create opportunities
Sentiment Momentum: Positive sentiment can build on itself

Sentiment Data Sources

Modern sentiment analysis draws from multiple data sources to capture market psychology.

The Strategic Value of Each Data Source

News Articles: Professional journalism provides high-quality, fact-checked information that institutional investors rely on. Breaking news from Reuters or Bloomberg can move markets within seconds because every major trading desk monitors these feeds. The key is speed and credibility.

Social Media: Retail investor sentiment on Twitter and Reddit can drive massive price movements, especially in meme stocks and cryptocurrencies. The challenge is filtering noise from signal - viral misinformation can cause temporary spikes, while sustained positive sentiment can create lasting trends.

Earnings Calls: Management tone and word choice often reveal more than the numbers themselves. Confident CEOs use different language than worried ones. Analyzing transcripts can predict post-earnings stock performance better than the actual earnings figures.

Professional vs. Retail Sentiment: Analyst reports represent institutional sentiment, while social media captures retail sentiment. These often diverge, creating arbitrage opportunities. When professionals are bearish but retail is bullish, or vice versa, significant price movements often follow.

📰 News Articles

Sources: Reuters, Bloomberg, Financial Times, WSJ

Signal: Breaking news sentiment and event analysis

🐦 Social Media

Sources: Twitter, Reddit, StockTwits, Discord

Signal: Retail investor sentiment and viral trends

💼 Earnings Calls

Sources: Transcripts, management tone, Q&A sessions

Signal: Management confidence and future outlook

📊 Analyst Reports

Sources: Research reports, upgrades, downgrades

Signal: Professional sentiment and price targets

📈 Market Data

Sources: VIX, Put/Call ratios, Insider trading

Signal: Fear/greed indicators and positioning

🔍 Search Trends

Sources: Google Trends, Wikipedia views

Signal: Public interest and attention levels

Natural Language Processing Framework

Let's build a comprehensive NLP system for processing and analyzing financial text data.

NLP Techniques for Financial Text

VADER vs. TextBlob: VADER (Valence Aware Dictionary and sEntiment Reasoner) is specifically tuned for social media text and handles negations, intensifiers, and punctuation better than general-purpose tools. TextBlob provides polarity and subjectivity scores, useful for distinguishing factual reporting from opinion pieces.

Financial Language Challenges: Financial text has unique characteristics - "beat earnings by 2 cents" is positive, but "missed by 2 cents" is negative. Standard NLP models struggle with financial jargon, numbers, and context. We need domain-specific preprocessing and potentially fine-tuned models.

Preprocessing Importance: Removing financial stop words ("stock", "price", "market") helps focus on sentiment-bearing words. However, some financial terms carry sentiment - "bankruptcy" is clearly negative, "acquisition" might be positive or negative depending on context.

Real-Time Processing: Professional systems process thousands of documents per second. Efficiency matters as much as accuracy - a slightly less accurate model that processes news 10x faster can be more profitable by capturing time-sensitive opportunities.

🧠 NLP & Sentiment Analysis Framework

# Comprehensive sentiment analysis framework
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# NLP Libraries
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import re
from textblob import TextBlob

# Machine Learning
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler

# Deep Learning (optional)
# import tensorflow as tf
# from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification

# Web scraping (for demo purposes)
import requests
from bs4 import BeautifulSoup
import time
import warnings
warnings.filterwarnings('ignore')

# Download required NLTK data
try:
    nltk.data.find('tokenizers/punkt')
    nltk.data.find('corpora/stopwords')
    nltk.data.find('corpora/wordnet')
    nltk.data.find('vader_lexicon')
except LookupError:
    print("Downloading required NLTK data...")
    nltk.download('punkt')
    nltk.download('stopwords')
    nltk.download('wordnet')
    nltk.download('vader_lexicon')

print("Sentiment Analysis Framework Initialized!")
print("Ready to analyze market psychology through text data")

class SentimentAnalyzer:
    """
    Comprehensive sentiment analysis system for financial text
    """
    
    def __init__(self):
        self.vader_analyzer = SentimentIntensityAnalyzer()
        self.lemmatizer = WordNetLemmatizer()
        self.stop_words = set(stopwords.words('english'))
        
        # Financial-specific stop words
        self.financial_stop_words = {
            'stock', 'price', 'market', 'trading', 'shares', 'company', 
            'financial', 'investment', 'investors', 'analyst', 'analysts'
        }
        
        # Sentiment models
        self.tfidf_vectorizer = None
        self.ml_model = None
        self.scaler = StandardScaler()
    
    def preprocess_text(self, text):
        """Clean and preprocess text for analysis"""
        
        if not isinstance(text, str):
            return ""
        
        # Convert to lowercase
        text = text.lower()
        
        # Remove special characters and digits
        text = re.sub(r'[^a-zA-Z\s]', '', text)
        
        # Tokenize
        tokens = word_tokenize(text)
        
        # Remove stop words and lemmatize
        tokens = [
            self.lemmatizer.lemmatize(token) 
            for token in tokens 
            if token not in self.stop_words and token not in self.financial_stop_words
            and len(token) > 2
        ]
        
        return ' '.join(tokens)
    
    def analyze_sentiment_vader(self, text):
        """Analyze sentiment using VADER (Valence Aware Dictionary and sEntiment Reasoner)"""
        
        scores = self.vader_analyzer.polarity_scores(text)
        
        return {
            'compound': scores['compound'],  # Overall sentiment (-1 to 1)
            'positive': scores['pos'],
            'neutral': scores['neu'], 
            'negative': scores['neg'],
            'sentiment_label': self.classify_sentiment(scores['compound'])
        }
    
    def analyze_sentiment_textblob(self, text):
        """Analyze sentiment using TextBlob"""
        
        blob = TextBlob(text)
        
        return {
            'polarity': blob.sentiment.polarity,  # -1 to 1
            'subjectivity': blob.sentiment.subjectivity,  # 0 to 1
            'sentiment_label': self.classify_sentiment(blob.sentiment.polarity)
        }
    
    def classify_sentiment(self, score, threshold=0.1):
        """Classify sentiment score into labels"""
        
        if score > threshold:
            return 'positive'
        elif score < -threshold:
            return 'negative'
        else:
            return 'neutral'
    
    def extract_financial_entities(self, text):
        """Extract financial entities and keywords"""
        
        # Simple keyword extraction (in practice, use NER models)
        financial_keywords = [
            'earnings', 'revenue', 'profit', 'loss', 'growth', 'decline',
            'bullish', 'bearish', 'buy', 'sell', 'hold', 'upgrade', 'downgrade',
            'merger', 'acquisition', 'partnership', 'launch', 'announce',
            'beat', 'miss', 'exceed', 'disappoint', 'strong', 'weak'
        ]
        
        found_keywords = []
        text_lower = text.lower()
        
        for keyword in financial_keywords:
            if keyword in text_lower:
                found_keywords.append(keyword)
        
        return found_keywords
    
    def calculate_sentiment_score(self, text):
        """Calculate comprehensive sentiment score"""
        
        # Get scores from different methods
        vader_scores = self.analyze_sentiment_vader(text)
        textblob_scores = self.analyze_sentiment_textblob(text)
        
        # Extract keywords
        keywords = self.extract_financial_entities(text)
        
        # Combine scores (weighted average)
        combined_score = (
            vader_scores['compound'] * 0.6 +
            textblob_scores['polarity'] * 0.4
        )
        
        return {
            'combined_score': combined_score,
            'vader_score': vader_scores['compound'],
            'textblob_score': textblob_scores['polarity'],
            'sentiment_label': self.classify_sentiment(combined_score),
            'keywords': keywords,
            'keyword_count': len(keywords)
        }

# Initialize sentiment analyzer
sentiment_analyzer = SentimentAnalyzer()

# Sample financial news headlines for demonstration
sample_news = [
    "Apple reports record quarterly earnings, beats analyst expectations",
    "Tesla stock plunges after disappointing delivery numbers",
    "Amazon announces major expansion into healthcare sector",
    "Google faces regulatory scrutiny over antitrust concerns",
    "Microsoft shows strong growth in cloud computing division",
    "Netflix loses subscribers for first time, shares tumble",
    "Meta announces layoffs amid declining revenue",
    "Nvidia benefits from AI boom, stock reaches new highs",
    "JPMorgan warns of potential recession risks ahead",
    "Goldman Sachs upgrades tech sector outlook"
]

print(f"\n=== Sample Sentiment Analysis ===")
print(f"Analyzing {len(sample_news)} financial headlines...")

# Analyze sample news
news_analysis = []
for i, headline in enumerate(sample_news):
    analysis = sentiment_analyzer.calculate_sentiment_score(headline)
    news_analysis.append({
        'headline': headline,
        'sentiment_score': analysis['combined_score'],
        'sentiment_label': analysis['sentiment_label'],
        'keywords': analysis['keywords']
    })
    
    print(f"\n{i+1}. {headline}")
    print(f"   Sentiment: {analysis['sentiment_label'].title()} ({analysis['combined_score']:.3f})")
    print(f"   Keywords: {', '.join(analysis['keywords']) if analysis['keywords'] else 'None'}")

# Convert to DataFrame for analysis
news_df = pd.DataFrame(news_analysis)
print(f"\nSentiment Distribution:")
print(news_df['sentiment_label'].value_counts())

Expected Output:
Sentiment Analysis Framework Initialized!
Analyzing 10 financial headlines...
Sentiment Distribution: 4 positive, 4 negative, 2 neutral

Decoding the Sentiment Analysis Results

Understanding Combined Scores: We weight VADER at 60% and TextBlob at 40% because VADER handles financial text better - it understands intensifiers like "absolutely crushing earnings" and negations like "not disappointing." TextBlob provides a useful sanity check and catches some nuances VADER misses.

Keyword Extraction Significance: Our keyword detection isn't just pattern matching - it's signal extraction. When we see "beat," "exceed," and "strong" together, it's a much stronger positive signal than any single word. Professional systems use Named Entity Recognition to identify companies, people, and financial metrics automatically.

Threshold Selection: Our 0.1 threshold for positive/negative classification seems small, but it's intentional. Financial markets are sensitive - even mildly positive news can move stocks. Professional traders often use multiple thresholds: 0.1 for weak signals, 0.3 for moderate, 0.5+ for strong signals.

Preprocessing Decisions: We remove financial stop words like "stock" and "price" because they appear in every financial article but carry no sentiment. However, we keep domain-specific terms like "earnings" and "revenue" because their context (beat earnings vs missed earnings) determines sentiment.

Real-Time Applications: In live trading, this analysis happens in milliseconds. News feeds are processed instantly, sentiment scores are calculated, and trading signals are generated before human traders even see the headlines. Speed is everything in sentiment-driven trading.

Building a Financial News Sentiment Model

Let's create a specialized sentiment model trained on financial text data.

Financial Sentiment Model Architecture

Financial text requires specialized models because:

Domain-specific language: Financial jargon and terminology
Context sensitivity: "Beat expectations" vs "Beat down"
Numerical context: Percentages and financial metrics matter
Temporal sensitivity: News impact varies over time

# Financial sentiment modeling
class FinancialSentimentModel:
    """
    Specialized sentiment model for financial text
    """
    
    def __init__(self):
        self.vectorizer = TfidfVectorizer(
            max_features=5000,
            ngram_range=(1, 2),  # Include bigrams
            min_df=2,
            max_df=0.8
        )
        self.model = None
        self.is_trained = False
        
        # Financial sentiment lexicon
        self.positive_words = {
            'beat', 'exceed', 'strong', 'growth', 'profit', 'gain', 'up', 'rise',
            'bullish', 'buy', 'upgrade', 'outperform', 'positive', 'excellent',
            'robust', 'solid', 'impressive', 'stellar', 'soar', 'surge'
        }
        
        self.negative_words = {
            'miss', 'disappoint', 'weak', 'decline', 'loss', 'drop', 'down', 'fall',
            'bearish', 'sell', 'downgrade', 'underperform', 'negative', 'poor',
            'struggle', 'plunge', 'tumble', 'crash', 'collapse', 'dive'
        }
    
    def create_training_data(self, n_samples=1000):
        """
        Create synthetic training data for demonstration
        In practice, you'd use labeled financial news datasets
        """
        
        print(f"Creating synthetic training dataset with {n_samples} samples...")
        
        # Positive news templates
        positive_templates = [
            "{company} reports strong quarterly earnings",
            "{company} beats analyst expectations",
            "{company} announces robust growth in {sector}",
            "{company} stock surges after positive news",
            "{company} shows excellent performance",
            "Analysts upgrade {company} rating",
            "{company} launches innovative new product",
            "{company} exceeds revenue forecasts"
        ]
        
        # Negative news templates
        negative_templates = [
            "{company} disappoints with weak earnings",
            "{company} misses analyst expectations",
            "{company} announces declining revenue",
            "{company} stock plunges on bad news",
            "{company} shows poor performance",
            "Analysts downgrade {company} rating",
            "{company} faces regulatory challenges",
            "{company} falls short of forecasts"
        ]
        
        # Neutral news templates
        neutral_templates = [
            "{company} releases quarterly report",
            "{company} announces management changes",
            "{company} schedules earnings call",
            "{company} stock remains stable",
            "{company} provides business update",
            "{company} holds investor meeting",
            "{company} publishes annual report",
            "{company} maintains current guidance"
        ]
        
        companies = ['Apple', 'Google', 'Microsoft', 'Amazon', 'Tesla', 'Netflix', 'Meta']
        sectors = ['cloud services', 'artificial intelligence', 'mobile devices', 'streaming']
        
        training_data = []
        labels = []
        
        # Generate positive samples
        for _ in range(n_samples // 3):
            template = np.random.choice(positive_templates)
            company = np.random.choice(companies)
            sector = np.random.choice(sectors)
            text = template.format(company=company, sector=sector)
            training_data.append(text)
            labels.append(1)  # Positive
        
        # Generate negative samples
        for _ in range(n_samples // 3):
            template = np.random.choice(negative_templates)
            company = np.random.choice(companies)
            sector = np.random.choice(sectors)
            text = template.format(company=company, sector=sector)
            training_data.append(text)
            labels.append(-1)  # Negative
        
        # Generate neutral samples
        for _ in range(n_samples - 2 * (n_samples // 3)):
            template = np.random.choice(neutral_templates)
            company = np.random.choice(companies)
            sector = np.random.choice(sectors)
            text = template.format(company=company, sector=sector)
            training_data.append(text)
            labels.append(0)  # Neutral
        
        return training_data, labels
    
    def train_model(self, texts=None, labels=None):
        """Train the financial sentiment model"""
        
        if texts is None or labels is None:
            # Use synthetic data
            texts, labels = self.create_training_data(1000)
        
        print("Training financial sentiment model...")
        
        # Vectorize texts
        X = self.vectorizer.fit_transform(texts)
        y = np.array(labels)
        
        # Split data
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.2, random_state=42, stratify=y
        )
        
        # Train model
        self.model = RandomForestClassifier(n_estimators=100, random_state=42)
        self.model.fit(X_train, y_train)
        
        # Evaluate
        train_score = self.model.score(X_train, y_train)
        test_score = self.model.score(X_test, y_test)
        
        print(f"✅ Model trained successfully!")
        print(f"   Training accuracy: {train_score:.3f}")
        print(f"   Test accuracy: {test_score:.3f}")
        
        # Make predictions on test set for detailed analysis
        y_pred = self.model.predict(X_test)
        
        print(f"\nClassification Report:")
        print(classification_report(y_test, y_pred, target_names=['Negative', 'Neutral', 'Positive']))
        
        self.is_trained = True
        return self.model
    
    def predict_sentiment(self, text):
        """Predict sentiment for new text"""
        
        if not self.is_trained:
            print("Model not trained yet. Training with default data...")
            self.train_model()
        
        # Vectorize text
        X = self.vectorizer.transform([text])
        
        # Get prediction and probability
        prediction = self.model.predict(X)[0]
        probabilities = self.model.predict_proba(X)[0]
        
        # Map prediction to label
        label_map = {-1: 'negative', 0: 'neutral', 1: 'positive'}
        sentiment_label = label_map[prediction]
        
        # Get confidence (probability of predicted class)
        confidence = np.max(probabilities)
        
        return {
            'sentiment': sentiment_label,
            'confidence': confidence,
            'score': prediction,
            'probabilities': {
                'negative': probabilities[0],
                'neutral': probabilities[1], 
                'positive': probabilities[2]
            }
        }
    
    def get_feature_importance(self, top_n=20):
        """Get most important features for sentiment classification"""
        
        if not self.is_trained:
            return None
        
        # Get feature names and importance
        feature_names = self.vectorizer.get_feature_names_out()
        importances = self.model.feature_importances_
        
        # Create feature importance dataframe
        feature_importance = pd.DataFrame({
            'feature': feature_names,
            'importance': importances
        }).sort_values('importance', ascending=False)
        
        return feature_importance.head(top_n)

# Initialize and train financial sentiment model
financial_model = FinancialSentimentModel()
trained_model = financial_model.train_model()

# Test the model on sample headlines
print(f"\n=== Testing Financial Sentiment Model ===")

test_headlines = [
    "Apple stock soars after beating earnings expectations",
    "Tesla disappoints investors with weak delivery numbers", 
    "Microsoft announces quarterly dividend",
    "Amazon faces antitrust investigation",
    "Google shows strong growth in cloud business"
]

for headline in test_headlines:
    result = financial_model.predict_sentiment(headline)
    print(f"\nHeadline: {headline}")
    print(f"Sentiment: {result['sentiment'].title()} (confidence: {result['confidence']:.3f})")
    print(f"Probabilities: Pos:{result['probabilities']['positive']:.2f} "
          f"Neu:{result['probabilities']['neutral']:.2f} "
          f"Neg:{result['probabilities']['negative']:.2f}")

# Show most important features
print(f"\n=== Most Important Features for Sentiment Classification ===")
feature_importance = financial_model.get_feature_importance(15)
print(feature_importance)

Understanding Machine Learning for Financial Sentiment

Why Synthetic Training Data Works: Our template-based approach creates thousands of realistic financial headlines quickly. While real labeled data is better, synthetic data helps us understand model behavior and provides a baseline. Professional firms often start with synthetic data before investing in expensive human labeling.

TF-IDF Feature Engineering: We use TF-IDF (Term Frequency-Inverse Document Frequency) because it captures word importance better than simple word counts. Words like "beats" and "disappoints" get high TF-IDF scores because they're frequent in financial news but rare in general text. The bigram feature (ngram_range=(1,2)) captures phrases like "beats expectations."

Random Forest Choice: Random Forest works well for text classification because it handles sparse features (most words don't appear in most documents) and provides feature importance rankings. It's also interpretable - we can see which words/phrases drive sentiment predictions, crucial for regulatory compliance in finance.

Model Performance Interpretation: An 85%+ test accuracy on financial sentiment is excellent because financial text can be ambiguous. "Apple announces product recall" could be negative (safety issues) or neutral (routine maintenance). Professional models often achieve 75-85% accuracy on real financial news.

Feature Importance Insights: The most important features will likely be words like "beats," "exceeds," "disappoints," and "plunges." These are pure sentiment words in financial contexts. Bigrams like "analyst expectations" and "quarterly earnings" provide context that single words miss.

Confidence Scores Matter: Our confidence scores help traders filter signals. High-confidence positive sentiment (>0.8) might trigger immediate buying, while low-confidence signals (0.4-0.6) might just be monitored. Professional systems often require confidence >0.7 for automated trading.

Real-Time Sentiment Data Collection

Let's build a system to collect and process real-time sentiment data from various sources.

Sentiment Data Sources

Note: This lesson demonstrates data collection concepts. In production, you'd need:

API keys for news services (Alpha Vantage, NewsAPI, Polygon)
Social media API access (Twitter API, Reddit API)
Compliance with rate limits and terms of service
Proper data storage and processing infrastructure

📡 Real-Time Sentiment Collection

# Real-time sentiment data collection system
import yfinance as yf
from datetime import datetime, timedelta
import json

class SentimentDataCollector:
    """
    Real-time sentiment data collector and processor
    """
    
    def __init__(self, sentiment_model):
        self.sentiment_model = sentiment_model
        self.data_cache = {}
        self.sentiment_history = {}
    
    def simulate_news_feed(self, symbol, n_articles=5):
        """
        Simulate news feed for a given symbol
        In production, this would connect to real news APIs
        """
        
        # Get recent stock performance for context
        stock = yf.Ticker(symbol)
        recent_data = stock.history(period="5d")
        recent_change = (recent_data['Close'][-1] / recent_data['Close'][0] - 1) * 100
        
        # Generate realistic news based on recent performance
        if recent_change > 2:
            news_type = 'positive'
        elif recent_change < -2:
            news_type = 'negative'
        else:
            news_type = 'neutral'
        
        # Simulated news templates based on performance
        news_templates = {
            'positive': [
                f"{symbol} stock rallies on strong quarterly results",
                f"Analysts upgrade {symbol} following impressive growth",
                f"{symbol} announces breakthrough in key market segment",
                f"Institutional investors increase {symbol} positions",
                f"{symbol} beats expectations, raises guidance"
            ],
            'negative': [
                f"{symbol} shares decline on disappointing earnings",
                f"Concerns grow over {symbol}'s competitive position",
                f"{symbol} faces headwinds in key market",
                f"Analysts express caution on {symbol} outlook",
                f"{symbol} warns of potential challenges ahead"
            ],
            'neutral': [
                f"{symbol} schedules quarterly earnings call",
                f"{symbol} announces management changes",
                f"{symbol} provides business update to investors",
                f"{symbol} maintains current market guidance",
                f"{symbol} releases routine regulatory filing"
            ]
        }
        
        # Select articles based on sentiment bias
        articles = []
        templates = news_templates[news_type]
        
        for i in range(n_articles):
            # Add some randomness
            if np.random.random() < 0.7:
                article_type = news_type
            else:
                article_type = np.random.choice(['positive', 'negative', 'neutral'])
            
            template = np.random.choice(news_templates[article_type])
            
            articles.append({
                'title': template,
                'timestamp': datetime.now() - timedelta(hours=np.random.randint(0, 24)),
                'source': np.random.choice(['Reuters', 'Bloomberg', 'WSJ', 'CNBC', 'MarketWatch'])
            })
        
        return articles
    
    def process_news_sentiment(self, symbol, articles):
        """Process news articles and calculate sentiment scores"""
        
        print(f"Processing {len(articles)} news articles for {symbol}...")
        
        sentiments = []
        
        for article in articles:
            # Analyze sentiment
            sentiment_result = self.sentiment_model.predict_sentiment(article['title'])
            
            sentiments.append({
                'title': article['title'],
                'timestamp': article['timestamp'],
                'source': article['source'],
                'sentiment': sentiment_result['sentiment'],
                'confidence': sentiment_result['confidence'],
                'score': sentiment_result['score']
            })
        
        # Calculate aggregate sentiment
        scores = [s['score'] for s in sentiments]
        confidences = [s['confidence'] for s in sentiments]
        
        # Weighted average by confidence
        weighted_sentiment = np.average(scores, weights=confidences) if scores else 0
        avg_confidence = np.mean(confidences) if confidences else 0
        
        return {
            'symbol': symbol,
            'timestamp': datetime.now(),
            'articles': sentiments,
            'aggregate_sentiment': weighted_sentiment,
            'average_confidence': avg_confidence,
            'article_count': len(articles),
            'positive_count': sum(1 for s in sentiments if s['sentiment'] == 'positive'),
            'negative_count': sum(1 for s in sentiments if s['sentiment'] == 'negative'),
            'neutral_count': sum(1 for s in sentiments if s['sentiment'] == 'neutral')
        }
    
    def calculate_sentiment_momentum(self, symbol, lookback_hours=24):
        """Calculate sentiment momentum over time"""
        
        if symbol not in self.sentiment_history:
            return None
        
        cutoff_time = datetime.now() - timedelta(hours=lookback_hours)
        
        recent_sentiments = [
            entry for entry in self.sentiment_history[symbol]
            if entry['timestamp'] > cutoff_time
        ]
        
        if len(recent_sentiments) < 2:
            return None
        
        # Calculate momentum (change in sentiment over time)
        sentiments = [entry['aggregate_sentiment'] for entry in recent_sentiments]
        timestamps = [entry['timestamp'] for entry in recent_sentiments]
        
        # Simple momentum: recent sentiment - older sentiment
        momentum = sentiments[-1] - sentiments[0]
        
        return {
            'momentum': momentum,
            'current_sentiment': sentiments[-1],
            'sentiment_trend': 'improving' if momentum > 0.1 else 'declining' if momentum < -0.1 else 'stable',
            'data_points': len(recent_sentiments)
        }
    
    def update_sentiment_data(self, symbol):
        """Update sentiment data for a symbol"""
        
        # Collect news
        articles = self.simulate_news_feed(symbol, n_articles=5)
        
        # Process sentiment
        sentiment_data = self.process_news_sentiment(symbol, articles)
        
        # Store in history
        if symbol not in self.sentiment_history:
            self.sentiment_history[symbol] = []
        
        self.sentiment_history[symbol].append(sentiment_data)
        
        # Keep only recent data (last 7 days)
        cutoff_time = datetime.now() - timedelta(days=7)
        self.sentiment_history[symbol] = [
            entry for entry in self.sentiment_history[symbol]
            if entry['timestamp'] > cutoff_time
        ]
        
        return sentiment_data

# Initialize sentiment data collector
collector = SentimentDataCollector(financial_model)

# Collect sentiment data for multiple symbols
symbols = ['AAPL', 'TSLA', 'GOOGL', 'MSFT', 'AMZN']

print(f"\n=== Real-Time Sentiment Analysis ===")

sentiment_results = {}
for symbol in symbols:
    sentiment_data = collector.update_sentiment_data(symbol)
    sentiment_results[symbol] = sentiment_data
    
    print(f"\n{symbol} Sentiment Analysis:")
    print(f"  Aggregate Sentiment: {sentiment_data['aggregate_sentiment']:.3f}")
    print(f"  Confidence: {sentiment_data['average_confidence']:.3f}")
    print(f"  Articles: {sentiment_data['positive_count']} pos, "
          f"{sentiment_data['negative_count']} neg, {sentiment_data['neutral_count']} neu")
    
    # Show sample articles
    print(f"  Sample headlines:")
    for article in sentiment_data['articles'][:3]:
        print(f"    • {article['title']} ({article['sentiment']})")

# Visualize sentiment comparison
def plot_sentiment_comparison(sentiment_results):
    """Plot sentiment comparison across symbols"""
    
    symbols = list(sentiment_results.keys())
    sentiments = [sentiment_results[symbol]['aggregate_sentiment'] for symbol in symbols]
    confidences = [sentiment_results[symbol]['average_confidence'] for symbol in symbols]
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
    
    # Sentiment scores
    colors = ['green' if s > 0.1 else 'red' if s < -0.1 else 'gray' for s in sentiments]
    bars1 = ax1.bar(symbols, sentiments, color=colors, alpha=0.7)
    ax1.set_title('Current Sentiment Scores by Symbol')
    ax1.set_ylabel('Sentiment Score')
    ax1.axhline(y=0, color='black', linestyle='-', alpha=0.3)
    ax1.axhline(y=0.1, color='green', linestyle='--', alpha=0.5, label='Positive threshold')
    ax1.axhline(y=-0.1, color='red', linestyle='--', alpha=0.5, label='Negative threshold')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # Add value labels on bars
    for bar, sentiment in zip(bars1, sentiments):
        height = bar.get_height()
        ax1.annotate(f'{sentiment:.3f}',
                    xy=(bar.get_x() + bar.get_width() / 2, height),
                    xytext=(0, 3 if height > 0 else -15),
                    textcoords="offset points",
                    ha='center', va='bottom' if height > 0 else 'top')
    
    # Confidence levels
    bars2 = ax2.bar(symbols, confidences, color='blue', alpha=0.7)
    ax2.set_title('Sentiment Analysis Confidence')
    ax2.set_ylabel('Average Confidence')
    ax2.set_ylim(0, 1)
    ax2.grid(True, alpha=0.3)
    
    # Add value labels
    for bar, confidence in zip(bars2, confidences):
        height = bar.get_height()
        ax2.annotate(f'{confidence:.3f}',
                    xy=(bar.get_x() + bar.get_width() / 2, height),
                    xytext=(0, 3),  
                    textcoords="offset points",
                    ha='center', va='bottom')
    
    plt.tight_layout()
    plt.show()

# Create sentiment comparison plot
print("\nCreating sentiment analysis visualization...")
plot_sentiment_comparison(sentiment_results)

Real-Time Sentiment Collection Architecture

Simulated News Feed Logic: Our news simulation is smarter than random - it uses recent stock performance to bias news sentiment. If a stock is up 5%, we're more likely to generate positive news. This mimics reality where news often follows price movements, creating feedback loops that sentiment traders exploit.

Confidence-Weighted Aggregation: Instead of simple averaging, we weight sentiment scores by confidence. A high-confidence negative article (-0.8 with 0.9 confidence) has more impact than a low-confidence positive article (0.3 with 0.5 confidence). This approach filters noise and focuses on clear signals.

Sentiment Momentum Calculation: Our momentum metric measures how sentiment is changing over time, not just current sentiment level. A stock moving from neutral to positive sentiment might be a better buy signal than one that's been positive for weeks. Professional traders watch sentiment velocity, not just sentiment level.

Data Retention Strategy: We only keep 7 days of sentiment history because older sentiment becomes less relevant for trading decisions. This also manages memory usage in production systems that process thousands of articles daily. Professional systems often use exponential decay to weight recent sentiment more heavily.

Visualization Insights: Our sentiment comparison chart helps identify relative opportunities. If most stocks show neutral sentiment but one shows strong positive sentiment with high confidence, that's a potential trade signal. Professional dashboards update these charts in real-time for hundreds of stocks simultaneously.

Production Considerations: Real systems would need error handling for API failures, rate limiting to avoid being blocked, duplicate detection to avoid processing the same article twice, and sophisticated entity recognition to link news to specific stocks (especially important for merger announcements affecting multiple companies).

Building Sentiment-Driven Trading Strategies

Now let's integrate sentiment analysis into actionable trading strategies.

📈 Sentiment Trading Strategy

# Sentiment-driven trading strategy
class SentimentTradingStrategy:
    """
    Trading strategy that incorporates sentiment analysis
    """
    
    def __init__(self, sentiment_collector, initial_capital=100000):
        self.sentiment_collector = sentiment_collector
        self.initial_capital = initial_capital
        self.current_capital = initial_capital
        self.positions = {}
        self.trading_history = []
        
        # Strategy parameters
        self.sentiment_threshold_buy = 0.3
        self.sentiment_threshold_sell = -0.3
        self.confidence_threshold = 0.6
        self.max_position_size = 0.1  # 10% max per position
        
    def generate_trading_signals(self, symbol, price_data, sentiment_data):
        """Generate trading signals based on sentiment and price"""
        
        current_price = price_data['Close'].iloc[-1]
        recent_returns = price_data['Close'].pct_change(5).iloc[-1]  # 5-day return
        
        sentiment_score = sentiment_data['aggregate_sentiment']
        sentiment_confidence = sentiment_data['average_confidence']
        
        # Technical momentum (simple)
        price_momentum = 1 if recent_returns > 0.02 else -1 if recent_returns < -0.02 else 0
        
        # Sentiment signal
        if (sentiment_score > self.sentiment_threshold_buy and 
            sentiment_confidence > self.confidence_threshold):
            sentiment_signal = 1  # Buy
        elif (sentiment_score < self.sentiment_threshold_sell and 
              sentiment_confidence > self.confidence_threshold):
            sentiment_signal = -1  # Sell
        else:
            sentiment_signal = 0  # Hold
        
        # Combined signal (sentiment + momentum confirmation)
        if sentiment_signal == 1 and price_momentum >= 0:
            final_signal = 1
            signal_strength = min(abs(sentiment_score), 1.0) * sentiment_confidence
        elif sentiment_signal == -1 and price_momentum <= 0:
            final_signal = -1
            signal_strength = min(abs(sentiment_score), 1.0) * sentiment_confidence
        else:
            final_signal = 0
            signal_strength = 0
        
        return {
            'signal': final_signal,
            'strength': signal_strength,
            'sentiment_score': sentiment_score,
            'sentiment_confidence': sentiment_confidence,
            'price_momentum': price_momentum,
            'current_price': current_price,
            'reasoning': self._explain_signal(final_signal, sentiment_score, price_momentum)
        }
    
    def _explain_signal(self, signal, sentiment, momentum):
        """Explain the reasoning behind a trading signal"""
        
        if signal == 1:
            return f"BUY: Positive sentiment ({sentiment:.3f}) with supportive price action"
        elif signal == -1:
            return f"SELL: Negative sentiment ({sentiment:.3f}) with weak price action"
        else:
            return "HOLD: Insufficient sentiment conviction or conflicting signals"
    
    def calculate_position_size(self, signal_strength, current_price, available_capital):
        """Calculate position size based on signal strength and risk management"""
        
        # Base position size as percentage of capital
        base_size = self.max_position_size * signal_strength
        
        # Convert to dollar amount
        position_value = available_capital * base_size
        
        # Convert to shares
        shares = int(position_value / current_price)
        
        return max(shares, 0)
    
    def execute_trade(self, symbol, signal_data):
        """Execute trading decision"""
        
        signal = signal_data['signal']
        signal_strength = signal_data['strength']
        current_price = signal_data['current_price']
        
        if signal == 0:
            return None  # No trade
        
        # Calculate position size
        available_capital = self.current_capital * 0.95  # Keep 5% cash buffer
        shares = self.calculate_position_size(signal_strength, current_price, available_capital)
        
        if shares == 0:
            return None
        
        # Execute trade
        trade_value = shares * current_price
        
        if signal == 1:  # Buy
            if trade_value <= available_capital:
                self.current_capital -= trade_value
                self.positions[symbol] = self.positions.get(symbol, 0) + shares
                
                trade_record = {
                    'timestamp': datetime.now(),
                    'symbol': symbol,
                    'action': 'BUY',
                    'shares': shares,
                    'price': current_price,
                    'value': trade_value,
                    'sentiment_score': signal_data['sentiment_score'],
                    'reasoning': signal_data['reasoning']
                }
                
        elif signal == -1:  # Sell
            current_position = self.positions.get(symbol, 0)
            shares_to_sell = min(shares, current_position)
            
            if shares_to_sell > 0:
                self.current_capital += shares_to_sell * current_price
                self.positions[symbol] = current_position - shares_to_sell
                
                trade_record = {
                    'timestamp': datetime.now(),
                    'symbol': symbol,
                    'action': 'SELL',
                    'shares': shares_to_sell,
                    'price': current_price,
                    'value': shares_to_sell * current_price,
                    'sentiment_score': signal_data['sentiment_score'],
                    'reasoning': signal_data['reasoning']
                }
            else:
                return None
        
        self.trading_history.append(trade_record)
        return trade_record
    
    def run_strategy(self, symbols, lookback_days=30):
        """Run the sentiment trading strategy"""
        
        print(f"Running sentiment trading strategy on {len(symbols)} symbols...")
        
        strategy_results = {}
        
        for symbol in symbols:
            print(f"\nAnalyzing {symbol}...")
            
            # Get price data
            stock = yf.Ticker(symbol)
            price_data = stock.history(period=f"{lookback_days}d")
            
            # Update sentiment data
            sentiment_data = self.sentiment_collector.update_sentiment_data(symbol)
            
            # Generate signals
            signal_data = self.generate_trading_signals(symbol, price_data, sentiment_data)
            
            # Execute trade if signal is strong enough
            trade_result = self.execute_trade(symbol, signal_data)
            
            strategy_results[symbol] = {
                'signal_data': signal_data,
                'trade_result': trade_result,
                'current_price': signal_data['current_price']
            }
            
            print(f"  Signal: {signal_data['reasoning']}")
            if trade_result:
                print(f"  Trade: {trade_result['action']} {trade_result['shares']} shares at ${trade_result['price']:.2f}")
            else:
                print(f"  Trade: No action taken")
        
        return strategy_results
    
    def get_portfolio_summary(self):
        """Get current portfolio summary"""
        
        total_value = self.current_capital
        position_values = {}
        
        for symbol, shares in self.positions.items():
            if shares > 0:
                stock = yf.Ticker(symbol)
                current_price = stock.history(period="1d")['Close'].iloc[-1]
                position_value = shares * current_price
                total_value += position_value
                position_values[symbol] = {
                    'shares': shares,
                    'price': current_price,
                    'value': position_value,
                    'weight': position_value / total_value
                }
        
        return {
            'total_value': total_value,
            'cash': self.current_capital,
            'positions': position_values,
            'total_return': (total_value / self.initial_capital) - 1,
            'number_of_trades': len(self.trading_history)
        }

# Initialize and run sentiment trading strategy
strategy = SentimentTradingStrategy(collector, initial_capital=100000)

print(f"\n" + "="*60)
print("SENTIMENT TRADING STRATEGY EXECUTION")
print("="*60)

# Run strategy on selected symbols
strategy_symbols = ['AAPL', 'TSLA', 'GOOGL', 'MSFT']
strategy_results = strategy.run_strategy(strategy_symbols)

# Get portfolio summary
portfolio_summary = strategy.get_portfolio_summary()

print(f"\n=== Portfolio Summary ===")
print(f"Total Portfolio Value: ${portfolio_summary['total_value']:,.2f}")
print(f"Cash: ${portfolio_summary['cash']:,.2f}")
print(f"Total Return: {portfolio_summary['total_return']:.2%}")
print(f"Number of Trades: {portfolio_summary['number_of_trades']}")

print(f"\nCurrent Positions:")
for symbol, position in portfolio_summary['positions'].items():
    print(f"  {symbol}: {position['shares']} shares @ ${position['price']:.2f} "
          f"(${position['value']:,.2f}, {position['weight']:.1%})")

# Show recent trades
if strategy.trading_history:
    print(f"\nRecent Trades:")
    for trade in strategy.trading_history[-5:]:  # Last 5 trades
        print(f"  {trade['timestamp'].strftime('%Y-%m-%d %H:%M')} - "
              f"{trade['action']} {trade['shares']} {trade['symbol']} @ ${trade['price']:.2f} "
              f"(Sentiment: {trade['sentiment_score']:.3f})")

# Sentiment vs Price Movement Analysis
def analyze_sentiment_price_relationship(strategy_results):
    """Analyze relationship between sentiment and price movements"""
    
    print(f"\n=== Sentiment vs Price Analysis ===")
    
    for symbol, results in strategy_results.items():
        signal_data = results['signal_data']
        
        # Get recent price movement
        stock = yf.Ticker(symbol)
        recent_data = stock.history(period="5d")
        price_change = (recent_data['Close'].iloc[-1] / recent_data['Close'].iloc[0] - 1) * 100
        
        sentiment_score = signal_data['sentiment_score']
        
        print(f"\n{symbol}:")
        print(f"  Sentiment Score: {sentiment_score:.3f}")
        print(f"  5-Day Price Change: {price_change:+.2f}%")
        
        # Simple correlation analysis
        if sentiment_score > 0.1 and price_change > 0:
            relationship = "Positive sentiment, positive price movement ✅"
        elif sentiment_score < -0.1 and price_change < 0:
            relationship = "Negative sentiment, negative price movement ✅"
        elif abs(sentiment_score) < 0.1:
            relationship = "Neutral sentiment, mixed signals"
        else:
            relationship = "Sentiment-price divergence ⚠️"
        
        print(f"  Relationship: {relationship}")

# Analyze sentiment-price relationships
analyze_sentiment_price_relationship(strategy_results)

Sentiment Trading Strategy Deep Dive

Multi-Factor Signal Generation: Our strategy doesn't rely on sentiment alone - it requires both positive sentiment AND positive price momentum for a buy signal. This reduces false positives because sentiment can be wrong or priced in. Professional sentiment strategies always combine multiple factors to improve signal quality.

Confidence Thresholds: We require 60% confidence before acting on sentiment signals. This filters out ambiguous news that could be interpreted either way. In volatile markets, firms often raise confidence thresholds to 70-80% to reduce noise trading, while in trending markets they might lower thresholds to capture more opportunities.

Position Sizing Logic: Our position size combines sentiment strength and confidence - stronger, more confident signals get larger positions. A weakly positive sentiment (0.2) with high confidence (0.9) gets a smaller position than strongly positive sentiment (0.8) with high confidence. This dynamic sizing optimizes risk-adjusted returns.

Signal Explanation System: Every trade includes a text explanation of why the signal was generated. This is crucial for regulatory compliance, strategy debugging, and investor communication. Professional systems log not just what trades were made, but why the algorithm made those decisions.

Real-Time Execution: Our strategy runs in real-time, processing news as it arrives and immediately generating trading signals. In practice, milliseconds matter - the first trader to act on breaking news gets the best prices. Professional systems often co-locate servers near exchanges to minimize latency.

Portfolio Risk Management: The 10% maximum position size prevents over-concentration, even if sentiment signals are very strong. Professional firms often use even lower limits (2-5%) and monitor portfolio-level exposure to sentiment factors, ensuring no single news event can cause catastrophic losses.

Performance Analysis: Our sentiment-price relationship analysis helps validate strategy effectiveness. If sentiment consistently disagrees with price movements, the sentiment model needs recalibration. Professional teams continuously monitor signal quality and adjust parameters based on live performance data.

Hands-On Exercise

Build advanced sentiment analysis trading systems!

Exercise 1: Multi-Source Sentiment Aggregation

Create a system that combines sentiment from multiple sources:

Weight different sources by reliability and accuracy
Handle conflicting sentiment signals
Implement sentiment momentum and trend analysis
Add real-time sentiment alerts and notifications

# Multi-source sentiment aggregator
class MultiSourceSentimentAnalyzer:
    """
    Advanced sentiment analyzer combining multiple data sources
    """
    
    def __init__(self):
        self.sources = {}
        self.source_weights = {}
        self.sentiment_history = {}
    
    def add_sentiment_source(self, source_name, weight, reliability_score):
        """Add a new sentiment data source"""
        
        # Your implementation here:
        # 1. Register new sentiment source
        # 2. Set source weight and reliability
        # 3. Initialize source-specific processing
        
        pass
    
    def aggregate_multi_source_sentiment(self, symbol, time_window='1h'):
        """Aggregate sentiment from multiple sources"""
        
        # Your aggregation logic:
        # 1. Collect sentiment from all sources
        # 2. Apply source weights and reliability scores
        # 3. Handle conflicting signals
        # 4. Calculate confidence intervals
        
        pass
    
    def detect_sentiment_anomalies(self, symbol):
        """Detect unusual sentiment patterns"""
        
        # Your anomaly detection logic
        pass

# Implement your multi-source analyzer
# multi_analyzer = MultiSourceSentimentAnalyzer()
# aggregated_sentiment = multi_analyzer.aggregate_multi_source_sentiment('AAPL')

Exercise 2: Event-Driven Sentiment Strategy

Build a strategy that reacts to specific events and news:

# Event-driven sentiment strategy
class EventDrivenSentimentStrategy:
    """
    Trading strategy focused on news events and sentiment spikes
    """
    
    def __init__(self):
        self.event_types = {}
        self.event_history = {}
        self.strategy_rules = {}
    
    def detect_market_events(self, symbol, sentiment_data, price_data):
        """Detect significant market events"""
        
        # Your event detection logic:
        # 1. Earnings announcements
        # 2. News sentiment spikes
        # 3. Unusual trading volume
        # 4. Price gap events
        
        pass
    
    def create_event_response_rules(self):
        """Define how to respond to different events"""
        
        # Your event response rules:
        # 1. Immediate reaction strategies
        # 2. Delayed reaction strategies  
        # 3. Contrarian vs momentum approaches
        # 4. Risk management for event trading
        
        pass
    
    def execute_event_strategy(self, detected_events):
        """Execute trades based on detected events"""
        
        # Your execution logic
        pass

# Implement your event-driven strategy
# event_strategy = EventDrivenSentimentStrategy()
# events = event_strategy.detect_market_events('AAPL', sentiment_data, price_data)

Sentiment Analysis Limitations

Data Quality: Sentiment analysis is only as good as the input data
Context Sensitivity: NLP models may miss sarcasm, context, or nuance
Market Efficiency: Sentiment signals may be quickly arbitraged away
Overfitting: Historical sentiment patterns may not predict future behavior
Data Latency: Real-time sentiment data can be expensive and delayed
Regulatory Risk: Using material non-public information can be illegal

Key Takeaways

You've mastered advanced sentiment analysis techniques for quantitative trading:

NLP Foundations: Text preprocessing, tokenization, and feature extraction
Sentiment Models: VADER, TextBlob, and custom financial sentiment classifiers
Data Collection: Real-time sentiment monitoring and aggregation systems
Trading Integration: Sentiment-driven signal generation and strategy execution
Multi-Source Analysis: Combining news, social media, and market sentiment
Performance Analytics: Sentiment-price relationship analysis and strategy evaluation

Finally, we'll explore comprehensive backtesting and strategy evaluation to ensure our trading systems are robust and profitable before deploying real capital!