Understanding markets, instruments, and data sources for algorithmic trading
Financial markets are complex ecosystems where various instruments are traded. As quantitative traders, we need to understand these markets, the instruments available, and how to access and analyze market data programmatically.
What: Stocks and shares of public companies
Examples: AAPL, GOOGL, TSLA
Trading Hours: 9:30 AM - 4:00 PM EST
Volatility: Medium to High
What: Currency pairs and exchange rates
Examples: EUR/USD, GBP/JPY, USD/CAD
Trading Hours: 24/5 (Sunday 5 PM - Friday 5 PM EST)
Volatility: Low to Medium
What: Raw materials and agricultural products
Examples: Gold, Oil, Wheat, Coffee
Trading Hours: Varies by commodity
Volatility: Medium to High
What: Bonds and government securities
Examples: US Treasury Bonds, Corporate Bonds
Trading Hours: 8:00 AM - 5:00 PM EST
Volatility: Low
What: Contracts based on underlying assets
Examples: Options, Futures, Swaps
Trading Hours: Varies by contract
Volatility: High
What: Digital currencies and tokens
Examples: BTC, ETH, ADA
Trading Hours: 24/7
Volatility: Very High
Understanding different types of market data is crucial for building trading algorithms.
Let's set up Python libraries and data sources for quantitative trading.
# Install required packages
# pip install yfinance pandas numpy matplotlib seaborn plotly
# pip install scipy scikit-learn ta-lib
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')
# Set up plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
print("Trading environment setup complete!")
print(f"Python libraries loaded successfully")
print(f"Ready to fetch and analyze market data")
yfinance is a popular Python library that provides easy access to Yahoo Finance data.
Time Period Selection: The period="1y"
parameter gives us enough historical data to identify trends and patterns, but not so much that we include irrelevant old market regimes. For day trading strategies, you might use shorter periods like "1mo", while long-term investors might prefer "5y" or "max".
Interval Choice: The interval="1d"
gives us daily data, which is perfect for swing trading and longer-term strategies. Intraday traders would use "1m" or "5m" for minute-level data, while position traders might use "1wk" for weekly data. Remember: higher frequency = more data points but also more noise.
Market Data Quality: We're not just grabbing data randomly - we need to understand what we're getting. Each data point represents real market activity where millions of dollars changed hands. The quality and completeness of this data directly impacts our strategy's success.
# Fetch stock data for Apple (AAPL)
def get_stock_data(symbol, period="1y", interval="1d"):
"""
Fetch stock data using yfinance
Parameters:
symbol: Stock ticker (e.g., 'AAPL', 'GOOGL')
period: Data period (1d, 5d, 1mo, 3mo, 6mo, 1y, 2y, 5y, 10y, ytd, max)
interval: Data interval (1m, 2m, 5m, 15m, 30m, 60m, 90m, 1h, 1d, 5d, 1wk, 1mo, 3mo)
"""
stock = yf.Ticker(symbol)
data = stock.history(period=period, interval=interval)
# Get additional info
info = stock.info
return data, info
# Example: Get Apple stock data
print("Fetching Apple (AAPL) stock data...")
aapl_data, aapl_info = get_stock_data("AAPL", period="1y")
print(f"Data shape: {aapl_data.shape}")
print(f"Date range: {aapl_data.index[0].date()} to {aapl_data.index[-1].date()}")
print(f"Columns: {list(aapl_data.columns)}")
# Display basic info
print(f"\nCompany: {aapl_info.get('longName', 'N/A')}")
print(f"Sector: {aapl_info.get('sector', 'N/A')}")
print(f"Market Cap: ${aapl_info.get('marketCap', 0):,}")
print(f"52 Week High: ${aapl_info.get('fiftyTwoWeekHigh', 0):.2f}")
print(f"52 Week Low: ${aapl_info.get('fiftyTwoWeekLow', 0):.2f}")
# Show first few rows
print("\nFirst 5 rows of data:")
print(aapl_data.head())
# Analyze OHLCV data structure
def analyze_ohlcv_data(data, symbol):
"""Analyze Open, High, Low, Close, Volume data"""
print(f"=== {symbol} OHLCV Analysis ===")
print(f"Total trading days: {len(data)}")
print(f"Date range: {data.index[0].date()} to {data.index[-1].date()}")
# Price statistics
print(f"\n=== Price Statistics ===")
print(f"Highest Close: ${data['Close'].max():.2f}")
print(f"Lowest Close: ${data['Close'].min():.2f}")
print(f"Average Close: ${data['Close'].mean():.2f}")
print(f"Current Price: ${data['Close'].iloc[-1]:.2f}")
# Volume statistics
print(f"\n=== Volume Statistics ===")
print(f"Average Daily Volume: {data['Volume'].mean():,.0f}")
print(f"Highest Volume Day: {data['Volume'].max():,.0f}")
print(f"Most recent volume: {data['Volume'].iloc[-1]:,.0f}")
# Calculate basic metrics
data['Daily_Return'] = data['Close'].pct_change()
data['Price_Range'] = data['High'] - data['Low']
data['Body_Size'] = abs(data['Close'] - data['Open'])
print(f"\n=== Volatility Metrics ===")
print(f"Average Daily Return: {data['Daily_Return'].mean()*100:.2f}%")
print(f"Daily Return Std Dev: {data['Daily_Return'].std()*100:.2f}%")
print(f"Average Price Range: ${data['Price_Range'].mean():.2f}")
return data
# Analyze Apple data
aapl_analyzed = analyze_ohlcv_data(aapl_data.copy(), "AAPL")
Why These Calculations Matter: We're not just crunching numbers for fun - each metric tells us something crucial about market behavior:
Trading Insight: These aren't just statistics - they're the building blocks for every trading strategy. Momentum strategies look for consistent returns, mean reversion strategies look for high volatility, and volume strategies confirm price movements with institutional activity.
Visualization is crucial for understanding price movements and patterns. Charts reveal information that raw numbers can't convey - support and resistance levels, trend channels, and breakout patterns that form the basis of technical analysis.
# Create comprehensive price charts
def plot_stock_analysis(data, symbol, days=90):
"""Create a comprehensive stock analysis chart"""
# Get recent data
recent_data = data.tail(days).copy()
# Create subplots
fig = make_subplots(
rows=3, cols=1,
subplot_titles=(f'{symbol} Price Chart', 'Volume', 'Daily Returns'),
vertical_spacing=0.08,
row_width=[0.7, 0.15, 0.15]
)
# Candlestick chart
fig.add_trace(
go.Candlestick(
x=recent_data.index,
open=recent_data['Open'],
high=recent_data['High'],
low=recent_data['Low'],
close=recent_data['Close'],
name='Price'
),
row=1, col=1
)
# Volume chart
colors = ['red' if close < open else 'green'
for close, open in zip(recent_data['Close'], recent_data['Open'])]
fig.add_trace(
go.Bar(
x=recent_data.index,
y=recent_data['Volume'],
name='Volume',
marker_color=colors
),
row=2, col=1
)
# Daily returns
daily_returns = recent_data['Close'].pct_change() * 100
colors_returns = ['red' if ret < 0 else 'green' for ret in daily_returns]
fig.add_trace(
go.Bar(
x=recent_data.index,
y=daily_returns,
name='Daily Return %',
marker_color=colors_returns
),
row=3, col=1
)
# Update layout
fig.update_layout(
title=f'{symbol} Market Analysis - Last {days} Days',
xaxis_rangeslider_visible=False,
height=800,
showlegend=False
)
fig.show()
return fig
# Plot Apple analysis
print("Creating comprehensive AAPL chart...")
aapl_chart = plot_stock_analysis(aapl_analyzed, "AAPL", days=90)
Pattern Recognition: Our brains are wired to recognize visual patterns. A chart can instantly reveal trends, support/resistance levels, and breakout patterns that would take hours to identify in raw data. Professional traders rely on charts because they compress thousands of data points into actionable insights.
Candlestick Charts: We use candlesticks (not simple line charts) because they show four critical pieces of information: opening price (market sentiment at open), closing price (final sentiment), high (maximum optimism), and low (maximum pessimism). The "body" and "wicks" tell stories about buyer vs seller control.
Volume Confirmation: The volume subplot is crucial - price movements mean more when accompanied by high volume. A breakout on low volume might be a false signal, while high-volume moves often indicate institutional involvement and are more likely to continue.
Quantitative strategies often involve multiple assets. Let's learn to collect and compare data across different instruments.
Single-stock strategies are risky and inefficient. Professional quantitative traders think in portfolios because:
# Collect data for multiple symbols
def get_multiple_stocks(symbols, period="6mo"):
"""
Fetch data for multiple stocks
Parameters:
symbols: List of stock tickers
period: Data period
"""
all_data = {}
print(f"Fetching data for {len(symbols)} symbols...")
for symbol in symbols:
try:
print(f" Downloading {symbol}...")
stock = yf.Ticker(symbol)
data = stock.history(period=period)
if not data.empty:
all_data[symbol] = data
print(f" ā
{symbol}: {len(data)} days")
else:
print(f" ā {symbol}: No data available")
except Exception as e:
print(f" ā {symbol}: Error - {str(e)}")
return all_data
# Example: Tech stocks
tech_symbols = ['AAPL', 'GOOGL', 'MSFT', 'AMZN', 'TSLA', 'META', 'NFLX', 'NVDA']
tech_data = get_multiple_stocks(tech_symbols, period="6mo")
print(f"\nSuccessfully collected data for {len(tech_data)} symbols")
# Create a price comparison
def compare_stock_performance(stock_data, normalize=True):
"""Compare performance of multiple stocks"""
# Extract closing prices
prices = pd.DataFrame()
for symbol, data in stock_data.items():
prices[symbol] = data['Close']
# Remove any NaN values
prices = prices.dropna()
if normalize:
# Normalize to starting price (percentage change)
normalized_prices = prices / prices.iloc[0] * 100
plt.figure(figsize=(12, 8))
for symbol in normalized_prices.columns:
plt.plot(normalized_prices.index, normalized_prices[symbol],
label=symbol, linewidth=2)
plt.title('Normalized Stock Performance Comparison (Starting at 100)')
plt.ylabel('Normalized Price')
plt.xlabel('Date')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
return normalized_prices
else:
# Plot actual prices
plt.figure(figsize=(12, 8))
for symbol in prices.columns:
plt.plot(prices.index, prices[symbol], label=symbol, linewidth=2)
plt.title('Stock Price Comparison')
plt.ylabel('Price ($)')
plt.xlabel('Date')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
return prices
# Compare tech stock performance
print("Comparing tech stock performance...")
tech_performance = compare_stock_performance(tech_data, normalize=True)
Real market data often contains gaps, errors, and anomalies. Let's learn to identify and handle these issues.
Garbage In, Garbage Out: Poor data quality can lead to false signals, incorrect backtests, and real trading losses. A single bad data point can trigger a stop-loss or cause a momentum algorithm to buy at the wrong time. Professional trading firms spend millions on data quality because clean data is the foundation of profitable strategies.
Common Data Issues: Corporate actions (stock splits, dividends), exchange holidays, market halts, and data vendor errors can create gaps or spikes in price data. We need to identify these systematically rather than discovering them during live trading when it's too late.
Financial Impact: A trading algorithm that doesn't handle data quality issues might mistake a stock split for a 50% price drop, triggering massive buy orders. Or it might ignore a trading halt and assume liquidity exists when it doesn't. These aren't theoretical problems - they cause real losses.
# Data quality analysis
def analyze_data_quality(data, symbol):
"""Analyze data quality and identify potential issues"""
print(f"=== Data Quality Analysis for {symbol} ===")
# Basic statistics
print(f"Total rows: {len(data)}")
print(f"Date range: {data.index[0].date()} to {data.index[-1].date()}")
# Check for missing values
missing_data = data.isnull().sum()
print(f"\nMissing values:")
for col, missing in missing_data.items():
if missing > 0:
print(f" {col}: {missing} ({missing/len(data)*100:.1f}%)")
else:
print(f" {col}: None")
# Check for zero/negative prices
price_cols = ['Open', 'High', 'Low', 'Close']
for col in price_cols:
zero_prices = (data[col] <= 0).sum()
if zero_prices > 0:
print(f"\nā ļø Warning: {col} has {zero_prices} zero/negative values")
# Check for zero volume
zero_volume = (data['Volume'] == 0).sum()
if zero_volume > 0:
print(f"\nā ļø Warning: {zero_volume} days with zero volume")
# Check for OHLC consistency
inconsistent_ohlc = 0
if 'High' in data.columns and 'Low' in data.columns:
inconsistent_ohlc = (data['High'] < data['Low']).sum()
if inconsistent_ohlc > 0:
print(f"\nā ļø Warning: {inconsistent_ohlc} days where High < Low")
# Check for extreme price movements (>20% in one day)
daily_returns = data['Close'].pct_change()
extreme_moves = (abs(daily_returns) > 0.20).sum()
if extreme_moves > 0:
print(f"\nš Note: {extreme_moves} days with >20% price movement")
extreme_dates = data[abs(daily_returns) > 0.20].index
for date in extreme_dates[:5]: # Show first 5
ret = daily_returns[date] * 100
print(f" {date.date()}: {ret:+.1f}%")
# Gap analysis (missing trading days)
expected_trading_days = pd.bdate_range(start=data.index[0], end=data.index[-1])
actual_days = set(data.index.date)
expected_days = set(expected_trading_days.date)
missing_days = expected_days - actual_days
if missing_days:
print(f"\nš
Missing trading days: {len(missing_days)}")
if len(missing_days) <= 10:
for day in sorted(missing_days)[:5]:
print(f" {day}")
return {
'missing_values': missing_data,
'zero_volume_days': zero_volume,
'inconsistent_ohlc': inconsistent_ohlc,
'extreme_moves': extreme_moves,
'missing_trading_days': len(missing_days)
}
# Analyze Apple data quality
aapl_quality = analyze_data_quality(aapl_data, "AAPL")
# Clean data function
def clean_market_data(data):
"""Clean and preprocess market data"""
cleaned_data = data.copy()
# Remove rows with any missing values
cleaned_data = cleaned_data.dropna()
# Remove rows with zero/negative prices
price_cols = ['Open', 'High', 'Low', 'Close']
for col in price_cols:
cleaned_data = cleaned_data[cleaned_data[col] > 0]
# Fix OHLC inconsistencies (set Low to min of OHLC, High to max)
cleaned_data['Low'] = cleaned_data[['Open', 'High', 'Low', 'Close']].min(axis=1)
cleaned_data['High'] = cleaned_data[['Open', 'High', 'Low', 'Close']].max(axis=1)
print(f"Data cleaning complete:")
print(f" Original rows: {len(data)}")
print(f" Cleaned rows: {len(cleaned_data)}")
print(f" Removed: {len(data) - len(cleaned_data)} rows")
return cleaned_data
# Clean Apple data
aapl_clean = clean_market_data(aapl_data)
Why We Clean Data This Way: Each cleaning step serves a specific purpose in quantitative trading:
Professional Approach: In institutional trading, data quality is monitored in real-time with alerts for anomalies. We're building good habits by implementing these checks in our educational framework - the same principles apply whether you're managing $1,000 or $100 million.
Practice working with real market data!
Collect and analyze data for different asset classes:
# Your solution here
def multi_asset_analysis():
"""Analyze different asset classes"""
# Define symbols for different asset classes
assets = {
'stocks': ['AAPL', 'JPM', 'JNJ'], # Your choices here
'etfs': ['SPY', 'QQQ', 'IWM'],
'commodities': ['GLD', 'USO'],
'crypto': ['BTC-USD', 'ETH-USD']
}
# Your code to:
# 1. Fetch data for all symbols
# 2. Calculate basic statistics
# 3. Compare volatility across asset classes
# 4. Create visualization
pass
# Run your analysis
multi_asset_analysis()
Create a comprehensive data quality dashboard:
# Create a data quality dashboard
def create_data_quality_dashboard(symbols, period="1y"):
"""Create a dashboard showing data quality metrics for multiple symbols"""
quality_summary = []
for symbol in symbols:
# Your code to:
# 1. Fetch data
# 2. Analyze quality
# 3. Store results
pass
# Your code to:
# 1. Create summary DataFrame
# 2. Generate visualizations
# 3. Identify best/worst data quality
return quality_summary
# Test your dashboard
test_symbols = ['AAPL', 'GOOGL', 'TSLA', 'SPY', 'BTC-USD']
dashboard = create_data_quality_dashboard(test_symbols)
In this lesson, you've learned the foundations of quantitative trading:
Next, we'll dive into technical analysis and learn to calculate trading indicators that form the basis of many quantitative strategies!