Learn how to measure performance and improve your models
25-30 minutes Advanced Level 8 Quiz Questions
Why Evaluation Matters
Building a model is only half the battle. You need to know: Is it actually good? How can you make it better? Model evaluation provides the answers through systematic measurement of performance.
Think of it like being a coach - you need metrics to track player performance, identify weaknesses, and develop training strategies. In ML, evaluation metrics serve the same purpose for your models.
Classification Metrics
For problems where you predict categories (spam/not spam, cat/dog, disease/healthy):
Essential Classification Metrics
📊 Accuracy
Formula: (TP + TN) / (TP + TN + FP + FN)
Percentage of correct predictions. Simple but can be misleading with imbalanced data.
🎯 Precision
Formula: TP / (TP + FP)
Of all positive predictions, how many were actually correct? Important when false positives are costly.
🔍 Recall (Sensitivity)
Formula: TP / (TP + FN)
Of all actual positives, how many did we correctly identify? Critical when missing positives is dangerous.
Harmonic mean of precision and recall. Balances both metrics.
🔢 MNIST Digit "8" Classification Matrix
Adjust the classification results and see how evaluation metrics change for digit recognition:
True Positives: Model said "8", actually was "8"
False Positives: Model said "8", actually other digit
False Negatives: Model missed "8", said other digit
True Negatives: Model correctly identified other digits
Model Prediction
Actual Digit
Other (0,1,2,3,4,5,6,7,9)
Digit "8"
Other Digits
942 Correctly identified other digits
8 Mistakenly called "8"
Digit "8"
3 Missed actual "8"s
47 Correctly found "8"s
📊 Overall Accuracy
98.9%
Correct predictions / All predictions
🎯 "8" Precision
85.5%
When model says "8", how often correct?
🔍 "8" Recall
94.0%
Of all actual "8"s, how many found?
⚖️ F1-Score
89.5%
Balance of precision and recall
💡 MNIST Insight: In digit recognition, high precision means when the model says "8", it's usually right.
High recall means the model catches most of the actual "8"s in the dataset.
Regression Metrics
For predicting continuous values (house prices, temperature, stock prices):
Mean Absolute Error (MAE)
Average absolute difference between predictions and actual values
Advantage: Easy to interpret, same units as target
Mean Squared Error (MSE)
Average squared difference - heavily penalizes large errors
Advantage: Smooth gradient for optimization
R² Score (Coefficient of Determination)
Proportion of variance explained by the model (0-1 scale)
Advantage: Scale-independent, easy to interpret
Cross-Validation: Robust Evaluation
Never trust a model evaluated on just one split of data. Cross-validation provides more reliable performance estimates:
K-Fold Cross-Validation Process:
Split data into k equal parts (folds)
Train on k-1 folds, test on remaining fold
Repeat k times, using each fold as test set once
Average the k performance scores
Hyperparameter Optimization
Fine-Tuning Your Model
Hyperparameters are settings you choose before training (learning rate, number of layers, etc.). Finding optimal values requires systematic search:
Common Hyperparameters:
Learning Rate: How big steps to take during optimization
Batch Size: Number of examples processed together
Number of Epochs: How many times to see the full dataset
Architecture: Number of layers, neurons per layer
Regularization: Techniques to prevent overfitting
Search Strategies:
Grid Search: Test all combinations of predefined values
Random Search: Randomly sample hyperparameter combinations
Bayesian Optimization: Use previous results to guide search
Preventing Overfitting
Overfitting occurs when models memorize training data but fail on new data. Prevention strategies:
🛡️ Regularization
L1/L2 Regularization: Add penalty for large weights
Dropout: Randomly ignore neurons during training
⏰ Early Stopping
Monitor validation performance and stop training when it stops improving
📊 Data Augmentation
Create variations of training examples to increase diversity
🔄 Cross-Validation
Use multiple train/validation splits to get robust estimates
Bias-Variance Tradeoff
Understanding this fundamental tradeoff helps make better modeling decisions:
High Bias: Model is too simple, misses patterns (underfitting)
High Variance: Model is too complex, memorizes noise (overfitting)
Sweet Spot: Balance complexity to minimize total error
Model Selection Best Practices
🎯 Systematic Approach
Start Simple: Begin with basic models to establish baselines
Add Complexity Gradually: Increase sophistication step by step
Use Validation Sets: Never optimize on test data
Consider Domain Knowledge: Let expertise guide feature selection
Monitor Multiple Metrics: Don't optimize for just one number
Document Everything: Track experiments for reproducibility
Knowledge Check
Test your understanding of model evaluation and optimization
1. When is precision more important than recall?
A) When false negatives are costly
B) When false positives are costly
C) When the dataset is balanced
D) Never - recall is always more important
2. What does the F1-score represent?
A) The arithmetic mean of precision and recall
B) The harmonic mean of precision and recall
C) The maximum of precision and recall
D) The difference between precision and recall
3. What is the main advantage of k-fold cross-validation?
A) It trains models faster
B) It provides more robust performance estimates
C) It requires less data
D) It automatically optimizes hyperparameters
4. Which metric heavily penalizes large errors?
A) Mean Absolute Error (MAE)
B) Mean Squared Error (MSE)
C) Accuracy
D) R² score
5. What is overfitting?
A) Model performs poorly on both training and test data
B) Model performs well on training data but poorly on test data
C) Model is too simple to capture patterns
D) Model trains too quickly
6. Which technique randomly ignores neurons during training to prevent overfitting?
A) L2 regularization
B) Early stopping
C) Dropout
D) Data augmentation
7. In a confusion matrix, what does a False Positive represent?
A) Correctly predicted positive
B) Correctly predicted negative
C) Incorrectly predicted as positive
D) Incorrectly predicted as negative
8. What does an R² score of 0.85 mean?
A) 85% accuracy
B) 85% of variance is explained by the model
C) 15% error rate
D) 85% precision
Quiz Complete!
0/8
Great job! You understand model evaluation and optimization.