Supervised & Unsupervised Learning
Overview
Machine learning can be broadly categorized into two main types: supervised and unsupervised learning. These approaches differ in how they learn from data and what kind of problems they're best suited for.
Supervised Learning
Supervised learning is a type of machine learning where the algorithm learns from labeled training data. The goal is to learn a mapping from inputs to outputs.
Key Characteristics
- Uses labeled training data
- Learns to predict outputs from inputs
- Can be used for classification and regression
- Requires human supervision for labeling
Common Applications
- Image classification
- Spam detection
- Price prediction
- Medical diagnosis
Example: Email Spam Classification
In spam detection, the algorithm learns from emails that are labeled as "spam" or "not spam" to predict the category of new emails.
Unsupervised Learning
Unsupervised learning is a type of machine learning where the algorithm learns patterns from unlabeled data without explicit supervision.
Key Characteristics
- Uses unlabeled data
- Discovers hidden patterns and structures
- Used for clustering and dimensionality reduction
- No human supervision required
Common Applications
- Customer segmentation
- Anomaly detection
- Topic modeling
- Image compression
Example: Customer Segmentation
In customer segmentation, the algorithm groups customers based on their behavior patterns without any predefined categories.
Comparison
Feature | Supervised Learning | Unsupervised Learning |
---|---|---|
Training Data | Labeled | Unlabeled |
Learning Process | Learning from examples | Finding patterns |
Output | Predictions | Patterns/Structures |
Applications | Classification, Regression | Clustering, Dimensionality Reduction |
Code Example
# Supervised Learning Example (Classification)
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
# Load data
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
iris.data, iris.target, test_size=0.2
)
# Train model
model = RandomForestClassifier()
model.fit(X_train, y_train)
# Unsupervised Learning Example (Clustering)
from sklearn.cluster import KMeans
# Perform clustering
kmeans = KMeans(n_clusters=3)
clusters = kmeans.fit_predict(iris.data)