Gradient Descent - ML Resources Hub

Core Concepts

Gradient descent is built on several key concepts that enable effective optimization.

Basic Algorithm
The fundamental steps of gradient descent are:
- Start at a random point in the parameter space
- Calculate the gradient of the loss function
- Move in the opposite direction of the gradient
- Repeat until convergence

Key Components
The main components include:
- Learning rate for step size control
- Gradient calculation
- Parameter updates
- Convergence criteria

Types of Gradient Descent

Uses the entire training dataset to compute the gradient at each step.

Advantages: Stable convergence, accurate gradient
Disadvantages: Computationally expensive, memory intensive

Uses a single training example to compute the gradient at each step.

Advantages: Fast, can escape local minima
Disadvantages: Noisy updates, may not converge

Uses a small batch of training examples to compute the gradient.

Advantages: Balance between speed and stability
Disadvantages: Requires tuning batch size

External Resources

Gradient Descent Overview

Comprehensive guide to gradient descent optimization

Deep Learning Book

Chapter on optimization algorithms

TensorFlow Optimizers

Documentation on optimization implementations

Implementation Examples

import numpy as np

def gradient_descent(X, y, learning_rate=0.01, num_iterations=1000):
    # Initialize parameters
    m = len(y)
    theta = np.zeros(X.shape[1])
    
    # Gradient descent loop
    for i in range(num_iterations):
        # Calculate predictions
        predictions = X.dot(theta)
        
        # Calculate error
        error = predictions - y
        
        # Calculate gradient
        gradient = X.T.dot(error) / m
        
        # Update parameters
        theta = theta - learning_rate * gradient
        
    return theta

# Example usage
X = np.array([[1, 2], [2, 3], [3, 4]])
y = np.array([2, 4, 6])
theta = gradient_descent(X, y)

import torch
import torch.nn as nn
import torch.optim as optim

# Define model
model = nn.Linear(2, 1)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop
for epoch in range(1000):
    # Forward pass
    outputs = model(X)
    loss = criterion(outputs, y)
    
    # Backward pass and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Core Concepts

Types of Gradient Descent

External Resources

Related Topics

Deep Neural Networks

Loss Functions

Optimization

Interactive Visualization

Implementation Examples

Core Concepts

Types of Gradient Descent

Batch Gradient Descent

Stochastic Gradient Descent (SGD)

Mini-batch Gradient Descent

External Resources

Related Topics

Deep Neural Networks

Loss Functions

Optimization

Interactive Visualization

Implementation Examples