Gradient Descent Logo Gradient descent is an optimization algorithm used to minimize the loss function by iteratively moving in the direction of steepest descent. It's a fundamental algorithm in machine learning, particularly in training neural networks. The algorithm works by calculating the gradient of the loss function with respect to the parameters and updating them in the opposite direction of the gradient.

Core Concepts

Gradient descent is built on several key concepts that enable effective optimization.

  • Basic Algorithm

    The fundamental steps of gradient descent are:

    • Start at a random point in the parameter space
    • Calculate the gradient of the loss function
    • Move in the opposite direction of the gradient
    • Repeat until convergence

  • Key Components

    The main components include:

    • Learning rate for step size control
    • Gradient calculation
    • Parameter updates
    • Convergence criteria

Types of Gradient Descent

Uses the entire training dataset to compute the gradient at each step.

  • Advantages: Stable convergence, accurate gradient
  • Disadvantages: Computationally expensive, memory intensive

Uses a single training example to compute the gradient at each step.

  • Advantages: Fast, can escape local minima
  • Disadvantages: Noisy updates, may not converge

Uses a small batch of training examples to compute the gradient.

  • Advantages: Balance between speed and stability
  • Disadvantages: Requires tuning batch size

Interactive Visualization

Explore how gradient descent navigates a 3D surface to find the minimum point. Click and drag to rotate the view, scroll to zoom.

0.01
2
Position: (2.00, 2.00)
Value: 8.00
Gradient: (4.00, 4.00)
Status: Running

Implementation Examples

import numpy as np

def gradient_descent(X, y, learning_rate=0.01, num_iterations=1000):
    # Initialize parameters
    m = len(y)
    theta = np.zeros(X.shape[1])
    
    # Gradient descent loop
    for i in range(num_iterations):
        # Calculate predictions
        predictions = X.dot(theta)
        
        # Calculate error
        error = predictions - y
        
        # Calculate gradient
        gradient = X.T.dot(error) / m
        
        # Update parameters
        theta = theta - learning_rate * gradient
        
    return theta

# Example usage
X = np.array([[1, 2], [2, 3], [3, 4]])
y = np.array([2, 4, 6])
theta = gradient_descent(X, y)
import torch
import torch.nn as nn
import torch.optim as optim

# Define model
model = nn.Linear(2, 1)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop
for epoch in range(1000):
    # Forward pass
    outputs = model(X)
    loss = criterion(outputs, y)
    
    # Backward pass and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()