Gradient descent is built on several key concepts that enable effective optimization.
The fundamental steps of gradient descent are:
The main components include:
Uses the entire training dataset to compute the gradient at each step.
Uses a single training example to compute the gradient at each step.
Uses a small batch of training examples to compute the gradient.
Explore how gradient descent navigates a 3D surface to find the minimum point. Click and drag to rotate the view, scroll to zoom.
import numpy as np
def gradient_descent(X, y, learning_rate=0.01, num_iterations=1000):
# Initialize parameters
m = len(y)
theta = np.zeros(X.shape[1])
# Gradient descent loop
for i in range(num_iterations):
# Calculate predictions
predictions = X.dot(theta)
# Calculate error
error = predictions - y
# Calculate gradient
gradient = X.T.dot(error) / m
# Update parameters
theta = theta - learning_rate * gradient
return theta
# Example usage
X = np.array([[1, 2], [2, 3], [3, 4]])
y = np.array([2, 4, 6])
theta = gradient_descent(X, y)
import torch
import torch.nn as nn
import torch.optim as optim
# Define model
model = nn.Linear(2, 1)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
# Training loop
for epoch in range(1000):
# Forward pass
outputs = model(X)
loss = criterion(outputs, y)
# Backward pass and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()