A Complete Guide to Linear Regression (Theory + PyTorch Practice)

Introduction: What is Linear Regression?

Linear regression is one of the most fundamental supervised learning algorithms in machine learning.
It is mainly used for solving regression problems, i.e. predicting continuous values, such as:

In this tutorial, we will cover linear regression theory step by step, followed by a PyTorch implementation with training code and examples.


Core Idea of Linear Regression

We aim to fit a “linear model” that captures the relationship between input features and the output value. Mathematically:

ŷ = wᵀx + b


Key Concepts in Linear Regression

Here are some basic terms you’ll encounter when studying linear regression for beginners:

Concept Meaning
Sample size n Number of data rows (e.g., 100 houses with area and price info)
Feature dimension d Number of input variables (e.g., area and age)
Training set Data used to train the model
Feature x Input values per sample (e.g., area, age)
Label y Ground truth output (e.g., price)

Mathematical Formulation

For 2D features: price = w₁ · area + w₂ · age + b

For higher dimensions, the linear regression equation can be written as:
ŷ = w₁x₁ + w₂x₂ + ⋯ + wᵢxᵢ + b = wᵀx + b

If we have n samples, we can use matrix notation:

Then: ŷ = Xw + b


Loss Function: Mean Squared Error (MSE)

We use the Mean Squared Error (MSE) as the loss function:

L(w, b) = (1 / 2n) · ∑(ŷ⁽ⁱ⁾ − y⁽ⁱ⁾)²


Training a Linear Regression Model

Method 1: Analytical Solution (Closed-form)

If the dataset is small and the problem is linear, you can directly solve the optimal weights and bias using the formula: w* = (XᵀX)⁻¹ Xᵀy

This method is fast and accurate. However, it becomes impractical if:

Method 2: Gradient Descent Optimization

We iteratively update the model parameters by minimizing the loss function. Each step updates the weights and bias in the negative gradient direction:

w ← w − η · ∇w L(w, b)

b ← b − η · ∇b L(w, b)

Common Practice: Mini-batch Stochastic Gradient Descent (SGD)

Instead of using all data at once, we randomly select a small batch of samples to update the parameters each time:

This is the standard approach used in PyTorch linear regression tutorials, because it scales well with large datasets and deep learning.


Why Use Vectorization in Linear Regression?

Vectorization makes training faster by using optimized matrix operations instead of Python loops.

In training, we often perform large matrix operations. Vectorization improves efficiency by:


Why Squared Loss? Connection to Normal Distribution

We can model prediction error as Gaussian noise:

y = wᵀx + b + ε, where ε ~ 𝒩(0, σ²)

Using Maximum Likelihood Estimation (MLE) to fit the model is mathematically equivalent to minimizing the mean squared error (MSE):

L(w, b) = (1 / 2n) · ∑(ŷ⁽ⁱ⁾ − y⁽ⁱ⁾)²

This is the mathematical motivation behind using MSE.


From Linear Regression to Neural Networks

Linear regression can be seen as the simplest neural network model (a single-layer perceptron). This helps beginners connect regression to deep learning basics.

Linear regression can be seen as the simplest form of a neural network:

Just like a biological neuron:


Analogy with Biological Neurons

A biological neuron structure includes:

This is where the term neural network originates. Modern deep learning is more influenced by math and engineering than biology.


Summary Table: Linear Regression at a Glance

Item Description
Goal Learn a linear function to predict values
Model Equation ŷ = wᵀx + b
Loss Function Mean Squared Error (MSE)
Training Methods Closed-form solution or Gradient Descent
Vectorization Improves performance and readability
Relation to NN Simplest single-layer neural network

FAQ

Q: What is linear regression used for in real life?
A: Linear regression is widely used for housing price prediction, stock forecasting, marketing analytics, and risk modeling.

Q: Why is Mean Squared Error (MSE) commonly used?
A: Because minimizing MSE is equivalent to Maximum Likelihood Estimation under Gaussian noise assumption, making it mathematically justified.

Q: Can I implement linear regression in PyTorch?
A: Yes. PyTorch provides autograd and optimization tools, making it simple to implement linear regression with gradient descent.


References