Linear Regression

Last updated: Jan 28, 2026
8 min read

Key Concept: Linear Regression is a supervised learning algorithm used to model the relationship between a dependent variable and one or more independent variables by fitting a linear equation.

Introduction

Linear regression is one of the most fundamental algorithms in machine learning. It's used to predict a continuous output variable based on one or more input features. The goal is to find the best-fitting line (or hyperplane in higher dimensions) that describes the relationship between variables.

Mathematical Formulation

The linear regression model can be expressed as:

y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε

Where:

  • y is the dependent variable (target)
  • x₁, x₂, ..., xₙ are the independent variables (features)
  • β₀ is the intercept term
  • β₁, β₂, ..., βₙ are the coefficients
  • ε is the error term

Cost Function

Linear regression uses Mean Squared Error (MSE) as its cost function:

MSE = (1/n) Σ(yᵢ - ŷᵢ)²

The goal is to minimize this cost function to find the optimal parameters.

Implementation in Python

Here's a simple implementation using scikit-learn:

import numpy as np
from sklearn.linear_model import LinearRegression

# Create sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])

# Create and train the model
model = LinearRegression()
model.fit(X, y)

# Make predictions
predictions = model.predict([[6]])
print(f"Prediction for X=6: {predictions[0]}")

Key Assumptions

Linear regression makes several important assumptions:

  1. Linearity: The relationship between X and Y is linear
  2. Independence: Observations are independent of each other
  3. Homoscedasticity: Constant variance of errors
  4. Normality: Errors are normally distributed
  5. No multicollinearity: Independent variables are not highly correlated

Advantages and Disadvantages

Advantages

  • Simple to implement and interpret
  • Computationally efficient
  • Works well when the relationship is linear
  • Less prone to overfitting with few features

Disadvantages

  • Assumes a linear relationship
  • Sensitive to outliers
  • Cannot capture complex patterns
  • Requires assumptions to be met for optimal performance

Pro Tip: Always visualize your data before applying linear regression to check if the relationship is indeed linear. Use residual plots to validate your model's assumptions.

Was this article helpful?