Building Linear Regression From Scratch: Mastering the Fundamentals 🚀
Linear Regression is the bedrock of machine learning models — simple yet powerful. To truly master it, I decided to build it from scratch, implementing every core step myself.
Theory: Cost Function and Gradient Descent
1. Cost Function (Mean Squared Error - MSE)
We define the cost function as: Where:
is the number of training examples
is the predicted value
is the true value
2. Gradient Descent Update Rules
To minimize the cost function, we update parameters and using:
Where:
is the learning rate which controls the step size of each update
Gradients are calculated as:
💻 Implementation
️1. Data Preprocessing
To ensure faster convergence and stable optimization, the input
features were standardized using Scikit-learn’s
StandardScaler
.
This was crucial—without feature scaling, the gradients would oscillate or diverge.
📊 Data: Auto MPG Dataset
🎯 Goal: Predict a car’s fuel efficiency (mpg) using engine and car specs.
🔗 Source: Available in seaborn or directly via UCI ML repo.
2. Code
The core class LinearRegressionScratch
contains:
fit()
— for model training using gradient descentpredict()
— for making predictionsupdate_params()
— for applying gradients
Weights and bias are initialized to zeros and iteratively updated
over 10,000 iterations.
Here is the full implementation of the linear regression from
scratch:
2.1 Initialization
class LinearRegressionScratch:
def __init__(self, learning_rate=0.01, n_iterations=10000):
self.learning_rate = learning_rate
self.n_iterations = n_iterations
This initializes the model with a specified learning rate and number of iterations.
2.2 Training the Model (fit)
def fit(self, X, y):
self.n, self.m = X.shape
self.w = np.zeros(self.m)
self.b = 0
self.X = X
self.y = y
self.losses = []
for i in range(self.n_iterations):
self.update_params()
loss = self.compute_loss()
self.losses.append(loss)
return self
This method runs gradient descent and tracks the loss at every iteration.
2.3 Compute Loss (compute_loss)
def compute_loss(self):
y_pred = self.predict(self.X)
loss = (1/(2*self.n)) * np.sum((y_pred - self.y) ** 2)
return loss
Computes the MSE loss at each training iteration.
2.4 Prediction (predict)
def predict(self, X):
return np.dot(X, self.w) + self.b
Used for generating predictions on new data.
📉 Visualizing the Loss Curve
Monitoring loss during training was critical to detect divergence and confirm correct implementation of gradient descent.
⚡ Benchmark Comparison
Model | RMSE |
---|---|
Custom Scratch Model | 24.4751 |
Scikit-learn
LinearRegression |
22.1532 |
This validates that the custom implementation achieves near-identical performance compared to a production-level tool.
🧠 Key Takeaways
Gradient descent is extremely sensitive to feature scale.
Debugging gradient formulas built true understanding.
Visualizing loss is essential — it provides a heartbeat of the learning process.
Understanding convergence from first principles gives a deeper grasp than black-box usage.
What’s Next?
In the coming weeks:
Polynomial Regression (non-linearity from scratch)
Ridge and Lasso Regression (regularization from scratch)
Deriving and visualizing bias-variance trade-off
🏁 Conclusion
This project helped me internalize the fundamentals of optimization, modeling, and numerical learning — lessons that will compound as I scale deeper into advanced models.