Skip to content

Namita-AM/regularization-approaches-comprehensive-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Regularized Statistical Machine Learning: Comparative Analysis

Overview

This project demonstrates the power and versatility of regularized statistical learning methods (Ridge, Lasso, and Elastic Net) across multiple machine learning paradigms. Through comprehensive analysis of two distinct datasets, we explore how regularization techniques enhance model performance, prevent overfitting, and improve generalization across different algorithmic approaches.

The work showcases expertise in:

  • Multi-paradigm Analysis: Comparative study across OLS, SVM, and Neural Networks
  • Regularization Techniques: Ridge (L2), Lasso (L1), and Elastic Net (L1+L2) implementations
  • Real-world Applications: Crime prediction and health analytics use cases
  • Rigorous Evaluation: Statistical inference, cross-validation, and performance metrics

Dataset

1. Boston Housing Dataset

  • Target: Predict per capita crime rates in Boston suburbs
  • Features: 13 neighborhood characteristics (506 observations)
  • Challenges: Heavy right-skew, multicollinearity, outliers
  • Solution: Log-transformation and regularization techniques

2. Life Expectancy Dataset Source: WHO Life Expectancy Dataset - Kaggle

  • 2900+ rows, 20 predictors across health, demographic, and economic domains
  • Target: Life expectancy at birth (years)
  • Predictors include:
    • Health: Adult Mortality, Infant deaths, Immunisation coverage, HIV/AIDS, BMI
    • Demographics: Country, Year, Status (Developed/Developing)
    • Economics: GDP per capita, Schooling, Income composition of resources, Health expenditure
  • Data challenges addressed: Missing values handled with multi-level imputation (moving averages, country means, global medians)
  • Solution: Advanced imputation strategies and nonlinear modeling

Methods

Machine Learning Pipeline - Raw Data → Preprocessing → Feature Engineering → Model Training → Evaluation → Comparison

Models Implemented 1. Ordinary Least Squares (OLS)

  • Baseline unregularized linear regression
  • Statistical hypothesis testing
  • Cross-validation assessment

2. Regularized Linear Regression

  • Ridge Regression: L2 penalty for coefficient shrinkage
  • Lasso Regression: L1 penalty for feature selection
  • Elastic Net: Combined L1+L2 approach

3. Support Vector Machines

  • Support Vector Regression (SVR) with L2 regularization
  • Support Vector Classification (SVC) with multiple penalty types
  • Linear and RBF kernel comparisons

4. Multi-Layer Perceptron (MLP)

  • Deep neural networks with regularization
  • Comprehensive hyperparameter tuning
  • Performance optimization across regularization schemes

Key Insights & Impact

  • Education & Income Matter: Schooling and income composition consistently emerged as strong positive predictors, reinforcing the importance of investing in education and economic equity to extend life expectancy.
  • Mortality Reduction is Critical: High infant, child, and adult mortality rates remain the most significant negative drivers. Interventions targeting child survival and disease prevention (e.g., HIV/AIDS) can have outsized effects.
  • Non-linear Interactions Exist: Life expectancy is not simply additive - factors like GDP and BMI show diminishing returns, which only non-linear models could capture.
  • Policy Relevance: Results highlight that health outcomes are multi-dimensional, requiring holistic strategies across healthcare, education, and economic policy.
  • Analytics Impact: Regularisation improved model reliability on noisy, real-world data, showing its practical role in building robust health prediction systems.

About

Exploring how regularization techniques enhance model performance, prevent overfitting, and improve generalization across different algorithmic approaches.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors