Regularized Statistical Machine Learning: Comparative Analysis

Overview

This project demonstrates the power and versatility of regularized statistical learning methods (Ridge, Lasso, and Elastic Net) across multiple machine learning paradigms. Through comprehensive analysis of two distinct datasets, we explore how regularization techniques enhance model performance, prevent overfitting, and improve generalization across different algorithmic approaches.

The work showcases expertise in:

Multi-paradigm Analysis: Comparative study across OLS, SVM, and Neural Networks
Regularization Techniques: Ridge (L2), Lasso (L1), and Elastic Net (L1+L2) implementations
Real-world Applications: Crime prediction and health analytics use cases
Rigorous Evaluation: Statistical inference, cross-validation, and performance metrics

Dataset

1. Boston Housing Dataset

Target: Predict per capita crime rates in Boston suburbs
Features: 13 neighborhood characteristics (506 observations)
Challenges: Heavy right-skew, multicollinearity, outliers
Solution: Log-transformation and regularization techniques

2. Life Expectancy Dataset Source: WHO Life Expectancy Dataset - Kaggle

2900+ rows, 20 predictors across health, demographic, and economic domains
Target: Life expectancy at birth (years)
Predictors include:
- Health: Adult Mortality, Infant deaths, Immunisation coverage, HIV/AIDS, BMI
- Demographics: Country, Year, Status (Developed/Developing)
- Economics: GDP per capita, Schooling, Income composition of resources, Health expenditure
Data challenges addressed: Missing values handled with multi-level imputation (moving averages, country means, global medians)
Solution: Advanced imputation strategies and nonlinear modeling

Methods

Machine Learning Pipeline - Raw Data → Preprocessing → Feature Engineering → Model Training → Evaluation → Comparison

Models Implemented 1. Ordinary Least Squares (OLS)

Baseline unregularized linear regression
Statistical hypothesis testing
Cross-validation assessment

2. Regularized Linear Regression

Ridge Regression: L2 penalty for coefficient shrinkage
Lasso Regression: L1 penalty for feature selection
Elastic Net: Combined L1+L2 approach

3. Support Vector Machines

Support Vector Regression (SVR) with L2 regularization
Support Vector Classification (SVC) with multiple penalty types
Linear and RBF kernel comparisons

4. Multi-Layer Perceptron (MLP)

Deep neural networks with regularization
Comprehensive hyperparameter tuning
Performance optimization across regularization schemes

Key Insights & Impact

Education & Income Matter: Schooling and income composition consistently emerged as strong positive predictors, reinforcing the importance of investing in education and economic equity to extend life expectancy.
Mortality Reduction is Critical: High infant, child, and adult mortality rates remain the most significant negative drivers. Interventions targeting child survival and disease prevention (e.g., HIV/AIDS) can have outsized effects.
Non-linear Interactions Exist: Life expectancy is not simply additive - factors like GDP and BMI show diminishing returns, which only non-linear models could capture.
Policy Relevance: Results highlight that health outcomes are multi-dimensional, requiring holistic strategies across healthcare, education, and economic policy.
Analytics Impact: Regularisation improved model reliability on noisy, real-world data, showing its practical role in building robust health prediction systems.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Boston Dataset Notebooks		Boston Dataset Notebooks
Life Expectancy Notebooks		Life Expectancy Notebooks
Life Expectancy Data.csv		Life Expectancy Data.csv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Regularized Statistical Machine Learning: Comparative Analysis

Overview

Dataset

Methods

Key Insights & Impact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Regularized Statistical Machine Learning: Comparative Analysis

Overview

Dataset

Methods

Key Insights & Impact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages