Skip to content
View ngwanelegacie's full-sized avatar

Block or report ngwanelegacie

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ngwanelegacie/README.md

Typing SVG


About Me

I turn messy datasets into decisions — and I explain them so anyone can follow.

I'm Vusumuzi Nkosi, a data scientist with 9+ years as a mathematics lecturer who pivoted into ML and analytics. That teaching background isn't a footnote — it's the thing that makes me different. I don't just build models; I communicate what they mean.

Currently completing a Postgraduate Diploma in Data Science (NQF 8) at STADIO and graduated from the ALX Data Science Programme. I work at the intersection of data science and finance, building end-to-end pipelines on real-world data.

Quick Facts

  • Education: PGDip Data Science @ STADIO (NQF 8) | ALX DS Graduate | BSc Maths & CS
  • Location: Newcastle, KZN, South Africa
  • Focus: Credit Risk | Financial Analytics | Machine Learning
  • Building: Portfolio projects with real data, real scale, real business impact
  • Open to: Data Scientist, Data Analyst, or ML roles (South Africa / remote)

Tech Stack

Languages & Core

Python SQL R Excel

ML & Data Science

Scikit-learn XGBoost Pandas NumPy SMOTE

Visualisation & BI

Power BI Tableau Matplotlib Seaborn Plotly

Tools

Jupyter Git GitHub VS Code Google Colab AWS SageMaker


Featured Projects

1. Lending Club Loan Default Analysis

2.26M real loans | AUC 0.72 | ~$95M annual impact

End-to-end credit risk pipeline on real Lending Club data (2007–2018). Designed the SQL schema, cleaned 151 columns down to 33 features, ran full EDA, trained Logistic Regression / Random Forest / Gradient Boosting, and quantified ~$95M/year in preventable losses. This is the project that shows how I work at scale.

Metric Value
Dataset 2,260,701 loans
Best model Gradient Boosting (AUC 0.72)
Top feature sub_grade (23.3% importance)
Business impact ~$95M annual loss prevention

Tech: Python SQL scikit-learn Streamlit Excel

View Project Live Demo


2. US Household Income SQL Analysis

Advanced SQL | Window functions, CTEs, aggregations

Cleaned and analysed household income data across US states and counties, uncovering regional income trends and demographic patterns using joins, window functions, CTEs, and multi-level aggregations.

Tech: SQL MySQL

View Project


3. World Life Expectancy SQL Analysis

190+ countries | Health trends | Economic correlations

Investigated correlations between life expectancy and health variables across 190+ countries. SQL data cleaning and exploratory analysis examining trends over time and links to economic indicators.

Tech: SQL MySQL

View Project


4. Sales Performance Dashboard & Analysis

Excel | Interactive dashboards | Business KPIs

Sales performance analysis with interactive dashboards, pivot tables, conditional formatting, and visual reporting. Covers data cleaning, trend identification, and stakeholder-ready business metrics.

Tech: Excel Pivot Tables

View Project


Education & Certifications

Qualification Institution Status
Postgraduate Diploma in Data Science (NQF 8) STADIO Higher Education In Progress (2026)
Data Science Programme ALX Africa Completed
BSc Mathematics & Computer Science University of South Africa Completed
BEd Mathematics & Computer Science University of South Africa Completed
Certification Issuer
Data Science Certificate ALX Africa
Data Analyst Certificate (SQL, Power BI, A/B Testing) ALX Africa
AI Engineering Bootcamp Zero To Mastery Academy
Prompt Engineering Bootcamp Zero To Mastery Academy

GitHub Stats


My Journey

2026          PGDip Data Science (NQF 8) — STADIO Higher Education
               └─ Intro to Data Science · Working with Data · Statistical Modelling · Applied ML

2025-2026     ALX Data Science Programme — Graduated
               └─ Professional Foundations · Data Analytics · Python · Machine Learning

2025          Lending Club Analysis — 2.26M loans, AUC 0.72, $95M impact
               └─ The project that proved I can work at scale with real data

2025          Certifications — ALX Data Analyst · ZTM AI Engineering · ZTM Prompt Engineering

2022-2025     BSc Mathematics & Computer Science — UNISA

2017-Present  Senior Lecturer — Mathematics & IT, Amajuba TVET College
               └─ 9+ years teaching 100+ students/year → communication superpower

2012-2017     BEd Mathematics & Computer Science — UNISA

2012-2016     Special Educator — Mathematics, Mdumiseni High School
               └─ Where it all started

Connect

Portfolio Email LinkedIn


Visitor Badge

"Data is the new oil, but insight is the engine that powers progress."

Pinned Loading

  1. lending-credit-analysis lending-credit-analysis Public

    End-to-end credit risk analysis on 2.26M Lending Club loans - SQL, Python EDA, ML default prediction (AUC 0.72), and ~$95M annual impact quantification.

    Jupyter Notebook