Stephen Marn · Completed August 2023 · MIT IDSS / Great Learning
A curated portfolio of 31 data-science projects spanning exploratory analysis, statistical inference, supervised & unsupervised machine learning, deep learning (CNNs), recommender systems, and network analysis. Each project includes a written walk-through, an accurate methodology diagram, and the visualizations produced from the analysis.
A themed, browsable version of every project is published with GitHub Pages from the docs/ folder:
https://<your-username>.github.io/<repo-name>/— enable it under Settings → Pages → Source:/docs.
| Project | Summary |
|---|---|
| CardioGood Fitness: Treadmill Customer Profiling | Descriptive analytics to build a buyer profile for each of three treadmill product lines |
| FIFA World Cup Analysis | Mining 80+ years of World Cup history to guide a new football club's strategy |
| Uber NYC Trip Demand Analysis | Exploring six months of NYC Uber pickups to understand when and where ride demand peaks |
| Project | Summary |
|---|---|
| CAVIAR Criminal Network Analysis | Tracking how a Montreal drug-trafficking network reorganized under repeated police seizures |
| Clustering Countries by Socio-Economic Profile | Grouping 167 nations by health, trade, and income indicators to guide aid and development decisions |
| Enron Email Network Analysis | Using social network analysis to surface the key actors in Enron's senior leadership |
| Genomic Data Clustering: Decoding the Genetic Code | Using unsupervised learning to rediscover that DNA reads in three-letter codons |
| Unsupervised Pattern Discovery with PCA and t-SNE | Compressing high-dimensional education and air-pollution data to reveal hidden structure |
| Project | Summary |
|---|---|
| BigMart Sales Prediction | Predicting item-level outlet sales with interpretable linear regression |
| Employee Attrition Prediction | Why employees leave — and predicting who is at risk |
| Predicting Hospital Length of Stay | Forecasting patient length-of-stay at admission to help HealthPlus plan beds, staff, and resources |
| SuperKart Retail Sales Forecasting | Predicting per-product store sales for the upcoming quarter with linear regression |
| Project | Summary |
|---|---|
| Bitcoin Price Prediction | Forecasting monthly Bitcoin closing prices with classical time-series models |
| Celestial Object Detection | Classifying stars, galaxies, and quasars from Sloan Digital Sky Survey photometry |
| Predicting Employee Attrition at McCurr Health Consultancy | An end-to-end classification pipeline to flag at-risk employees before they leave |
| Predicting Hospital Length of Stay for HealthPlus | A deployable regression model that forecasts patient length of stay at admission to plan beds, staff, and resources |
| Predicting Hotel Booking Cancellations for INN Hotels | Using tree-based classifiers to flag at-risk bookings before they cancel |
| Project | Summary |
|---|---|
| Audio MNIST: Spoken-Digit Recognition with a Neural Network | Classifying spoken digits 0-9 from raw .wav audio using MFCC features and a Keras ANN |
| CIFAR-10 Image Classification with CNNs | Classifying 32x32 color images into 10 object classes with convolutional neural networks and transfer learning |
| COVID-19 Chest X-Ray Classification | A CNN decision-aid that triages chest X-rays into COVID, Normal, and Viral Pneumonia |
| Citation Network Classification with Graph Neural Networks | Predicting a paper's research topic from how it cites other papers, using a GCN on the Cora dataset |
| Food Image Classification with CNNs | Teaching a convolutional neural network to tell Bread, Soup, and Vegetable-Fruit apart |
| Movie Recommendation with Graph Neural Networks | Learning movie embeddings from co-viewing patterns on MovieLens to suggest the next film to watch |
| Predicting Employee Attrition with Deep Learning | An artificial neural network that flags which data scientists are likely to switch jobs |
| Predicting Graduate Admission Chances | A neural network that flags which applicants are likely to be admitted to UCLA |
| Rice Type Classification with CNNs | Sorting five rice varieties from magnified grain images using deep learning |
| Project | Summary |
|---|---|
| Book Recommendation System | Comparing rank-based, collaborative filtering, and matrix factorization approaches to recommend books |
| Building a Product Recommender at Scale | Comparing rank-based and collaborative-filtering recommenders on millions of user ratings |
| MovieLens Movie Recommendation System | Recommending relevant movies from user rating history with popularity, collaborative filtering, and SVD |
| Yelp Restaurant Recommendation System | Recommending restaurants from Yelp reviews using both collaborative filtering and content-based NLP |
| Project | Summary |
|---|---|
| Used Cars Price Prediction (Capstone) | Pricing 7,253 used cars for Cars4U with regression on log price |
- Each
<module>/<project>/folder holds the notebook(s), aREADME.md, arequirements.txt, and afigures/folder. docs/is the GitHub Pages site — one themed page per project plus an index landing page.- Datasets and course materials are intentionally not committed (see Attribution).
These projects were completed as part of the MIT Applied Data Science Program (MIT IDSS / Great Learning). The program provided the case-study scaffolding and data; the analysis, code, and results are my own. Course materials, provided solution notebooks, and program datasets are not redistributed here — published with permission, for portfolio use only.
© Stephen Marn — all rights reserved. This repository is shared as a portfolio showcase, not as an open-source template: it carries no software license, so the code is not granted for reuse, copying, or redistribution.
If you are currently enrolled in this or a similar program, do not copy this work or submit it as your own — doing so violates your program's honor code. Learn from the approach, then write your own.