Skip to content
/ ML Public

My machine learning projects

License

Notifications You must be signed in to change notification settings

gdonald/ML

Repository files navigation

Machine Learning

This repo contains my machine learning projects.

Projects

Probabilistic graphical model using discrete Bayesian networks to predict weather conditions across 36 US cities. Infers temperature, humidity, pressure, and wind categories from time of day and seasonal patterns. Implements both manual and learned network structures with pgmpy.

CIFAR-10

  • Corrupted Dataset Analysis

    Initial exploration of CIFAR-10-C dataset analyzing 19 corruption types across 5 severity levels. Extracts and organizes 950k corrupted images with detailed data summaries and class distribution analysis.

  • Model Training

    Compact CNN with batch normalization and data augmentation. Trained for 60 epochs with learning rate scheduling, achieving 63.8% test accuracy. Visualizes training curves and includes full model summary.

  • Advanced Training

    Improved 3-layer CNN with dropout regularization, cosine annealing learning rate schedule, and early stopping. Achieves 84.2% test accuracy with confusion matrix analysis showing strong performance across all 10 classes.

Fine-tuned DistilBERT model for binary sentiment classification on movie reviews. Implements early stopping, sequence length optimization (320 tokens), and threshold tuning. Achieves 92.2% accuracy and 0.9219 F1 score on 25k test reviews.

Ensemble learning combining three CNN architectures for Japanese Hiragana character recognition. Uses stacking with logistic regression meta-learner and k-fold cross-validation. Final ensemble achieves 96.05% accuracy with 0.1942 log loss on 10-class classification.

Real-time sentiment analysis pipeline for Mastodon posts using Twitter-RoBERTa. Fetches 500 English posts from mastodon.social public timeline, performs spaCy preprocessing with feature extraction, and classifies sentiment with confidence scores.

Regression model predicting Connecticut school district attendance rates using HistGradientBoostingRegressor. Analyzes attendance patterns across student demographics and identifies at-risk districts through clustering and visualization.

Binary classification predicting passenger survival on the Titanic using RandomForestClassifier. Features engineered include family size, title extraction, and one-hot encoding. Achieved 81.7% accuracy with hyperparameter tuning via GridSearchCV.

Reinforcement learning experiments using Proximal Policy Optimization to train AI agents for Reversi/Othello. Implements 32 parallel environments with mixed opponent sparring against heuristic and random bots. Not very strong yet.

About

My machine learning projects

Resources

License

Stars

Watchers

Forks