An interactive Jupyter Book that teaches machine learning concepts through hands-on implementation with real-world data collection and analysis.
This comprehensive book provides an in-depth introduction to 22 essential machine learning and AI algorithms, combining theoretical understanding with practical implementation. What sets this book apart is its emphasis on:
- Real-world data collection through ethical web scraping and APIs
- Step-by-step mathematical explanations with proper citations
- Interactive code examples that you can run and modify
- Comprehensive visualizations to understand algorithm behavior
- Best practices for reproducible data science workflows
- Complete coverage from basic regression to deep learning
- Linear & Logistic Regression
- Decision Trees & Random Forest
- Support Vector Machines
- Neural Networks & Deep Learning
- Naive Bayes & K-Nearest Neighbors
- Boosting Methods (AdaBoost, Gradient Boosting)
- Clustering (K-Means, Hierarchical, DBSCAN, GMM)
- Dimensionality Reduction (PCA, LDA)
- Association Rule Mining
- Reinforcement Learning (Q-Learning)
- Deep Learning (Autoencoders, CNNs)
FrontierML/
├── notebooks/ # Interactive Jupyter Book chapters (22 total)
│ ├── 00_index.ipynb # Course overview
│ ├── 01_data_collection.ipynb # Data collection and scraping
│ ├── 02_linear_regression.ipynb # Linear regression
│ ├── 03_logistic_regression.ipynb # Logistic regression
│ ├── 04_decision_trees.ipynb # Decision trees
│ ├── 05_random_forest.ipynb # Random forest
│ ├── 06_support_vector_machines.ipynb # Support vector machines
│ ├── 07_neural_networks.ipynb # Neural networks
│ ├── 08_k_means_clustering.ipynb # K-means clustering
│ ├── 09_hierarchical_clustering.ipynb # Hierarchical clustering
│ ├── 10_principal_component_analysis.ipynb # PCA
│ ├── 11_naive_bayes.ipynb # Naive Bayes
│ ├── 12_k_nearest_neighbors.ipynb # KNN
│ ├── 13_gradient_boosting.ipynb # Gradient boosting
│ ├── 14_association_rule_mining.ipynb # Association rules
│ ├── 15_dbscan_clustering.ipynb # DBSCAN clustering
│ ├── 16_linear_discriminant_analysis.ipynb # LDA
│ ├── 17_gaussian_mixture_models.ipynb # GMM
│ ├── 18_adaboost.ipynb # AdaBoost
│ ├── 19_q_learning.ipynb # Q-learning
│ ├── 20_autoencoders.ipynb # Autoencoders
│ └── 21_convolutional_neural_networks.ipynb # CNNs
├── data/ # Datasets used in notebooks
│ ├── raw/ # Raw scraped data
│ ├── processed/ # Cleaned and processed data
│ ├── features/ # Feature engineered datasets
│ └── samples/ # Sample datasets for examples
├── utils/ # Utility functions for notebooks
│ ├── data_utils.py # Data processing utilities
│ ├── evaluation_utils.py # Model evaluation utilities
│ ├── plot_utils.py # Visualization utilities
│ └── scraping_utils.py # Web scraping utilities
├── tests/ # Basic functionality tests
├── docs/ # Additional documentation
├── _config.yml # Jupyter Book configuration
├── _toc.yml # Table of contents
├── intro.md # Book introduction
├── references.bib # Bibliography with proper citations
└── requirements.txt # Dependencies including deep learning libraries
- Python Programming: Intermediate Python skills (functions, classes, NumPy basics)
- Mathematics: Linear algebra, calculus fundamentals, basic probability
- Statistics: Descriptive statistics, hypothesis testing concepts
- Python 3.8+ with pip package manager
- Git for version control
- Jupyter Lab or Jupyter Notebook
- 8GB+ RAM recommended for deep learning chapters
- GPU support optional but recommended for Chapters 20-21
-
Clone the repository:
git clone https://github.com/Beaker12/FrontierML.git cd FrontierML -
Install dependencies:
pip install -r requirements.txt
-
Build and serve the book:
make book # Build the interactive book make serve # Serve locally at http://localhost:8000
-
Open in browser: Navigate to
http://localhost:8000or open_build/html/index.html
-
Start Jupyter Lab:
make jupyter
-
Navigate to notebooks/: Open individual chapters for hands-on learning
-
Install development dependencies:
pip install -r requirements.txt pip install pre-commit black flake8 mypy
-
Set up pre-commit hooks:
pre-commit install
- Start with Chapter 1 (Data Collection) to understand data gathering
- Progress through Chapter 2 (Linear Regression) for mathematical foundations
- Continue sequentially through supervised learning chapters
- Review Chapter 0 (Course Overview) for structure
- Jump to specific algorithms of interest
- Focus on implementation details and mathematical derivations
- Examine the mathematical foundations in each chapter
- Review citations and references in
references.bib - Use implementations as starting points for custom algorithms
This comprehensive course covers 22 essential machine learning and AI techniques organized into logical progressions:
- Ethical web scraping principles and legal considerations
- API interactions for real-world data collection
- Data quality assessment and cleaning pipelines
- Feature engineering for machine learning applications
- Mathematical foundations with proper derivations
- Implementation from scratch using NumPy
- Real estate price prediction with actual scraped data
- Model evaluation, interpretation, and diagnostics
Probabilistic classification with sigmoid functions
Information theory and tree-based learning
Ensemble methods and bootstrap aggregating
Margin maximization and kernel methods
Introduction to neural networks and deep learning fundamentals:
- Mathematical foundations: Perceptron algorithm and convergence theory
- Multi-layer perceptrons: Architecture design and backpropagation implementation
- Activation functions: ReLU, sigmoid, tanh with practical comparisons
- Framework comparison: Implementation across scikit-learn, TensorFlow, and PyTorch
- Sports analytics application: NFL player performance prediction and season classification
- Real-world insights: Pattern recognition in professional sports statistics
Bayesian inference and probabilistic models
Instance-based learning and distance metrics
XGBoost, LightGBM, and advanced boosting
Adaptive boosting with weak learner combinations
Centroid-based clustering and Lloyd's algorithm
Agglomerative and divisive clustering methods
Density-based clustering with noise detection
Probabilistic clustering using EM algorithm
Eigenvalue decomposition and variance preservation
Supervised dimensionality reduction
Market basket analysis and frequent itemset discovery
Markov Decision Processes and value iteration methods
Representation learning and variational autoencoders
Computer vision and image processing
Mathematical Rigor: Every algorithm includes step-by-step mathematical derivations with proper citations
Real-World Data: Actual data collection from websites and APIs, not toy datasets
Implementation Focus: Build algorithms from scratch to understand core concepts
Production Ready: Scikit-learn and TensorFlow implementations for practical use
Comprehensive Coverage: 22 algorithms spanning the full ML spectrum
Reproducible Science: Version-controlled code, documented methodology, proper citations
We welcome contributions that enhance the educational value of this course! Please follow these guidelines:
- Fork the repository and create a feature branch
- Follow coding standards
- Add comprehensive tests for any new functionality
- Update documentation and citations as needed
- Submit a pull request with detailed description
- Additional datasets for algorithm demonstrations
- Exercise solutions and coding challenges
- Mathematical clarifications and improved derivations
- Performance optimizations and computational efficiency
- Translation to other programming languages
All contributions are reviewed for:
- Mathematical accuracy and proper citations
- Code quality and adherence to standards
- Educational clarity and learning progression
- Reproducibility and documentation completeness
If you use this course in your research or teaching, please cite:
@misc{frontierml2025,
title={FrontierML: A Comprehensive Course in Machine Learning and AI},
author={Beaker12},
year={2025},
url={https://github.com/Beaker12/FrontierML}
}This educational resource is licensed under the MIT License - see the LICENSE file for details.
- Mathematical foundations based on established academic literature
- Implementation guidance from scikit-learn and TensorFlow documentation
- Educational approach inspired by best practices in machine learning pedagogy
- Community contributions from students and practitioners worldwide
Start your machine learning journey today!
Whether you're a beginner exploring your first algorithm or an expert deepening your understanding, FrontierML provides the mathematical rigor and practical implementation skills needed to excel in modern AI and machine learning.