Welcome to my R Programming & Statistical Analysis Portfolio! This repository showcases my expertise in R programming, R Markdown, and statistical modeling through a comprehensive collection of academic and professional projects. Each project demonstrates my proficiency in data manipulation, statistical analysis, data visualization, and reproducible research practices using R.
As a Business Analytics graduate student at Roosevelt University, these projects reflect my strong foundation in statistical methods, predictive analytics, and data-driven decision-makingโessential skills for Data Analyst and Business Intelligence roles.
- 6 Comprehensive R Projects covering statistical analysis and predictive analytics
- R Markdown Documents for reproducible, professional analysis reports
- Curated R Cheatsheets (data.table, xts) for quick reference and learning
- Real-World Datasets including Houston flights data and statistical modeling scenarios
- Academic Excellence demonstrating strong statistical theory and practical application
- Clean, Documented Code following R best practices and coding standards
Folder: Houston_Flights_Dataset_Analysis_Assignment
Description:
Comprehensive analysis of Houston airport flight data examining patterns, delays, and operational insights. This project demonstrates data wrangling, exploratory analysis, and visualization skills using real aviation datasets.
Key Learnings:
- Working with large-scale time series flight data
- Analyzing flight delay patterns and causes
- Identifying peak travel times and seasonal trends
- Data cleaning and preprocessing for aviation datasets
- Creating insightful visualizations for operational insights
- Statistical testing for flight performance metrics
Technologies: R, R Markdown, dplyr, ggplot2, lubridate, tidyr
Skills Demonstrated: Data Wrangling, Time Series Analysis, Exploratory Data Analysis, Data Visualization, Statistical Testing
Folder: Sampling_and_Group_Analysis_Assignment
Description:
Statistical analysis project focusing on sampling techniques, group comparisons, and hypothesis testing. This assignment demonstrates fundamental statistical concepts and their practical applications.
Key Learnings:
- Understanding various sampling methods (random, stratified, systematic)
- Conducting group comparisons using t-tests and ANOVA
- Hypothesis testing and p-value interpretation
- Confidence interval construction and interpretation
- Sample size determination and power analysis
- Statistical significance vs. practical significance
Technologies: R, R Markdown, Statistical Testing Packages
Skills Demonstrated: Statistical Inference, Hypothesis Testing, Sampling Theory, Group Analysis, Research Methodology
Folder: Predictive_Analytics_Assignment_1
Description:
Introduction to predictive modeling focusing on linear regression analysis, model evaluation, and interpretation. This project builds the foundation for advanced predictive analytics techniques.
Key Learnings:
- Building and interpreting linear regression models
- Model assumptions testing (normality, homoscedasticity, linearity)
- Feature selection and variable importance
- Model evaluation using R-squared, RMSE, and MAE
- Residual analysis and diagnostics
- Making predictions and confidence intervals
Technologies: R, R Markdown, Statistical Modeling, caret
Skills Demonstrated: Linear Regression, Model Diagnostics, Predictive Modeling, Statistical Inference
Folder: Predictive_Analytics_Assignment_2
Description:
Advanced regression techniques including multiple regression, polynomial regression, and regularization methods. This project expands predictive modeling skills with more complex scenarios.
Key Learnings:
- Multiple linear regression with multiple predictors
- Handling multicollinearity (VIF analysis)
- Polynomial and interaction terms
- Ridge and Lasso regularization
- Cross-validation techniques
- Feature engineering and transformation
Technologies: R, R Markdown, glmnet, caret, Statistical Modeling
Skills Demonstrated: Multiple Regression, Regularization, Feature Engineering, Model Selection, Cross-Validation
Folder: Predictive_Analytics_Assignment_3
Description:
Classification modeling project focusing on logistic regression and model evaluation for categorical outcomes. This assignment demonstrates skills in binary and multinomial classification.
Key Learnings:
- Logistic regression for binary classification
- Odds ratios and probability interpretation
- Classification metrics (accuracy, precision, recall, F1-score)
- ROC curves and AUC analysis
- Confusion matrix interpretation
- Threshold optimization for classification
Technologies: R, R Markdown, Logistic Regression, pROC, caret
Skills Demonstrated: Classification Modeling, Logistic Regression, Model Evaluation, ROC Analysis, Probability Modeling
Folder: Predictive_Analytics_Assignment_4
Description:
Advanced machine learning techniques including decision trees, ensemble methods, and model comparison. This project showcases expertise in modern predictive analytics approaches.
Key Learnings:
- Decision trees and tree-based models
- Random forests and ensemble methods
- Model comparison and selection strategies
- Handling imbalanced datasets
- Feature importance from tree-based models
- Advanced model evaluation techniques
Technologies: R, R Markdown, randomForest, rpart, caret, Machine Learning
Skills Demonstrated: Tree-Based Models, Ensemble Learning, Random Forests, Model Comparison, Advanced ML Techniques
This repository includes carefully curated reference materials to support R programming workflow:
Comprehensive guide to the data.table package for high-performance data manipulation
- Fast data aggregation and summarization
- Efficient joins and reshaping operations
- Memory-efficient data processing
- Advanced data.table syntax and operations
Extensible Time Series (xts) package reference for time series data manipulation
- Time series object creation and manipulation
- Date/time indexing and subsetting
- Time-based aggregations
- Time series plotting and analysis
Purpose: These cheatsheets serve as quick references to enhance workflow efficiency and support continuous learning in R programming, benefiting both learners and professionals.
- R Programming: Advanced R syntax, functions, packages, and best practices
- R Markdown: Reproducible research, literate programming, professional reports
- RStudio: Integrated development environment proficiency
- Version Control: Git/GitHub for code management and collaboration
- Descriptive Statistics: Summary statistics, distributions, data exploration
- Inferential Statistics: Hypothesis testing, confidence intervals, p-values
- Regression Analysis: Linear, multiple, polynomial, logistic regression
- Predictive Modeling: Machine learning algorithms, model evaluation, validation
- Time Series Analysis: Temporal patterns, seasonality, trend analysis
- Statistical Testing: t-tests, ANOVA, chi-square, correlation tests
- Data Wrangling: dplyr, tidyr, data.table for efficient data manipulation
- Data Visualization: ggplot2 for professional, publication-quality graphics
- Data Cleaning: Handling missing values, outliers, data quality issues
- Feature Engineering: Creating derived variables, transformations, encoding
- Supervised Learning: Regression and classification models
- Model Evaluation: Cross-validation, performance metrics, ROC analysis
- Regularization: Ridge, Lasso for overfitting prevention
- Ensemble Methods: Random forests, boosting, bagging
- Feature Selection: Variable importance, stepwise selection
- Reproducible Research: R Markdown for transparent, repeatable analysis
- Technical Writing: Clear documentation, code comments, professional reports
- Statistical Reporting: Communicating findings to technical and non-technical audiences
- Data Storytelling: Creating narratives from statistical insights
These R programming projects demonstrate my ability to:
โ
Conduct Rigorous Statistical Analysis: Apply appropriate statistical methods to answer business questions
โ
Build Predictive Models: Create accurate models to forecast outcomes and support decision-making
โ
Extract Insights from Data: Transform raw data into actionable intelligence through statistical analysis
โ
Ensure Reproducibility: Document analysis workflows for transparency and repeatability
โ
Communicate Complex Results: Present statistical findings clearly to diverse stakeholders
โ
Apply Academic Excellence: Demonstrate strong theoretical foundation combined with practical skills
R is the gold standard for statistical computing and data science, offering:
- Comprehensive Statistical Capabilities: Industry-leading statistical packages and methods
- Data Visualization Excellence: ggplot2 and other libraries for stunning, informative graphics
- Reproducible Research: R Markdown enables transparent, repeatable analysis workflows
- Active Community: Extensive package ecosystem (CRAN) with 18,000+ packages
- Academic & Industry Adoption: Widely used in research, healthcare, finance, and tech
- Open Source: Free, community-driven, continuously evolving language
Explore my other professional work:
- ๐ Python Projects - Data analysis and visualization with Python
- ๐ Power BI Projects - Interactive dashboards and BI solutions
- ๐๏ธ SQL Data Analysis - Database querying and analysis
- ๐ Certificates - Professional certifications
- ๐ Excel Portfolio - Advanced Excel analysis
- Programming: R (Advanced), Python, SQL
- Statistical Analysis: Regression, Hypothesis Testing, Predictive Modeling, Time Series
- Data Science: Machine Learning, Statistical Modeling, Data Mining
- Business Intelligence: Power BI, Data Visualization, Dashboard Development
- Tools: RStudio, Jupyter Notebooks, Git/GitHub, Excel
- Business Skills: Analytical Thinking, Problem-solving, Communication, Research
I'm always open to connecting with fellow data professionals, discussing opportunities, or collaborating on interesting projects!
This repository is for educational and portfolio purposes. The code and analysis are available for learning and reference. Please feel free to explore and reach out with any questions!
Interested in learning R? Here are some valuable resources:
- R for Data Science by Hadley Wickham & Garrett Grolemund
- CRAN: The Comprehensive R Archive Network
- RStudio Cheatsheets: Quick references for popular R packages
- R-bloggers: Community-driven R news and tutorials
โญ If you find these R projects helpful or interesting, please consider starring this repository!
Last Updated: December 2025