Overview: The Medical Cost Analysis Project explores the relationship between Medical Costs and Demographic Factors. The project was conducted in R to determine and analyze these relationships and determine trends.
Methods used in statistical analysis:
- Data cleaning: Checked for clean data by detecting and removing outliers and determining normality in the relationships between medical costs and each variable.
- Exploratory Data Analysis (EDA): Constructed and analyzed visuals that display the relationships between medical costs and individual variables.
- Hypothesis testing: Applied appropriate hypothesis tests to further analyze findings from the EDA by assessing the significance of differences across means and medians of demographic variables.
Key Findings:
- Gender: No significant differences in median healthcare costs across subgroups (male, female).
- Regional: The initial differences in median medical costs across regions presented in the EDA were not statistically significant on further analysis.
- Smoking Status: Significantly higher mean healthcare costs for smokers compared to non-smokers.
Recommendations: Further exploration of anomalies in Gender and Region and expansion of the dataset.
Conclusion: This project analyzes how demographic variables impact medical costs. I found that region and gender had a minimal impact on medical expenses while smoking status had a greater impact.