This project analyzes Call of Duty player behavior to understand which factors are most associated with player progression and in-game performance. The analysis separates raw engagement volume, such as time played and total kills, from efficiency metrics, such as kills per hour, kills per game, and score per minute.
Data source: https://www.kaggle.com/datasets/aishahakami/call-of-duty-players
What behavioral factors are most strongly associated with player progression and performance?
The project compares three related outcomes:
- Raw progression: how far a player has advanced, measured with
level. - Progression efficiency: how quickly a player earns progress, measured with
xp_per_hour. - Performance: how effectively a player performs, measured with
scorePerMinute.
- Raw level progression is highly predictable from engagement volume. Players with more time played, kills, headshots, and wins tend to have higher levels.
- Cumulative stats can be misleading on their own because they often reflect playtime more than skill.
- Efficiency metrics help separate "played more" from "played better."
- For progression efficiency, the strongest nonlinear signals were
kdRatio,kills_per_game, andkills_per_hour. - The project uses both Ridge regression and Random Forest models to balance interpretability with nonlinear feature importance.
| Analysis | Target | Model | R2 | MAE |
|---|---|---|---|---|
| Raw Progression Level | level |
RidgeCV | 0.950 | 8.24 |
| Raw Progression Level | level |
Random Forest | 0.964 | 5.39 |
| Progression Efficiency | xp_per_hour |
RidgeCV | 0.786 | 305.89 |
| Progression Efficiency | xp_per_hour |
Random Forest | 0.829 | 269.65 |
| Performance | scorePerMinute |
RidgeCV | 0.793 | 35.51 |
| Performance | scorePerMinute |
Random Forest | 0.845 | 28.59 |
From the project root:
pip install -r requirements.txt
python src/run_analysis.pyThe pipeline loads the raw dataset, engineers behavioral features, trains the models, and writes updated tables and figures to outputs/.
Generated summary:
outputs/analysis_summary.md
Generated tables:
outputs/tables/model_metrics.csvoutputs/tables/behavioral_factor_rankings.csvoutputs/tables/data_quality_summary.csvoutputs/tables/numeric_feature_summary.csvoutputs/tables/target_correlations.csv- model-specific Ridge coefficient tables
- model-specific Random Forest permutation-importance tables
Generated figures:
outputs/figures/behavioral_correlation_heatmap.pngoutputs/figures/level_vs_time_played.png- model-specific coefficient plots
- model-specific permutation-importance plots
The analysis creates rate-based features to separate raw activity from efficiency:
win_rate = wins / (wins + losses)accuracy = hits / shotsheadshot_rate = headshots / killskills_per_game = kills / gamesPlayedassists_per_game = assists / gamesPlayedkills_per_hour = kills / timePlayedxp_per_hour = xp / timePlayedlevel_per_hour = level / timePlayed
Division-by-zero cases are handled safely, and extreme rate outliers are capped at the 99th percentile to keep the models interpretable.
The analysis trains three model groups:
| Analysis | Target | Purpose |
|---|---|---|
| Raw Progression Level | level |
Identifies factors associated with total progression. |
| Progression Efficiency | xp_per_hour |
Identifies behaviors associated with earning progress faster. |
| Performance | scorePerMinute |
Identifies behaviors associated with stronger in-game performance. |
Each analysis uses:
- RidgeCV: standardized linear model with cross-validated regularization.
- Random Forest: nonlinear model with permutation importance.
xp is intentionally excluded from the raw level model because XP is directly tied to level progression and would create leakage.
COD-Analysis/
├── data/
│ └── raw/
│ └── cod.csv
├── src/
│ ├── cod_analysis/
│ │ ├── config.py
│ │ ├── data.py
│ │ ├── features.py
│ │ ├── models.py
│ │ ├── pipeline.py
│ │ ├── plots.py
│ │ └── reports.py
│ └── run_analysis.py
├── outputs/
│ ├── analysis_summary.md
│ ├── figures/
│ └── tables/
├── README.md
└── requirements.txt
data/raw/ stores the original dataset. outputs/ contains generated artifacts and can be recreated by rerunning the pipeline.
This framing supports two types of engagement strategy:
- Retention strategy: encourage consistent play through session goals, streaks, and time-limited progression events.
- Skill strategy: reward efficient play through headshot challenges, accuracy goals, assist bonuses, win streaks, and score-per-minute objectives.
The main distinction is that encouraging more play is different from encouraging better, more satisfying play.