Data Scientist | Marketing Analytics | Machine Learning
Industrial Engineer from Universidad de Chile (distinción máxima) currently working at ClaroVTR as Efficiencies Engineer — building end-to-end forecasting systems with XGBoost / LightGBM / Prophet ensembles for enterprise clients (~$30M CLP/month identified savings).
End-to-end quantile ML pipeline for monthly data-quota forecasting and per-subscriber plan optimization — a portfolio reproduction of a production system I built at work. Combines XGBoost Quantile Regression (P90 direct), LightGBM with a custom asymmetric loss (penalizes under-prediction 1.5×), DTW shape clustering and Prophet in a tier-based ensemble, then translates each forecast into a concrete topup-bag recommendation via an integer-pricing optimizer.
- Quantile XGBoost with
reg:quantileerrortargeting P90 — calibrated for the asymmetric cost of under-prediction - DTW clustering (
tslearn) on min-max-normalized series — groups subscribers by shape, not level - Behavior features —
abrupt_change,acceleration,consecutive_overage,pct_plan_last, partial-cycle detection - Tier-based ensemble — different model blends for low / mid / high / very_high volume subscribers
- Pricing optimizer with property-based test suite — solves the small/large bag breakpoint
- End-to-end demo runs on a synthetic 120-subscriber dataset; ~93% P90 coverage on the validation fold
- Engineering —
pyproject.tomlwith optional extras, OAuth2 client via env vars (no hard-coded secrets), 14 passing tests
Python · XGBoost · LightGBM · tslearn · Prophet · scikit-learn · xlsxwriter
End-to-end analysis on the UCI Bank Marketing dataset (45k calls from a Portuguese bank) to predict term-deposit subscription. Includes a v2 iteration that diagnoses and fixes a SMOTE-in-CV data leakage bug — the kind of methodology issue real production pipelines must catch.
- EDA with PySpark — conversion rates by segment, temporal patterns, imbalance analysis
- Modeling — Decision Tree, Random Forest, XGBoost with GridSearchCV
- Best model: Random Forest, ROC-AUC 0.7959 on hold-out (with
durationexcluded to avoid leakage) - Key business insight: previously-contacted clients convert at 63.8% vs 9.3% for new ones — 7× more likely to subscribe
PySpark · scikit-learn · XGBoost · imbalanced-learn · Developed in Databricks
Quantifying the gender income gap among ~5,000 small merchants in Latin America (~245k weekly observations) using transactional data from a digital payments platform.
- Fixed-effects regression with
fixest::feols— progressive controls for hours, business category, zone and age brackets - Regularized models — Ridge / LASSO via
glmnet, Backward / Forward selection - Machine learning — CART, MARS, KNN, Random Forest (caret-tuned)
- Key finding: raw gap ≈ 20.7%, of which a substantial part is mediated by hours worked and business category — but a meaningful hourly-productivity gap persists after controls
R · fixest · glmnet · caret · earth · randomForest
Discrete-choice analysis of how the visual salience of credit terms in digital advertising affects consumer choices. Built on a randomized experiment with 4 ad-design conditions (control + 3 treatments emphasizing financial information at increasing levels) and 6 binary choices per participant.
- Conditional logit and mixed logit with
mlogit— including unobserved heterogeneity via random coefficients - ML comparison — CART, SVM, KNN, Random Forest via
caret - Key finding: simple logits show no treatment effect, but the mixed logit reveals a significant T3 effect once unobserved heterogeneity is allowed — a reminder that model choice can change a policy answer
R · mlogit · caret · randomForest · Discrete choice modeling
Languages: Python · R · SQL
ML / Stats: scikit-learn · XGBoost · LightGBM · Statsmodels · mlogit · fixest
Data: Pandas · NumPy · PySpark · Databricks
Visualization: Matplotlib · Seaborn · ggplot2 · Plotly
Tools: Git · Jupyter · RMarkdown · VS Code
- 🏛️ Universidad de Chile — Industrial Engineering
- 📊 Experienced in discrete choice modeling (Logit, Mixed Logit) and Machine Learning
- 📍 Based in Santiago, Chile
- 🌱 Currently exploring: MLOps, production ML pipelines, causal inference