Jonathan Sánchez jsanchez-ds

Hi, I'm Jonathan Sánchez 👋

Data Scientist | Marketing Analytics | Machine Learning

Industrial Engineer from Universidad de Chile (distinción máxima) currently working at ClaroVTR as Efficiencies Engineer — building end-to-end forecasting systems with XGBoost / LightGBM / Prophet ensembles for enterprise clients (~$30M CLP/month identified savings).

🚀 Featured Projects

📡 Telecom Quota Forecasting

End-to-end quantile ML pipeline for monthly data-quota forecasting and per-subscriber plan optimization — a portfolio reproduction of a production system I built at work. Combines XGBoost Quantile Regression (P90 direct), LightGBM with a custom asymmetric loss (penalizes under-prediction 1.5×), DTW shape clustering and Prophet in a tier-based ensemble, then translates each forecast into a concrete topup-bag recommendation via an integer-pricing optimizer.

Quantile XGBoost with reg:quantileerror targeting P90 — calibrated for the asymmetric cost of under-prediction
DTW clustering (tslearn) on min-max-normalized series — groups subscribers by shape, not level
Behavior features — abrupt_change, acceleration, consecutive_overage, pct_plan_last, partial-cycle detection
Tier-based ensemble — different model blends for low / mid / high / very_high volume subscribers
Pricing optimizer with property-based test suite — solves the small/large bag breakpoint
End-to-end demo runs on a synthetic 120-subscriber dataset; ~93% P90 coverage on the validation fold
Engineering — pyproject.toml with optional extras, OAuth2 client via env vars (no hard-coded secrets), 14 passing tests

Python · XGBoost · LightGBM · tslearn · Prophet · scikit-learn · xlsxwriter

🏦 Bank Marketing Campaign Analysis

End-to-end analysis on the UCI Bank Marketing dataset (45k calls from a Portuguese bank) to predict term-deposit subscription. Includes a v2 iteration that diagnoses and fixes a SMOTE-in-CV data leakage bug — the kind of methodology issue real production pipelines must catch.

EDA with PySpark — conversion rates by segment, temporal patterns, imbalance analysis
Modeling — Decision Tree, Random Forest, XGBoost with GridSearchCV
Best model: Random Forest, ROC-AUC 0.7959 on hold-out (with duration excluded to avoid leakage)
Key business insight: previously-contacted clients convert at 63.8% vs 9.3% for new ones — 7× more likely to subscribe

PySpark · scikit-learn · XGBoost · imbalanced-learn · Developed in Databricks

💼 Gender Income Gap in Small Commerce

Quantifying the gender income gap among ~5,000 small merchants in Latin America (~245k weekly observations) using transactional data from a digital payments platform.

Fixed-effects regression with fixest::feols — progressive controls for hours, business category, zone and age brackets
Regularized models — Ridge / LASSO via glmnet, Backward / Forward selection
Machine learning — CART, MARS, KNN, Random Forest (caret-tuned)
Key finding: raw gap ≈ 20.7%, of which a substantial part is mediated by hours worked and business category — but a meaningful hourly-productivity gap persists after controls

R · fixest · glmnet · caret · earth · randomForest

💳 Credit Choice Experiment

Discrete-choice analysis of how the visual salience of credit terms in digital advertising affects consumer choices. Built on a randomized experiment with 4 ad-design conditions (control + 3 treatments emphasizing financial information at increasing levels) and 6 binary choices per participant.

Conditional logit and mixed logit with mlogit — including unobserved heterogeneity via random coefficients
ML comparison — CART, SVM, KNN, Random Forest via caret
Key finding: simple logits show no treatment effect, but the mixed logit reveals a significant T3 effect once unobserved heterogeneity is allowed — a reminder that model choice can change a policy answer

R · mlogit · caret · randomForest · Discrete choice modeling

🛠️ Tech Stack

Languages: Python · R · SQL ML / Stats: scikit-learn · XGBoost · LightGBM · Statsmodels · mlogit · fixest Data: Pandas · NumPy · PySpark · Databricks Visualization: Matplotlib · Seaborn · ggplot2 · Plotly Tools: Git · Jupyter · RMarkdown · VS Code

🎓 About Me

🏛️ Universidad de Chile — Industrial Engineering
📊 Experienced in discrete choice modeling (Logit, Mixed Logit) and Machine Learning
📍 Based in Santiago, Chile
🌱 Currently exploring: MLOps, production ML pipelines, causal inference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jonathan Sánchez jsanchez-ds

Block or report jsanchez-ds