Data Scientist | ML Systems | NLP & LLM Engineering Bridging applied ML and deep NLP/LLM understanding.
I'm a Data Scientist with a B.Tech in CS (Data Science specialization) from VIT, currently transitioning back to a dedicated DS role after a stint in enterprise IT operations at Schneider Electric.
My work sits at the intersection of rigorous ML engineering and real-world problem framing β I care about systems that are interpretable, statistically grounded, and production-ready.
A geospatial exposure scoring pipeline that converts Sentinel-2 multispectral imagery into tiered environmental risk classifications for mining assets.
- NDVI-based vegetation suppression features across a 4-asset global portfolio (2019β2023)
- Corporate Environmental Risk Index (CERI) via z-score compositing
- Validated via 1,000 bootstrap simulations and LOOCV
- Deterministic risk tier outputs at scale
π Repository
A modular multi-signal attrition risk detection system for early identification of employee disengagement.
- CatBoost + neural tabular network for structured HR modeling
- Behavioral communication drift analysis
- Neural meta-fusion layer for calibrated risk probabilities
- SHAP-based explainability, fairness auditing, threshold optimization
π Repository
Working through a structured NLP β LLM roadmap covering:
- NLP Fundamentals β tokenisation, embeddings (Word2Vec, GloVe), language modelling, text classification
- Transformer Deep Dive β implementing attention, positional encoding, and full encoder blocks from scratch in PyTorch
- LLM Internals β decoder-only architecture, LoRA fine-tuning, RAG pipelines
- Capstone β building a mini GPT from scratch (nanoGPT approach)
Goal: move from using LLMs to understanding and fine-tuning them.
Schneider Electric | Aug 2024 β Present Graduate Engineer Trainee β Digital Workplace Engineer Enterprise-scale endpoint operations across 100,000+ devices across 50 global sites. Strengthened production awareness, systems thinking, and cross-functional execution.
Samsung Prism | R&D Intern | AprβSep 2023 ML model for document vs. scene image classification.
PepsiCo GBS | Data Science Intern | MayβJul 2023 Time series forecasting models integrated into operational dashboards.
"Milestone-Gated, Donor-Governed Crowdfunding: Security Properties and the Price of Safety." ICDTDE 2025, Jordan β smart contract protocol with on-chain donor governance, evaluated via static analysis and adversarial fuzzing.
ML & Data: Python, Scikit-learn, PyTorch, CatBoost, LightGBM, SHAP, TensorFlow, Keras, NumPy, Pandas, Dash, Plotly
Geospatial: Sentinel-2, NDVI, geospatial risk scoring
Infrastructure: SQL, MongoDB, Docker, Git, GCP, Azure
Open to Data Scientist and ML Engineer roles. Based in Bangalore.