Anisha Ray aniray2908

Hi, I'm Anisha 👋

Data Scientist | ML Systems | NLP & LLM Engineering Bridging applied ML and deep NLP/LLM understanding.

🚀 About Me

I'm a Data Scientist with a B.Tech in CS (Data Science specialization) from VIT, currently transitioning back to a dedicated DS role after a stint in enterprise IT operations at Schneider Electric.

My work sits at the intersection of rigorous ML engineering and real-world problem framing — I care about systems that are interpretable, statistically grounded, and production-ready.

📌 Featured Projects

🛰 Satellite Environmental Risk Engine

A geospatial exposure scoring pipeline that converts Sentinel-2 multispectral imagery into tiered environmental risk classifications for mining assets.

NDVI-based vegetation suppression features across a 4-asset global portfolio (2019–2023)
Corporate Environmental Risk Index (CERI) via z-score compositing
Validated via 1,000 bootstrap simulations and LOOCV
Deterministic risk tier outputs at scale

👉 Repository

🔍 Silent Attrition Detector

A modular multi-signal attrition risk detection system for early identification of employee disengagement.

CatBoost + neural tabular network for structured HR modeling
Behavioral communication drift analysis
Neural meta-fusion layer for calibrated risk probabilities
SHAP-based explainability, fairness auditing, threshold optimization

👉 Repository

🧠 Current Focus

Working through a structured NLP → LLM roadmap covering:

NLP Fundamentals — tokenisation, embeddings (Word2Vec, GloVe), language modelling, text classification
Transformer Deep Dive — implementing attention, positional encoding, and full encoder blocks from scratch in PyTorch
LLM Internals — decoder-only architecture, LoRA fine-tuning, RAG pipelines
Capstone — building a mini GPT from scratch (nanoGPT approach)

Goal: move from using LLMs to understanding and fine-tuning them.

💼 Background

Schneider Electric | Aug 2024 – Present Graduate Engineer Trainee → Digital Workplace Engineer Enterprise-scale endpoint operations across 100,000+ devices across 50 global sites. Strengthened production awareness, systems thinking, and cross-functional execution.

Samsung Prism | R&D Intern | Apr–Sep 2023 ML model for document vs. scene image classification.

PepsiCo GBS | Data Science Intern | May–Jul 2023 Time series forecasting models integrated into operational dashboards.

📄 Publication

"Milestone-Gated, Donor-Governed Crowdfunding: Security Properties and the Price of Safety." ICDTDE 2025, Jordan — smart contract protocol with on-chain donor governance, evaluated via static analysis and adversarial fuzzing.

🛠 Technical Stack

ML & Data: Python, Scikit-learn, PyTorch, CatBoost, LightGBM, SHAP, TensorFlow, Keras, NumPy, Pandas, Dash, Plotly

Geospatial: Sentinel-2, NDVI, geospatial risk scoring

Infrastructure: SQL, MongoDB, Docker, Git, GCP, Azure

Open to Data Scientist and ML Engineer roles. Based in Bangalore.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly