Skip to content
View aniray2908's full-sized avatar

Block or report aniray2908

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
aniray2908/README.md

Hi, I'm Anisha πŸ‘‹

Data Scientist | ML Systems | NLP & LLM Engineering Bridging applied ML and deep NLP/LLM understanding.


πŸš€ About Me

I'm a Data Scientist with a B.Tech in CS (Data Science specialization) from VIT, currently transitioning back to a dedicated DS role after a stint in enterprise IT operations at Schneider Electric.

My work sits at the intersection of rigorous ML engineering and real-world problem framing β€” I care about systems that are interpretable, statistically grounded, and production-ready.


πŸ“Œ Featured Projects

πŸ›° Satellite Environmental Risk Engine

A geospatial exposure scoring pipeline that converts Sentinel-2 multispectral imagery into tiered environmental risk classifications for mining assets.

  • NDVI-based vegetation suppression features across a 4-asset global portfolio (2019–2023)
  • Corporate Environmental Risk Index (CERI) via z-score compositing
  • Validated via 1,000 bootstrap simulations and LOOCV
  • Deterministic risk tier outputs at scale

πŸ‘‰ Repository


πŸ” Silent Attrition Detector

A modular multi-signal attrition risk detection system for early identification of employee disengagement.

  • CatBoost + neural tabular network for structured HR modeling
  • Behavioral communication drift analysis
  • Neural meta-fusion layer for calibrated risk probabilities
  • SHAP-based explainability, fairness auditing, threshold optimization

πŸ‘‰ Repository


🧠 Current Focus

Working through a structured NLP β†’ LLM roadmap covering:

  • NLP Fundamentals β€” tokenisation, embeddings (Word2Vec, GloVe), language modelling, text classification
  • Transformer Deep Dive β€” implementing attention, positional encoding, and full encoder blocks from scratch in PyTorch
  • LLM Internals β€” decoder-only architecture, LoRA fine-tuning, RAG pipelines
  • Capstone β€” building a mini GPT from scratch (nanoGPT approach)

Goal: move from using LLMs to understanding and fine-tuning them.


πŸ’Ό Background

Schneider Electric | Aug 2024 – Present Graduate Engineer Trainee β†’ Digital Workplace Engineer Enterprise-scale endpoint operations across 100,000+ devices across 50 global sites. Strengthened production awareness, systems thinking, and cross-functional execution.

Samsung Prism | R&D Intern | Apr–Sep 2023 ML model for document vs. scene image classification.

PepsiCo GBS | Data Science Intern | May–Jul 2023 Time series forecasting models integrated into operational dashboards.


πŸ“„ Publication

"Milestone-Gated, Donor-Governed Crowdfunding: Security Properties and the Price of Safety." ICDTDE 2025, Jordan β€” smart contract protocol with on-chain donor governance, evaluated via static analysis and adversarial fuzzing.


πŸ›  Technical Stack

ML & Data: Python, Scikit-learn, PyTorch, CatBoost, LightGBM, SHAP, TensorFlow, Keras, NumPy, Pandas, Dash, Plotly

Geospatial: Sentinel-2, NDVI, geospatial risk scoring

Infrastructure: SQL, MongoDB, Docker, Git, GCP, Azure


Open to Data Scientist and ML Engineer roles. Based in Bangalore.

Pinned Loading

  1. nlp-llm-journey nlp-llm-journey Public

    A hands-on journey from classical NLP to building LLMs from scratch β€” tokenisation, Transformers, LoRA fine-tuning, and RAG.

    Jupyter Notebook

  2. satellite-environmental-risk-engine satellite-environmental-risk-engine Public

    Satellite-driven environmental degradation modeling for mining asset risk assessment using multi-spectral imagery.

    Jupyter Notebook

  3. silent-attrition-detector silent-attrition-detector Public

    Multi-signal machine learning system for employee attrition risk modeling using structured HR data, behavioral communication drift, and neural meta-learning fusion.

    Jupyter Notebook