Skip to content
View AG141293's full-sized avatar
๐Ÿ’ญ
Turning data into intelligent systems
๐Ÿ’ญ
Turning data into intelligent systems

Block or report AG141293

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please donโ€™t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
AG141293/README.md
ANKITA GHOSH GenAI ยท RAG ยท LLMs ยท NLP ยท VLM ๐Ÿš€ Building Production-Ready AI Systems 01 RAG LLM NLP 0x1A

Typing SVG


๐Ÿš€ About Me

๐ŸŽ“ M.Sc. in Artificial Intelligence & Machine Learning (CGPA: 9.3) ๐Ÿ’ผ Ex-ML Intern @ Tech Mahindra Makers Lab | AI Intern @ Codec Technologies (Current) ๐Ÿค– GenAI & NLP Engineer โ€” I build production-ready AI systems, not just models. ๐ŸŒŸ Contributor to vinta/awesome-python โ€” the #10 most starred repo on GitHub โœ…

๐Ÿ’ก Focus Areas:

  • RAG Systems & LLM Pipelines
  • Multilingual NLP (Indian Languages, 10,000+ dialect variations)
  • Synthetic Data Generation (10ร— scaling)
  • Document Intelligence & OCR

๐Ÿ† GitHub Trophies


๐Ÿ“Š GitHub Stats


๐ŸŒ Open Source Contributions

# Repo Type Description Status
1 vinta/awesome-python ๐Ÿ”€ PR fix: Remove duplicate ruff entry from Code Linters section โ€” Merged โœ… into #10 most starred GitHub repo! ๐ŸŸฃ Merged
2 recodehive/Opensource-practice ๐Ÿ”€ PR fix: Resolve misplaced names & duplicate J section โ€” Merged โœ… ๐ŸŸฃ Merged
3 canonical/pycloudlib ๐Ÿ”€ PR docs: Update contributing guidelines ๐ŸŸก Open
4 topoteretes/cognee ๐Ÿ”€ PR Add beginner-friendly example to documentation ๐ŸŸก Open
5 recodehive/machine-learning-repos ๐Ÿ› Issue [Enhancement] Add Multilingual & Indian Language NLP Resources ๐ŸŸข Raised
6 recodehive/Opensource-practice ๐Ÿ› Issue [Bug] Multiple names misplaced & duplicate section under letter J ๐ŸŸข Raised
7 supabase/supabase-py ๐Ÿ’ก Issue Suggestion to improve documentation clarity for beginners ๐ŸŸข Raised
8 topoteretes/cognee ๐Ÿ’ก Issue [Docs] Improve documentation with beginner-friendly knowledge graph example ๐ŸŸข Raised
9 public-apis/public-apis ๐Ÿ› Issue [Bug] Multiple defunct/dead APIs still listed in README (FTX, AnonFiles, MetaWeather & more) ๐ŸŸข Raised
10 Mohitha1406/text-emotion-classifier ๐Ÿ› Issue [Docs] Add dataset source and download instructions for emotion classification model ๐ŸŸข Raised
11 shrutiii16/Traffic-Patterns ๐Ÿ› Issue [Docs] Add README with project overview, dataset description and usage instructions ๐ŸŸข Raised
12 EpistasisLab/tpot ๐Ÿ› Issue [Docs] Fix typo: "feautures" should be "features" in README Tips section ๐ŸŸข Raised
13 TheAlgorithms/Python ๐Ÿ’ก Issue [Feature] Add Naive Bayes Text Classification Algorithm to machine_learning/ ๐ŸŸข Raised
14 datasciencemasters/go ๐Ÿ› Issue [Docs] Update outdated book prices and mark freely available resources ๐ŸŸข Raised
15 TheAlgorithms/Python ๐Ÿ” Review [Code Review] Reviewed PR #14665 โ€” identified type hint bug in predict(), missing edge-case doctests for empty string input, validation order issue in fit(), and missing demo output in __main__ ๐ŸŸก Reviewed

๐Ÿ›  Tech Stack

๐Ÿ’ป Languages

Python C C++ Java HTML SQL BigQuery

๐Ÿ“Š Data & Visualization Tools

Tableau Excel

๐Ÿค– GenAI / LLM Stack

LangChain LangGraph Gemini RAG HuggingFace NVIDIA NIM

๐Ÿง  ML / NLP

PyTorch scikit-learn spaCy OpenCV

โ˜๏ธ MLOps & Tools

AWS Docker MLflow Streamlit FastAPI


๐Ÿ’ผ Experience

๐Ÿค– Artificial Intelligence Intern โ€” Codec Technologies (Current)

Apr 2026 โ€“ Jun 2026 | Hybrid / India | National Internship Portal

  • ๐ŸŒ Selected through the National Internship Portal (NCS) at Codec Technologies โ€” a global IT & consultancy platform operating in 27+ countries
  • ๐Ÿค– Working on applied AI/ML project assignments under Codec Technologies' Google for Education partner platform framework
  • ๐Ÿง  Applying GenAI and deep learning skills to real-world business challenges
  • ๐Ÿš€ Building domain expertise in AI systems, production deployment, and cross-cultural AI problem solving

๐Ÿง  Machine Learning Intern โ€” Tech Mahindra Makers Lab

Aug 2025 โ€“ Feb 2026 | Project: INDUS (India's Own LLM) | Employee ID: C129669

  • ๐ŸŒ Built multilingual NLP pipelines handling 10,000+ dialect variations
  • ๐ŸŽ™๏ธ Developed ASR (Automatic Speech Recognition) systems with WER evaluation
  • ๐Ÿท๏ธ Designed automated annotation pipelines using NVIDIA NIM
  • ๐Ÿ”„ Built real-time translation systems for low-resource Indian languages
  • ๐Ÿ“ฆ Built LLM-based Synthetic Data Generation pipelines achieving 10ร— dataset expansion

๐Ÿš€ Featured Projects

Project Description Stack Link
Netflix Content Dashboard Interactive viz of 8,807 Netflix titles โ€” Movies vs Shows, Country Map, Genres & Ratings Tableau View
LLM Q&A Generation System Auto-generates high-quality QA pairs from documents LangChain, Gemini, RAG
Intelligent RAG Pipeline Multi-source document retrieval with semantic reranking FAISS, LangGraph, HuggingFace
Vision-Language Model OCR + LLM fusion for document intelligence VLM, Tesseract, GPT-4V
Forecasting & ML Projects Time-series forecasting with ensemble methods scikit-learn, MLflow, Streamlit

๐ŸŽฏ 2026 Goals

  • Contribute to top open-source AI projects (LangChain, HuggingFace, etc.)
  • Build and deploy scalable GenAI systems end-to-end
  • Strengthen MLOps, system design & distributed training
  • Land a high-impact AI/ML engineering role ๐Ÿš€

๐Ÿ My Contributions, Eaten Alive

Snake animation

โญ Building AI systems that actually work in the real world

Pinned Loading

  1. -Business-Case-Target-SQL -Business-Case-Target-SQL Public

    Analyzed 100k+ Target Brazil e-commerce orders (2016โ€“2018) using customer, order, payment, product & review data. Explored sales trends, delivery times, freight costs, payment types & seasonality. โ€ฆ

    2

  2. customer-service-chatbot-nlp customer-service-chatbot-nlp Public

    NLP-based Customer Service Chatbot using TF-IDF + Logistic Regression | 27 intent categories | 26K+ real queries | Codec Technologies AI Internship

    Jupyter Notebook 2

  3. twitter-sentiment-analysis twitter-sentiment-analysis Public

    Jupyter Notebook 2

  4. -AdEase-Time-Series -AdEase-Time-Series Public

    AdEase is an AI-powered ad infrastructure offering design, dispense, and decipher modules for efficient digital advertising. This project analyzes 145k Wikipedia pages over 550 days, forecasting peโ€ฆ

    1

  5. -Business-Case-Netflix---Data-Exploration-and-Visualisation -Business-Case-Netflix---Data-Exploration-and-Visualisation Public

    Analyzed Netflix dataset of 10k+ shows & movies to uncover trends by country, type, genre, cast & release year. Explored TV vs. movies, yearly growth, top directors/actors & content distribution. Gโ€ฆ

    1

  6. -indian-language-apis -indian-language-apis Public

    ๐Ÿ‡ฎ๐Ÿ‡ณ A curated list of public APIs for Indian language processing โ€” Translation, ASR, TTS, NLP, OCR and more.