Skip to content
View dileepkreddy5's full-sized avatar

Block or report dileepkreddy5

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
dileepkreddy5/README.md

Hi, I'm Dileep 👋

AI Engineer building secure, production-grade ML and GenAI systems on AWS.

My work sits at the intersection of machine learning, distributed systems, and cloud architecture. I’m particularly interested in building AI systems that are not just intelligent — but observable, scalable, and safe to run in real-world environments.


What I Focus On

  • LLM infrastructure & AI security
  • Retrieval-Augmented Generation (RAG) systems
  • Real-time ML inference platforms
  • Quantitative ML systems with governance controls
  • Cloud-native AI architecture (AWS + Terraform)

I care about performance benchmarking, model reliability, and infrastructure reproducibility just as much as model accuracy.


Flagship Projects

🔐 Secure LLM Gateway

A production-style middleware layer in front of large language models.

It handles:

  • Prompt injection detection (embedding-based classifier with async micro-batching)
  • Role-based model access control (RBAC)
  • PII detection and response redaction
  • Token usage tracking and cost estimation
  • Prometheus instrumentation
  • Concurrency benchmarking (~475 req/sec tested)

The goal was to design a safe, governed LLM deployment layer — something closer to enterprise reality than a demo app.

Repository:
https://github.com/dileepkreddy5/secure-llm-gateway


🧩 Enterprise Hybrid Graph-RAG

A hybrid retrieval architecture combining:

  • Multi-hop graph reasoning
  • Vector similarity search
  • Weighted ranking strategy
  • Retrieval evaluation metrics (Precision@K, Recall@K, MRR)
  • Latency instrumentation
  • Claude integration (Bedrock-ready design)

This project focuses on explainability and measurable retrieval quality — not just “LLM answers.”

Repository:
https://github.com/dileepkreddy5/graph-rag-enterprise


⚡ Real-Time ML Feature Store

A production-style inference system designed around training-serving parity.

  • Online/offline feature consistency
  • Redis for online serving, Parquet for offline training
  • Streaming ingestion (Redpanda)
  • Multi-worker FastAPI inference
  • ~7 ms p50 latency, ~2.6K req/sec sustained load

This project explores low-latency ML deployment and real-time system performance.

Repository:
https://github.com/dileepkreddy5/real-time-ml-feature-store


📈 QuantEdge — Quantitative AI Platform

An institutional-style quantitative analysis system deployed end-to-end on AWS.

Core components include:

  • 8 ML/statistical models (LSTM, XGBoost, LightGBM, HMM, GJR-GARCH, Kalman, Ensemble)
  • Triple-barrier labeling (Lopez de Prado)
  • Regime detection (5-state HMM)
  • Independent risk engine (CVaR, shrinkage covariance, volatility targeting)
  • Portfolio construction via HRP
  • Governance layer with Deflated Sharpe Ratio (DSR)
  • Full infrastructure-as-code using Terraform
  • ECS Fargate backend, Redis caching, Cognito + MFA, WAF hardening

The emphasis here is systems rigor — model governance, risk isolation, and reproducible cloud deployment.

Live system:
https://quant.dileepkapu.com


Tech Stack

Languages
Python · SQL · TypeScript

AI / ML
PyTorch · Scikit-learn · XGBoost · LightGBM
LangChain · Vector Databases · RAG Architectures
Feature Stores · Model Serving · Retrieval Evaluation

Cloud & Infrastructure
AWS (ECS, S3, RDS, Redis, Cognito, WAF, CloudFront, Bedrock)
Terraform · Docker · GitHub Actions

Observability & Performance
Prometheus · Latency Instrumentation
Concurrency Benchmarking · Micro-batching Optimization


Certifications


Current Interests

  • Secure LLM architectures
  • AI governance & model reliability
  • Retrieval quality benchmarking
  • Cost-aware GenAI systems
  • Real-time ML infrastructure design

Connect

LinkedIn: https://www.linkedin.com/in/kapu-dileep-kumar-reddy-1084301a9/

Portfolio: https://dileepkapu.com

Email: dileepkreddy5@gmail.com

Pinned Loading

  1. graph-rag-enterprise graph-rag-enterprise Public

    Enterprise-grade Hybrid Graph-RAG engine with multi-hop reasoning, weighted retrieval ranking, evaluation metrics, and latency instrumentation.

    Python 1

  2. real-time-ml-feature-store real-time-ml-feature-store Public

    Production-style real-time ML feature store with low-latency inference

    Python 1

  3. secure-llm-gateway secure-llm-gateway Public

    Production-grade AI security middleware with async micro-batching, prompt injection detection, RBAC enforcement, PII redaction, cost tracking, and concurrency benchmarking.

    Python 1