Skip to content
View prudvikomerelli's full-sized avatar

Block or report prudvikomerelli

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
prudvikomerelli/README.md

πŸš€ Prudvi S R Komerelli

Lead Data Engineer | AI Data Platform Modernization | LLM-Assisted ETL & BI Migration | Azure | Power BI | IDMC | OAS

I’m a data engineering and data platform modernization professional with 13+ years of experience building scalable cloud data platforms, lakehouse architectures, ETL/ELT pipelines, BI modernization programs, and analytics systems across Azure, Databricks, Snowflake, Oracle, Power BI, and AWS.

My recent focus is on modernizing legacy enterprise data ecosystems β€” including Informatica, IDMC, OBIEE, OAS, SSIS, Azure Data Factory, and Power BI β€” using practical LLM-assisted workflows for migration analysis, SQL refactoring, mapping documentation, test case generation, validation, and reporting modernization.

I enjoy building systems that turn complex legacy data environments into scalable, governed, AI-ready platforms that reduce manual effort, improve trust, and accelerate business decision-making.


πŸ”§ Core Expertise

  • AI Data Platform Modernization: LLM-assisted ETL migration, legacy-to-cloud modernization, AI-assisted validation, migration documentation
  • Cloud Data Platforms: Azure Data Factory, Azure Databricks, Azure Synapse, Snowflake, AWS, IDMC
  • Data Engineering: Python, SQL, PySpark, Spark SQL, ETL/ELT, batch pipelines, orchestration
  • BI & Analytics Modernization: Power BI, Oracle Analytics Server, OBIEE, BI Publisher, semantic models
  • Lakehouse & Warehouse Architecture: Delta Lake, medallion architecture, dimensional modeling, curated data products
  • Governance & Reliability: data quality, source-to-target validation, monitoring, alerting, lineage, compliance-ready delivery

πŸ“Œ Featured Projects

πŸ”Ή ETL Modernization Platform β€” AI-Assisted Migration SaaS

Production-style SaaS platform that converts legacy ETL pipeline definitions into cloud-native workflow artifacts.

Built to solve a real enterprise modernization challenge: understanding and converting legacy ETL logic buried inside Informatica-style XML before migrating to platforms like Azure Data Factory, Airflow, Databricks Workflows, Dagster, Prefect, and AWS Glue.

Key capabilities:

  • Parses legacy ETL XML and metadata from Informatica, SSIS, Talend, DataStage, and Ab Initio
  • Converts parsed metadata into a platform-neutral canonical JSON model
  • Generates first-draft target artifacts for ADF, Airflow, Databricks, Dagster, Prefect, and AWS Glue
  • Produces validation reports, gap analysis, unsupported logic detection, and remediation suggestions
  • Tracks conversion runs, pipeline steps, errors, duration, and project history
  • Uses LLM-assisted workflows for documentation, artifact generation, and migration analysis

Modeled business impact:

  • Designed for enterprise-scale programs with 500+ workflows, 1,000+ mappings, and 2,000+ transformations
  • Modeled 40-60% reduction in manual migration analysis and rewrite planning
  • Modeled $450K+ cost avoidance through reduced consulting effort, faster assessment, and less rework
  • Estimated AI token cost around <$1-$3 per workflow conversion, making the approach practical at scale

Tech Stack: Next.js, TypeScript, PostgreSQL, Prisma, Supabase, Stripe, OpenAI, Tailwind CSS


πŸ”Ή ResumeAI β€” AI-Powered SaaS Platform

End-to-end AI SaaS platform that generates ATS-optimized resumes and cover letters.

Built with: Next.js, Supabase, Prisma, Stripe, and LLM workflows

Focus areas:

  • ATS keyword matching
  • Resume and cover letter generation
  • AI-assisted content optimization
  • SaaS product architecture
  • Authentication, billing, and user workflow design

πŸ”Ή OpenWeather ETL Platform

Production-style ETL pipeline using Airflow, Python, PostgreSQL, and Docker.

Features:

  • API ingestion from OpenWeather
  • Airflow TaskFlow API orchestration
  • Dynamic task mapping
  • Idempotent upserts
  • Layered raw, staging, and curated tables
  • Dockerized local development environment

Focus: Data ingestion, orchestration, transformation, and analytics-ready modeling


πŸ”Ή NYC Taxi Streaming Data Platform

Real-time analytics pipeline using Kafka, Spark, Airflow, PostgreSQL, and Superset.

Focus areas:

  • Streaming ingestion
  • Distributed processing
  • Near-real-time analytics
  • End-to-end data pipeline design
  • Operational dashboards

🧠 What I Focus On

I like building data systems that:

  • Modernize legacy platforms without losing business logic
  • Reduce manual migration and reporting effort
  • Improve data trust through validation and reconciliation
  • Scale reliably across batch, streaming, and BI workloads
  • Support business decisions with clean, governed, analytics-ready data
  • Apply AI and LLMs where they create real engineering leverage

πŸ“Š Recent Impact

  • Reduced ETL pipeline maintenance and support overhead by 25% through SSIS-to-ADF modernization
  • Improved platform uptime to 99.9% through cloud data platform modernization
  • Reduced infrastructure spend by 43% through Azure platform optimization
  • Improved time-to-insight by 60% through curated lakehouse and warehouse data products
  • Supported 40+ Power BI reports across 10+ business functions
  • Modeled $450K+ cost avoidance with AI-assisted ETL modernization patterns

πŸ› οΈ Tools & Technologies

Languages: Python, SQL, PySpark, TypeScript
Cloud/Data: Azure Data Factory, Azure Databricks, Azure Synapse, Snowflake, AWS, IDMC
BI/Analytics: Power BI, Oracle Analytics Server, OBIEE, BI Publisher, Tableau
AI/LLM: OpenAI, LLM-assisted migration workflows, AI-assisted validation, Microsoft Copilot
Engineering: Airflow, Kafka, Spark, PostgreSQL, Prisma, Supabase, Docker, GitHub Actions, Terraform
Architecture: Lakehouse, Data Warehouse, Semantic Models, ETL/ELT, Data Quality, Governance


πŸ“« Connect With Me

Pinned Loading

  1. ResumeAI ResumeAI Public

    AI-powered SaaS platform that generates ATS-optimized resumes and cover letters using LLMs, with scoring, keyword analysis, and Stripe-based billing.

    TypeScript

  2. openweather-airflow-postgres openweather-airflow-postgres Public

    End-to-end ETL pipeline using Apache Airflow to ingest OpenWeather API data, transform it with Python, and load analytics-ready tables into PostgreSQL with Dockerized local setup.

    Python

  3. nyc-taxi-data-pipeline nyc-taxi-data-pipeline Public

    Real-time data pipeline using Kafka, Spark, and Airflow to process NYC taxi data and deliver analytics via PostgreSQL and Superset.

    Python

  4. etl-modernization-platform etl-modernization-platform Public

    AI-assisted SaaS for converting legacy ETL XML into cloud-native workflow artifacts for ADF, Airflow, Databricks, Dagster, Prefect, and AWS Glue.

    TypeScript