Skip to content

purusottambuilds-lab/data-engineering-projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 

Repository files navigation

Data Engineering Projects - purusottambuilds-lab

Data Engineering projects by Purusottam Swain

Azure Data Engineering portfolio showcasing end-to-end pipeline development using Microsoft Azure services.


Portfolio Projects

Project Tech Stack Description Status
001-azure-insurance-claims-pipeline ADF, Databricks, PySpark, Delta Lake, Azure SQL PIngests raw insurance claims CSV, runs a data quality framework with scoring gate, applies multi-factor PySpark risk scoring, stores versioned Delta Lake output, loads to Azure SQL via JDBC with Gmail alerting ✅ Complete
002-azure-weather-lakehouse ADF, Open-Meteo API, Databricks, PySpark, Delta Lake, Azure SQL Ingests live hourly weather data for 4 Indian cities via REST API using ADF ForEach, implements Medallion Architecture (Bronze/Silver/Gold), detects anomalies, applies rolling window analytics and severity scoring ✅ Complete
003-stock-market-analytics-pipeline ADF, Alpha Vantage API, Databricks, PySpark, Delta Lake, Azure SQL Ingests daily stock price data via free API, implements SCD Type 2 history tracking using Delta MERGE, demonstrates Delta time travel, incremental loading pattern and broadcast variables 🔄 In Progress
004-retail-sales-data-warehouse ADF, REST Countries API, Kaggle Olist Dataset, Databricks, PySpark, Delta Lake, Azure SQL Multi-source ingestion from CSV and REST API, builds star schema data warehouse with dim and fact tables, parameterized ADF pipelines, bucketing and partitioning strategies ⏳ Planned
005-github-activity-analytics ADF, GitHub API, Databricks, PySpark, Delta Lake Ingests GitHub public events via REST API using ADF ForEach, processes developer activity metrics, builds productivity scoring with dynamic parameterized pipelines and Z-ordering ⏳ Planned

🛠️ Tech Stack

Azure Databricks PySpark Python SQL


Repository Structure

purusottambuilds-lab/
├── data-engineering-projects/     ← this repo (portfolio hub)
├── 001-azure-insurance-claims-pipeline/
├── 002-azure-weather-lakehouse/
├── 003-stock-market-analytics-pipeline/
├── 004-retail-sales-data-warehouse/
└── 005-github-activity-analytics/

📬 Contact

Purusottam Swain (purusottam.builds@gmail.com)


About

Azure Data Engineering portfolio hub - end-to-end pipelines using ADF, Databricks, PySpark, Delta Lake and Medallion Architecture on Microsoft Azure.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors