Skip to content

purva0231/Data-Warehouse-Development-Using-SQL-Server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ SQL Data Warehouse Project

πŸ“– Overview

This project demonstrates the design and implementation of an end-to-end Data Warehouse using Microsoft SQL Server.

The solution integrates CRM and ERP data from CSV files, applies data cleansing and transformation rules, and delivers business-ready datasets through a Medallion Architecture consisting of Bronze, Silver, and Gold layers.

The project follows modern Data Engineering and Data Warehousing best practices, including:

  • ETL Development
  • Data Quality Management
  • Data Cleansing & Standardization
  • Dimensional Modeling
  • Star Schema Design
  • Surrogate Key Generation
  • Business Rule Implementation
  • Error Handling & Monitoring

πŸ—οΈ Architecture

Medallion Architecture

Architecture Diagram


πŸ“‚ Data Sources

data_integration The warehouse integrates data from two business systems.

CRM System

Contains:

  • Customer Information
  • Product Information
  • Sales Transactions

Tables

crm_cust_info
crm_prd_info
crm_sales_details

ERP System

Contains:

  • Customer Master Data
  • Product Categories
  • Location Information

Tables

erp_cust_az12
erp_loc_a101
erp_px_cat_g1v2

πŸ”„ Data Flow Diagram

data_flow

πŸ₯‰ Bronze Layer

Purpose

Stores raw source data exactly as received from source systems.

Characteristics

  • Raw Data Storage
  • No Transformations
  • Full Load Processing
  • Truncate & Insert Strategy
  • CSV File Ingestion

Tables

bronze.crm_cust_info
bronze.crm_prd_info
bronze.crm_sales_details
bronze.erp_cust_az12
bronze.erp_loc_a101
bronze.erp_px_cat_g1v2

Loading Method

EXEC bronze.load_bronze;

Features

  • BULK INSERT
  • Batch Processing
  • ETL Logging
  • Error Handling

πŸ₯ˆ Silver Layer

Purpose

Transforms raw data into clean and standardized datasets.

Tables

silver.crm_cust_info
silver.crm_prd_info
silver.crm_sales_details
silver.erp_cust_az12
silver.erp_loc_a101
silver.erp_px_cat_g1v2

Loading Method

EXEC silver.load_silver;

Data Cleansing

Duplicate Removal

Keeps the latest customer record.

ROW_NUMBER() OVER (
PARTITION BY cst_id
ORDER BY cst_create_date DESC
)

Gender Standardization

Source Standardized
M Male
F Female

Marital Status Standardization

Source Standardized
M Married
S Single

Country Standardization

Source Standardized
US United States
USA United States
DE Germany

Data Quality Checks

Customer Validation

WHERE cst_id IS NOT NULL

Birthdate Validation

WHEN bdate > GETDATE()
THEN NULL

Date Validation

WHEN sls_order_dt = 0
THEN NULL

Sales Validation

Business Rule:

Sales = Quantity * Price

Incorrect sales values are automatically recalculated.


πŸ₯‡ Gold Layer

Purpose

Provides business-ready datasets optimized for analytics and reporting.

Implemented as SQL Views.


⭐ Star Schema

data_model

Fact Table

fact_sales

Stores transactional sales information.

Column
order_number
product_key
customer_key
order_date
shipping_date
due_date
sales_amount
quantity
price

Dimension Table

dim_customers

Column
customer_key
customer_id
customer_number
first_name
last_name
gender
marital_status
birthdate
country

Dimension Table

dim_products

Column
product_key
product_id
product_number
product_name
category
subcategory
maintenance
cost
product_line
start_date

Important Points

πŸ”‘ Surrogate Keys

Generated using:

ROW_NUMBER() OVER(...)

Examples:

customer_key
product_key

Benefits:

  • Improved Query Performance
  • Stable Relationships
  • Data Warehouse Best Practice

πŸ”„ Slowly Changing Dimension Logic

Product history tracking is implemented using:

LEAD(prd_start_dt)

End dates are automatically calculated.

Only active products are loaded into Gold.

WHERE prd_end_dt IS NULL

βš™οΈ ETL Pipeline

Step 1: Load Bronze Layer

EXEC bronze.load_bronze;

Responsibilities

  • Load CSV Files
  • Truncate Existing Data
  • Store Raw Data

Step 2: Load Silver Layer

EXEC silver.load_silver;

Responsibilities

  • Data Cleansing
  • Data Standardization
  • Data Validation
  • Data Enrichment

Step 3: Query Gold Layer

SELECT *
FROM gold.fact_sales;

πŸ“Š Business Calculations

1. Revenue

SUM(sales_amount)

2. Total Orders

COUNT(order_number)

3. Quantity Sold

SUM(quantity)

4. Average Order Value

AVG(sales_amount)

πŸ›‘οΈ Error Handling

Implemented using:

BEGIN TRY

-- ETL Logic

END TRY

BEGIN CATCH

-- Error Logging

END CATCH

Benefits:

  • Robust ETL Process
  • Easier Debugging
  • Better Reliability

⏱️ ETL Monitoring

Execution times are tracked using:

DATEDIFF(
SECOND,
@start_time,
@end_time
)

Tracks:

  • Table Load Duration
  • Batch Duration
  • ETL Performance

πŸ“ˆ Analytics Capabilities

The Data Warehouse supports:

1. Sales Analytics

  • Revenue Analysis
  • Sales Trends
  • Order Analysis

2. Customer Analytics

  • Customer Segmentation
  • Demographic Analysis
  • Country Analysis

3. Product Analytics

  • Product Performance
  • Category Analysis
  • Cost Analysis

πŸ“Š Reporting & BI

The Gold Layer is designed for:

  • Power BI Dashboards
  • Ad-Hoc SQL Queries
  • Executive Reporting
  • KPI Monitoring

Example Dashboards:

  • Sales Dashboard
  • Customer Dashboard
  • Product Dashboard
  • Executive Dashboard

πŸš€ Future Enhancements

Potential improvements include:

  • Incremental Loading
  • Change Data Capture (CDC)
  • Time Dimension
  • Geography Dimension
  • SQL Server Agent Scheduling
  • Azure Data Factory Integration
  • Machine Learning Pipelines

πŸ› οΈ Technologies Used

  • Microsoft SQL Server
  • T-SQL
  • Stored Procedures
  • BULK INSERT
  • Star Schema
  • Dimensional Modeling
  • Medallion Architecture

πŸ“š Skills Demonstrated

Data Warehousing

  • Medallion Architecture
  • Star Schema
  • Fact & Dimension Modeling
  • Surrogate Keys

Data Engineering

  • ETL Development
  • Data Integration
  • Data Transformation
  • Data Quality Management

SQL Development

  • Stored Procedures
  • Window Functions
  • Views
  • Error Handling

🎯 Project Outcomes

βœ… Built a complete SQL Server Data Warehouse

βœ… Integrated CRM and ERP systems

βœ… Implemented Bronze, Silver, and Gold layers

βœ… Developed automated ETL pipelines

βœ… Applied data quality and validation rules

βœ… Created a Star Schema dimensional model

βœ… Enabled BI reporting and analytics

βœ… Prepared data for future Machine Learning workloads


πŸ‘¨β€πŸ’» Author

Purva Kalamabte
B.Tech – Electronics & Communication (ENC) Engineering

  • πŸŽ“ Bachelor of Technology in Electronics & Communication Engineering
  • πŸ† Qualified GATE 2025 with AIR 8067
  • πŸ“„ Published 2 Research Papers in Machine Learning (CNN-based Models)
  • πŸ’‘ Interested in Data Engineering, Machine Learning, and Data Analytics
  • πŸ› οΈ Skilled in SQL Server, T-SQL, Data Warehousing, ETL Development, Python, PowerBI, Excel

About

I designed and implemented a modern Data Warehouse solution using the Medallion Architecture (Bronze, Silver, and Gold layers) to integrate and transform data from multiple business systems. The primary objective of the project was to create a centralized and high-performance analytical platform for reporting and business intelligence.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages