This project demonstrates the design and implementation of an end-to-end Data Warehouse using Microsoft SQL Server.
The solution integrates CRM and ERP data from CSV files, applies data cleansing and transformation rules, and delivers business-ready datasets through a Medallion Architecture consisting of Bronze, Silver, and Gold layers.
The project follows modern Data Engineering and Data Warehousing best practices, including:
- ETL Development
- Data Quality Management
- Data Cleansing & Standardization
- Dimensional Modeling
- Star Schema Design
- Surrogate Key Generation
- Business Rule Implementation
- Error Handling & Monitoring
The warehouse integrates data from two business systems.
Contains:
- Customer Information
- Product Information
- Sales Transactions
crm_cust_info
crm_prd_info
crm_sales_details
Contains:
- Customer Master Data
- Product Categories
- Location Information
erp_cust_az12
erp_loc_a101
erp_px_cat_g1v2
Stores raw source data exactly as received from source systems.
- Raw Data Storage
- No Transformations
- Full Load Processing
- Truncate & Insert Strategy
- CSV File Ingestion
bronze.crm_cust_info
bronze.crm_prd_info
bronze.crm_sales_details
bronze.erp_cust_az12
bronze.erp_loc_a101
bronze.erp_px_cat_g1v2
EXEC bronze.load_bronze;- BULK INSERT
- Batch Processing
- ETL Logging
- Error Handling
Transforms raw data into clean and standardized datasets.
silver.crm_cust_info
silver.crm_prd_info
silver.crm_sales_details
silver.erp_cust_az12
silver.erp_loc_a101
silver.erp_px_cat_g1v2
EXEC silver.load_silver;Keeps the latest customer record.
ROW_NUMBER() OVER (
PARTITION BY cst_id
ORDER BY cst_create_date DESC
)| Source | Standardized |
|---|---|
| M | Male |
| F | Female |
| Source | Standardized |
|---|---|
| M | Married |
| S | Single |
| Source | Standardized |
|---|---|
| US | United States |
| USA | United States |
| DE | Germany |
WHERE cst_id IS NOT NULLWHEN bdate > GETDATE()
THEN NULLWHEN sls_order_dt = 0
THEN NULLBusiness Rule:
Sales = Quantity * PriceIncorrect sales values are automatically recalculated.
Provides business-ready datasets optimized for analytics and reporting.
Implemented as SQL Views.
Stores transactional sales information.
| Column |
|---|
| order_number |
| product_key |
| customer_key |
| order_date |
| shipping_date |
| due_date |
| sales_amount |
| quantity |
| price |
| Column |
|---|
| customer_key |
| customer_id |
| customer_number |
| first_name |
| last_name |
| gender |
| marital_status |
| birthdate |
| country |
| Column |
|---|
| product_key |
| product_id |
| product_number |
| product_name |
| category |
| subcategory |
| maintenance |
| cost |
| product_line |
| start_date |
Generated using:
ROW_NUMBER() OVER(...)Examples:
customer_key
product_keyBenefits:
- Improved Query Performance
- Stable Relationships
- Data Warehouse Best Practice
Product history tracking is implemented using:
LEAD(prd_start_dt)End dates are automatically calculated.
Only active products are loaded into Gold.
WHERE prd_end_dt IS NULLEXEC bronze.load_bronze;- Load CSV Files
- Truncate Existing Data
- Store Raw Data
EXEC silver.load_silver;- Data Cleansing
- Data Standardization
- Data Validation
- Data Enrichment
SELECT *
FROM gold.fact_sales;SUM(sales_amount)COUNT(order_number)SUM(quantity)AVG(sales_amount)Implemented using:
BEGIN TRY
-- ETL Logic
END TRY
BEGIN CATCH
-- Error Logging
END CATCHBenefits:
- Robust ETL Process
- Easier Debugging
- Better Reliability
Execution times are tracked using:
DATEDIFF(
SECOND,
@start_time,
@end_time
)Tracks:
- Table Load Duration
- Batch Duration
- ETL Performance
The Data Warehouse supports:
- Revenue Analysis
- Sales Trends
- Order Analysis
- Customer Segmentation
- Demographic Analysis
- Country Analysis
- Product Performance
- Category Analysis
- Cost Analysis
The Gold Layer is designed for:
- Power BI Dashboards
- Ad-Hoc SQL Queries
- Executive Reporting
- KPI Monitoring
Example Dashboards:
- Sales Dashboard
- Customer Dashboard
- Product Dashboard
- Executive Dashboard
Potential improvements include:
- Incremental Loading
- Change Data Capture (CDC)
- Time Dimension
- Geography Dimension
- SQL Server Agent Scheduling
- Azure Data Factory Integration
- Machine Learning Pipelines
- Microsoft SQL Server
- T-SQL
- Stored Procedures
- BULK INSERT
- Star Schema
- Dimensional Modeling
- Medallion Architecture
- Medallion Architecture
- Star Schema
- Fact & Dimension Modeling
- Surrogate Keys
- ETL Development
- Data Integration
- Data Transformation
- Data Quality Management
- Stored Procedures
- Window Functions
- Views
- Error Handling
✅ Built a complete SQL Server Data Warehouse
✅ Integrated CRM and ERP systems
✅ Implemented Bronze, Silver, and Gold layers
✅ Developed automated ETL pipelines
✅ Applied data quality and validation rules
✅ Created a Star Schema dimensional model
✅ Enabled BI reporting and analytics
✅ Prepared data for future Machine Learning workloads
Purva Kalamabte
B.Tech – Electronics & Communication (ENC) Engineering
- 🎓 Bachelor of Technology in Electronics & Communication Engineering
- 🏆 Qualified GATE 2025 with AIR 8067
- 📄 Published 2 Research Papers in Machine Learning (CNN-based Models)
- 💡 Interested in Data Engineering, Machine Learning, and Data Analytics
- 🛠️ Skilled in SQL Server, T-SQL, Data Warehousing, ETL Development, Python, PowerBI, Excel


