Skip to content

evgeniimatveev/business-sql-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Business SQL Analytics — PostgreSQL · Python · Tableau · Excel

SQL Python Tableau Excel CI/CD License Status


What This Project Does

End-to-end business analytics pipeline built on a simulated retail dataset: customers, orders, products, and transactions across 4 normalized tables.

26+ SQL queries organized from data validation → aggregations → transaction analysis → multi-table joins, with Python automation for data generation and Tableau dashboards for stakeholder reporting.

Pipeline: PostgreSQL → SQL Analytics → Python Export → Tableau + Excel

Dataset: 2,314 customers · 5,000 transactions · $2.58M revenue · 14 months


Business Questions & Findings

Question Finding
Who are the top 10 customers by revenue? Top customer: $5,133 total spend · avg customer $1,115
Which payment methods dominate? Across 5,000 transactions, avg ticket $515.85
What is the monthly revenue trend? Peak month: $240,920 (446 orders) · low: $79,918
Which product categories drive most sales? $2,579,272 total revenue across all categories
Returning vs one-time customers? 59.8% returning (1,384 of 2,314) vs 40.2% one-time

SQL — Advanced Patterns

Window Function: Customer Revenue Ranking

SELECT
    c.customer_name,
    SUM(t.amount)                                               AS total_spent,
    RANK() OVER (ORDER BY SUM(t.amount) DESC)                  AS revenue_rank,
    ROUND(SUM(t.amount) / SUM(SUM(t.amount)) OVER () * 100, 1) AS pct_of_total
FROM customers c
JOIN transactions t ON c.customer_id = t.customer_id
GROUP BY c.customer_name
ORDER BY revenue_rank;

CTE: Monthly Revenue with Month-over-Month Change

WITH monthly_revenue AS (
    SELECT
        DATE_TRUNC('month', transaction_date) AS month,
        SUM(amount)                           AS revenue,
        COUNT(DISTINCT customer_id)           AS active_customers
    FROM transactions
    GROUP BY 1
)
SELECT
    month,
    revenue,
    active_customers,
    ROUND(revenue - LAG(revenue) OVER (ORDER BY month), 0) AS mom_change
FROM monthly_revenue
ORDER BY month;

Multi-Table JOIN: Full Customer Order Profile

SELECT
    c.customer_name,
    COUNT(DISTINCT o.order_id)  AS total_orders,
    SUM(t.amount)               AS total_spent,
    ROUND(AVG(t.amount), 2)    AS avg_transaction,
    MAX(t.transaction_date)     AS last_purchase
FROM customers c
JOIN orders o       ON c.customer_id = o.customer_id
JOIN transactions t ON o.order_id    = t.order_id
GROUP BY c.customer_name
ORDER BY total_spent DESC
LIMIT 10;

Project Structure

mlops_sql_project/
├── sql/
│   ├── ddl/                    # schema, tables, constraints
│   ├── dml/                    # data inserts & resets
│   └── dql/
│       ├── a_checks/           # data validation (nulls, FK, indexes)
│       ├── b_aggregations/     # counts, min/max, category stats
│       ├── c_transactions/     # order & payment analysis
│       └── d_joins/            # multi-table joins, top customers
├── python/
│   ├── generate_customers.py
│   ├── generate_orders.py
│   ├── generate_products.py
│   ├── generate_transactions.py
│   ├── sales_summary.py
│   └── customer_behavior.py
├── Tableau/
│   ├── sales_dashboard.twb
│   ├── transaction_analysis.twb
│   └── customer_insights.twb
├── excel/
│   ├── sales_summary.xlsx
│   ├── transactions_overview.xlsx
│   └── customer_behavior.xlsx
└── env/                        # DB config, logging, settings

Architecture

PostgreSQL (4 tables: customers, orders, products, transactions)
    └── SQL layer: DDL → DML → DQL (26+ queries)
            └── Python automation (data generation + export)
                    ├── Tableau dashboards (sales, transactions, customers)
                    └── Excel reports (summary exports)

How to Run

# 1. Clone and set up environment
git clone https://github.com/evgeniimatveev/business-sql-analytics.git
cd business-sql-analytics
conda env create -f environment.yaml
conda activate mlops_env

# 2. Configure DB credentials
cp env/.env.example env/.env

# 3. Generate synthetic data
python python/generate_customers.py
python python/generate_orders.py
python python/generate_products.py
python python/generate_transactions.py

# 4. Run SQL analytics
# Open sql/dql/ queries in DBeaver or psql

Stack

Layer Technology
Database PostgreSQL
Analytics SQL (CTEs, Window Functions, Multi-table JOINs)
Automation Python (Faker, pandas, psycopg2)
Visualization Tableau
Reporting Excel
CI/CD GitHub Actions

Connect

About

Retail analytics pipeline: 2,314 customers · 5,000 transactions · $2.58M revenue. PostgreSQL + SQL (CTEs, window functions) + Python + Tableau + Excel.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages