This project focuses on analyzing an Amazon-like order transportation and logistics system using SQL. The objective is to evaluate order fulfillment performance, delivery efficiency, carrier reliability, shipping methods, and regional behavior through structured relational data.
The project is designed around operational analytics, not sales analytics, and reflects real-world logistics scenarios such as delayed deliveries, order cancellations, and carrier performance variation.
The database consists of four relational tables:
Stores customer details and region information.
Acts as the central table that tracks the order lifecycle, shipping method, and order status.
Contains shipment-related details including carrier assignment, delivery date, and shipping cost.
Stores logistics carrier information along with their operational regions.
One customer can place multiple orders
Each order has one transportation record
Each transportation record is handled by one carrier
This relational structure enables end-to-end tracking from order placement to delivery execution.
-
Built and Analyzed a Dataset : Created and managed a relational database with over 200 rows of data across four tables (
Orders,Transportation,Carriers, andCustomers), using SQL to simulate Amazon's order transportation system. -
Delivered Key Insights Across 10+ Metrics: Executed 15+ complex SQL queries to evaluate delivery timeliness, regional performance, carrier reliability, and shipping method efficiency, identifying a 15% delay rate and high cancellation rates in specific regions.
-
Enhanced Logistics Optimization: Pinpointed performance drops and bottlenecks in the transportation process, recommending improvements based on quantitative insights that could potentially reduce delays by up to 20%.
SELECT
Status,
COUNT(*) AS TotalOrders
FROM
Orders
GROUP BY
Status;- Purpose: Understand the distribution of orders by status (e.g., Delivered, Processing, Cancelled).
SELECT
t.CarrierID,
c.CarrierName,
AVG(DATEDIFF(t.DeliveryDate, o.OrderDate)) AS AvgDeliveryTime
FROM
Transportation t
JOIN
Orders o ON t.OrderID = o.OrderID
JOIN
Carriers c ON t.CarrierID = c.CarrierID
WHERE
t.DeliveryDate IS NOT NULL
GROUP BY
t.CarrierID, c.CarrierName;- Purpose: Measure the performance of carriers based on average delivery times.
SELECT
o.OrderID,
o.OrderDate,
t.DeliveryDate,
DATEDIFF(t.DeliveryDate, o.OrderDate) AS DeliveryTime
FROM
Orders o
JOIN
Transportation t ON o.OrderID = t.OrderID
WHERE
t.DeliveryDate IS NOT NULL AND DATEDIFF(t.DeliveryDate, o.OrderDate) > 5;- Purpose: Identify orders delivered after 5 days, indicating potential delays.
SELECT
c.Region,
COUNT(*) AS TotalOrders
FROM
Customers c
JOIN
Orders o ON c.CustomerID = o.CustomerID
GROUP BY
c.Region
ORDER BY
TotalOrders DESC;- Purpose: Track the number of orders in each region and identify high-demand areas.
SELECT
c.Region,
COUNT(*) AS CancelledOrders
FROM
Customers c
JOIN
Orders o ON c.CustomerID = o.CustomerID
WHERE
o.Status = 'Cancelled'
GROUP BY
c.Region
ORDER BY
CancelledOrders DESC;- Purpose: Identify regions with the highest number of canceled orders.
SELECT
ShippingMethod,
COUNT(*) AS TotalOrders
FROM
Orders
GROUP BY
ShippingMethod
ORDER BY
TotalOrders DESC;- Purpose: Determine which shipping methods are most frequently chosen by customers.
SELECT
t.CarrierID,
c.CarrierName,
AVG(t.ShippingCost) AS AvgShippingCost
FROM
Transportation t
JOIN
Carriers c ON t.CarrierID = c.CarrierID
GROUP BY
t.CarrierID, c.CarrierName
ORDER BY
AvgShippingCost ASC;- Purpose: Evaluate carriers based on their average shipping costs.
SELECT
t.CarrierID,
c.CarrierName,
COUNT(o.OrderID) AS DeliveredOrders
FROM
Transportation t
JOIN
Orders o ON t.OrderID = o.OrderID
JOIN
Carriers c ON t.CarrierID = c.CarrierID
WHERE
o.Status = 'Delivered'
GROUP BY
t.CarrierID, c.CarrierName
ORDER BY
DeliveredOrders DESC;- Purpose: Identify carriers with the highest number of successfully delivered orders.
SELECT
o.ShippingMethod,
COUNT(CASE WHEN DATEDIFF(t.DeliveryDate, o.OrderDate) > 5 THEN 1 END) AS DelayedOrders
FROM
Orders o
JOIN
Transportation t ON o.OrderID = t.OrderID
WHERE
t.DeliveryDate IS NOT NULL
GROUP BY
o.ShippingMethod;- Purpose: Track delays based on the shipping method used.
SELECT
c.CustomerID,
c.Name,
COUNT(o.OrderID) AS TotalOrders
FROM
Customers c
JOIN
Orders o ON c.CustomerID = o.CustomerID
GROUP BY
c.CustomerID, c.Name
ORDER BY
TotalOrders DESC
LIMIT 10;- Purpose: Identify the top 10 customers with the most orders.
SELECT
c.Region,
SUM(t.ShippingCost) AS TotalShippingCost
FROM
Customers c
JOIN
Orders o ON c.CustomerID = o.CustomerID
JOIN
Transportation t ON o.OrderID = t.OrderID
GROUP BY
c.Region
ORDER BY
TotalShippingCost DESC;- Purpose: Evaluate regions with the highest shipping costs.
SELECT
DATE(o.OrderDate) AS OrderDate,
COUNT(*) AS TotalOrders
FROM
Orders o
GROUP BY
DATE(o.OrderDate)
ORDER BY
OrderDate ASC;- Purpose: Track the trend of orders over time and identify peak days.
involves analyzing trends and identifying areas where logistics or operations may have faltered. We’ll focus on canceled orders, delayed deliveries, and changes in overall order statuses over time. Here’s how we can analyze this:
This query tracks the number of canceled, delayed, and delivered orders on a daily basis.
SELECT
DATE(o.OrderDate) AS OrderDate,
COUNT(CASE WHEN o.Status = 'Cancelled' THEN 1 END) AS CancelledOrders,
COUNT(CASE WHEN t.DeliveryDate IS NOT NULL AND DATEDIFF(t.DeliveryDate, o.OrderDate) > 5 THEN 1 END) AS DelayedOrders,
COUNT(CASE WHEN o.Status = 'Delivered' THEN 1 END) AS DeliveredOrders
FROM
Orders o
LEFT JOIN
Transportation t ON o.OrderID = t.OrderID
GROUP BY
DATE(o.OrderDate)
ORDER BY
OrderDate ASC;- Insight: This query reveals if the number of canceled or delayed orders is increasing over time, which may point to performance drops.
Identify regions with a worsening trend in delayed and canceled orders.
SELECT
c.Region,
COUNT(CASE WHEN o.Status = 'Cancelled' THEN 1 END) AS CancelledOrders,
COUNT(CASE WHEN t.DeliveryDate IS NOT NULL AND DATEDIFF(t.DeliveryDate, o.OrderDate) > 5 THEN 1 END) AS DelayedOrders,
COUNT(CASE WHEN o.Status = 'Delivered' THEN 1 END) AS DeliveredOrders
FROM
Customers c
JOIN
Orders o ON c.CustomerID = o.CustomerID
LEFT JOIN
Transportation t ON o.OrderID = t.OrderID
GROUP BY
c.Region
ORDER BY
CancelledOrders DESC, DelayedOrders DESC;- Insight: Pinpoints regions with declining delivery performance or high cancellations.
Evaluate carriers that show a significant increase in delayed deliveries or cancellations.
SELECT
t.CarrierID,
c.CarrierName,
COUNT(CASE WHEN o.Status = 'Cancelled' THEN 1 END) AS CancelledOrders,
COUNT(CASE WHEN t.DeliveryDate IS NOT NULL AND DATEDIFF(t.DeliveryDate, o.OrderDate) > 5 THEN 1 END) AS DelayedOrders
FROM
Transportation t
JOIN
Orders o ON t.OrderID = o.OrderID
JOIN
Carriers c ON t.CarrierID = c.CarrierID
GROUP BY
t.CarrierID, c.CarrierName
ORDER BY
CancelledOrders DESC, DelayedOrders DESC;- Insight: Identifies carriers contributing to performance drops and allows you to focus improvement efforts.
Determine which shipping methods are leading to higher delays or cancellations.
SELECT
o.ShippingMethod,
COUNT(CASE WHEN o.Status = 'Cancelled' THEN 1 END) AS CancelledOrders,
COUNT(CASE WHEN t.DeliveryDate IS NOT NULL AND DATEDIFF(t.DeliveryDate, o.OrderDate) > 5 THEN 1 END) AS DelayedOrders,
COUNT(CASE WHEN o.Status = 'Delivered' THEN 1 END) AS DeliveredOrders
FROM
Orders o
LEFT JOIN
Transportation t ON o.OrderID = t.OrderID
GROUP BY
o.ShippingMethod
ORDER BY
CancelledOrders DESC, DelayedOrders DESC;- Insight: Pinpoints shipping methods that may be leading to order delays or cancellations.
Analyze the percentage of delayed or canceled orders relative to total orders.
SELECT
COUNT(CASE WHEN o.Status = 'Cancelled' THEN 1 END) * 100.0 / COUNT(*) AS CancelledPercentage,
COUNT(CASE WHEN t.DeliveryDate IS NOT NULL AND DATEDIFF(t.DeliveryDate, o.OrderDate) > 5 THEN 1 END) * 100.0 / COUNT(*) AS DelayedPercentage
FROM
Orders o
LEFT JOIN
Transportation t ON o.OrderID = t.OrderID;- Insight: Provides overall metrics to assess the extent of performance drops.
This project demonstrates the practical use of SQL for analyzing real-world logistics and transportation workflows. By applying relational database design, joins, aggregations, and conditional logic, the analysis identifies delivery inefficiencies, regional bottlenecks, and carrier performance variations. Identified key operational issues such as delivery delays and high cancellation regions. Suggested improvements in carrier performance and shipping methods based on analysis.
The insights derived from this project can support data-driven decision-making in logistics optimization and operational planning.
Sai B
Aspiring Data Scientist | SQL