Key Assumptions in the Project: Data pertains to a company with in-store and online selling facilities. Online selling takes place through a third-party vendor e.g. Amazon.com
Tables: Products lists all products. SalesReport primarily summarizes the sales made for reporting purposes; and SalesBySKU contains before and after the report (recent) information about total orders for a product. All_sessions: Many sellers –like us- on the website. Contains details of all sessions. Analytics contains details provided by the third party vendor.


To answer the following questions using SQL Analysis:
Question 1: What percentage of our products are the online visitors interested in? What do online customers buy?
Question 2: Which products have we been selling online? How many of them?
Question 3: Should I reorder more stock for our online top-seller products?
Question 4: Which country are our leading online purchasers from?
Question 5: What online product categories feature our highest selling offline products?
Step 1. Check the CSV data in Excel to understand the basic table structure.
Step 2: Create Tables that loosely match the datatypes required.
Step 3: Import data from CSVs into tables
Step 4: Added Primary Key to Products table and Foreign Keys in SalesReport and SaleBySKU tables - referencing the primary key of the Products table. I avoided making Foreign Keys on Analytics and All_sessions tables. Both these tables are large. Adding an performance overhead by creating indexes on these did not feel worth it. I used the keys to join particular tables instead. As my queries were not using data from columns that needed further refinement -e.g. converting date from varchar to date, these columns were not further refine.
Step 5: Create queries that to answer the questions described in the project goals/questions.
The results suggest:
-
Out of 1092 products the company sells, only 389 were checked by the online visitors. 13101 product webpages were visited.
-
Only 27 attempted to place an order and 7 total orders had a transaction id so far. Although 7 may look like a small numer of orders, total revenue generated by these orders was 658219992 and 27 orders that could not get placed amounted to 4854309988 or were currently under processing at the time of data collection. This looked like a huge amount of money but then we were to divide it by 1M which brought some sense to data.
-
Following is the list of top three online purchasers:
- United States 4804.33
- Israel 32.99
- Switzerland 16.99
-
What products have generated most of the online revenue? Only two products have been sold online: SPF-15 Slim & Slender Lip Balm Waze Mobile Phone Vent Mount
-
We do have enough stock of our online best-selling products. (For SPF-15 Slim & Slender Lip Balm: In Stock: 4069, Units Sold: 69; and Waze Mobile Phone Vent Mount, In Stock: 129, Units Sold: 8)
The data was given in form of CSV and no details about the data, its collection method, methodology followed was shared. We were left to make some sense of data and about the tables so the work started with some basic assumptions. I had to identify the keys which could serve as relational links to other tables. I did not feel that adding primary keys to all the tables was necessary as it would have caused a performance overhead in index creation and there was no special computing purpose they would have served as I was not going to join any of the two larger tables: all_sessions and analytics. I chose to leave them without any FK and PK constraints.
Another question was regarding: I chose to not change the Date columns in All_Sessions and Analytics tables as both the columns had dates in same format (YYYYMMDD) and could be compared just as they were without any changes.
SalesReport table contained (almost) same information as SalesBySKU table. SalesBySKU table had 8 extra rows. Initially SalesBySKU looked reduncdant but the
If I had more time, I would clarify the correctness of the basic assumptions on which the analysis is based.
It would be interesting to know how duration of online engagement affected actual purchase of a good. Social engagement was another section that i wanted to explore further. I would also have gone into interpretations of sentiment score and magnitude questions.
The online orders in this dataset have really big amounts attached to them, I wanted to know what is the unit of exchange in terms of quantity of goods in return for the money i.e. is it one piece of SPF-15 Slim & Slender Lip Balm or a ship container full of it. (This became clear with divide by 1M factor.) Otherwise I was worried about how these people are going to store all this cash! :)