diff --git a/02_activities/assignments/DC_Cohort/Assignment1.md b/02_activities/assignments/DC_Cohort/Assignment_1/Assignment1.md similarity index 76% rename from 02_activities/assignments/DC_Cohort/Assignment1.md rename to 02_activities/assignments/DC_Cohort/Assignment_1/Assignment1.md index f650c9752..76fdc5abb 100644 --- a/02_activities/assignments/DC_Cohort/Assignment1.md +++ b/02_activities/assignments/DC_Cohort/Assignment_1/Assignment1.md @@ -1,13 +1,12 @@ -# DC Assignment 1: Meet the farmersmarket.db and Basic SQL +# Assignment 1: Meet the farmersmarket.db and Basic SQL 🚨 **Please review our [Assignment Submission Guide](https://github.com/UofT-DSI/onboarding/blob/main/onboarding_documents/submissions.md)** 🚨 for detailed instructions on how to format, branch, and submit your work. Following these guidelines is crucial for your submissions to be evaluated correctly. #### Submission Parameters: -* Submission Due Date: `March 31, 2026` +* Submission Due Date: `November 17, 2025` * Weight: 30% of total grade * The branch name for your repo should be: `assignment-one` * What to submit for this assignment: - * This markdown (Assignment1.md) with written responses in Section 4 * One Entity-Relationship Diagram (preferably in a pdf, jpeg, png format). * One .sql file * What the pull request link should look like for this assignment: `https://github.com//sql/pulls/` @@ -115,7 +114,7 @@ Steps to complete this part of the assignment: - Open the assignment1.sql file in DB Browser for SQLite: - from [Github](./02_activities/assignments/assignment1.sql) - or, from your local forked repository -- Complete each question, by writing responses between the QUERY # and END QUERY blocks +- Complete each question ### Write SQL @@ -126,11 +125,10 @@ Steps to complete this part of the assignment:
-
#### WHERE -1. Write a query that returns all customer purchases of product IDs 4 and 9. Limit to 25 rows of output. +1. Write a query that returns all customer purchases of product IDs 4 and 9. 2. Write a query that returns all customer purchases and a new calculated column 'price' (quantity * cost_to_customer_per_qty), filtered by customer IDs between 8 and 10 (inclusive) using either: 1. two conditions using AND 2. one condition using BETWEEN -Limit to 25 rows of output.
-
@@ -142,7 +140,7 @@ Limit to 25 rows of output.
-
#### JOIN -1. Write a query that `INNER JOIN`s the `vendor` table to the `vendor_booth_assignments` table on the `vendor_id` field they both have in common, and sorts the result by `market_date` then `vendor_name`. Limit to 24 rows of output. +1. Write a query that `INNER JOIN`s the `vendor` table to the `vendor_booth_assignments` table on the `vendor_id` field they both have in common, and sorts the result by `vendor_name`, then `market_date`. *** @@ -153,7 +151,7 @@ Steps to complete this part of the assignment: - Open the assignment1.sql file in DB Browser for SQLite: - from [Github](./02_activities/assignments/assignment1.sql) - or, from your local forked repository -- Complete each question, by writing responses between the QUERY # and END QUERY blocks +- Complete each question ### Write SQL @@ -180,34 +178,36 @@ To insert the new row use VALUES, specifying the value you want for each column: **HINT**: you might need to search for strfrtime modifers sqlite on the web to know what the modifers for month and year are! -Limit to 25 rows of output. - 2. Using the previous query as a base, determine how much money each customer spent in April 2022. Remember that money spent is `quantity*cost_to_customer_per_qty`. -**HINTS**: you will need to AGGREGATE, GROUP BY, and filter...but remember, STRFTIME returns a STRING for your WHERE statement... -AND be sure you remove the LIMIT from the previous query before aggregating!! +**HINTS**: you will need to AGGREGATE, GROUP BY, and filter...but remember, STRFTIME returns a STRING for your WHERE statement!! + + + + +## Section 4: What values systems are embedded in databases and data systems you encounter in your day-to-day life? + +I would suggest that the values that one encounters in databases and data systems in the quotidian are either unassuming/uncritical, and or naively determined. By the latter, I imply that most value systems, with their varying hermeneutical destinations, are not simply willed or agenticaly assumed, but rather possessive and deterministic. One does not chose which ideas or indexes to follow, one simply deploys oneself with these not in mind. One can read, on can study, however, behaviour tends not to follow the fantasies and the petulations, whether intellectual or political, that one may entertain. Rather, one is compelled to act with what is in engraved in what counts as the actions and the decisions that are actually relevant, that is those that are done in the horizon of fear and uncertainty. I believe, that what is considered as a "technical worker", those professionals in either the hard sciences or in the digital professions, such as as Computer Science or Data Science, are not nearly unaware of the unassuming values. More emphatically I deem that they are either untrained in carrying critical injunctions, and more affirmatively the shouldn't. To use an analogy to stress my point, a knife is a "piece" of technology. A good knife is defined by its ability to cut properly. One can use as knife to cut bread or to stab someone. The knife itself is not bad, it is good because it cuts and its stab. Stabbing is normatively negative and punishable. Therefore the function of the knife, as the function of database/data system, is contingent on the user, not the maker of the knife. A technical maker (the creator of the database as the creator of the knife) cannot "code" in a function into the knife, he or she can only strive to make it sharp. In that regard, whatever system of values is coded into the tool is naively determined. At the end of the day, the function of database/data system is determined by the barer, not by the maker. In that, whatever values are "embedded" in databases sufferer in their irrelevance, from the naiveté of their designer. + + + + + + + + + + + + + -*** -## Section 4: -You can start this section anytime. -Steps to complete this part of the assignment: -- Read the article -- Write, within this markdown file, between 250 and 1000 words. No additional citations/sources are required. -### Ethics -Read: Qadri, R. (2021, November 11). _When Databases Get to Define Family._ Wired.
- https://www.wired.com/story/pakistan-digital-database-family-design/ -Link if you encounter a paywall: https://archive.is/srKHV or https://web.archive.org/web/20240422105834/https://www.wired.com/story/pakistan-digital-database-family-design/ -**What values systems are embedded in databases and data systems you encounter in your day-to-day life?** -Consider, for example, concepts of fariness, inequality, social structures, marginalization, intersection of technology and society, etc. -``` -Your thoughts... -``` diff --git a/02_activities/assignments/DC_Cohort/Assignment1_rubric.md b/02_activities/assignments/DC_Cohort/Assignment_1/Assignment1_rubric.md similarity index 100% rename from 02_activities/assignments/DC_Cohort/Assignment1_rubric.md rename to 02_activities/assignments/DC_Cohort/Assignment_1/Assignment1_rubric.md diff --git a/02_activities/assignments/DC_Cohort/Assignment_1/ERD.pdf b/02_activities/assignments/DC_Cohort/Assignment_1/ERD.pdf new file mode 100644 index 000000000..d1c1dd6ad Binary files /dev/null and b/02_activities/assignments/DC_Cohort/Assignment_1/ERD.pdf differ diff --git a/02_activities/assignments/DC_Cohort/assignment1.sql b/02_activities/assignments/DC_Cohort/Assignment_1/assignment1.sql similarity index 57% rename from 02_activities/assignments/DC_Cohort/assignment1.sql rename to 02_activities/assignments/DC_Cohort/Assignment_1/assignment1.sql index 2ec561e2a..d4fb4a975 100644 --- a/02_activities/assignments/DC_Cohort/assignment1.sql +++ b/02_activities/assignments/DC_Cohort/Assignment_1/assignment1.sql @@ -1,52 +1,47 @@ - /* ASSIGNMENT 1 */ ---Please write responses between the QUERY # and END QUERY blocks +/* ASSIGNMENT 1 */ /* SECTION 2 */ --SELECT /* 1. Write a query that returns everything in the customer table. */ ---QUERY 1 +SELECT * +FROM customer; - - ---END QUERY - - -/* 2. Write a query that displays all of the columns and 10 rows from the customer table, +/* 2. Write a query that displays all of the columns and 10 rows from the cus- tomer table, sorted by customer_last_name, then customer_first_ name. */ ---QUERY 2 - - - - ---END QUERY +SELECT * +FROM customer +ORDER BY customer_last_name, customer_first_name +LIMIT 10; --WHERE -/* 1. Write a query that returns all customer purchases of product IDs 4 and 9. -Limit to 25 rows of output. */ ---QUERY 3 - - - - ---END QUERY - +/* 1. Write a query that returns all customer purchases of product IDs 4 and 9. */ +SELECT * +FROM customer_purchases +WHERE product_id IN (4, 9) +LIMIT 25; /*2. Write a query that returns all customer purchases and a new calculated column 'price' (quantity * cost_to_customer_per_qty), filtered by customer IDs between 8 and 10 (inclusive) using either: 1. two conditions using AND 2. one condition using BETWEEN -Limit to 25 rows of output. */ ---QUERY 4 - +-- option 1 +SELECT *, + quantity * cost_to_customer_per_qty AS price +FROM customer_purchases +WHERE customer_id >= 8 AND customer_id <= 10 +LIMIT 25; - - ---END QUERY +-- option 2 +SELECT *, + quantity * cost_to_customer_per_qty AS price +FROM customer_purchases +WHERE customer_id BETWEEN 8 AND 10 +LIMIT 25; --CASE @@ -54,35 +49,43 @@ Limit to 25 rows of output. Using the product table, write a query that outputs the product_id and product_name columns and add a column called prod_qty_type_condensed that displays the word “unit” if the product_qty_type is “unit,” and otherwise displays the word “bulk.” */ ---QUERY 5 - +SELECT + product_id, + product_name, + CASE + WHEN product_qty_type = 'unit' THEN 'unit' + ELSE 'bulk' + END AS prod_qty_type_condensed +FROM product; ---END QUERY - - /* 2. We want to flag all of the different types of pepper products that are sold at the market. add a column to the previous query called pepper_flag that outputs a 1 if the product_name contains the word “pepper” (regardless of capitalization), and otherwise outputs 0. */ ---QUERY 6 - - - - ---END QUERY +SELECT + product_id, + product_name, + CASE + WHEN product_qty_type = 'unit' THEN 'unit' + ELSE 'bulk' + END AS prod_qty_type_condensed, + CASE + WHEN LOWER(product_name) LIKE '%pepper%' THEN 1 + ELSE 0 + END AS pepper_flag +FROM product; --JOIN /* 1. Write a query that INNER JOINs the vendor table to the vendor_booth_assignments table on the -vendor_id field they both have in common, and sorts the result by market_date, then vendor_name. -Limit to 24 rows of output. */ ---QUERY 7 - - - - ---END QUERY +vendor_id field they both have in common, and sorts the result by vendor_name, then market_date. */ +SELECT * +FROM vendor +INNER JOIN vendor_booth_assignments + ON vendor.vendor_id = vendor_booth_assignments.vendor_id +ORDER BY market_date, vendor_name +LIMIT 24; @@ -91,25 +94,31 @@ Limit to 24 rows of output. */ -- AGGREGATE /* 1. Write a query that determines how many times each vendor has rented a booth at the farmer’s market by counting the vendor booth assignments per vendor_id. */ ---QUERY 8 - - - - ---END QUERY +SELECT + v.vendor_id, + v.vendor_name, + COUNT(*) AS booth_rental_count +FROM vendor v +INNER JOIN vendor_booth_assignments vba + ON v.vendor_id = vba.vendor_id +GROUP BY v.vendor_id, v.vendor_name; /* 2. The Farmer’s Market Customer Appreciation Committee wants to give a bumper sticker to everyone who has ever spent more than $2000 at the market. Write a query that generates a list of customers for them to give stickers to, sorted by last name, then first name. - HINT: This query requires you to join two tables, use an aggregate function, and use the HAVING keyword. */ ---QUERY 9 - - - - ---END QUERY +SELECT + c.customer_id, + c.customer_first_name, + c.customer_last_name, + SUM(cp.quantity * cp.cost_to_customer_per_qty) AS total_spent +FROM customer c +INNER JOIN customer_purchases cp + ON c.customer_id = cp.customer_id +GROUP BY c.customer_id, c.customer_first_name, c.customer_last_name +HAVING total_spent > 2000 +ORDER BY c.customer_last_name, c.customer_first_name; --Temp Table @@ -123,37 +132,39 @@ When inserting the new vendor, you need to appropriately align the columns to be -> To insert the new row use VALUES, specifying the value you want for each column: VALUES(col1,col2,col3,col4,col5) */ ---QUERY 10 - - - +CREATE TABLE new_vendor AS +SELECT * +FROM vendor; +SELECT * FROM new_vendor; +INSERT INTO new_vendor +VALUES (10, 'Thomass Superfood Store', 'Fresh Focused', 'Thomas', 'Rosenthal'); ---END QUERY -- Date /*1. Get the customer_id, month, and year (in separate columns) of every purchase in the customer_purchases table. HINT: you might need to search for strfrtime modifers sqlite on the web to know what the modifers for month -and year are! -Limit to 25 rows of output. */ ---QUERY 11 - - - - ---END QUERY +and year are! */ +SELECT + customer_id, + strftime('%m', market_date) AS month, + strftime('%Y', market_date) AS year +FROM customer_purchases; /* 2. Using the previous query as a base, determine how much money each customer spent in April 2022. Remember that money spent is quantity*cost_to_customer_per_qty. HINTS: you will need to AGGREGATE, GROUP BY, and filter... -but remember, STRFTIME returns a STRING for your WHERE statement... -AND be sure you remove the LIMIT from the previous query before aggregating!! */ ---QUERY 12 - +but remember, STRFTIME returns a STRING for your WHERE statement!! */ +SELECT + customer_id, + SUM(quantity * cost_to_customer_per_qty) AS total_spent +FROM customer_purchases +WHERE strftime('%m', market_date) = '04' + AND strftime('%Y', market_date) = '2022' +GROUP BY customer_id; ---END QUERY diff --git a/02_activities/assignments/DC_Cohort/Assignment2.md b/02_activities/assignments/DC_Cohort/Assignment_2/Assignment2.md similarity index 61% rename from 02_activities/assignments/DC_Cohort/Assignment2.md rename to 02_activities/assignments/DC_Cohort/Assignment_2/Assignment2.md index 01f991d02..08d2173a3 100644 --- a/02_activities/assignments/DC_Cohort/Assignment2.md +++ b/02_activities/assignments/DC_Cohort/Assignment_2/Assignment2.md @@ -1,9 +1,9 @@ -# DC Assignment 2: Design a Logical Model and Advanced SQL +# Assignment 2: Design a Logical Model and Advanced SQL 🚨 **Please review our [Assignment Submission Guide](https://github.com/UofT-DSI/onboarding/blob/main/onboarding_documents/submissions.md)** 🚨 for detailed instructions on how to format, branch, and submit your work. Following these guidelines is crucial for your submissions to be evaluated correctly. #### Submission Parameters: -* Submission Due Date: `April 07, 2026` +* Submission Due Date: `November 12, 2025` * Weight: 70% of total grade * The branch name for your repo should be: `assignment-two` * What to submit for this assignment: @@ -40,8 +40,6 @@ Design a logical model for a small bookstore. 📚 At the minimum it should have employee, order, sales, customer, and book entities (tables). Determine sensible column and table design based on what you know about these concepts. Keep it simple, but work out sensible relationships to keep tables reasonably sized. Additionally, include a date table. -A date table (also called a calendar table) is a permanent table containing a list of dates and various components of those dates. Some theory, tips, and commentary can be found [here](https://www.sqlshack.com/designing-a-calendar-table/), [here](https://www.mssqltips.com/sqlservertip/4054/creating-a-date-dimension-or-calendar-table-in-sql-server/) and [here](https://sqlgeekspro.com/creating-calendar-table-sql-server/). -Remember, you don't actually need to run any of the queries in these articles, but instead understand *why* date tables in SQL make sense, and how to situate them within your logical models. There are several tools online you can use, I'd recommend [Draw.io](https://www.drawio.com/) or [LucidChart](https://www.lucidchart.com/pages/). @@ -56,7 +54,63 @@ The store wants to keep customer addresses. Propose two architectures for the CU **HINT:** search type 1 vs type 2 slowly changing dimensions. ``` -Your answer... +# SCD Type 1 vs Type 2 — Customer Addresses + +Some data doesn't change often, but when it does, one has a choice: forget the past, or keep it. That's the whole game with Slowly Changing Dimensions. + +--- + +## Type 1 — Amnesia mode + +Customer moves? One overwrites. One row per customer, always current, history gone. + +``` +customer_address (Type 1) +────────────────────────────────────────── +address_id INT PK +customer_id INT FK → customer +street_address VARCHAR(100) +city VARCHAR(50) +state_province VARCHAR(50) +postal_code VARCHAR(20) +``` + +Simple, small, zero drama. Great for fixing typos. Terrible if one ever needs to ask *"where did we ship that order six months ago?"* + +--- + +## Type 2 — Full memory mode + +Customer moves? One closes the old record and opens a new one. Every address the customer ever had lives in this table forever. + +``` +customer_address (Type 2) +────────────────────────────────────────── +address_id INT PK ← new key per version +customer_id INT FK → customer +street_address VARCHAR(100) +city VARCHAR(50) +state_province VARCHAR(50) +postal_code VARCHAR(20) +effective_start_date DATE ← went live on this date +effective_end_date DATE ← NULL = still active +is_current BOOLEAN ← quick filter shortcut +``` + +More rows, more complexity — but one can time-travel. Join on `order_date_key` between `effective_start_date` and `effective_end_date` and every order snaps to the address that existed when it was placed. + +--- + +## Which one for the bookstore? + +| | Type 1 | Type 2 | +|---|---|---| +| Address change | `UPDATE` | `INSERT` new + close old | +| History | Gone | Full timeline | +| Table size | Stays small | Grows over time | +| Best for | Typo corrections | Dispute resolution, delivery history | + +**Type 2 wins here.** A lost package, a disputed order, an audit — all of these require knowing the address *at the time*, not today's address. ``` *** @@ -68,7 +122,7 @@ Steps to complete this part of the assignment: - Open the assignment2.sql file in DB Browser for SQLite: - from [Github](./02_activities/assignments/assignment2.sql) - or, from your local forked repository -- Complete each question, by writing responses between the QUERY # and END QUERY blocks +- Complete each question ### Write SQL @@ -97,16 +151,10 @@ You can either display all rows in the customer_purchases table, with the counte **HINT**: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). -Filter the visits to dates before April 29, 2022. - -2. Reverse the numbering of the query so each customer’s most recent visit is labeled 1, then write another query that uses this one as a subquery (or temp table) and filters the results to only the customer’s most recent visit. -**HINT**: Do not use the previous visit dates filter. +2. Reverse the numbering of the query from a part so each customer’s most recent visit is labeled 1, then write another query that uses this one as a subquery (or temp table) and filters the results to only the customer’s most recent visit. 3. Using a COUNT() window function, include a value along with each row of the customer_purchases table that indicates how many different times that customer has purchased that product_id. -You can make this a running count by including an ORDER BY within the PARTITION BY if desired. -Filter the visits to dates before April 29, 2022. -
-
#### String manipulations @@ -136,7 +184,7 @@ Steps to complete this part of the assignment: - Open the assignment2.sql file in DB Browser for SQLite: - from [Github](./02_activities/assignments/assignment2.sql) - or, from your local forked repository -- Complete each question, by writing responses between the QUERY # and END QUERY blocks +- Complete each question ### Write SQL @@ -191,5 +239,15 @@ Consider, for example, concepts of labour, bias, LLM proliferation, moderating c ``` -Your thoughts... +Your thoughts.... + +The article reveals a fundamentally unnuanced and, at times, almost puerile understanding of the theory of value and the mechanisms through which labour is priced in a market economy. At its core, the piece conflates observation with explanation: it gestures toward disparities in wages and working conditions but fails to ground these observations in any coherent theoretical framework. This absence is particularly striking given that the author appears to treat the technological context of the labour in question as analytically significant, when in fact it is largely incidental. Whether the labour occurs in software engineering or garment production does not alter the underlying dynamics of value formation; markets do not discriminate in their basic logic simply because the outputs are digital rather than material. + +Indeed, the author’s fixation on the unit labour price of a garment worker—or more specifically, a sewer—serves less as evidence of exploitation than as an inadvertent exposure of their own analytical unpreparedness. To be surprised by low wages in such sectors without situating them within global supply chains, capital mobility, labour bargaining power, and historical conditions of production suggests a superficial engagement with the subject. It is not enough to point at a wage and declare it “undervalued”; one must articulate the criteria by which such a judgment is made. Is the claim rooted in a Marxian conception of surplus value? A marginal productivity framework? A moral philosophy of fairness? The article provides no such clarification. + +This leads to a deeper issue: the failure to distinguish between price and value. The author implicitly treats wages as representations of an intrinsic worth, rather than as outcomes of exchange processes shaped by supply, demand, institutional constraints, and power asymmetries. Price does not emerge as a transparent expression of value; rather, it is the contingent product of market interactions. To argue that labour is “undervalued” without interrogating the structure of those interactions—or without defining what “value” itself means—is to substitute rhetoric for analysis. + +Moreover, the absence of an explicit argument leaves the piece analytically hollow. There are hints—at best a tepid attempt—to suggest that certain forms of labour are systematically undercompensated, yet these hints are never developed into a rigorous position. The reader is left with impressions rather than arguments, assertions rather than demonstrations. This is particularly problematic in a discussion that implicitly critiques economic systems, where precision and conceptual clarity are indispensable. + +Finally, the broader pattern noted here reflects a recurring limitation among many technically trained commentators. While expertise in computer science or related fields equips individuals with powerful tools for problem-solving, it does not automatically confer the critical frameworks necessary for engaging with political economy or social theory. Without sustained exposure to these traditions, arguments risk collapsing under even basic counterpoints. The result is a mode of commentary that is confident in tone but thin in substance—one that gestures toward critique without possessing the conceptual resources to sustain it. ``` diff --git a/02_activities/assignments/DC_Cohort/Assignment2_rubric.md b/02_activities/assignments/DC_Cohort/Assignment_2/Assignment2_rubric.md similarity index 100% rename from 02_activities/assignments/DC_Cohort/Assignment2_rubric.md rename to 02_activities/assignments/DC_Cohort/Assignment_2/Assignment2_rubric.md diff --git a/02_activities/assignments/DC_Cohort/Assignment_2/Bookstore ERD.pdf b/02_activities/assignments/DC_Cohort/Assignment_2/Bookstore ERD.pdf new file mode 100644 index 000000000..97ff911ff Binary files /dev/null and b/02_activities/assignments/DC_Cohort/Assignment_2/Bookstore ERD.pdf differ diff --git a/02_activities/assignments/DC_Cohort/assignment2.sql b/02_activities/assignments/DC_Cohort/Assignment_2/assignment2.sql similarity index 53% rename from 02_activities/assignments/DC_Cohort/assignment2.sql rename to 02_activities/assignments/DC_Cohort/Assignment_2/assignment2.sql index f7515f625..d455208cc 100644 --- a/02_activities/assignments/DC_Cohort/assignment2.sql +++ b/02_activities/assignments/DC_Cohort/Assignment_2/assignment2.sql @@ -1,5 +1,4 @@ /* ASSIGNMENT 2 */ ---Please write responses between the QUERY # and END QUERY blocks /* SECTION 2 */ -- COALESCE @@ -21,12 +20,15 @@ nulls, and 'unit' for the second column with nulls. The `||` values concatenate the columns into strings. Edit the appropriate columns -- you're making two edits -- and the NULL rows will be fixed. All the other rows will remain the same. */ ---QUERY 1 - - +SELECT + product_name || ', ' || + COALESCE(product_size, '') || + ' (' || + COALESCE(product_qty_type, 'unit') || + ')' +FROM product; ---END QUERY --Windowed Functions @@ -37,40 +39,65 @@ Each customer’s first visit is labeled 1, second visit is labeled 2, etc. You can either display all rows in the customer_purchases table, with the counter changing on each new market date for each customer, or select only the unique market dates per customer (without purchase details) and number those visits. -HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). -Filter the visits to dates before April 29, 2022. */ ---QUERY 2 - +HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). */ +SELECT + customer_id, + market_date, + DENSE_RANK() OVER ( + PARTITION BY customer_id + ORDER BY market_date + ) AS visit_number +FROM customer_purchases +WHERE market_date < '2022-04-29' +ORDER BY customer_id, market_date; ---END QUERY - -/* 2. Reverse the numbering of the query so each customer’s most recent visit is labeled 1, +/* 2. Reverse the numbering of the query from a part so each customer’s most recent visit is labeled 1, then write another query that uses this one as a subquery (or temp table) and filters the results to -only the customer’s most recent visit. -HINT: Do not use the previous visit dates filter. */ ---QUERY 3 - - - - ---END QUERY +only the customer’s most recent visit. */ + +SELECT + customer_id, + market_date, + DENSE_RANK() OVER ( + PARTITION BY customer_id + ORDER BY market_date DESC + ) AS visit_number +FROM customer_purchases +ORDER BY customer_id, market_date DESC; + +SELECT * +FROM ( + SELECT + customer_id, + market_date, + DENSE_RANK() OVER ( + PARTITION BY customer_id + ORDER BY market_date DESC + ) AS visit_number + FROM customer_purchases +) AS ranked_visits +WHERE visit_number = 1 +ORDER BY customer_id; /* 3. Using a COUNT() window function, include a value along with each row of the -customer_purchases table that indicates how many different times that customer has purchased that product_id. - -You can make this a running count by including an ORDER BY within the PARTITION BY if desired. -Filter the visits to dates before April 29, 2022. */ ---QUERY 4 - - - - ---END QUERY - +customer_purchases table that indicates how many different times that customer has purchased that product_id. */ + +SELECT + customer_id, + market_date, + product_id, + quantity, + cost_to_customer_per_qty, + COUNT(*) OVER ( + PARTITION BY customer_id, product_id + ) AS times_purchased +FROM customer_purchases +WHERE market_date < '2022-04-29' +ORDER BY customer_id, product_id, market_date; -- String manipulations /* 1. Some product names in the product table have descriptions like "Jar" or "Organic". @@ -83,22 +110,21 @@ Remove any trailing or leading whitespaces. Don't just use a case statement for | Habanero Peppers - Organic | Organic | Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */ ---QUERY 5 - - - - ---END QUERY +SELECT + product_name, + CASE + WHEN INSTR(product_name, '-') > 0 THEN + TRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1)) + ELSE NULL + END AS description +FROM product; /* 2. Filter the query to show any product_size value that contain a number with REGEXP. */ ---QUERY 6 - - - - ---END QUERY +SELECT * +FROM product +WHERE product_size REGEXP '[0-9]'; -- UNION /* 1. Using a UNION, write a query that displays the market dates with the highest and lowest total sales. @@ -109,14 +135,85 @@ HINT: There are a possibly a few ways to do this query, but if you're struggling "best day" and "worst day"; 3) Query the second temp table twice, once for the best day, once for the worst day, with a UNION binding them. */ ---QUERY 7 - - - - ---END QUERY - +WITH sales_by_date AS ( + SELECT + market_date, + SUM(quantity * cost_to_customer_per_qty) AS total_sales + FROM customer_purchases + GROUP BY market_date +) +SELECT + market_date, + total_sales +FROM sales_by_date +WHERE total_sales = (SELECT MAX(total_sales) FROM sales_by_date) + +UNION + +SELECT + market_date, + total_sales +FROM sales_by_date +WHERE total_sales = (SELECT MIN(total_sales) FROM sales_by_date); + +WITH sales_by_date AS ( + SELECT + market_date, + SUM(quantity * cost_to_customer_per_qty) AS total_sales + FROM customer_purchases + GROUP BY market_date +), +ranked_sales AS ( + SELECT + market_date, + total_sales, + RANK() OVER (ORDER BY total_sales DESC) AS best_rank, + RANK() OVER (ORDER BY total_sales ASC) AS worst_rank + FROM sales_by_date +) +SELECT + market_date, + total_sales, + CASE + WHEN best_rank = 1 THEN 'best day' + WHEN worst_rank = 1 THEN 'worst day' + END AS day_type +FROM ranked_sales +WHERE best_rank = 1 + OR worst_rank = 1; + + WITH sales_by_date AS ( + SELECT + market_date, + SUM(quantity * cost_to_customer_per_qty) AS total_sales + FROM customer_purchases + GROUP BY market_date +), +ranked_sales AS ( + SELECT + market_date, + total_sales, + RANK() OVER (ORDER BY total_sales DESC) AS best_rank, + RANK() OVER (ORDER BY total_sales ASC) AS worst_rank + FROM sales_by_date +) + +SELECT + market_date, + total_sales, + 'best day' AS day_type +FROM ranked_sales +WHERE best_rank = 1 + +UNION + +SELECT + market_date, + total_sales, + 'worst day' AS day_type +FROM ranked_sales +WHERE worst_rank = 1; /* SECTION 3 */ @@ -130,48 +227,67 @@ Remember, CROSS JOIN will explode your table rows, so CROSS JOIN should likely b Think a bit about the row counts: how many distinct vendors, product names are there (x)? How many customers are there (y). Before your final group by you should have the product of those two queries (x*y). */ ---QUERY 8 - - - - ---END QUERY +WITH vendor_products AS ( + SELECT DISTINCT + vi.vendor_id, + vi.product_id, + vi.original_price + FROM vendor_inventory vi +) +SELECT + v.vendor_name, + p.product_name, + SUM(5 * vp.original_price) AS money_per_product +FROM vendor_products vp +CROSS JOIN customer c +INNER JOIN vendor v + ON vp.vendor_id = v.vendor_id +INNER JOIN product p + ON vp.product_id = p.product_id +GROUP BY + v.vendor_name, + p.product_name +ORDER BY + v.vendor_name, + p.product_name; -- INSERT /*1. Create a new table "product_units". This table will contain only products where the `product_qty_type = 'unit'`. It should use all of the columns from the product table, as well as a new column for the `CURRENT_TIMESTAMP`. Name the timestamp column `snapshot_timestamp`. */ ---QUERY 9 - - - - ---END QUERY +CREATE TABLE product_units AS +SELECT + *, + CURRENT_TIMESTAMP AS snapshot_timestamp +FROM product +WHERE product_qty_type = 'unit'; /*2. Using `INSERT`, add a new row to the product_units table (with an updated timestamp). This can be any product you desire (e.g. add another record for Apple Pie). */ ---QUERY 10 - - - - ---END QUERY +INSERT INTO product_units +SELECT + *, + CURRENT_TIMESTAMP +FROM product +WHERE product_name = 'Apple Pie' +LIMIT 1; -- DELETE /* 1. Delete the older record for the whatever product you added. HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/ ---QUERY 11 - - - - ---END QUERY +DELETE FROM product_units +WHERE product_name = 'Apple Pie' + AND snapshot_timestamp < ( + SELECT MAX(snapshot_timestamp) + FROM product_units + WHERE product_name = 'Apple Pie' + ); -- UPDATE /* 1.We want to add the current_quantity to the product_units table. @@ -189,12 +305,22 @@ Third, SET current_quantity = (...your select statement...), remembering that WH Finally, make sure you have a WHERE statement to update the right row, you'll need to use product_units.product_id to refer to the correct row within the product_units table. When you have all of these components, you can run the update statement. */ ---QUERY 12 - - - ---END QUERY +ALTER TABLE product_units +ADD COLUMN current_quantity INT; + +UPDATE product_units +SET current_quantity = COALESCE( + ( + SELECT vi.quantity + FROM vendor_inventory AS vi + WHERE vi.product_id = product_units.product_id + ORDER BY vi.market_date DESC + LIMIT 1 + ), + 0 +) +WHERE product_id IS NOT NULL;