Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
206 changes: 206 additions & 0 deletions course-materials/answer-keys/eod-day7-key.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
---
title: "Day 7: Tasks & Activities"
subtitle: "Plant Hardiness Exercise"
format:
html:
toc: true
toc-depth: 3
code-fold: show
jupyter: eds217_2024
---

![](images/farm_panda.jpeg)

## Introduction

Today, we looked at the latest plant hardiness zone map distributed by the US Department of Agriculture (USDA), and how it differs from the previous map, which came out in 2012.

The map is meant to help gardeners, and people interested in gardening and plants, know which plants will survive and thrive in different parts of the United States. The main data point used by the map is the mean low temperature recorded (in Fahrenheit) for a given location. Locations with similar mean low temperatures, within a 10-degree range, are categorized as being in the same zone. Each zone is then further divided into two sub-zones, with a 5-degree range for minimum temperatures.

The latest zone map showed that for many locations, the minimum temperature had risen somewhat. This doesn’t come as a massive surprise, given trends in climate change, but I thought that it would be interesting to explore the data and understand just where things had changed.

The data was collected by the PRISM research group at Oregon State University (https://prism.oregonstate.edu/projects/plant_hardiness_zones.php). A new data set was just released on November 15th by the US Department of Agriculture, at https://www.ars.usda.gov/news-events/news/research-news/2023/usda-unveils-updated-plant-hardiness-zone-map/.

The data is available in a variety of formats, but I decided that it would probably be easiest to work with it via zip codes, because they are spread out through the entire country. In order to figure out just where the zip codes are located, we downloaded an additional data set, a map of zip codes along with location information for each one, joining it together with our other data.

### Datasets

You will work with three CSV datasets:

First, we'll use the latest (2023) Plant Hardiness Zone report from https://prism.oregonstate.edu/projects/plant_hardiness_zones.php . The data comes in several formats and parts; we'll use the CSV file that provides us with data per US zip code:

https://prism.oregonstate.edu/projects/phm_data/phzm_us_zipcode_2023.csv

Next, we'll download data from the previous survey in 2012, described at https://prism.oregonstate.edu/projects/plant_hardiness_zones_2012.php :

https://prism.oregonstate.edu/projects/public/phm/2012/phm_us_zipcode_2012.csv

Finally, we'll download and work with a CSV file containing US zip codes:

http://uszipcodelist.com/zip_code_database.csv

## Tasks

### 1. Setup

First, import pandas, matplotlib, and seaborn and load the three datasets.

```{python}
#| echo: true
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Define data urls
url_20203 = 'https://prism.oregonstate.edu/projects/phm_data/phzm_us_zipcode_2023.csv'
url_2012 = 'https://prism.oregonstate.edu/projects/public/phm/2012/phm_us_zipcode_2012.csv'
url_zips = 'http://uszipcodelist.com/zip_code_database.csv'

# Load the data
df_2023 = pd.read_csv(url_20203)
df_2012 = pd.read_csv(url_2012)
df_zipcodes = pd.read_csv(url_zips)
```

Next, display the first few rows and print out the dataset info to get an idea of the contents of each dataset.

```{python}
#| echo: true
# Display the first few rows:
print(df_2023.head())
print(df_2012.head())
print(df_zipcodes.head())
```

```{python}
#| echo: true
# Display the dataframe info:
print(df_2023.info())
print(df_2012.info())
print(df_zipcodes.info())
```

You may have noticed that the zipcodes were read in as integers rather than strings, and therefore might not be 5 digits long. Ensure the `zipcode` or `zip` column in all datasets is a 5-character string, filling in any zeros that were dropped.

```{python}
# Ensure 5-character string zipcodes
df_2023['zipcode'] = df_2023['zipcode'].astype(str).str.zfill(5)
df_2012['zipcode'] = df_2012['zipcode'].astype(str).str.zfill(5)
df_zipcodes['zip'] = df_zipcodes['zip'].astype(str).str.zfill(5)
```

Combine the 2012 and 2023 data together by adding a `year` column and then stacking them together.

```{python}
# Add the year column
df_2023['year'] = 2023
df_2012['year'] = 2012

# Bind the datasets together
df = pd.concat([df_2023, df_2012], axis=0)
```

In the combined plant hardiness dataframe, create two new columns, `trange_min` and `trange_max`, containing the min and max temperatures of the `trange` column. Remove the original `trange` column.

Hint: use `str.split()` to split the `trange` strings where they have spaces and retrieve the first and last components (min and max, respectively)

```{python}
# Split the trange string and get the first (min) and last (max) pieces of it
df['trange_min'] = df['trange'].str.split().str.get(0).astype(int)
df['trange_max'] = df['trange'].str.split().str.get(-1).astype(int)

df = df.drop('trange', axis='columns')
```

## Tasks

### 2. Exploration and visualization

On average, how much has the minimum temperature in a zip code changed from 2012 to 2023?
```{python}
# Get the mean of the minimum temperatures
mean_2023 = df[df['year'] == 2023]['trange_min'].mean()
mean_2012 = df[df['year'] == 2012]['trange_min'].mean()

# Print the difference between the two means
print(mean_2023 - mean_2012)
```

Merge together the combined plant hardiness dataset and the zipcode dataset by zipcode. This will give us more informtaion in the plant hardiness dataset, such as the latitude and longitude for each zipcode.
```{python}
df = pd.merge(df, df_zipcodes, left_on='zipcode', right_on='zip')
df.head()
```

Create two scatter plot where the x axis is the longitude, the y axis is the latitude, the color is based on the minimum temperature in 2012 for one and 2023 for the other. Only look at longitude < -60.

```{python}
# Filter the data for longitude less than -60
df_filtered = df[df['longitude'] < -60]

# Create scatter plot for 2012
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df_filtered[df_filtered['year'] == 2012],
x='longitude', y='latitude', hue='trange_min', palette='coolwarm', s=10)
plt.title('Minimum Temperature in 2012 by Latitude and Longitude')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.legend(title='Min Temp (2012)')
plt.show()

# Create scatter plot for 2023
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df_filtered[df_filtered['year'] == 2023],
x='longitude', y='latitude', hue='trange_min', palette='coolwarm', s=10)
plt.title('Minimum Temperature in 2023 by Latitude and Longitude')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.legend(title='Min Temp (2023)')
plt.show()
```

Now create a single scatter plot where you look at the difference between the minimum temperature in 2012 and 2023. Only look at longitude < -60. Color any zipcodes where you do not have information from both years in grey.

```{python}
# Get the difference in minimum temperature between 2023 and 2012
df_diff = df_filtered.pivot_table(index=['zipcode', 'latitude', 'longitude'],
columns='year', values='trange_min').reset_index()
df_diff['temp_diff'] = df_diff[2023] - df_diff[2012]

# Create a scatter plot showing the difference
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df_diff, x='longitude', y='latitude', hue='temp_diff',
palette='coolwarm', s=10, edgecolor='gray', legend='full')
plt.title('Temperature Difference (2023 - 2012) by Latitude and Longitude')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.legend(title='Temp Difference')
plt.show()
```

Create a bar plot showing the top 10 states where the average minimum temperature increased the most. Label your axes appropriately.

```{python}
# Filter the data for only 2012 and 2023
df_2012_filtered = df_filtered[df_filtered['year'] == 2012]
df_2023_filtered = df_filtered[df_filtered['year'] == 2023]

# Group by state and calculate the mean minimum temperature for 2012 and 2023
mean_temp_2012 = df_2012_filtered.groupby('state')['trange_min'].mean()
mean_temp_2023 = df_2023_filtered.groupby('state')['trange_min'].mean()

# Calculate the difference between 2023 and 2012 for each state
temp_diff_by_state = mean_temp_2023 - mean_temp_2012

# Sort the states by the highest temperature increase and get the top 10
top_10_states = temp_diff_by_state.sort_values(ascending=False).head(10)

# Plot the top 10 states
plt.figure(figsize=(10, 6))
sns.barplot(x=top_10_states.values, y=top_10_states.index, palette='viridis')
plt.title('Top 10 States with Highest Increase in Minimum Temperature (2012-2023)')
plt.xlabel('Average Temp Increase (°F)')
plt.ylabel('State')
plt.show()
```

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
69 changes: 60 additions & 9 deletions course-materials/eod-practice/eod-day7.qmd
Original file line number Diff line number Diff line change
@@ -1,19 +1,70 @@
---
title: "Day 7: Tasks & activities"
subtitle: "Plant Hardiness Zones"
execute:
echo: false
include: false
python: eds217_2024
title: "Day 7: Tasks & Activities"
subtitle: "Plant Hardiness Exercise"
format:
html:
toc: true
toc-depth: 3
code-fold: show
jupyter: eds217_2024
---

![](images/farm_panda.jpeg)

## Introduction

::: {.center-text .body-text-xl .teal-text}
End Activity Session (Day 7)
:::
Today, we looked at the latest plant hardiness zone map distributed by the US Department of Agriculture (USDA), and how it differs from the previous map, which came out in 2012.

The map is meant to help gardeners, and people interested in gardening and plants, know which plants will survive and thrive in different parts of the United States. The main data point used by the map is the mean low temperature recorded (in Fahrenheit) for a given location. Locations with similar mean low temperatures, within a 10-degree range, are categorized as being in the same zone. Each zone is then further divided into two sub-zones, with a 5-degree range for minimum temperatures.

The latest zone map showed that for many locations, the minimum temperature had risen somewhat. This doesn’t come as a massive surprise, given trends in climate change, but I thought that it would be interesting to explore the data and understand just where things had changed.

The data was collected by the PRISM research group at Oregon State University (https://prism.oregonstate.edu/projects/plant_hardiness_zones.php). A new data set was just released on November 15th by the US Department of Agriculture, at https://www.ars.usda.gov/news-events/news/research-news/2023/usda-unveils-updated-plant-hardiness-zone-map/.

The data is available in a variety of formats, but I decided that it would probably be easiest to work with it via zip codes, because they are spread out through the entire country. In order to figure out just where the zip codes are located, we downloaded an additional data set, a map of zip codes along with location information for each one, joining it together with our other data.

### Datasets

You will work with three CSV datasets:

First, we'll use the latest (2023) Plant Hardiness Zone report from https://prism.oregonstate.edu/projects/plant_hardiness_zones.php . The data comes in several formats and parts; we'll use the CSV file that provides us with data per US zip code:

https://prism.oregonstate.edu/projects/phm_data/phzm_us_zipcode_2023.csv

Next, we'll download data from the previous survey in 2012, described at https://prism.oregonstate.edu/projects/plant_hardiness_zones_2012.php :

https://prism.oregonstate.edu/projects/public/phm/2012/phm_us_zipcode_2012.csv

Finally, we'll download and work with a CSV file containing US zip codes:

http://uszipcodelist.com/zip_code_database.csv

## Tasks

### 1. Setup

First, import pandas, matplotlib, and seaborn and load the three datasets.

Next, display the first few rows and print out the dataset info to get an idea of the contents of each dataset.

You may have noticed that the zipcodes were read in as integers rather than strings, and therefore might not be 5 digits long. Ensure the `zipcode` or `zip` column in all datasets is a 5-character string, filling in any zeros that were dropped.

Combine the 2012 and 2023 data together by adding a `year` column and then stacking them together.

In the combined plant hardiness dataframe, create two new columns, `trange_min` and `trange_max`, containing the min and max temperatures of the `trange` column. Remove the original `trange` column.

Hint: use `str.split()` to split the `trange` strings where they have spaces and retrieve the first and last components (min and max, respectively)

## Tasks

### 2. Exploration and visualization

On average, how much has the minimum temperature in a zip code changed from 2012 to 2023?

Merge together the combined plant hardiness dataset and the zipcode dataset by zipcode. This will give us more informtaion in the plant hardiness dataset, such as the latitude and longitude for each zipcode.

Create two scatter plot where the x axis is the longitude, the y axis is the latitude, the color is based on the minimum temperature in 2012 for one and 2023 for the other. Only look at longitude < -60.

Now create a single scatter plot where you look at the difference between the minimum temperature in 2012 and 2023. Only look at longitude < -60. Color any zipcodes where you do not have information from both years in grey.

Create a bar plot showing the top 10 states where the average minimum temperature increased the most. Label your axes appropriately.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.