Skip to content

thyphan2025/Electric-Vehicle-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Analyzing Electric Vehicles Population in Washington State

This project analyzes the population of electric vehicles in Washington State, focusing on attributes such as make, model, electric vehicle type, Clean Alternative Fuel Vehicle (CAFV) eligibility, electric range, county, and location. After cleaning the dataset and performing Exploratory Data Analysis (EDA) to uncover key patterns, a Decision Tree classifier is applied. To address class imbalance in the training data, the Synthetic Minority Oversampling Technique (SMOTE) is used.

Data Source

This dataset shows the Battery Electric Vehicles (BEVs) and Plug-in Hybrid Electric Vehicles (PHEVs) that are currently registered through Washington State Department of Licensing (DOL).

Data ranges from 2010 to 2026 Model Year (Metadata Updated: June 14, 2025)

https://catalog.data.gov/dataset/electric-vehicle-population-data

Methods for Data Analysis

  1. Perform Data Cleaning
  2. Univariate Analysis
  3. Bivariate Analysis
  4. Encoding Categorical Features
  5. Apply Machine Learning Model: Decision Tree
  6. Evaluate Model Performance Using Classification Report and Confusion Matrix
  7. Addressing Class Imbalance using SMOTE
  8. Train the new Decision Tree Model
  9. Evaluate Model Performance using Classification Report and Confusion Matrix

Tools Used

Python (pandas, matplotlib, seaborn, scikit-learn, imbalanced-learn)

Project Files

  1. README.md - Project Description and Summary of Findings
  2. EV Analysis _Thy Phan.ipynb - Python Notebook

Summary of Findings

Univariate Analysis

1. Univariate Analysis of Electric Range

image

The most common electric range is around 20 miles, with approximately 17,500 vehicles falling into this category — clearly dominating the distribution.

image

The electric range varies widely, with most vehicles falling between approximately 30 and 220 miles. The median range is around 60 miles, highlighting a strong skew toward lower-range vehicles.

2. Univariate Anlaysis of Electric Vehicle Types

image

Battery Electric Vehicles (BEVs) make up approximately 200,000 registrations—about four times more than Plug-In Hybrid Electric Vehicles (PHEVs), which total around 50,000. This reflects a strong shift in consumer and manufacturer preference toward BEVs.

3. Univariate Analysis of Model Year

image

Electric vehicles started showing up around 2012, and their numbers kept growing every year. The peak was in 2023 with about 60,000 vehicles. There’s a drop in 2024, and again in 2025 and 2026— probably because those years aren’t over yet and not all vehicles have been registered.

4. Univariate Analysis of Vehicle Make

image

Tesla leads the top 10 with 105,001 vehicles — nearly five times more than Chevrolet and Nissan, which have 17,840 and 15,892 respectively. Apart from Tesla, the number of vehicles across other top manufacturers is relatively similar.

5. Univariate Analysis of Vehicle Model

image

Tesla Model Y and Model 3 dominate the Top 10 Electric Vehicle Models with 51,528 and 37,427 vehicles, respectively. Interestingly, the Nissan Leaf ranks third with 13,950 vehicles—outpacing the Tesla Model S, which ranks fourth with 7,912 vehicles.

Bivariate Analysis

1. Bivariate Analysis for Eletric Vehicle Types with Electric Range under CAFV Eligibility

image

Battery Electric Vehicles (BEVs) generally have a significantly longer electric range—averaging around 200 miles—which qualifies most of them for Clean Alternative Fuel Vehicle (CAFV) eligibility. In contrast, BEVs with shorter ranges (~25 miles) are typically not eligible.

Plug-in Hybrid Electric Vehicles (PHEVs), with a maximum range of about 45 miles, show a smaller distinction. Some PHEVs qualify for CAFV eligibility, while others fall just below the 20-mile threshold and are not eligible.

2. Bivariate Analysis for Electric Range and Model Year

image

Electric range has shown notable fluctuations over time. It peaked around 2010 with an average of 250 miles and again in 2020 at approximately 256 miles. However, a sharp decline is observed after 2020, continuing into 2025. This recent drop may be attributed to incomplete data for newer model years, as 2025 is still ongoing.

3. Bivariate Analysis of Makes (Top 10) and Location (longtitude, latitude)

image

The top 10 EV makes are visualized based on geographic coordinates (longitude and latitude), with each make represented by a unique color. This scatter plot highlights how different EV brands are distributed across Washington State.

Decision Tree

1. Plot Tree

image

The following decision tree visualization shows how the model splits on features like Model Year and Electric Range to distinguish between BEV and PHEV.

2. Classification Report and Confusion Matrix

image

3. Normalized Confusion Matrix (showed percentage of each label)

image

The initial Decision Tree model shows a strong bias toward the majority class — Battery Electric Vehicles (BEV) — over the minority class — Plug-in Hybrid Electric Vehicles (PHEV). While the model achieves 78.90% accuracy for classifying BEV (class 0), it only correctly identifies 12.77% of PHEV (class 1) cases.

This imbalance is showed in both the confusion matrix and the classification report, where BEV metrics notably outperform those for PHEV. To address this class imbalance and improve model fairness, I will apply the SMOTE (Synthetic Minority Over-sampling Technique) method in the next step.

Address Class Imbalance

image

Before applying SMOTE, the dataset was imbalanced, with 121,897 Battery Electric Vehicles (BEV) and only 22,690 Plug-in Hybrid Electric Vehicles (PHEV).

After SMOTE, the minority class (PHEV) was synthetically oversampled to match the majority class, resulting in a balanced dataset with 121,897 records for both BEV and PHEV.

Decision Tree After Applying SMOTE

1. Plot Tree

image

Compared to the original model, the new decision tree after applying SMOTE has more branches. This is likely because the balanced data allows the model to make more detailed splits.

2. Classification Report and Confusion Matrix

image

3. Normalized Confusion Matrix (showed percentage of each label)

image

After applying SMOTE, the updated decision tree model achieves 93.17% accuracy in classifying BEV (class 0) and 83.36% accuracy in classifying PHEV (class 1). This marks a significant improvement compared to the original model’s performance, which classified BEV at 78.90% and PHEV at only 12.77%.

Project Wrap Up

Websites Explored During My Electric Vehicle Analysis Project

  1. Mastering Exploratory Data Analysis
  2. Master Visualization in EDA
  3. Confusion Matrix Visualization
  4. Class Imbalance Strategies
  5. Seaborn documentation
  6. Sklearn documentation
  7. Matplotlib documentation

About

Electric Vehicle Analysis (Cleaning, Perform Exploratory Data Analysis and Apply Decision Tree Model in Python)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors