DataFun-04-EDA: Exploratory Data Analysis of the Iris Dataset

This project demonstrates a complete exploratory data analysis (EDA) workflow using the classic Iris dataset. The goal is to show how Python tools can be used to load, inspect, visualize, and engineer features from data, and to draw meaningful insights about the relationships between variables and species.

Project Overview

Dataset: Iris flower dataset (150 samples, 4 features, 3 species)
Tools Used: Python, pandas, seaborn, matplotlib
Notebook: All analysis is documented in TestDrive.ipynb.

Key Steps Performed

Data Loading: Imported the Iris dataset using seaborn and loaded it into a pandas DataFrame.
Data Inspection: Explored the structure, types, and summary statistics of the data.
Visualization: Created histograms, pairplots, and scatter plots to visualize distributions and relationships.
Feature Engineering: Created a new feature (Sepal Area) to explore additional relationships.
Analysis: Compared species using visualizations and statistics to identify which features best separate them.
Insights: Summarized findings and highlighted the most predictive features for species classification.

Results

Petal measurements (length and width) are the most effective for distinguishing species, especially Setosa.
Sepal measurements and engineered features like Sepal Area provide additional, but less powerful, separation.
Visualizations clearly show patterns and support the conclusions drawn from the data.

How to Use This Project

Clone the repository:

git clone https://github.com/KHenn22/datafun-04-eda.git
cd datafun-04-eda

Create and activate a virtual environment:

python3 -m venv .venv
source .venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Open TestDrive.ipynb in Jupyter or VS Code and run the cells to reproduce the analysis.

Requirements

Python 3.8 or higher
See requirements.txt for package list

Project Status

Complete as of 9/10/2025.

Prerequisites

Python 3.8+
Recommended: Use a virtual environment

Setup

Clone the repository:

git clone https://github.com/KHenn22/datafun-04-eda.git
cd datafun-04-eda

Create and activate a virtual environment:

python3 -m venv .venv
source .venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
README.md		README.md
TestDrive.ipynb		TestDrive.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataFun-04-EDA: Exploratory Data Analysis of the Iris Dataset

Project Overview

Key Steps Performed

Results

How to Use This Project

Requirements

Project Status

Prerequisites

Setup

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DataFun-04-EDA: Exploratory Data Analysis of the Iris Dataset

Project Overview

Key Steps Performed

Results

How to Use This Project

Requirements

Project Status

Prerequisites

Setup

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages