Skip to content

prashant-sharma-cmd/flight-delay-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

US Flight Delays Analysis & Dashboard

Comprehensive exploratory data analysis, visualization, and interactive dashboard for US domestic flight delays using historical airline on-time performance data combined with weather information.

Python version Streamlit Data Source

📊 Project Overview

This project analyzes US flight delay patterns using publicly available Bureau of Transportation Statistics (BTS) on-time performance data, enriched with weather conditions at departure and arrival airports.

Key features:

  • Data cleaning and feature engineering
  • Exploratory Data Analysis (EDA): carriers, airports, time-of-day, day-of-week, month, delay causes
  • Geographic visualization of delay hotspots
  • Interactive Streamlit dashboard with filters and dynamic charts
  • Basic predictive modeling foundation (Random Forest / XGBoost ready)

Main questions answered:

  • Which airlines have the highest delay rates?
  • What are the most common causes of delays?
  • When (time of day, day of week, month) do delays peak?
  • Which airports are the worst delay hotspots?

📁 Project Structure

flight-delay-analysis/
├── data/
│ ├── raw/ # original downloaded CSVs
├── notebooks/
│ └── notebook.ipynb # main Jupyter notebook with EDA & modeling
│ └── cleaned_flights.parquet # cleaned & processed dataset
├── scripts/
│ └── dashboard.py # Web interface to access data analysis
│ └── cleaned_flights.parquet # cleaned & processed dataset
│ └── airport_coords.py # contains cordinates for mapping
├── dashboard.py # Streamlit app
├── requirements.txt
├── readme.md
└── demos/ # screenshots & saved plots

🛠️ Technologies & Libraries

  • Python 3.10+
  • Data processing: pandas, numpy
  • Visualization: matplotlib, seaborn, plotly, folium
  • Dashboard: streamlit, streamlit-folium
  • Modeling (optional): scikit-learn, xgboost
# requirements.txt
pandas
numpy
matplotlib
seaborn
plotly
folium
streamlit
streamlit-folium
scikit-learn          # optional for modeling
xgboost               # optional

🚀 Getting Started

  1. Clone the repository
   bash
   git clone https://github.com/prashant-sharma-cmd/flight-delay-analysis.git
   cd flight-delay-project
  1. Install Dependencies

# Recommended: create virtual environment first

python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows

pip install -r requirements.txt
  1. Prepare the Data

Download a suitable dataset, for example:

Kaggle: Airline Delay and Cancellation Data Or BTS TranStats: https://www.transtats.bts.gov/DL_SelectFields.aspx?gnoyr_VQ=FGJ
to /data folder. Name the dataset you want to analyze as flights_with_weather.

  1. Explore the notebook

Open notebooks/notebook.ipynb in Jupyter / VS Code / JupyterLab for detailed EDA, charts, and modeling experiments.

  1. Run the Interactive Dashboard
cd scripts
streamlit run dashboard.py

Open http://localhost:8501 in your browser.

Known Limitations

  • Charts don't render the data points in streamlit app.
  • Application can only read datasets of a single month.
  • Large datasets (>5M rows) may need sampling or Dask for faster processing

Data Sources & Licenses

Primary data: U.S. Department of Transportation – Bureau of Transportation Statistics (BTS) https://www.transtats.bts.gov/ Weather integration: commonly found in Kaggle merged datasets Airport coordinates: OpenFlights / OurAirports (public domain)

Acknowledgements

Inspired by many excellent Kaggle notebooks on flight delays. Thanks to the open data community and BTS for making this kind of analysis possible.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors