linear Regression Model to Predict CO2 Emissions in East Africa

Description of Mission and Problem

My mission focuses on Climate Change and how to use technologies to address environmental challenges in Africa.
The problem addressed here is the prediction of CO2 emissions trends to support improved climate-mitigation planning.
The goal is to contribute practical, data-driven tools that support Africa's efforts in effective climate adaptation.

Dataset Information

This project uses historical country-level CO2 emissions and socioeconomic indicators from 2000 to 2020 to create a model that predicts total CO2 emissions excluding LUCF in East Africa.

Dataset: africa-co2-emissions.csv

Dataset Characteristics:

Rows: 1,134
Columns: 20 total (3 non-numeric, 17 numeric)
Source: African countries CO2 emissions data

Project Structure

├── summative/
│   ├── API/
│   │   ├── app.py
│   │   ├── prediction.py
│   │   ├── requirements.txt
│   ├── FlutterApp/
│   │   ├── east_africa_co2_prediction_mobile_app
│   └── linear_regression/
│       ├── multivariate.ipynb
│       ├── data/
│       │   └── africa-co2-emissions.csv
│       └── final_model/
│           ├── best_linear_regression_model.joblib
│           └── fastapi_model_artifacts.joblib
└── README.md

Setup Instructions

Clone or Download the Repository

git clone https://github.com/hd77alu/linear_regression_model
cd linear_regression_model

Create and activate a Python virtual environment.
Install notebook dependencies:
- pip install numpy pandas scikit-learn matplotlib joblib jupyter

How to Use the Notebook

Method 1: Using Jupyter Notebook (Local)

# Navigate to project directory
cd linear_regression_model

# Launch Jupyter Notebook
jupyter notebook

# Open the file: multivariate.ipynb

Method 2: Using Google Colab

Click the "Open in Colab" badge at the top of the notebook
Upload africa-co2-emissions.csv to your files
Run all cells

Method 3: Using VS Code

Open VS Code
Install the Jupyter extension (if not already installed)
Open the folder linear_regression
Click on multivariate.ipynb
Select Run all
Select Virtual Python kernel when prompted

How to Run API Locally

Install API dependencies (from repository root):

pip install -r summative/API/requirements.txt

Start the FastAPI server (from repository root):

python -m uvicorn summative.API.app:app --reload --host 0.0.0.0 --port 8000

Verify service health:

API root: http://127.0.0.1:8000/
Swagger UI: http://127.0.0.1:8000/docs

Test API with Swagger UI

Open http://127.0.0.1:8000/docs or use deplyed Render Swagger UI: https://linear-regression-model-wk1e.onrender.com/docs.
Expand POST /predict and click Try it out.

Use a sample payload:

{
	"country": "Kenya",
	"year": 2020,
	"population": 53771300,
	"transportation_mt": 5.1,
	"manufacturing_construction_mt": 2.3,
	"electricity_heat_mt": 3.8,
	"building_mt": 1.7
}

Click Execute and inspect prediction_mt in the response.

Expand POST /predict/batch and click Try it out.

Use a sample payload:

{
  "rows": [
    { "country": "Kenya", "year": 2020, "population": 53771300, "transportation_mt": 5.1, "manufacturing_construction_mt": 2.3, "electricity_heat_mt": 3.8, "building_mt": 1.7 },
    { "country": "Uganda", "year": 2019, "population": 45741000, "transportation_mt": 3.2, "manufacturing_construction_mt": 1.1, "electricity_heat_mt": 1.5, "building_mt": 0.9 }
  ]
}

Click Execute and inspect prediction_mt in the response.

For model refresh with labeled rows, test POST /retrain.

Sample payload:

{
  "persist_new_rows": false,
  "rows": [
    {
      "country": "Kenya",
      "year": 2021,
      "population": 54985711,
      "transportation_mt": 5.3,
      "manufacturing_construction_mt": 2.5,
      "electricity_heat_mt": 3.9,
      "building_mt": 1.8,
      "target_mt": 12.4
    }
  ]
}

Project Implementation

1. Problem Framing And Scope

This implementation is built around a climate-focused regression task: estimate East Africa Total CO2 Emission excluding LUCF (Mt) from historical economic and sectoral emission indicators. The geographic scope was intentionally constrained to Eastern Africa to keep the model aligned with the mission objective and to avoid mixing very different regional emission dynamics.

2. Data Processing Pipeline

The notebook implementation follows a structured preprocessing pipeline:

Load the dataset from summative/linear_regression/data/africa-co2-emissions.csv.
Replace placeholder missing values (N/A, na, empty strings) with nulls.
Filter records to Eastern Africa only.
Convert selected columns to numeric types using safe coercion.
Drop highly sparse columns (such as Fugitive Emissions) when needed.
Fill remaining predictor missing values using country-wise median imputation.
Apply a global median fallback for any predictor values still missing after country-level fill.
Drop rows where the target is missing to keep model prediction valid.

This step ensures that the model receives clean, consistent numeric inputs and that downstream analysis is reproducible.

3. Exploratory Data Analysis (EDA)

EDA was used to validate assumptions and guide feature decisions:

A. Correlation heatmap to inspect relationships between predictors and the target.

B. Histograms to understand the distribution and spread of key variables.

C. Scatter plots to inspect directional patterns and potential linear relationships.

These visualizations were not only descriptive; they directly informed which variables were likely to be useful and where multicollinearity/leakage risks might appear.

4. Feature Engineering And Selection

Feature engineering was implemented with explicit decision logic:

Build an intermediate feature-engineering dataframe.
Drop columns with redundancy or leakage risk.
Create derived features (for example, population density) and test their usefulness.
Rank numeric candidates by correlation strength to the target.
Finalize a decomposition-aware set of predictors for training.

Final modeling features:

Country
Year
Population
Transportation (Mt)
Manufacturing/Construction (Mt)
Electricity/Heat (Mt)
Building (Mt)

This made the model easier to interpret while reducing unstable feature overlap.

5. Data Standardization And Encoding

To prepare features for model training:

Separate features (X) and target (y).
Standardize numeric predictors with StandardScaler.
One-hot encode the categorical Country feature.
Split into training and testing sets using an 80/20 ratio.

This ensures numeric features are on comparable scales and categorical information is represented in a model-friendly way.

6. Model Training Strategy

Three regression models were trained on the same processed dataset:

Linear Regression
Decision Tree Regressor
Random Forest Regressor

All three were trained and evaluated under the same split and preprocessing path for fair comparison.

7. Evaluation, Selection, And Validation

Model comparison used Mean Squared Error (MSE) as the primary metric. The implementation includes:

MSE calculation for each model.
Ranking models by lowest test MSE.
Additional quick prediction checks (actual vs predicted samples).
Train-vs-test loss visualization for comparison across models.

The best-performing model (lowest loss) was selected automatically from the evaluation table.

8. Visualization of the Final Linear Fit

A scatter plot was implemented to show the fitted linear relationship after training the Linear Regression model. For clarity, the fitted line is visualized against a chosen feature slice (Transportation (Mt)) while other features are held at baseline values. This provides a readable 2D interpretation of a multivariate model.

9. Model Persistence, Inference Script, And API Layer

After selection, the best model is saved to:

summative/linear_regression/final_model/best_linear_regression_model.joblib
summative/linear_regression/final_model/fastapi_model_artifacts.joblib

To support backend integration, a standalone inference module was built:

summative/API/prediction.py

The module:

Loads the saved model.
Reuses notebook-equivalent preprocessing for training-frame construction.
Saves and reloads preprocessing artifacts (scaler and training columns) so inference remains consistent with training.
Validates typed input payloads.
Supports both single and batch predictions.

The API service in summative/API/app.py:

Exposes /predict, /predict/batch, and /retrain endpoints.
Uses Pydantic request schemas for validation.
Delegates model/preprocessing logic to prediction.py.
Supports CORS configuration through the ALLOWED_ORIGINS environment variable.

10. Flutter Mobile Application

A Flutter app was built to provide a user-friendly interface for interacting with the prediction API. The app is located at summative/FlutterApp/east_africa_co2_prediction_mobile_app and connects to the deployed API at https://linear-regression-model-wk1e.onrender.com.

The app is structured as a single prediction page with two modes:

Single Entry mode — the user fills in 7 input fields (Country, Year, Population, Transportation, Manufacturing/Construction, Electricity/Heat, Building) and submits to POST /predict. The predicted CO₂ emission value is displayed in a result box below the form.
Multiple Entries mode — the user adds as many entry cards as needed, each with the same 7 fields, and submits all at once to POST /predict/batch. Results are displayed as expandable cards (expanded by default), one per entry, each showing the predicted emission value.

Key implementation details:

All fields are validated before submission — required check, numeric type check, and non-negative value check.
The Predict button is disabled and shows a loading spinner while a request is in flight.
API errors and network failures are caught and displayed in a dedicated red-bordered error box.
A 30-second request timeout is applied to handle Render free-tier cold starts.
The UI uses a dark blue (#0A1E3C) background with yellow (#FFC107) accents throughout.
State is managed using setState within a single StatefulWidget, keeping the implementation simple and self-contained.

For Flutter setup and run instructions, see Flutter Setup Instructions.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
summative		summative
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

linear Regression Model to Predict CO2 Emissions in East Africa

Description of Mission and Problem

Dataset Information

Dataset Characteristics:

Project Structure

Setup Instructions

How to Use the Notebook

Method 1: Using Jupyter Notebook (Local)

Method 2: Using Google Colab

Method 3: Using VS Code

How to Run API Locally

Test API with Swagger UI

Project Implementation

1. Problem Framing And Scope

2. Data Processing Pipeline

3. Exploratory Data Analysis (EDA)

4. Feature Engineering And Selection

5. Data Standardization And Encoding

6. Model Training Strategy

7. Evaluation, Selection, And Validation

8. Visualization of the Final Linear Fit

9. Model Persistence, Inference Script, And API Layer

10. Flutter Mobile Application

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

linear Regression Model to Predict CO2 Emissions in East Africa

Description of Mission and Problem

Dataset Information

Dataset Characteristics:

Project Structure

Setup Instructions

How to Use the Notebook

Method 1: Using Jupyter Notebook (Local)

Method 2: Using Google Colab

Method 3: Using VS Code

How to Run API Locally

Test API with Swagger UI

Project Implementation

1. Problem Framing And Scope

2. Data Processing Pipeline

3. Exploratory Data Analysis (EDA)

4. Feature Engineering And Selection

5. Data Standardization And Encoding

6. Model Training Strategy

7. Evaluation, Selection, And Validation

8. Visualization of the Final Linear Fit

9. Model Persistence, Inference Script, And API Layer

10. Flutter Mobile Application

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages