- My mission focuses on Climate Change and how to use technologies to address environmental challenges in Africa.
- The problem addressed here is the prediction of CO2 emissions trends to support improved climate-mitigation planning.
- The goal is to contribute practical, data-driven tools that support Africa's efforts in effective climate adaptation.
This project uses historical country-level CO2 emissions and socioeconomic indicators from 2000 to 2020 to create a model that predicts total CO2 emissions excluding LUCF in East Africa.
Dataset: africa-co2-emissions.csv
- Rows: 1,134
- Columns: 20 total (3 non-numeric, 17 numeric)
- Source: African countries CO2 emissions data
├── summative/
│ ├── API/
│ │ ├── app.py
│ │ ├── prediction.py
│ │ ├── requirements.txt
│ ├── FlutterApp/
│ │ ├── east_africa_co2_prediction_mobile_app
│ └── linear_regression/
│ ├── multivariate.ipynb
│ ├── data/
│ │ └── africa-co2-emissions.csv
│ └── final_model/
│ ├── best_linear_regression_model.joblib
│ └── fastapi_model_artifacts.joblib
└── README.md
- Clone or Download the Repository
git clone https://github.com/hd77alu/linear_regression_model
cd linear_regression_model- Create and activate a Python virtual environment.
- Install notebook dependencies:
pip install numpy pandas scikit-learn matplotlib joblib jupyter
# Navigate to project directory
cd linear_regression_model
# Launch Jupyter Notebook
jupyter notebook
# Open the file: multivariate.ipynb- Click the "Open in Colab" badge at the top of the notebook
- Upload
africa-co2-emissions.csvto your files - Run all cells
- Open VS Code
- Install the Jupyter extension (if not already installed)
- Open the folder
linear_regression - Click on
multivariate.ipynb - Select Run all
- Select Virtual Python kernel when prompted
- Install API dependencies (from repository root):
pip install -r summative/API/requirements.txt- Start the FastAPI server (from repository root):
python -m uvicorn summative.API.app:app --reload --host 0.0.0.0 --port 8000- Verify service health:
- API root:
http://127.0.0.1:8000/ - Swagger UI:
http://127.0.0.1:8000/docs
- Open
http://127.0.0.1:8000/docsor use deplyed Render Swagger UI:https://linear-regression-model-wk1e.onrender.com/docs. - Expand
POST /predictand click Try it out.
- Use a sample payload:
{
"country": "Kenya",
"year": 2020,
"population": 53771300,
"transportation_mt": 5.1,
"manufacturing_construction_mt": 2.3,
"electricity_heat_mt": 3.8,
"building_mt": 1.7
}- Click Execute and inspect
prediction_mtin the response.
- Expand
POST /predict/batchand click Try it out.
- Use a sample payload:
{
"rows": [
{ "country": "Kenya", "year": 2020, "population": 53771300, "transportation_mt": 5.1, "manufacturing_construction_mt": 2.3, "electricity_heat_mt": 3.8, "building_mt": 1.7 },
{ "country": "Uganda", "year": 2019, "population": 45741000, "transportation_mt": 3.2, "manufacturing_construction_mt": 1.1, "electricity_heat_mt": 1.5, "building_mt": 0.9 }
]
}- Click Execute and inspect
prediction_mtin the response.
- For model refresh with labeled rows, test
POST /retrain.
- Sample payload:
{
"persist_new_rows": false,
"rows": [
{
"country": "Kenya",
"year": 2021,
"population": 54985711,
"transportation_mt": 5.3,
"manufacturing_construction_mt": 2.5,
"electricity_heat_mt": 3.9,
"building_mt": 1.8,
"target_mt": 12.4
}
]
}This implementation is built around a climate-focused regression task: estimate East Africa Total CO2 Emission excluding LUCF (Mt) from historical economic and sectoral emission indicators. The geographic scope was intentionally constrained to Eastern Africa to keep the model aligned with the mission objective and to avoid mixing very different regional emission dynamics.
The notebook implementation follows a structured preprocessing pipeline:
- Load the dataset from
summative/linear_regression/data/africa-co2-emissions.csv. - Replace placeholder missing values (
N/A,na, empty strings) with nulls. - Filter records to Eastern Africa only.
- Convert selected columns to numeric types using safe coercion.
- Drop highly sparse columns (such as Fugitive Emissions) when needed.
- Fill remaining predictor missing values using country-wise median imputation.
- Apply a global median fallback for any predictor values still missing after country-level fill.
- Drop rows where the target is missing to keep model prediction valid.
This step ensures that the model receives clean, consistent numeric inputs and that downstream analysis is reproducible.
EDA was used to validate assumptions and guide feature decisions:
A. Correlation heatmap to inspect relationships between predictors and the target.
B. Histograms to understand the distribution and spread of key variables.
C. Scatter plots to inspect directional patterns and potential linear relationships.
These visualizations were not only descriptive; they directly informed which variables were likely to be useful and where multicollinearity/leakage risks might appear.
Feature engineering was implemented with explicit decision logic:
- Build an intermediate feature-engineering dataframe.
- Drop columns with redundancy or leakage risk.
- Create derived features (for example, population density) and test their usefulness.
- Rank numeric candidates by correlation strength to the target.
- Finalize a decomposition-aware set of predictors for training.
Final modeling features:
- Country
- Year
- Population
- Transportation (Mt)
- Manufacturing/Construction (Mt)
- Electricity/Heat (Mt)
- Building (Mt)
This made the model easier to interpret while reducing unstable feature overlap.
To prepare features for model training:
- Separate features (
X) and target (y). - Standardize numeric predictors with
StandardScaler. - One-hot encode the categorical
Countryfeature. - Split into training and testing sets using an 80/20 ratio.
This ensures numeric features are on comparable scales and categorical information is represented in a model-friendly way.
Three regression models were trained on the same processed dataset:
- Linear Regression
- Decision Tree Regressor
- Random Forest Regressor
All three were trained and evaluated under the same split and preprocessing path for fair comparison.
Model comparison used Mean Squared Error (MSE) as the primary metric. The implementation includes:
- MSE calculation for each model.
- Ranking models by lowest test MSE.
- Additional quick prediction checks (actual vs predicted samples).
- Train-vs-test loss visualization for comparison across models.
The best-performing model (lowest loss) was selected automatically from the evaluation table.
A scatter plot was implemented to show the fitted linear relationship after training the Linear Regression model. For clarity, the fitted line is visualized against a chosen feature slice (Transportation (Mt)) while other features are held at baseline values. This provides a readable 2D interpretation of a multivariate model.
After selection, the best model is saved to:
summative/linear_regression/final_model/best_linear_regression_model.joblibsummative/linear_regression/final_model/fastapi_model_artifacts.joblib
To support backend integration, a standalone inference module was built:
summative/API/prediction.py
The module:
- Loads the saved model.
- Reuses notebook-equivalent preprocessing for training-frame construction.
- Saves and reloads preprocessing artifacts (scaler and training columns) so inference remains consistent with training.
- Validates typed input payloads.
- Supports both single and batch predictions.
The API service in summative/API/app.py:
- Exposes
/predict,/predict/batch, and/retrainendpoints. - Uses Pydantic request schemas for validation.
- Delegates model/preprocessing logic to
prediction.py. - Supports CORS configuration through the
ALLOWED_ORIGINSenvironment variable.
A Flutter app was built to provide a user-friendly interface for interacting with the prediction API.
The app is located at summative/FlutterApp/east_africa_co2_prediction_mobile_app and connects to the deployed API at https://linear-regression-model-wk1e.onrender.com.
The app is structured as a single prediction page with two modes:
- Single Entry mode — the user fills in 7 input fields (Country, Year, Population, Transportation, Manufacturing/Construction, Electricity/Heat, Building) and submits to
POST /predict. The predicted CO₂ emission value is displayed in a result box below the form. - Multiple Entries mode — the user adds as many entry cards as needed, each with the same 7 fields, and submits all at once to
POST /predict/batch. Results are displayed as expandable cards (expanded by default), one per entry, each showing the predicted emission value.
Key implementation details:
- All fields are validated before submission — required check, numeric type check, and non-negative value check.
- The Predict button is disabled and shows a loading spinner while a request is in flight.
- API errors and network failures are caught and displayed in a dedicated red-bordered error box.
- A 30-second request timeout is applied to handle Render free-tier cold starts.
- The UI uses a dark blue (
#0A1E3C) background with yellow (#FFC107) accents throughout. - State is managed using
setStatewithin a singleStatefulWidget, keeping the implementation simple and self-contained.
For Flutter setup and run instructions, see Flutter Setup Instructions.




