This project applies machine learning techniques to predict the dominant forest cover type of 30m × 30m land cells within the Roosevelt National Forest in northern Colorado. Using cartographic variables derived from USGS and US Forest Service datasets, the project compares multiple supervised learning algorithms for large-scale environmental classification and identifies the most effective model for forest cover prediction.
- Dataset: UCI Forest Covertype Dataset
- Original Size: 581,012 instances
- Features: 54 cartographic variables
- Target Classes: 7 forest cover types
- Project Sample: 50,000 randomly selected records
- Source: https://archive.ics.uci.edu/ml/datasets/covertype
Accurate forest cover classification is important for:
- Environmental monitoring
- Forest resource management
- Biodiversity conservation
- Land-use planning
Traditional mapping techniques are expensive and time-consuming. This project explores how machine learning can automate and improve forest cover prediction.
- Python
- Pandas
- NumPy
- Scikit-Learn
- Matplotlib
- Seaborn
- Jupyter Notebook
- Data Collection
- Data Preprocessing
- Random Sampling (50,000 records)
- Feature Scaling
- Correlation-Based Feature Selection
- Train-Test Split (80/20)
- Model Training
- Model Evaluation
- Performance Comparison
- Logistic Regression
- K-Nearest Neighbors (KNN)
- Decision Tree
- Support Vector Machine (SVM)
- Random Forest
- Accuracy
- Precision
- Recall
- F1-Score
- G-Mean
- False Positive Rate (FPR)
| Model | Accuracy |
|---|---|
| Logistic Regression | 72% |
| KNN | 82% |
| Decision Tree | 81% |
| SVM | 77% |
| Random Forest | 87% |
Random Forest achieved the highest performance and demonstrated strong capability in handling high-dimensional environmental data.
forest-cover-classification/
│
├── notebooks/
│ └── ml_project.ipynb
│
├── reports/
│ └── Forest_Cover_Classification.pdf
│
├── images/
│ ├── correlation_matrix.png
│ ├── confusion_matrix.png
│ └── feature_importance.png
│
├── requirements.txt
├── README.md
└── LICENSE
- Feature engineering and feature selection
- Multi-class classification
- Model comparison and benchmarking
- Evaluation metrics beyond accuracy
- Ensemble learning using Random Forest
- Train on the complete 581K-instance dataset
- Implement XGBoost and LightGBM
- Address class imbalance using SMOTE
- Deploy the model using Flask/FastAPI
- Create a web application for real-time predictions



